Recent comments in /f/MachineLearning
Pas7alavista t1_jeg5dhh wrote
Reply to comment by mattsverstaps in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
>so the extra dimensions are unnecessary
Yes one reason for embedding is to get extract relevant features.
Also, any finite dimensional inner product space has an orthonormal basis, and the math is easiest this way so there's not much of a reason to describe a space using non orthogonal dimensions. There is also nothing stopping you from doing so though.
>Doesn't it suggest a pattern in data if a mapping is found that reduces dimension
Yeah generally you wouldn't attempt to use ML methods on data where you think there is no pattern
>Something something Linear algebra
I think you might be thinking about the span and or basis but it's hard for me to interpret your question
darthmeck t1_jeg46cc wrote
Reply to [D][N] LAION Launches Petition to Establish an International Publicly Funded Supercomputing Facility for Open Source Large-scale AI Research and its Safety by stringShuffle
I don’t know how they’d go about doing this but there need to be provisions that it can never become a for-profit agency. OpenAI gained traction by doing cutting-edge research and touting it as open to the public (or at least researchers) and then pulled the rug out from under everyone when they struck gold. In case the LAION discovers a new architecture that dwarfs the capability of LLMs, they should never be able to say “ok time to start a company and mint billions now!”.
lxe t1_jeg2h5j wrote
Reply to comment by aliasaria in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Thank you. Much appreciate the explanation.
planetofthemapes15 t1_jeg1iqc wrote
Reply to [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
Cool, I had a mental model very similar to this which I was planning on implementing next week. I'll just try yours and if I make an improvement I'll submit a PR.
PassingTumbleweed t1_jeg11bt wrote
Reply to [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
Thanks for sharing! Can you explain the internals a bit more? How do you convert the user input into GPT prompt(s) and how do you turn the response(s) into a probability distribution?
Appropriate-Crab-379 t1_jefz9og wrote
Reply to comment by dreaming_geometry in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
There’s a ton of noise, not all techniques are worth knowing because in a few years a bunch of these concepts will be outdone by something new.
Dapper_Cherry1025 t1_jefywqj wrote
Reply to comment by KerfuffleV2 in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
Well, that's probably because I specifically asked it to use an internal monologue. I think what I'm trying to say is that each part of its response does seem to flow in a logical way that I found easy to understand. Heck, when I refined my prompt down for 3.5 I was able to get it to admit that it couldn't come up with a solution when I tried to get a more complicated example.
I also find it very interesting that when chatgpt starts a sentence with something like "Yes, because..." I know right away that the answer is probably incorrect, because after it replies "Yes" it will then try to justify the yes even if it is wrong. However, if you can get it to investigate a problem like shown in the example it can actually try different things before arriving at a solution.
turnip_burrito t1_jefysiz wrote
Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
My prompt:
> Suppose I have an N>>1 dimensional space, finite in extent along any given axis, in which a set of M random vectors are dispersed (each coordinate of each vector is randomly sampled from a uniform distribution spanning some bounded range of the space). What can we say about the distances in this space between the M vectors?
I left my prompt open ended to not give it any ideas one way or another.
Its response makes sense to me. The standard deviation of a set of random samples from a uniform distribution centered at mean 0, which is proportional to the distance calculated here, should shrink as dimension N grows. If N is large, then the distribution of pairwise distances will narrow until nearly all points are roughly the same distance from each other. (The random sampling is a way to build in lack of correlation, like how you mentioned unrelated ideas)
Of course, the reverse is also true: if dimension N is small, then originally "far" points will become closer or farther (which one effect exactly is unpredictable depending on which dimensions are removed) because the averaging over random sample fluctuations disappears.
Zealousideal_Low1287 t1_jefx466 wrote
Reply to comment by shn29 in [D] [R] On what kind of machine does Midjorney, the art generating AI, runs on? by shn29
Somewhere between a high end gaming card and 8 A100s probably. So £1k - 160k ish worth of GPUs?
mattsverstaps t1_jefvu45 wrote
Reply to comment by Pas7alavista in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
So the extra dimensions are unnecessary? I just realised that there could be some situations in which non orthogonal dimensions are preferable. I can’t exactly think of them. Doesn’t it suggest a pattern in data if a mapping is found that reduces the dimension? Like I picture from linear algebra 101 finding a line that everything is a multiple of so one dimension would do and that line is a ‘pattern’? Sorry I’m high.
[deleted] t1_jefsejl wrote
Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
[removed]
itsyourboiirow t1_jefs1oh wrote
Reply to [D] Simple Questions Thread by AutoModerator
People/organizations to follow on Twitter with all things machine learning (traditional, deep neural networks, LLM, etc)
monks-cat t1_jefqotb wrote
Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
Context radically changes the "distance" between concepts. So in your example isotropy isn't necessarily a desired property of a LLM. In poetry, for example, we combine two concepts that would seemingly be very far apart in the original space but should be mapped rather closely in the embedding.
​
The problem I see with this whole idea though is that a "concept" doesn't inherently seem to be represented by list of features. Two concepts interacting aren't necessarily the intersection of their features.
I'll try to see if I can come up with concrete examples in language.
pasr9 t1_jefqoii wrote
Reply to comment by polawiaczperel in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
I'm more interested in them releasing the dataset used to fine tune it.
lacker t1_jefpz2c wrote
Reply to [D][N] LAION Launches Petition to Establish an International Publicly Funded Supercomputing Facility for Open Source Large-scale AI Research and its Safety by stringShuffle
I’m a big fan of open source AI research, but creating a new facility doesn’t seem like the way to go. If you’re making a GPU cluster that has to be shared among a bunch of different academic groups, you’ll have to build resource-sharing software, infrastructure tools, etc, and spend all this money on what is essentially an AWS clone.
Wouldn’t it be more effective to simply give this money to AI research groups and let them buy infrastructure from the most cost-effective provider? If AWS works best, fine, if it’s some smaller infrastructure provider, that’s fine too.
This proposal seems like it would actually divert money away from AI, by spending a lot of money rebuilding the standard cluster infrastructure stuff that cloud providers already have.
ReasonableObjection t1_jefpe2n wrote
Reply to comment by grotundeek_apocolyps in [D] AI Explainability and Alignment through Natural Language Internal Interfaces by jackfaker
Thank you so much for the thoughtful reply!
Will read into these and may reach out to you with other questions.
Edit - as far as how I'm feeling... at the moment just curious, been asking lots of questions about this the last few days and reading any resources people are kind enough to share :-)
artsybashev t1_jefp0o2 wrote
Reply to comment by Sopel97 in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
Yes the only thing they can do is ban you from their service
shn29 OP t1_jefmsm2 wrote
Reply to comment by MotionTwelveBeeSix in [D] [R] On what kind of machine does Midjorney, the art generating AI, runs on? by shn29
I think I've pointed that out that they're doing it large-scale. I was curious how much of processing power would require if I wanted to run it locally. And the end of the post I said "theoretically of course".
MotionTwelveBeeSix t1_jefm3ak wrote
Requirements for running locally aren’t related to what the company uses, they’re servicing thousands/tens of thousands of images every minute, you’re not.
You can run Stable Diffusion locally at fairly high resolution and equally high quality models off pretty much any modern graphics card.
[deleted] t1_jeflnld wrote
Reply to [D] FOMO on the rapid pace of LLMs by 00001746
[removed]
Ricenaros t1_jefllyl wrote
Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
What does (in)finite have to do with anything? Infinity is an abstract mathematical concept used for modeling purposes and has nothing to do with physical reality.
WikiSummarizerBot t1_jefl95t wrote
Reply to comment by grotundeek_apocolyps in [D] AI Explainability and Alignment through Natural Language Internal Interfaces by jackfaker
>Landauer's principle is a physical principle pertaining to the lower theoretical limit of energy consumption of computation. It holds that an irreversible change in information stored in a computer, such as merging two computational paths, dissipates a minimum amount of heat to its surroundings.
Solomonoff's theory of inductive inference
>Unfortunately, Solomonoff also proved that Solomonoff's induction is uncomputable. In fact, he showed that computability and completeness are mutually exclusive: any complete theory must be uncomputable. The proof of this is derived from a game between the induction and the environment. Essentially, any computable induction can be tricked by a computable environment, by choosing the computable environment that negates the computable induction's prediction.
^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)
grotundeek_apocolyps t1_jefl7kd wrote
Reply to comment by ReasonableObjection in [D] AI Explainability and Alignment through Natural Language Internal Interfaces by jackfaker
The crux of the matter is that there are fundamental limitations to the power of computation. It is physically impossible to create an AI, or any other kind of intelligent agent, that can overpower everything else in the physical world by virtue of sheer smartness.
Depending on where you're coming from this is not an easy thing to understand, it usually requires a lot of education. The simplest metaphor that I've thought of is the speed of light: it seems intuitively plausible that a powerful enough rocket ship should be able to fly faster than the speed of light, but actually the laws of physics prohibit it.
Similarly, it seems intuitively plausible that a smart enough agent should be able to solve any problem arbitrarily quickly, thereby enabling it to (for example) conquer the world or destroy humanity, but that too is physically impossible.
There are a lot of ways to understand why this is true. I'll give you a few places to start.
- landauer's principle: infinite computation would require infinite resources
- solomonoff induction is uncomputable: the optimal general method of bayesian induction is literally impossible to compute even in principle
- chaotic dynamics cannot be predicted: control requires prediction, but the finite precision of measurement and the aforementioned limits on computation mean that our control over the world is fundamentally limited and intelligence can never overcome this fact
The people who have thought about this "for 30+ years" and come to a different conclusion are charlatans. I don't know of a gentler way of putting it. What do you tell someone when they ask you to explain why someone who has been running a cult for 30 years isn't really talking directly to god?
Something to note on the more psychological end of things is that a person's ability to understand things is fundamentally limited by their understanding of their own emotions. The consequence of this is that you should also be thinking about how you're feeling when you're reading hysterical nonsense about the robot apocalypse, because that's going to affect how likely you are to believe things that aren't true. People often fixate on things that have a strong emotional valence, irrespective of their accuracy.
KerfuffleV2 t1_jefkhxs wrote
Reply to comment by Dapper_Cherry1025 in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
> Something about these distillations feels fundamentally different than when interacting with the larger models.
It may not have anything to do with size. ChatGPT is just adding a lot of comfort-phrases into its response instead of just responding. "Hmm, this is an interesting challenge", "Let's see", etc. Some of that may be based on the system prompt, some of it may be training to specifically produce more natural sounding responses.
All "Hmm", "interesting challenge" and stuff that makes it sound like a person isn't actually adding any actual information that's relevant to answering the query though. (Also, you may be paying for those extraneous tokens.)
Jean-Porte t1_jeg5xpd wrote
Reply to [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
How does this compare to Huggingface zero shot NLI pipelines, eg https://huggingface.co/sileod/deberta-v3-base-tasksource-nli ?