Recent comments in /f/MachineLearning
lgastako t1_jeayn8v wrote
Reply to comment by WokeAssBaller in [D] The best way to train an LLM on company data by jaxolingo
I know training a model from scratch will work, but the context of the conversation is fine tuning an existing model and I'm saying I would love to see examples of the claims people are making actually working, because I have only been able to find and create examples of it not working very well at all.
Ricenaros t1_jeax41q wrote
Reply to comment by RecoilS14 in [D] Simple Questions Thread by AutoModerator
I would suggest picking up either pytorch or tensorflow and sticking with one of these while you learn (personally I'd choose pytorch). It'll be easy to go back and learn the other one if needed once you get more comfortable with the material.
Ricenaros t1_jeawpf3 wrote
Reply to comment by sparkpuppy in [D] Simple Questions Thread by AutoModerator
It refers to the number of scalars needed to specify the model. At the heart of machine learning is matrix multiplication. Consider input vector x of size (n x 1). Here is a Linear transformation: y = Wx + b. In this case, the (m x n) matrix W(weights) and the (m x 1) vector b(bias) are the model parameters. Learning consists of tweaking W,b in a way that lowers the loss function. For this simple linear layer there are m*n + m scalar parameters (The elements of W and the elements of b).
Hyperparameters on the other hand are things like learning rate, batch size, number of epochs, etc.
Hope this helps.
alpolvovolvere t1_jeavq3v wrote
Reply to [D] Simple Questions Thread by AutoModerator
I'm trying to use Whisper in Python to produce a transcription of an 8-minute Japanese-language mp4. It doesn't really matter which model I use, the script's execution screeches to a halt after a few seconds, going from 9MiB/s to like 200Kib/s. Is this a "thing"? Like is it just something that everyone knows about? Is there a way to make this faster?
sandys1 t1_jeav3l4 wrote
Reply to comment by jaxolingo in [D] The best way to train an LLM on company data by jaxolingo
im not able to dm you. can u please dm.
FermiAnyon t1_jeauumy wrote
This topic in general is super interesting...
So the big difference between humans and these large transformers, on paper, is that humans learn to model things in their environments whether it's tools or people or whatever and it's on that basis that we use analogy and make predictions about things. But we ultimately interact with a small number of inputs, basically our five senses... so the thing I find super interesting is the question of whether these models, even ones that just interact with text, are learning to model just the text itself or if they're actually learning models of things that, with more data/compute would enable them to model more things...
I guess the question at hand is whether this ability to model things and make analogies and abstract things is some totally separate process that we haven't started working with yet, or whether it's an emergent property of just having enough weights to basically be properly isotropic with regard to the actual complexity of the world we live in
Jadien t1_jeastxv wrote
I've only skimmed the link (and its sub-links), but the basic idea is this:
If you've trained a model to predict the next move in an Othello game, given the board state as an input, you can not necessarily conclude that the model also has the ability to perform similar tasks, like "Determine whether a given move is legal" or "Determine what the board state will be after executing a move". Those abilities might help a model predict the next move but are not required.
However:
> Context: A recent paper trained a model to play legal moves in Othello by predicting the next move, and found that it had spontaneously learned to compute the full board state - an emergent world representation.
In the process of optimizing the model's ability to predict moves, the model did also develop the ability to compute the next board state, given the initial state previous moves and predicted move (Thank you /u/ditchfieldcaleb).
The author's contribution:
> I find that actually, there's a linear representation of the board state! > This is evidence for the linear representation hypothesis: that models, in general, compute features and represent them linearly, as directions in space! (If they don't, mechanistic interpretability would be way harder)
Which is to say that the model's internal prediction of the next board state is fairly interpretable by humans: There's some square-ish set of activations in the model that correspond to the square-ish Othello board. That's another property of the model that is a reasonable outcome but isn't a foregone conclusion.
theotherquantumjim t1_jearyqw wrote
Reply to comment by TitusPullo4 in [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents
It absolutely is not.
step21 t1_jearoln wrote
It means he says it has a representation of its world, not just statistics. He may or may not be right. (Also I didn’t read all of it yet, fing long.
EquipmentStandard892 t1_jeaqt6u wrote
Reply to comment by JustOneAvailableName in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
I've already had that in mind, I've found some interesting paper talking about integrating LLMs in a specific way designed to handle autonomous task execution given an direct objective/goal. Combining this with this RNN approach seems to be the go to for increase the cognitive development of the whole system. Using the RNN as our subconscious would do and indexing this into a vector space capable of hybrid search, or something like SPLADE search engines, or even build a neural attention graph network to store the rules that aggregate the raw tokens into the vector space, could drastically improve the performance of small language models, maybe leading to further optimization beyond the token limit span.
Article about integrating memory and task/objectives using multiple LLM instances: https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/
qalis t1_jeaqs4u wrote
Airflow, Metaflow, Zen.ML, Kedro
Daveboi7 t1_jeapo15 wrote
Reply to comment by matterhayes in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Nice one
Evening_Ad6637 t1_jeapgrs wrote
Reply to comment by machineko in [D] Training a 65b LLaMA model by Business-Lead2679
That sounds very interesting. I'm sorry if this question is trivial or stupid, but I'm an absolute newcomer in this field. Is there a way to train the model as you describe it here (https://xturing.stochastic.ai/quickstart) with only or almost only CPU performance? It's about the fact that I have the following specifications i5 @3.5ghz, 16gb ddr4 ram and only a radeon pro 575 4gb graca. But since I saw how fast alpaca runs over my cpu and ram on my computer, I hope that I could also fine-tune a llama model with this equipment. I would be very grateful for more information regarding possibilities in this direction.
Clicketrie t1_jeapcmv wrote
Reply to "[D]" Is wandb.ai worth using? by frodo_mavinchotil
At least with tools like Comet and W&B you have authentication and you can avoid logging anything you don't want logged. Mlflow has no authentication.
saintshing t1_jeaowjz wrote
Reply to comment by A_Light_Spark in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
I almost missed it too. There are too many new results.
The most crazy thing is it is all done by one person when the big techs all work on transformer models.
[deleted] t1_jeanzis wrote
Reply to comment by fishybird in [D] Simple Questions Thread by AutoModerator
[removed]
MysteryInc152 t1_jeanjj9 wrote
Reply to comment by ChuckSeven in [D] Can large language models be applied to language translation? by matthkamis
>LLM can do translation but they are significantly worse than translation models trained on translation data.
This is not true at all lol. They're better by a wide margin.
MysteryInc152 t1_jeanb01 wrote
Reply to comment by ChuckSeven in [D] Can large language models be applied to language translation? by matthkamis
>LLM trained on a multi-lingual corpus can be prompted to translate but they are far inferior to actual translation models.
No lol. You would know this if you've ever actually tried to translate with GPT-4 and the like. They re far superior to current sota
https://github.com/ogkalu2/Human-parity-on-machine-translations
currentscurrents t1_jean0il wrote
Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Other researchers are working on an LLM for whales.
Looks feasible to me, whale calls are no more alien to the computer than English is. The hard part is collecting enough data.
r_linux_mod_isahoe t1_jeamasz wrote
Reply to comment by Crow-Scare in [D] Directed Graph-based Machine Learning Pipeline tool? by Driiper
Those guys are so committed they call their pipelines DAGs.
turfptax OP t1_jealzsn wrote
Reply to comment by StudentBrilliant3388 in [N] Predicting Finger Movement and Pressure with Machine Learning and Open Hardware Bracelet by turfptax
Thank you!
Everything we produce is open source and open hardware in the hopes that it helps people and makes the resources for the tech freely available to everyone.
I truly believe that machine learning can help humanity and that it will advance so many technologies in the years to come :D
WokeAssBaller t1_jealxm2 wrote
Reply to comment by lgastako in [D] The best way to train an LLM on company data by jaxolingo
Train one from scratch
Nobodyet94 t1_jealqfl wrote
Reply to comment by gmork_13 in [D] Simple Questions Thread by AutoModerator
Thanks, well I have a 1660 gtx and 16 gb of ram, and yes it has to be a transformer used for vision. The fact is that I am not creative enough to choose a project ahah.
SatoshiNotMe t1_jealb7d wrote
Reply to comment by matterhayes in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Is there a "nice" way to use this model, (say, via the command-line like in the GPT4All or alpaca.cpp repos), rather than in a databricks notebook or in HG spaces? For example I'd like to chat with it on my M1 MacBook Pro. Any pointers appreciated!
Adventurous_Win8348 t1_jeazn5c wrote
Reply to [D] Simple Questions Thread by AutoModerator
Hi I want to make a ml model that can listen to the sound of the road and tell that what cars are they like auto or lorry or bus and tell me how many vehicle passed though and give a real-time feedback. I don’t know how to code.