Recent comments in /f/MachineLearning
nullbyte420 t1_jecf0se wrote
Reply to comment by waxroy-finerayfool in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
I love it when software people spontaneously discover 20th century French philosophy. Check out Saussure and Baudrillard in particular who wrote a lot on literally this some 60 years ago
UseNew5079 t1_jecefwx wrote
Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
Excellent quality responses from this model. This can be actually usable.
[deleted] t1_jece89n wrote
Reply to comment by WokeAssBaller in [D] The best way to train an LLM on company data by jaxolingo
[removed]
Rei1003 t1_jecdo3h wrote
Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
What's the point of these 10b models? I think now it seems more reasonable to work with 100b models (api) or 1b models.
Koda_20 t1_jecddbs wrote
Reply to comment by currentscurrents in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
I feel like they are starting with whales because it generates more publicity because Nemo lol
They are probably not but I thought it was funny
grotundeek_apocolyps t1_jecdbtm wrote
Reply to [D] AI Explainability and Alignment through Natural Language Internal Interfaces by jackfaker
"AI alignment" / "AI safety" are not credible fields of study.
wind_dude t1_jecct1i wrote
Reply to comment by KerfuffleV2 in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
ahh, sorry, referring to the dataset pulled from shareGPT that was used for finetuning. Which shareGPT has disappeared since the media hype about google using it for BARD.
​
Yes, the llama weights are everywhere, including HF in converted form for hf transformers.
WokeAssBaller t1_jecc92g wrote
Reply to comment by lgastako in [D] The best way to train an LLM on company data by jaxolingo
What a waste of time
MasterEpictetus t1_jecc69k wrote
Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Why am I only hearing about this now? It sounds amazing!
sebzim4500 t1_jecbyml wrote
Reply to comment by IntrepidTieKnot in [R] TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs - Yaobo Liang et al Microsoft 2023 by Singularian2501
I think the 'feedback to api developers' idea is novel and useful.
KerfuffleV2 t1_jecbxy7 wrote
Reply to comment by wind_dude in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
It's based on Llama, so basically the same problem as anything based on Llama. From the repo "We plan to release the model weights by providing a version of delta weights that build on the original LLaMA weights, but we are still figuring out a proper way to do so." edit: Nevermind.
You will still probably need a way to get a hold of the original Llama weights (which isn't the hardest thing...)
wind_dude t1_jecbli5 wrote
Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
What are the concerns with the release of the [shareGPT] dataset? I really hope it does get released, since it looks like shareGPT has shutdown api access, and even web access.
roselan t1_jecbcr4 wrote
Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
Results from the demo are amazingly good for a 13b model. I'm floored!
I wonder how much memory the demo needs to run.
lgastako t1_jecb96v wrote
Reply to comment by WokeAssBaller in [D] The best way to train an LLM on company data by jaxolingo
Ok, so can you point me to an example of it working well?
wind_dude t1_jec9lb4 wrote
Reply to comment by lazybottle in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed
Interesting I didn't realise the dataset was on HF with a different license. The dataset (https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json) is also in the code repo which has the apache 2.0 license, so the dataset would be covered by it.
gmork_13 t1_jec8rzf wrote
Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
nice.
quick question: is ChatGPT assumed to be 100% on that chart, or has it been rated to be 100% without knowing it is rating itself? I'm assuming ChatGPT is GPT-4.
lazybottle t1_jec8i0c wrote
Reply to comment by wind_dude in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed
Alpaca is not Apache 2.0
https://huggingface.co/datasets/tatsu-lab/alpaca#licensing-information
> The dataset is available under the Creative Commons NonCommercial (CC BY-NC 4.0).
Edit: I see the source of confusion. https://github.com/tatsu-lab/stanford_alpaca
While the code is released under apache 2.0, the instruct dataset as pointed out by OP is not. One could potentially repro the steps, possibly with human ground truth, and release under a more amenable data license.
EvilMegaDroid t1_jec89t6 wrote
Reply to comment by Zealousideal-Ice9957 in [D] FOMO on the rapid pace of LLMs by 00001746
That would be insane (I mean as noted, was not impossible given that people have come together to improve things such as big open source projets like linux, mpv etc).
I checked it out for a while but got confused, is everyone supposed to access the data because i could not.
CatalyzeX_code_bot t1_jec3evn wrote
Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
Found relevant code at https://github.com/dmlc/mxnet-memonger + all code implementations here
--
To opt out from receiving code links, DM me
andreichiffa t1_jec26vk wrote
Reply to comment by Jadien in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
Which is basically the self-attention mechanism + universal approximators nature of NNs. So I am not sure what that proves or what is new about it.
gmork_13 t1_jebwta4 wrote
Reply to comment by Nobodyet94 in [D] Simple Questions Thread by AutoModerator
Just pick the one that doesn't require too much compute (don't go for too high res images) and make sure you can find tutorials or guides for it.
danielbln t1_jebwikt wrote
Reply to comment by IntrepidTieKnot in [R] TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs - Yaobo Liang et al Microsoft 2023 by Singularian2501
Yeah, sounds exactly like chain of thought reasoning with tools, so langchain and also ChatGPT plugins.
j_lyf t1_jebvmnp wrote
Where do embeddings come into all of this?
thecity2 t1_jebvmmo wrote
Reply to [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Lol I just figured out why it’s called LLaMA. Guess I have some catching up to do. 🫤😆
Business-Lead2679 OP t1_jecfagu wrote
Reply to comment by Rei1003 in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
The main point of these open-source 10b models is to make them fit on an average consumer hardware, while still providing great performance, even offline. A 100b model is hard to train because of it's size, and even harder to maintain on a server that is powerful enough to handle multiple requests at the same time, while providing good response generation speed. Not to mention how expensive this can be to run. When it comes to 1b models, they usually do not achieve a good performance, as they do not have enough data. Some models with this size are good, yes, but a 10b model is usually significantly better, if trained correctly, and can still fit on a consumer hardware.