Business-Lead2679 OP t1_jecfagu wrote on March 31, 2023 at 12:09 AM

Reply to comment by Rei1003 in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

The main point of these open-source 10b models is to make them fit on an average consumer hardware, while still providing great performance, even offline. A 100b model is hard to train because of it's size, and even harder to maintain on a server that is powerful enough to handle multiple requests at the same time, while providing good response generation speed. Not to mention how expensive this can be to run. When it comes to 1b models, they usually do not achieve a good performance, as they do not have enough data. Some models with this size are good, yes, but a 10b model is usually significantly better, if trained correctly, and can still fit on a consumer hardware.

nullbyte420 t1_jecf0se wrote on March 31, 2023 at 12:07 AM

Reply to comment by waxroy-finerayfool in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad

I love it when software people spontaneously discover 20th century French philosophy. Check out Saussure and Baudrillard in particular who wrote a lot on literally this some 60 years ago

UseNew5079 t1_jecefwx wrote on March 31, 2023 at 12:03 AM

Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

Excellent quality responses from this model. This can be actually usable.

[deleted] t1_jece89n wrote on March 31, 2023 at 12:01 AM

Reply to comment by WokeAssBaller in [D] The best way to train an LLM on company data by jaxolingo

[removed]

Rei1003 t1_jecdo3h wrote on March 30, 2023 at 11:57 PM

Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

What's the point of these 10b models? I think now it seems more reasonable to work with 100b models (api) or 1b models.

Koda_20 t1_jecddbs wrote on March 30, 2023 at 11:55 PM

Reply to comment by currentscurrents in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

I feel like they are starting with whales because it generates more publicity because Nemo lol

They are probably not but I thought it was funny

grotundeek_apocolyps t1_jecdbtm wrote on March 30, 2023 at 11:54 PM

Reply to [D] AI Explainability and Alignment through Natural Language Internal Interfaces by jackfaker

"AI alignment" / "AI safety" are not credible fields of study.

wind_dude t1_jecct1i wrote on March 30, 2023 at 11:51 PM

Reply to comment by KerfuffleV2 in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

ahh, sorry, referring to the dataset pulled from shareGPT that was used for finetuning. Which shareGPT has disappeared since the media hype about google using it for BARD.

Yes, the llama weights are everywhere, including HF in converted form for hf transformers.

WokeAssBaller t1_jecc92g wrote on March 30, 2023 at 11:47 PM

Reply to comment by lgastako in [D] The best way to train an LLM on company data by jaxolingo

What a waste of time

MasterEpictetus t1_jecc69k wrote on March 30, 2023 at 11:46 PM

Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

Why am I only hearing about this now? It sounds amazing!

sebzim4500 t1_jecbyml wrote on March 30, 2023 at 11:44 PM

Reply to comment by IntrepidTieKnot in [R] TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs - Yaobo Liang et al Microsoft 2023 by Singularian2501

I think the 'feedback to api developers' idea is novel and useful.

KerfuffleV2 t1_jecbxy7 wrote on March 30, 2023 at 11:44 PM

Reply to comment by wind_dude in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

It's based on Llama, so basically the same problem as anything based on Llama. From the repo "We plan to release the model weights by providing a version of delta weights that build on the original LLaMA weights, but we are still figuring out a proper way to do so." edit: Nevermind.

You will still probably need a way to get a hold of the original Llama weights (which isn't the hardest thing...)

wind_dude t1_jecbli5 wrote on March 30, 2023 at 11:42 PM

Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

What are the concerns with the release of the [shareGPT] dataset? I really hope it does get released, since it looks like shareGPT has shutdown api access, and even web access.

roselan t1_jecbcr4 wrote on March 30, 2023 at 11:40 PM

Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

Results from the demo are amazingly good for a 13b model. I'm floored!

I wonder how much memory the demo needs to run.

lgastako t1_jecb96v wrote on March 30, 2023 at 11:39 PM

Reply to comment by WokeAssBaller in [D] The best way to train an LLM on company data by jaxolingo

Ok, so can you point me to an example of it working well?

wind_dude t1_jec9lb4 wrote on March 30, 2023 at 11:27 PM

Reply to comment by lazybottle in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

Interesting I didn't realise the dataset was on HF with a different license. The dataset (https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json) is also in the code repo which has the apache 2.0 license, so the dataset would be covered by it.

gmork_13 t1_jec8rzf wrote on March 30, 2023 at 11:21 PM

Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

nice.

quick question: is ChatGPT assumed to be 100% on that chart, or has it been rated to be 100% without knowing it is rating itself? I'm assuming ChatGPT is GPT-4.

lazybottle t1_jec8i0c wrote on March 30, 2023 at 11:19 PM

Reply to comment by wind_dude in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

Alpaca is not Apache 2.0

https://huggingface.co/datasets/tatsu-lab/alpaca#licensing-information

> The dataset is available under the Creative Commons NonCommercial (CC BY-NC 4.0).

Edit: I see the source of confusion. https://github.com/tatsu-lab/stanford_alpaca

While the code is released under apache 2.0, the instruct dataset as pointed out by OP is not. One could potentially repro the steps, possibly with human ground truth, and release under a more amenable data license.

EvilMegaDroid t1_jec89t6 wrote on March 30, 2023 at 11:18 PM

Reply to comment by Zealousideal-Ice9957 in [D] FOMO on the rapid pace of LLMs by 00001746

That would be insane (I mean as noted, was not impossible given that people have come together to improve things such as big open source projets like linux, mpv etc).

I checked it out for a while but got confused, is everyone supposed to access the data because i could not.

CatalyzeX_code_bot t1_jec3evn wrote on March 30, 2023 at 10:43 PM

Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

Found relevant code at https://github.com/dmlc/mxnet-memonger + all code implementations here

--

To opt out from receiving code links, DM me

andreichiffa t1_jec26vk wrote on March 30, 2023 at 10:34 PM

Reply to comment by Jadien in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad

Which is basically the self-attention mechanism + universal approximators nature of NNs. So I am not sure what that proves or what is new about it.

gmork_13 t1_jebwta4 wrote on March 30, 2023 at 9:57 PM

Reply to comment by Nobodyet94 in [D] Simple Questions Thread by AutoModerator

Just pick the one that doesn't require too much compute (don't go for too high res images) and make sure you can find tutorials or guides for it.

danielbln t1_jebwikt wrote on March 30, 2023 at 9:55 PM

Reply to comment by IntrepidTieKnot in [R] TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs - Yaobo Liang et al Microsoft 2023 by Singularian2501

Yeah, sounds exactly like chain of thought reasoning with tools, so langchain and also ChatGPT plugins.

j_lyf t1_jebvmnp wrote on March 30, 2023 at 9:49 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

Where do embeddings come into all of this?

thecity2 t1_jebvmmo wrote on March 30, 2023 at 9:49 PM

Reply to [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

Lol I just figured out why it’s called LLaMA. Guess I have some catching up to do. 🫤😆

Recent comments in /f/MachineLearning