gmork_13 t1_jecj9vo wrote on March 31, 2023 at 12:38 AM

Reply to comment by wind_dude in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

It'll be filled with copies of people attempting weird jailbreaks haha

gmork_13 t1_jec8rzf wrote on March 30, 2023 at 11:21 PM

Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

nice.

quick question: is ChatGPT assumed to be 100% on that chart, or has it been rated to be 100% without knowing it is rating itself? I'm assuming ChatGPT is GPT-4.

gmork_13 t1_jebwta4 wrote on March 30, 2023 at 9:57 PM

Reply to comment by Nobodyet94 in [D] Simple Questions Thread by AutoModerator

Just pick the one that doesn't require too much compute (don't go for too high res images) and make sure you can find tutorials or guides for it.

gmork_13 t1_je9e6wu wrote on March 30, 2023 at 11:27 AM

Reply to comment by CasulaScience in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

I'm wondering the same thing.
In the LoRA paper they had some pros vs cons on other adapters (where LoRA won out). Though you technically could do both, you'd probably pick one.

Indeed, this adapter wins out vs LoRA when looking at weight size, but since we're talking about MB it's an almost negligible difference (in this scenario). It's a shame they didn't include LoRA training time in their comparison.

They say 1hr on 8*A100, whereas the alpaca-LoRA github says 4-5 hrs on 1*4090.
8*A100's is 640GB VRAM (assuming 80GB model) as opposed to the 4090's 24GB - there are also differences in speed and the fact that the alpaca-LoRA github may have run the inference on an 8bit quantized model.

Since the adapter paper says nothing about quantization, I'm assuming it's 640GB VRAM used for the full fp32 7B model for one hour (or fp16?), compared to the alpaca-LoRA git which runs 24GB VRAM on 8int 7B model for 4.5 hrs.

They both train on the stanford dataset, but alpaca-LoRA-git trains for 3 epochs on the cleaned dataset whereas llama-adapter trains on the full dataset for 5 epochs.
That's a lot of small differences to account for if you're trying to figure out what's faster.
It can be done, but the question remains whether the end result is comparable and whether it was trained to an optimal point.

Since the authors trained alpaca-LoRA, why didn't they write how long alpaca-LoRA took in their comparison table? They trained on the same hardware and dataset, I assume.

If the only difference between this adapter and others is, as they mention in the paper, the gating, zero init and multi-modality then the downsides mentioned in the LoRA paper might still hold (bottlenecks). I'm no expert though.

gmork_13 t1_je7hec6 wrote on March 29, 2023 at 11:33 PM

Reply to comment by russell616 in [D] Simple Questions Thread by AutoModerator

What are you interested in?
I'd recommend covering some classification and generation using images and text, with several different models and data sets.

gmork_13 t1_je7h3rc wrote on March 29, 2023 at 11:31 PM

Reply to comment by Various_Ad7388 in [D] Simple Questions Thread by AutoModerator

Having started with TF and moved to torch myself, torch was just easier to work with when doing something a bit out of the ordinary. Since then it has gained in popularity and with popularity comes lots of walkthroughs, documentation, videos guides and research papers with github repos.

gmork_13 t1_je7gvde wrote on March 29, 2023 at 11:29 PM

Reply to comment by Various_Ad7388 in [D] Simple Questions Thread by AutoModerator

Definitely start with torch. It works all the way up, just start building more complex things.

gmork_13 t1_je7gluu wrote on March 29, 2023 at 11:27 PM

Reply to comment by ReasonablyBadass in [D] Simple Questions Thread by AutoModerator

And not using RNNs haha

gmork_13 t1_je7fmm8 wrote on March 29, 2023 at 11:20 PM

Reply to comment by james_mclellan in [D] Simple Questions Thread by AutoModerator

I'm assuming you don't mean missing values in your dataset.

You can create 'missing' data, but if you create the missing data out of the data you already give to the model you're sort of doing the work for it. For compute efficient reasons you might want to avoid giving it 'unnecessary' data. What is unnecessary can be hard to define. Think about what you want the model to grasp in the first place.
I'm not sure what you mean by performing a test. If you were to train a language model the context of the word would define its meaning. You can always take the output probs of a model and do something with that if you'd like (for instance, if it's lots of low probability alternatives - do something).

gmork_13 t1_je7eo4s wrote on March 29, 2023 at 11:12 PM

Reply to comment by Nobodyet94 in [D] Simple Questions Thread by AutoModerator

Does it have to be a transformer?
Have a look at this model, but it's difficult to answer your question without knowing the compute you have access to: https://paperswithcode.com/method/deit

Browse that site for some alternatives.

gmork_13 t1_je7dho0 wrote on March 29, 2023 at 11:03 PM

Reply to [D] Training a 65b LLaMA model by Business-Lead2679

For a more stable compute, check out google cloud gpu.

Consider training a quantized model with LoRA. If you know enough, perhaps the model could be split between VRAM and DDR RAM to make it train on a smaller GPU.

edit: here, I found one: https://github.com/tloen/alpaca-lora

I think you could get this done for far less than your budget if need be.

gmork_13 t1_jdlpq90 wrote on March 25, 2023 at 9:31 AM

Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

Sometimes I feel like a toddler for doing it, but I always scroll to the images first and for most papers that’s the TLDR.

gmork_13 t1_jc6e3ox wrote on March 14, 2023 at 11:35 AM

Reply to [D] ChatGPT without text limits. by spiritus_dei

The way I was going to implement it with the chatgpt API was to store the conversation and have the model itself extract keywords of the conversation so far as it neared the token limit.

Then you can inject the keywords and search the previous conversation.

But this is still nothing like truly extending the actual memory of the model.

gmork_13 t1_jbbj49n wrote on March 7, 2023 at 9:19 PM

Reply to [D] Neat project that would "fit" onto a 4090? by lifesthateasy

With fp16/int8 you can probably stick a couple of LLMs of smaller size onto that card.
Have a look around, with fp32 it's about 1B params per 4GB of VRAM. Halve it for fp16 and again for int8 (very roughly).