Viewing a single comment thread. View all comments

light24bulbs t1_jdks13d wrote

Question: i notice there's a focus here on fine tuning for instruction following, which is clearly different from the main training where the LLM just reads stuff and tries to predict the next word.

Is there any easy way to continue that bulk part of the training with some additional data? Everyone seems to be trying to get there with injecting embedding chunk text into prompts (my team included) but that approach just stinks for a lot of uses.

8

elbiot t1_jdlgxnz wrote

In my understanding, if you have text, it's not a challenge to train on next word prediction. Just keep the learning rate low. The reason there's a focus on the instruction based fine tuning is because that data is harder to come by.

My only experience is I've done this with a sentence embedding model (using sbert) and I just trained on my new text and the original training data 50/50 and it both got better at embedding my text and didn't forget how to do what it was originally trained on

5

light24bulbs t1_jdlrnll wrote

That's cool, that's exactly what I want to do. I'm hunting around for a ready-made pipeline to do that on top of a good open source model.

3

visarga t1_jdloh24 wrote

Since RLHF finetuning is short, you can continue training your original model and RLHF again.

2

baffo32 t1_jdnppmp wrote

this is the same task as instruction tuning. instruction tuning just uses specific datasets where instructions are followed. it‘s called “finetuning” but nowadays people are using adapters and peft to do this on low end systems.

1

light24bulbs t1_jdntdbb wrote

I'm not hoping to do instruction tuning, i want to do additional pre-training.

1

baffo32 t1_jdo24su wrote

It is the same thing. The alpaca data is just further pretraining data consisting of instructions and responses. Doing this is called finetuning.

1

baffo32 t1_jdrhj77 wrote

I was still confused as to your response, and I’m thinking that if you wanted a model to behave like you had given different pretraining data, you would probably first finetune on the different bulk data, and then after this finetune on the target task such as instruction following.

Instruction following is indeed of course just predicting the next word: on data where the next word is obedient to instructions preceding it.

1

light24bulbs t1_jdrm9kh wrote

That's the part I wasn't getting. I assumed the fine tuning involved a different process. I see now that it is fact just more training data, often templated into a document in such a way that it's framed clearly for the LLM.

The confusing thing is that most of the LLM-as-a-service companies, Open-AI included, will ONLY take data in the question answer format, as if that's the only data you'd want to use to fine tune.

What if i want to feed a book in so we can talk about the book? A set of legal documents? Documentation of my project? Transcriptions of TV shows?

There are so many use cases for training on top of an already pre-trained LLM that aren't just question answering.

I'm into training llama now. I simply took some training code i found, removed the JSON parsing question answer templating stuff, and done.

1

nemorocksharder t1_jdz8kt5 wrote

What you're describing is exactly what I have been looking to do too, and am really surprised I'm not hearing more about it. Have you found any useful approaches to essentially adding to the LLM's Corpus with target material/text? or anyone else trying to do this?

1