MysteryInc152

MysteryInc152 t1_j6okowf wrote

Not sure what you mean by penalize but say you wanted an LLM that wasn't instruction fine-tuned to translate between 2 languages it knows.

Your input would be

Language x: "text of language x"

Language y: "translated language x text"

You'd do this for a few examples. 2 or 3 should be good. Or even one depending on the task. Then finally

Language x: "text you want translated"

Language y: The model would translate the text and output here

All transformer generative LLMs work the same way with enough scale. GPT-2 (only 1.5b parameters) does not have the necessary scale.

1

MysteryInc152 t1_j5uvo3i wrote

Nothing that would beat Open AI's stuff (Google's stuff) is open for inference or finetuning from the public.

I think the best Open source alternative is this

https://github.com/THUDM/GLM-130B

https://huggingface.co/spaces/THUDM/GLM-130B

But it's not finetuned for instruction so you have to prompt/approach it like a text completer. And also you'll need a 4x3090 to get it running locally.

The best open source instruction finetuned models are the flan t5 models

https://huggingface.co/google/flan-t5-xxl

If you're not necessarily looking for open source but still actual alternatives that aren't just an API wraparound of GPT, you can try cohere

https://cohere.ai/pricing

Good thing is that it's completely free for non commercial or non production use

or alephalpha

https://app.aleph-alpha.com/

Not free but the pricing is decent and they have a visual language model as well. Something like flamingo

https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model

6

MysteryInc152 t1_j4l8fwz wrote

Yeah well, that's not really how these models work. There's no pulling from a database and there's no external searching. The model was trained and frozen.

While it is possible to have the model access some external database in the future, yeah...that's not going to happen in relation to previous chat entries you have no right or access to. That's a privacy can of worms no corporation with any sense will get into as well as being prohibitively expensive for no real gain at all.

2