Recent comments in /f/MachineLearning

Business-Lead2679 OP t1_jecfagu wrote

The main point of these open-source 10b models is to make them fit on an average consumer hardware, while still providing great performance, even offline. A 100b model is hard to train because of it's size, and even harder to maintain on a server that is powerful enough to handle multiple requests at the same time, while providing good response generation speed. Not to mention how expensive this can be to run. When it comes to 1b models, they usually do not achieve a good performance, as they do not have enough data. Some models with this size are good, yes, but a 10b model is usually significantly better, if trained correctly, and can still fit on a consumer hardware.

12

KerfuffleV2 t1_jecbxy7 wrote

It's based on Llama, so basically the same problem as anything based on Llama. From the repo "We plan to release the model weights by providing a version of delta weights that build on the original LLaMA weights, but we are still figuring out a proper way to do so." edit: Nevermind.

You will still probably need a way to get a hold of the original Llama weights (which isn't the hardest thing...)

−5

lazybottle t1_jec8i0c wrote

Alpaca is not Apache 2.0

https://huggingface.co/datasets/tatsu-lab/alpaca#licensing-information

> The dataset is available under the Creative Commons NonCommercial (CC BY-NC 4.0).

Edit: I see the source of confusion. https://github.com/tatsu-lab/stanford_alpaca

While the code is released under apache 2.0, the instruct dataset as pointed out by OP is not. One could potentially repro the steps, possibly with human ground truth, and release under a more amenable data license.

1