Recent comments in /f/MachineLearning

aliasaria t1_jefj33h wrote

It's a very different way to finetune a model efficiently.

All these tools try to nudge an existing large model, without having to nudge all the weights.

A simplistic explanation of LoRA is that LoRA looks at the whole pretrained model and tries to identify only the most influential weights, and nudge those only.

This tool, instead, adds weights to the model (at the start of prompts) in addition to the existing model.

One advantage to LoRA, in this case, is that you can merge your LoRA finetuned weights into the original model and the result is a new model that is exactly the same size and shape as the original model. In the technique in this paper, however, the final model is a different shape from the original model. But the concept is sort of simpler.

2

aliasaria t1_jefih93 wrote

A short answer is that it is "just different". It's another way to tweak an existing LLM to do another task, without having to finetune the whole system. Conceptually, this way is simpler than LoRA and seems to work as well or better.

In the paper, the authors mention that one advantage is that you can use this technique to add new modalities. The whole method works by adding to the prompt at the top most layer(s), so you can add not just words, you could add tokens that come from an image. They have an example on the top of page 4 with a picture of a baby opening a door.

2

mejdounarodni t1_jeff83b wrote

Hey, I don't know how relevant this is, but is there any voice cloning tools for other important languages aside from English? Such as Spanish, Russian, Mandarin Chinese... Thus far I have only found it for English and I think French. I have seen some sites claiming they work for other languages since arguably you type in the text in any language you want... only the phonemes used to recreate what you have written are those of the English language so it's a bit absurd, really. Any tips would be appreciated.

1

farleyknight t1_jef8a4v wrote

I had the exact same question! Just found on the GitHub page

> We plan to release the model weights by providing a version of delta weights that build on the original LLaMA weights, but we are still figuring out a proper way to do so. In this example, we demonstrate the usage of our distributed serving system using OPT models. Later, you can apply similar commands to serve Vicuna, just as shown in our demo.

5

Scew t1_jef6kiu wrote

Lol they opened up their AI to be trained for free for "research purposes..." Sounds similar to how certain corporations greatly profited from recent current events over the passed couple of years... Wonder if they'll even go as far as calling people some kind of hero for helping them make a bigger profit >.>

4

Scew t1_jef616t wrote

If a corporation were trying to depreciate an open-source alternative to one of their projects, it might look like spreading negative propaganda about the open-source alternative or highlighting the perceived weaknesses of the alternative. For example:

FUD: The corporation may spread Fear, Uncertainty, and Doubt (FUD) about the open-source alternative, such as by suggesting that it is not secure, reliable, or compatible with other systems.

Highlighting perceived weaknesses: The corporation may highlight perceived weaknesses of the open-source alternative, such as by emphasizing areas where it falls short compared to the corporation's proprietary solution.

Undermining community support: The corporation may attempt to undermine community support for the open-source alternative by spreading misinformation about the project's development or suggesting that it lacks the necessary resources to succeed.

Offering alternative solutions: The corporation may offer alternative solutions that they claim are superior to the open-source alternative, such as by highlighting their own proprietary products or services.

Funding competitors: The corporation may fund competitors who are developing similar solutions to the open-source alternative, with the intention of creating negative publicity or drawing attention away from the alternative.

These tactics can be effective in diminishing support for the open-source alternative, but they can also be perceived as unethical and manipulative, potentially damaging the corporation's reputation and relationship with the open-source community.

2

nbviewerbot t1_jef48jq wrote

I see you've posted a GitHub link to a Jupyter Notebook! GitHub doesn't render large Jupyter Notebooks, so just in case, here is an nbviewer link to the notebook:

https://nbviewer.jupyter.org/url/github.com/kddubey/cappr/blob/main/demos/copa.ipynb

Want to run the code yourself? Here is a binder link to start your own Jupyter server and try it out!

https://mybinder.org/v2/gh/kddubey/cappr/main?filepath=demos%2Fcopa.ipynb


^(I am a bot.) ^(Feedback) ^(|) ^(GitHub) ^(|) ^(Author)

10

LinuxSpinach t1_jeexz48 wrote

Only the code for initializing and training the model has been released under GPL... which leaves a substantial gap toward having anything useful. You would still have to replicate all of the training to produce weights that you can use commercially, which is a bridge too far for most individuals and small businesses.

2

AllowFreeSpeech t1_jeevp3b wrote

What bothers me is that most researchers don't care to use any model compression or efficiency techniques. They want others to pay for their architectural inefficiencies. IMO such funding could be a bad idea if it were to stop competition of neural architectures, and a good idea otherwise.

For example, is matrix-matrix multiplication necessary or can matrix-vector multiplication do the job? Similarly, are dense networks necessary or can sparse networks do the job? Alternatively the funding can go toward the engineering of optical and analog hardware that is significantly more power efficient.

3