Viewing a single comment thread. View all comments

Scarlet_pot2 t1_je92iud wrote

Most of this is precise and correct, but it seems like you say a transformer architecture is the GPUs? The transformer architecture is the neural network and how it is structured. It's code. The paper "attention is all you need" describes how the transformer arch. is made

After you have the transformer written out, you train it on GPUs using data you gathered. Free large datasets such as "the pile" by eluther.ai can be used to train on. This part is automatic.

the Human involved part is the data gathering, data cleaning, designing the architecture before the training. then after humans do finetuning / RLHF (reinforcement learning though human feedback).

those are the 6 steps. Making an AI model can seem hard and like magic, but it can be broken down into manageable steps. its doable, especially if you have a group of people who specialize in the different steps. maybe you have someone who's good with the data aspects, someone good at writing the architecture, some good with finetuning, and some people to do RLHF.

2