master3243

master3243 t1_iz2f181 wrote

Interesting read, I'm always interested in research about alternatives to backprop.

One important paragraph (for the curious, that won't read the paper):

> The forward-forward algorithm is somewhat slower than backpropagation and does does not generalize quite as well on several of the toy problems investigated in this paper so it is unlikely to replace backpropagation for applications where power is not an issue. The exciting exploration of the abilities of very large models trained on very large datasets will continue to use backpropagation.

> The two areas in which the forward-forward algorithm may be superior to backpropagation are as a model of learning in cortex and as a way of making use of very low-power analog hardware without resorting to reinforcement learning (Jabri and Flower, 1992).

44

master3243 t1_iyxcdtw wrote

It's very easy to point and criticize but what exactly do you propose is done in this type of situation?

Ban the authors because they acknowledged and rectified their error? Good job you just guaranteed that no author will ever speak up about any mistakes they legitimately made.

Not to mention that their updated results are still a massive improvement.

18

master3243 t1_iw1x35r wrote

> They theoretically show that, different from naive identity mapping, their initialization methods can avoid training degeneracy when the network dimension increases. In addition, they empirically show that they can achieve better performance than random initializations on image classification tasks, such as CIFAR-10 and ImageNet. They also show some nice properties of the model trained by their initialization methods, such as low-rank and sparse solutions.

1

master3243 t1_iw1h2h7 wrote

> potentially removes a lot of random variance from the process of training

You don't need the results of this paper for that.

One of my teams had a pipeline where every single script would initialize the seed of all random number generators (numpy, torch, pythons radom) to 42.

This essentially removed non-machine-precision stochasticity between different training iterations with the same inputs.

5

master3243 t1_iuzrd2n wrote

> In the US AI created art can't be covered by copyright

What? Literally the answer was one google search away

Kashtanova obtained a US copyright on the art compiled into 18-pages which was created by Midjourney

Sources:

Artist receives first known US copyright registration for latent diffusion AI art

A New York Artist Claims to Have Set a Precedent by Copyrighting Their A.I.-Assisted Comic Book. But the Law May Not Agree

5

master3243 t1_iuz0d6h wrote

Do they implicitly mean DALLE 2 or do they actually mean 1?

I can't tell anymore and I feel it's definitely possible they try to push the generic name "DALL-E" to refer to their newest model.

I still sometimes jokingly refer to it as "unCLIP" as that is what they called their model in the original paper.

70

master3243 t1_iutotkj wrote

It's not just about learning different categories.

Imagine you're trying to study a social network of people, take twitter users for example, the individual nodes will probably be the users and the data associated with them (past tweets, bio, etc) while the edges would be the connection between users that you care about (e.g. A follows B, or A tweeted at B, or A retweeted post by B, etc.) and you can see how each of those connections carries information other than just a binary yes or no (e.g. When did A follow B? How many previous tweets did A see of B? How many followers did B have at the time? How many tweets did B have at that time?)

You can see how an individual edge can carry an extremely rich feature vector between nodes A and B where those features are separate from the features belonging to either node A and B themselves. Thus, it's possible that a binary adjacency matrix would not be enough to capture the intrinsic properties of that system.

2

master3243 t1_itwsc56 wrote

It might also be informative to know what were the details of the communication? No matter what, it's wrong and I believe the paper should be rejected. But the repercussions for the reviewer might depend on the intentions which should be inferable from the details of the communication.

5

master3243 t1_iss7aiv wrote

I remember trying to publish a theory paper (statistical learning theory) in ICML and got criticized by two reviewers that complained the paper had no experimental justification (despite being pure information theoretic lower bound of any learnt algorithm which was impossible to justify experimentally??) and my professor and I doubt they understood what was happening.

The third reviewer was extremely knowledgeable in this area and we truly appreciated their comments which definitely helped better the paper.

20

master3243 t1_iscb9mu wrote

My link also says that heavier objects can fall slower than light objects. As in the styrofoam board that was heavier than the small ball yet it fell slower.

In the absence of more detail such as the dynamics of the shapes and the inclusion of air drag or not, it is fair to say that the most correct answer to the "which" question is "both". I would only count the "heavy first" answer as correct IF it included the discussion on air drag, otherwise the correct answer is "both". But that's my opinion and not objectively the only way to interpret this.

Especially given a model that has so many physics articles/material included in it's dataset, it's a pretty big fail that it can't answer this properly.

1

master3243 t1_irq9k6g wrote

That's just how math is done in research. If you don't like that then you'll hate, even more, pure math papers where they start with a theorem then show steps to end up with a true statement that is the theorem.

The intuition behind how the author came up with that path of thinking to come up with the final theorem is left (justifiably) entirely in the authors scratch paper or notebooks.

Some authors do give out insight onto the steps they took or their general intuition which is always nice, but not a requirement.

It's also worth mentioning that a lot of us like doing research but don't like writing research papers (as that is just a necessity due to humans lacking telepathic communication) so giving out more info is an optional step in a disliked process which makes sense why it's skipped.

5