Recent comments in /f/MachineLearning
Roger_Cockfoster t1_jegy0u7 wrote
Reply to comment by LoaderD in [News] Twitter algorithm now open source by John-The-Bomb-2
In fairness, it doesn't really matter what you interact with. Twitter is just a sewer of alt-right hate speech for everyone.
KD_A OP t1_jegxas8 wrote
Reply to comment by PassingTumbleweed in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
Interesting, and I think I know what you mean. One naive idea is a "top-k tokens" system. This system considers the top k highest probability tokens (conditional on previous ones) for each completion token, and for each completion. And then take the sum of the average likelihoods across all k^n (n = # completion tokens) paths for each completion. That would be one way to address this synonym problem. But ofc it results in way more computation.
Edit: actually, thinking a bit more, I think the synonym problem is more-or-less a non-issue for LMs trained to do next-token prediction.
codingwoman_ t1_jegws5h wrote
Apparently there is an Elon feature as well as for Republicans and Democrats?
https://github.com/twitter/the-algorithm/blob/7f90d0ca342b928b479b512ec51ac2c3821f5922/home-mixer/server/src/main/scala/com/twitter/home_mixer/functional_component/decorator/HomeTweetTypePredicates.scala#L228
Simusid OP t1_jegwcb6 wrote
Reply to comment by IntelArtiGen in [discussion] Anybody Working with VITMAE? by Simusid
All good info, thanks for the tips. I think ML for audio lags far behind imagery and NLP. I'm particularly interested in transients and weak signals.
londons_explorer t1_jegw8tx wrote
Reply to comment by CommunismDoesntWork in [News] Twitter algorithm now open source by John-The-Bomb-2
The claims are plausible accidents from a technical perspective. It's very possible for a system which does blocklists to choke up on the longest Blocklist it has ever seen and fail to add new things to the list.
londons_explorer t1_jegw0va wrote
Reply to comment by Ulfgardleo in [News] Twitter algorithm now open source by John-The-Bomb-2
Parts of this code dump are for recommendations and ranking.
PassingTumbleweed t1_jegvhb5 wrote
Reply to comment by KD_A in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
What I was thinking is that some kind of hierarchical LLM taxonomy might be interesting, where you can re-jigger the conditional probability tree onto any arbitrary vocab of token sequences.
IntelArtiGen t1_jeguknc wrote
Reply to [discussion] Anybody Working with VITMAE? by Simusid
I've used autoencoders on spectrograms and in theory you don't need an A100 or 80M spectrograms to have some results.
I've not used ViTMAE specifically but I read similar papers. I'm not sure on how to interpret the value of the loss. You can use some tips which are valid for most of DL projects. Can your model overfit on a smaller version of your dataset (1000 spectrograms) ? If yes, perhaps your model isn't large / efficient enough to process your whole dataset (though bird songs shouldn't be that hard to learn imo). At least you could easily do more epochs faster with this method and debug some parameters. If your model can't overfit, you may have a problem in your pre/post processing.
Do ViTMAE models need normalized inputs? Spectrograms can have large values by default which may not be easy to process, they may be hard to normalize. Your input and your output should be in a coherent range of values and you should use the right layers in your model if you want that to happen. Also fp16 training can mess up with that.
ViTMAE isn't specifically for sounds right? I think there have been multiple attemps to use it for sounds, this paper (https://arxiv.org/pdf/2212.09058v1.pdf) cites other papers:
>Inspired by the success of the recent visual pre-training method MAE [He et al., 2022], MSM-MAE [Niizumi et al., 2022], MaskSpec [Chong et al., 2022], MAE-AST [Baade et al., 2022] and Audio-MAE [Xu et al., 2022] learn the audio representations following the Transformer-based encoder-decoder design and reconstruction pre-training task in MAE
You can try to see their results and how they made it work, these papers probably also published their code.
Be careful with how you process sounds, the pre/post processing is different than for images which may induce some problems.
Pas7alavista t1_jegu8de wrote
Reply to comment by mattsverstaps in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
The span describes the entire space. It's a set of vectors that you can combine using addition and multiplication in order to obtain any other vector in the space. For example a spanning set over the real number plane would be {(1,0), (0,1)}. This particular set is also an orthonormal basis and you can think of each vector as representing two orthogonal dimensions. This is because their dot product is 0.
However, any set of two vectors that are not on the same line will span the real number plane. For example, {(1,1), (0,1)} spans the real number plane, but they are not orthogonal.
Overall though it is always important to be aware of your input space, and the features/dimensions that you use to represent it. You can easily introduce bias or just noise in a number of ways if you aren't thorough. One example would be not normalizing your data.
turnip_burrito t1_jegu7uk wrote
Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
Yeah I made the simplification of random vectors myself just to approximate what uncorrelated "features" in an embedding space could be like.
One thing that's relevant for embedding space size Takens theorem: https://en.wikipedia.org/wiki/Takens%27s_theorem?wprov=sfla1
If you have an originally D dimensional system (measured using correlation or information dimension for example), and you time delay embed data from the system, you at most (can be lower) need 2*D+1 embedding dimensions to ensure no false nearest neighbors.
This sets an upper bound if you use time delays. Now, for a *non-*time delayed embedding, I don't know the answer. I asked GPT4 and it said no analytical method for determining embedding dimension M presently exists ahead of time. An experimental method does exist that you can perform before training a model: You need to grow the number of embedding dimensions M and calculate FNN every time M grows. Once FNN drops to near zero, then you've finally found a suitable M.
One neat part about all this is that if you have some complex D-dimensional manifold or distribution with features that "poke out" into different directions in the embedding space (imagine a wheel hub with spokes), then increasing the embedding space size M will also increase the distance between the spokes. If M gets large enough, all the spokes should be nearly equal in distance from each other, but points along a singular spoke are also far from each other in most directions except for just a small subset.
I don't think that making it super large would actually make learning on the data any easier though. Best to stick with close to the minimum embedding dimension M. If you get larger, then measurement noise in your data becomes more represented in the embedded distribution. These dynamics also unfold when you increase M, which means if you're trying to only predict the D-dimensional system, you'll have harder time because now you're predicting a (D+large#) dimensional system and the obviousness of the D-dimensional system distribution gets lost in the larger distribution.
MjrK t1_jegtjqj wrote
Reply to comment by lordofbitterdrinks in [News] Twitter algorithm now open source by John-The-Bomb-2
We don't and likely we won't know.
Unless perhaps someone internal checks and leaks important missing details that later on...
But for now, it does seem robust enough to be reflective of what they have probably been using up to some recent - but that's still just speculation
[deleted] t1_jegtjma wrote
[deleted]
mcilrain t1_jegtja1 wrote
Reply to comment by Long_Educational in [News] Twitter algorithm now open source by John-The-Bomb-2
Those are probably part of the advertisement system.
Everlier t1_jegtfkv wrote
Reply to comment by hapliniste in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
I indeed hope so as well, it looks very decent
Educational-Net303 t1_jegta0z wrote
Reply to comment by LoaderD in [News] Twitter algorithm now open source by John-The-Bomb-2
Get rid of the if statement and you just recreated Twitter's recommendation algorithm
LoaderD t1_jegsuar wrote
Reply to comment by ZestyData in [News] Twitter algorithm now open source by John-The-Bomb-2
> Here we have a world-class complex recommendation
...You know this is twitter's recommender system right? All the tweets I interact with are ML related from very 'left' people like Jeremy Howard.
My recommender system could legit be:
if interested_in_finance_or_ML:
recommend_alt_right_hate_speech_accounts()
recommend_crypto_scam_ads()
CommunismDoesntWork t1_jegsr5b wrote
Reply to comment by junkboxraider in [News] Twitter algorithm now open source by John-The-Bomb-2
As far as I know, there was never any evidence to back up those claims
KD_A OP t1_jegsqe6 wrote
Reply to comment by PassingTumbleweed in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
That's a good criticism. I'd guess that this issue is quite problem-dependent. And I'd hope that an LM is good enough to discriminate between the correct-but-many-synonyms class and the wrong-but-few-synonyms class. (We're using the word synonym, but we really mean "high probability token path given prompt".) It's hard for me to come up with examples where this problem arises in a real classification task. But they may be out there.
Simusid OP t1_jegspl8 wrote
Reply to comment by Art10001 in [discussion] Anybody Working with VITMAE? by Simusid
VITMAE isn't a generative model. The intent is to use unlabeled data to train the encoder. After that, the decoder is thrown away. Then (in theory) I would use a relatively small amount of labeled data and the encoder with a new head to do traditional supervised classification.
f10101 t1_jegslrv wrote
Reply to comment by Ulfgardleo in [News] Twitter algorithm now open source by John-The-Bomb-2
I wonder did they add that flag before or after the day when they accidentally made people see only Elon's tweets on their timeline: https://www.theverge.com/2023/2/13/23598514/twitter-algorithm-elon-musk-tweets
MjrK t1_jegsl6v wrote
Necessary-Meringue-1 t1_jegshy4 wrote
It's a pretty cool resource to get to look at an enterprise recommendation algorithm like that.
​
An aside, if you want a chuckle, search the term "Elon" in the repo:https://github.com/twitter/the-algorithm/search?q=elonhttps://github.com/twitter/the-algorithm/search?q=elon&type=issues
​
[edit 1]
since it's gone now, here's the back up provided by u/MjrK:https://i.imgur.com/jxqaByA.png
[edit 2] lol
https://github.com/twitter/the-algorithm/commit/ec83d01dcaebf369444d75ed04b3625a0a645eb9#diff-a58270fa1b8b745cd0bd311bed9cd24c983de80f96e7bd445e16e88b61e492b8L225
Competitive-Song-539 t1_jegs9qz wrote
Reply to comment by lacker in [D][N] LAION Launches Petition to Establish an International Publicly Funded Supercomputing Facility for Open Source Large-scale AI Research and its Safety by stringShuffle
Yeah good point
ninjasaid13 t1_jegy25s wrote
Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
Everybody rereleasing llama models with a different name and license.