SatoshiNotMe
SatoshiNotMe t1_jeakml0 wrote
Reply to comment by matterhayes in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
thanks!
SatoshiNotMe t1_jdtemml wrote
So if the notebook is tuning on a fixed dataset, anyone running it will arrive at the same weights after an expensive compute, which seems wasteful. Why not just share the weights, I.e the final trained + tuned model ? Or is that already available?
SatoshiNotMe t1_jdpgrat wrote
Reply to comment by SatoshiNotMe in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Looking at the repo, well, it does looks like we need to run this in a DB notebook.
SatoshiNotMe t1_jdpgj80 wrote
I hope this is not closely tied to the Databricks ecosystem (i.e. their notebooks, spark clusters etc). Running things in DB notebooks is not a pleasant experience.
SatoshiNotMe t1_jdke4cu wrote
Reply to comment by JohnFatherJohn in [D] "Sparks of Artificial General Intelligence: Early experiments with GPT-4" contained unredacted comments by QQII
How do you input images to GPT4? Via the API?
SatoshiNotMe t1_jdkd8l5 wrote
Reply to comment by farmingvillein in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
I’m curious about this as well. I see it’s multimodal but how do I use it with images? The ChatGPTplus interface clearly does not handle images. Does the API handle image?
SatoshiNotMe t1_jacas00 wrote
Reply to [D] More stable alternative to wandb? by not_particulary
I never liked wandb aggressive forced annual Subscription pricing. I’ve been a happy user of ClearML for a year now. I only use their hosted service for experiment tracking, I don’t have my own server.
No specific experience with long running jobs etc.
SatoshiNotMe t1_ja3z3f2 wrote
Reply to comment by davidmezzetti in [P] Introducing txtchat, next-generation conversational search and workflows by davidmezzetti
You may get better discussion on HN. Speaking of which, I have a trove of HN discussion links bookmarked, and these are a goldmine of info. Would something like your approach work for “chatting” with these and getting useful answers?
SatoshiNotMe t1_ja2swra wrote
Reply to [P] Introducing txtchat, next-generation conversational search and workflows by davidmezzetti
This is very interesting. Ignore the down-voters. Thank you for sharing 🙏
SatoshiNotMe t1_j8d79wz wrote
Reply to comment by SatoshiNotMe in [D] Quality of posts in this sub going down by MurlocXYZ
Also compare Sebastian Raschka’s post today about his Transformers Tutorial in this sub (inexplicably downvoted to 62%), vs the same post on HN last week.
SatoshiNotMe t1_j8d53d3 wrote
Reply to [D] Quality of posts in this sub going down by MurlocXYZ
Agreed. I often see more nuanced discussions on ML related topics on Hacker News. E.g this post on ToolFormer last week, compared to the same topic posted in this sub today.
https://news.ycombinator.com/item?id=34757265
Also I think many serious ML folks even avoid posting here.
SatoshiNotMe t1_j88fz8v wrote
Reply to [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280
I agree, some of the things that make ML code inscrutable are that (a) every tensor has a shape that you have to guess, and keep track of as it goes through the various layers, plus (b) layers or operations that you have to constantly look up how they change the tensor shapes.
I’ve settled on two best practices to mitigate these:
- Always include the tensor dimensions in the variable name: e.g. x_b_t_e is a tensor of shape (b,t,e), a trick I learned at a Berkeley DRL workshop many years ago.
- Einops all the things! https://einops.rocks/
With einops you can express ops and layers in a transparent way by how the tensor dims change. And now suddenly your code is refreshingly clear.
The Einops page gives many nice examples but here’s a quick preview. Contrast these two lines:
`
y= x.view(x.shape[0], -1) # x: (batch, 256, 19, 19)
y_b_chw = rearrange(x_b_c_h_w, b c h w -> b (c h w)’)
`
Yes a little verbose but I find this helps hugely with the two issues mentioned above. YMMV :)
SatoshiNotMe t1_j856nri wrote
Reply to comment by Rieux_n_Tarrou in [P] Introducing arxivGPT: chrome extension that summarizes arxived research papers using chatGPT by _sshin_
A lot of people just write “using ChatGPT” in their app headlines when in fact they are actually using the GPT3 API. I will generously interpret this as being due to this genuine confusion :)
SatoshiNotMe t1_j71t20w wrote
Reply to [R] Graph Mixer Networks by asarig_
For those not clued in, can you briefly explain what are MLP-Mixers and how they are relevant to GNNs?
SatoshiNotMe t1_j6wnj9w wrote
Reply to [P] An open source tool for repeatable PyTorch experiments by embedding your code in each model checkpoint by latefordinnerstudios
Very nice and I appreciate you sharing the code as well as motivation on your blog. The code example to save snapshot looks simple. Did I understand correctly that when you reload a snapshot it puts your current directory into the git state corresponding the checkpoint ?
SatoshiNotMe t1_j6hmia3 wrote
Also subscribe to LabML trending papers newsletter. I like this because it’s based on papers trending on twitter, which means I don’t have to actually go doom-scrolling on twitter :)
SatoshiNotMe t1_j5ponoh wrote
Reply to comment by SimonJDPrince in [P] New textbook: Understanding Deep Learning by SimonJDPrince
Looks like a great book so far. I think it is definitely valuable to focus on giving a clear understanding of some topics rather than covering everything while compromising depth of understanding
SatoshiNotMe t1_j5emmmu wrote
Reply to comment by Veggies-are-okay in [R] Is there a way to combine a knowledge graph and other types of data for ML purposes? by Low-Mood3229
This great survey/intro on GNN was just published this week
https://papers.labml.ai/paper/050c8888986f11ed8f9c3d8021bca7c8
SatoshiNotMe t1_j4uqnbg wrote
Reply to [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng
Thanks for sharing. What is the Pile? Never heard of it.
SatoshiNotMe t1_j4pp5zy wrote
Reply to [D] Is it possible to update random forest parameters with new data instead of retraining on all data? by monkeysingmonkeynew
This is called Online Learning, as opposed to Batch Learning. It’s a somewhat neglected topic in terms of available packages, but there is one here (it has decision trees, not RF):
https://github.com/online-ml/river
There is a nice interview with the author on the ML Podcast
https://podcasts.apple.com/us/podcast/the-machine-learning-podcast/id1626358243?i=1000577393019
SatoshiNotMe t1_j3n5p3v wrote
Reply to comment by LahmacunBear in [R] Learning Learning-Rates: SteDy Optimizer by LahmacunBear
Thanks! I was just curious for future reference. I’ll need to first read the papers to see if it can help with my projects.
SatoshiNotMe t1_j3lhi10 wrote
Reply to comment by LahmacunBear in [R] Learning Learning-Rates: SteDy Optimizer by LahmacunBear
Are either of these open source and easily usable as a PyTorch optimizer ?
SatoshiNotMe t1_j2wotox wrote
Reply to comment by olegranmo in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo
Appreciate this! Will have to dig into your book
SatoshiNotMe t1_j2wb9xi wrote
Reply to comment by olegranmo in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo
Intrigued by this. Any chance you could give a one paragraph summary of what a Tsetlin machine is?
SatoshiNotMe t1_jealb7d wrote
Reply to comment by matterhayes in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Is there a "nice" way to use this model, (say, via the command-line like in the GPT4All or alpaca.cpp repos), rather than in a databricks notebook or in HG spaces? For example I'd like to chat with it on my M1 MacBook Pro. Any pointers appreciated!