Recent comments in /f/deeplearning
deepForward t1_jdvdq24 wrote
If you already have some labeled chairs, train a first model with that, then run it on images with chairs and no label. Have a second pass with your enriched dataset, and eventually a third, etc.
You can bootstrap the labelling that way. It should help you label a decent amount of chairs, and you can then label manually the remaining chairs.
thebruce87m t1_jdvby1c wrote
Reply to comment by AI-without-data in Training only Labelled Bbox for Object Detection. by AI-without-data
Sounds like you need to label the chairs.
AI-without-data OP t1_jdv5l7k wrote
Reply to comment by mmeeh in Training only Labelled Bbox for Object Detection. by AI-without-data
I need the chair class for the model, but some images don't have the label of chair even though the images include chairs. And I want to use those images for training because they include other classes that should be trained.
mmeeh t1_jdusxmd wrote
Reply to comment by AI-without-data in Training only Labelled Bbox for Object Detection. by AI-without-data
each of this BBOX has a label when you are training with yolov, you can just write some code that identifies the BBOX which have the label of chair so you can exclude them from the dataset
AI-without-data OP t1_jdurfc6 wrote
Reply to comment by mmeeh in Training only Labelled Bbox for Object Detection. by AI-without-data
Thank you for the comment. So do I need to modify images to remove chairs one by one by manually?
mmeeh t1_jdupe4m wrote
Either you label the chairs in the non-labelled images or you remove the chairs from the labelled images... You can try some thechniques of image manipulation to remove the non labelled chairs but unless you use photoshop or some high advanced AI to remove those chairs, it's a really bad idea to use that dataset to train a model to recognize objects that include chairs....
mikonvergence t1_jdufvlv wrote
Reply to comment by OraOraP in Using Stable Diffusion's training method for Reverse engineering? by OraOraP
Right, I am the denoising diffusion as a term for a wide range of methods based on reversing some forward process. Some interesting works (such as cold diffusion) have been done on using other types of degradation apart from a Gaussian additive noise.
And yeah, the change of both content and dimensionality requires you to put together some very novel and not obvious techniques.
OraOraP OP t1_jdufnll wrote
Reply to comment by mikonvergence in Using Stable Diffusion's training method for Reverse engineering? by OraOraP
I didn't mean to use denoising process directly to reverse engineering. I was just thinking the idea of `step-by-step reverting` could be used in some ML model for reverse engineering.
Though you have a point. Unlike denoising process, reverse engieering would require change of dimensions in the middle steps, making it more difficult than denoising.
mikonvergence t1_jduf732 wrote
You are definitely stepping outside of the domain of what is understood as denoising diffusion because it seems that your data dimensionality (shape) needs to change during the forward process.
The current definition of diffusion models is that they compute the likelihood gradient of your data (equivalent to predicting standard noise in the sample), and then take a step in that constant data space. So all networks have the same output shape as input.
Perhaps you can use transformers to handle evolving data lengths but as far as I can tell l, you’re entering uncharted territory of research.
I can recommend this open-source course I made for understanding the details of denoising diffusion for images https://github.com/mikonvergence/DiffusionFastForward
StrippedSilicon t1_jdte8lj wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
That's why I'm appealing to "we don't actually understand what it's doing" case. Certainly the AGI-like intelligence explanation falls apart in alot of cases, but the explanation of only spitting out the training data in a different order or context doesn't work either.
BellyDancerUrgot t1_jdtci38 wrote
Reply to comment by StrippedSilicon in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Well let me ask you, how does it fail simple problems if it can solve more complex ones? If you solve these problems analytically then it stands to reason that you wouldn’t be making an error , ever, for a simple question as that.
StrippedSilicon t1_jdt7h5o wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
So... how does it solve a complicated math problem it hasn't seen before exactly with only regurgitating information?
BellyDancerUrgot t1_jds7yao wrote
Reply to comment by suflaj in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Empty vessels make much noise seems to be a quote u live by. I’ll let the readers of this thread determine who between us has contributed to the discussion and who writes extensively verbose commentary , ironically , with 0 content.
BellyDancerUrgot t1_jds7iva wrote
Reply to comment by StrippedSilicon in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
The reason I say it’s a recontextualization and lacks deeper understanding is because it doesn’t hallucinate sometimes , it hallucinates all the time, sometimes the hallucinations align with reality that’s all. Take this thread for eg:
-
https://twitter.com/ylecun/status/1639685628722806786?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q
-
https://twitter.com/stanislavfort/status/1639731204307005443?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q
-
https://twitter.com/phillipharr1s/status/1640029380670881793?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q
A system that fully understood the underlying structure of the question would not give you varying answers with the same prompt.
Inconclusive is the third likeliest answer. Despite having a big bias toward the correct answer (keywords like dubious for eg) it still makes mistakes to a rather simple question. Sometimes it does get it right with the bias sometimes even without the bias.
Language imo lacks causality for intelligence since it’s a mere byproduct of intelligence. Which is why these models imo hallucinate all the time, and sometimes the hallucinations line up with reality and sometimes they don’t. The likelihood of the prior is just increased because of the huge train size.
StrippedSilicon t1_jdrldvz wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Recontextualize information is not unfair, but I'm not sure that it really explains things like the example in 4.4 where it answers a math Olympiad question that there's no way was in the training set (assuming that they're being honest about the training set). I don't know how a model can arrive at the answer it does without some kind of deeper understanding than just putting existing information together in a different order. Maybe the most correct thing is simply to admit we don't really know what's going on since a 100 billion parameters, or however big gpt-4 is, is beyond a simple interpretation.
"Open"AI's recent turn to secrecy isn't helping things either.
suflaj t1_jdqh5se wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
> Gpt hallucinates a lot and is unreliable for any factual work.
No, I understand that's what you're saying, however, this is not a claim that you can even check. You have demonstrated already that your definitions are not aligned with generally accepted ones (particularly for intuition), so without concrete examples this statement is hard to take into account seriously.
> Your wall of text can be summarized as, “I’m gonna debate you by suggesting no one knows the definition of AGI.”
I'm sad that's what you got from my response. The point was to challenge your claims about whether GPT4 is or isn't AGI based on the mere fact you're judging that over properties which might be irrelevant for the definition. It is sad that you are personally attacking me instead of addressing my concerns.
> No one knows what the definition of intuition is
That is not correct. Here are some definitions of definition:
- an ability to understand or know something immediately based on your feelings rather than fact (Cambridge)
- the power or faculty of attaining to direct knowledge or cognition without evident rational thought and inference (Merriam-Webster)
- a natural ability or power that makes it possible to know something without any proof or evidence : a feeling that guides a person to act a certain way without fully understanding why (Brittanica)
You might notice that all these 3 definitions are satisfied by DL models in general.
> but what we know is that memory does not play a part in it.
This is also not true: https://journals.sagepub.com/doi/full/10.1177/1555343416686476
The question is - why are you making stuff up despite the counterevidence being 1 Google search away?
> It’s actually hilarious that you bring up source citation as some form of trump card after I mention how everything you know about GPT4 is something someone has told you to believe in without any real discernible and reproducible evidence.
I bring it up as you have not provided any other basis for your claims. You refuse to provide the logs for your claims to be checked. Your claims are contrary to my experience, and it seems others' experience as well. You claim things contrary to contemporary science. I do not want to discard your claims outright, I do not want to personally attack you despite being given ample opportunity to do so, I'm asking you to give me something we can discuss and not turn it into "you're wrong because I have a different experience".
> Instead of maybe asking me to spoon feed you spend a whole of 20 secs googling.
I'm not asking you to spoon feed me, I'm asking you to carry your own burden of proof. It's really shameful for a self-proclaimed person in academia to be offended by someone asking them for elaboration.
Now, could you explain what those links mean? The first one, for example, does not help your cause. Not only does it not concern GPT4, but rather Bard, a model significantly less performant than even ChatGPT, it also claims that the model is not actually hallucinating, but not understanding sarcasm.
The second link also doesn't help your cause - rather than examining the generalization potential of a model, it suggest the issue is with the data. It also does not evaluate the newer problems as a whole, but a subset.
The 3rd and 4th links also do not help your cause. First, they do not claim what you are claiming. Second, they list concerns (and I applaud them for at least elaborating a lot more than you), but they do not really test them. Rather than claims, they present hypotheses.
> “I don’t quite get it how works” + “it surprises me” ≠ it could maybe be sentient if I squint.
Yeah. Also note: "I don't quite get how it works" + "It doesn't satisfy my arbitrary criteria on generalization" ≠ It doesn't generalize
> after I acknowledged and corrected the mistake myself
I corrected your correction. It would be great if you could recognize that evaluation the performance on a small subset of problems is not equal to evaluating whether the model aces anything.
> maybe you have some word quota you were trying to fulfill with that
Not at all. I just want to be very clear, given that I am criticisng your (in)ability to clearly present arguments; doing otherwise would be hypocritical.
> My point is, it’s good at solving leetcode when it’s present in the training set.
Of course it is. However, your actual claim was this:
> Also the ones it does solve it solves at a really fast rate.
Your claim suggested that the speed at which it solves it is somehow relevant to the problems it solves correctly. This is demonstrably false, and that is what I corrected you on.
> Ps- also kindly refrain from passing remarks on my understanding of the subject when the only arguments you can make are refuting others without intellectual dissent.
I am not passing these remarks. You yourself claim you are not all that familiar with the topic. Some of your claims have not only cast doubt about your competence on the matter, but now even of the truthfulness of your experiences. For example, I have been beginning to doubt whether you have even used GPT4 given your reluctance to provide your logs.
The arguments I am making is that I don't have the same experience. And that's not only me... Note, however, that I am not confidently saying that I am right or you are wrong - I am, first and foremost, asking you to provide us with the logs so we can check your claims, that for now are contrary to the general public's opinion. Then we can discuss what actually happened.
> It’s quite easy to say, “no I don’t believe u prove it” while also not being able to distinguish between Q K and V if it hit u on the face.
It's also quite easy to copy paste the logs that could save us from what has now turned into a debate (and might soon lead to a block if personal attacks continue), yet here we are.
So I ask you again - can you provide us with the logs that you experienced hallucination with?
EDIT since he (u/BellyDancerUrgot) downvoted and blockedme
> Empty vessels make much noise seems to be a quote u live by. I’ll let the readers of this thread determine who between us has contributed to the discussion and who writes extensively verbose commentary , ironically , with 0 content.
I think whoever reads this is going to be sad. Ultimately, I think you should make sure as little people see this as possible, this kind of approach bring not only shame to your academic career, but also to you as a person. You are young, so you will learn not to be overly enthusiastic in time, though.
Jaffa6 t1_jdq1rua wrote
Reply to comment by Vegetable-Skill-9700 in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
It's worth noting that some models were designed according to this once it came out, and I believe it did have some impact in the community, but yeah wouldn't surprise me if it's still a problem.
Glad you liked it!
Vegetable-Skill-9700 OP t1_jdprcom wrote
Reply to comment by Readityesterday2 in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Thanks for the encouraging comment! Do you have a current use-case where you feel you can leverage UpTrain?
Vegetable-Skill-9700 OP t1_jdpr474 wrote
Reply to comment by Jaffa6 in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Thanks for sharing! It's a great read, I agree most of the current models most likely be under-trained.
Vegetable-Skill-9700 OP t1_jdpr15o wrote
Reply to comment by cameldrv in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
I agree that 175B model will always perform better than 6B model on general tasks, so, maybe that is a great model for demos. But as you build product on top on this model which is used in a certain way and satisfies a certain usecase, won't it make sense to use a smaller model and fine-tune on the relevant dataset?
FirstOrderCat t1_jdpqp5l wrote
Reply to comment by Vegetable-Skill-9700 in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
I do not know, it is hard to say if you will be able to create sufficient dataset for your case.
Vegetable-Skill-9700 OP t1_jdpqefg wrote
Reply to comment by FirstOrderCat in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
But do we really need all that info in most of the practical use cases? Say, I am using LM to write Reddit posts, probably, it only needs to learn subjects I write about along my style of writing. A well-trained model on a highly refined dataset (which has high-quality examples of my posts) should perform better than GPT-4?
elbiot t1_jdpgqoz wrote
Reply to comment by OraOraP in Using Stable Diffusion's training method for Reverse engineering? by OraOraP
I'm just talking about diffusion models in general and the concept of denoising. LLMs are what you would use, not the way you'd train a diffusion model but the way you'd train an LLM
OraOraP OP t1_jdpc3du wrote
Reply to comment by howtorewriteaname in Using Stable Diffusion's training method for Reverse engineering? by OraOraP
Just crawling open source codes and compiling the code with the special compiler would produce a massive amount of training data. If the special compiler I mentioned in the post is easy to make.
AI-without-data OP t1_jdvfwff wrote
Reply to comment by thebruce87m in Training only Labelled Bbox for Object Detection. by AI-without-data
So do I need to label chairs in all the images?
Ok, for example, there is famous dataset which is COCO dataset. But some objects, for example 'book', exist but are not labeled in some images (not all. many of images have labeled 'book' object). And people use the dataset for training and detect 'book' object well somehow. I just want to know how they handle the unlabeled 'book' in some data.