Think_Olive_1000

Think_Olive_1000 t1_j17ndks wrote

https://openai.com/blog/faulty-reward-functions/

First result I get when I google reinforcement learning short circuit.

Pretty well known issue breh

>The RL agent finds an isolated lagoon where it can turn in a large circle and repeatedly knock over three targets, timing its movement so as to always knock over the targets just as they repopulate. Despite repeatedly catching on fire, crashing into other boats, and going the wrong way on the track, our agent manages to achieve a higher score using this strategy than is possible by completing the course in the normal way. Our agent achieves a score on average 20 percent higher than that achieved by human players.

It's short circuiting its reward function. You'll be amazed how many words their are to describe something going faulty. Short circuit seemed appropriate and is appropriate to describe what's happening here.

5

Think_Olive_1000 t1_iyr4e8f wrote

Reply to this sub by TinyBurbz

Idgaf what artists want, need, think, do. If your ability to draw is all that defines you then that's real tough pal.

3

Think_Olive_1000 t1_iyr3nj3 wrote

I one hundered percent agree, it cannot be our way of getting to AGI or ASI or anything remotely that can reason intelligently. BUT It can and will be useful for a lot of applications though and in lots of ways it is more useful than Google because it can somewhat understand the context of what im talking about because i've used it to debug code the very code it itself generated merely through back and forth conversation with my local ide open to run the code. I only hope it becomes better at this companion type role because sometimes it kinda sucks even for that. I will be happy if we can get that far.

1

Think_Olive_1000 t1_ixmm3a2 wrote

Yes, but how well it works will be limited by whether you can find exploit a similarity between the tasks.

Tangentially related: when openai were training their speech recognition model 'whisper' they found that when they trained the model to perform translation it also inexplicably increased the models performance in plain english transcription.

1

Think_Olive_1000 t1_is6e8p7 wrote

You can arrange rocks on a beach to have Turing completeness it doesn't mean that you moving them around will ever make them sentient. Sure the rocks can arbitrarily compute but they never form a cohesive experiencing machine or something that can simulate a reality of any kind on. When you move bits around inside a pc it's exactly the same.

https://xkcd.com/505/

0