Submitted by No_Possibility_7588 t3_10gu68z in deeplearning
I predicted that a certain change in the architecture of my agents would boost their coordination (in the context of multi-agent reinforcement learning). However, I tested this in the Meetup environment and it is not working, in the sense that it performs slightly worse than the baseline. This is how the environment works: three agents must collectively choose one of K landmarks and congregate near it. At each time step, each agent receives reward equal to the change in distance between itself and the landmark closest to all three agents. The goal landmark changes depending on the current position of all agents. When all K agents are adjacent to the same landmark, the agents receive a bonus of 1 and the episode ends.
Scientifically speaking, how can I be rigorous about testing this hypothesis again? A few ideas:
1 Repeat the experiment multiple times with different random seeds to ensure that the results are robust and not influenced by random variations.
2 Vary the parameters of the agent
- Vary the number of modules used in the policy and test the effect on coordination.
- Increase the number of agents
3 Vary the parameters of the environment
- Changing the number of landmarks
- Adding distractors
4 Test another environment
What do you think?
-
suflaj t1_j54vkhl wrote
Well you would at minimum need to explain why it didn't meet your expectations. You need to elaborate on what grounds you hypothesized what you hypothesized, and why that was a wrong basis or elaborate on what happened that you didn't predict would.
I also, ex., assumed that vision transformers would get better performance on the task I had. But when they didn't (they were sometimes outperformed by YOLO v1), I investigated why, and laid out the proof why it was not human error (aside from my judgement), as well as suggestions on how to proceed next. To do that I rerun the experiment many times, changed hyperparameters, swapped out detectors, all to narrow down that it wasn't actually the inadequate settings, but the arch and specific model themselves.