In this topic, we will be summarizing the questions regarding the Challenge tasks asked on the meetups, as well as ones we receive online.
If you don't find the answer you are looking for - feel free to ask your question here, and we will reply as soon as we can.
Q: Is there going to be a benchmark agent published?
A: There might be - we are considering building it.
Q: Does the agent know when the tasks switch?
A: No, the agent does not have any extra info, it needs to figure this out by itself. We had a discussion internally about this. It is possible to detect that a task switched by making a statistics of your performance for example. Anyway, building an agent that can recognize a task switch on its own is a part of the challenge. We're curious about the proposed solutions.
Q: What is the success criterion for the agent that indicates that it did not solve a task by randomness?
A: The agent should be able to solve more than 1 task instance; 10 successfully solved instances in a row indicate that it’s not pure luck.
Q: Will the evaluation curriculum also have tasks with a fixed order?
Q: Is the use of Azure optional?
A: Yes. We expect that the participants will primarily use their machines and use Azure in case of lack of resources, or to test their solution on an evaluation platform before submitting it.
Q: What is the purpose of the semicolon in the training tasks?
A: The semicolon works as a delimiter, and if an agent learns to recognize a delimiter, it can help it to solve future tasks (for instance, evaluation tasks might also contain a delimiter, but not the same one as in the training tasks).
Q: Can the agent access internet?
A: Not during the evaluation phase:-)
Q: Can the participants submit a trained agent?
A: Yes, they can pre-train their agent on the training tasks. In addition they can use their own tasks or data sets
Q: According to the README file that is available at https://github.com/general-ai-challenge/Round1 the default value for REQUIRED_CONSECUTIVE_REWARDS is 10. Can this value be less than 10 in the evaluation round?
Exploiting this exact value can be a big advantage for detecting the task instance switch. If this value can change, the luckiest guy who happened to have the best guess may win, because we may not submit plural solutions which only differ in parameters.
A: We're planning to randomize REQUIRED_CONSECUTIVE_REWARDS between tasks in the evaluation phase - there will be no guarantee that any two tasks use the same value of REQUIRED_CONSECUTIVE_REWARDS.
The principle for detecting that an agent has learned to solve a task will remain the same though - we will be watching out for a consecutive number of successes in a row.