On page 21 of the specifications pdf, it reads:
A.188.8.131.52 Instance success criterion
The agent successfully solves a task instance if, for each input (not feedback) digit that is shown for the second time, third time, n-th time, it predicts the correct feedback digit. If the agent makes a mistake, the environment keeps presenting data from the current task instance until a timeout occurs or the agent finally solves the instance (although such solution is not considered a success).
I'm not sure what that last bit means. The agent is bound to make mistakes (wrong guesses) and not get it completely right straight away. I'm assuming what is meant is that if the agent still hasn't solved the task instance after a certain number of attempts (some solving threshold) then the agent can keep trying to solve the task instance (until a further timeout threshold number of guesses is reached) and if the agent solves the task instance after the solving threshold (but before the timeout threshold) then it's good (good for the agent as it should have learned how to solve future instances of the task) but that particular task instance would not count as a success - is that correct?
When we submit our agents, they will communicate with the evaluation tester via zeromq. If our agents connect to localhost (127.0.0.1), eg on port 5556, then it might be possible (for a skilled programmer) to program their agent to analyze loads, processes & tmp files on the local machine to get an unfair advantage by being able to determine when a task / task instance has changed (etc, or worse). Will the evaluation tester be on a different machine instead? - and if so, our agents will have to accept IP address & Port number as arguments when our agents are executed, right?