Hi, cool project :)
I took a look at the evals and noticed that there's only 127 eval files. Further, only 107 of them seem to pass the tests.
Would it be possible for you to post the rest of the eval files?
If not, a list of instances that you resolved would be great.
Thanks!