Skip to content

New SWE-bench submission #2

@EwoutH

Description

@EwoutH

Hi @theskcd!

As current leaders on the SWE-bench Lite Leaderboard, I was curious if you were planning to submit a new version of Aide, and maybe to multiple leaderboards.

In the SWE Bench Lite Analysis blog it was noted that:

Impossible questions
There are a set of impossible questions which cannot be solved by any framework or human. These involve questions where the error needs to be formatted exactly as the test.

SWE-bench Verified might help with that. All 500 questions have been human-validated for quality. It might be that Aide could break the 50% threshold as the first model and go into history this way!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions