SM-100

The SM-100 benchmark is a comprehensive evaluation framework designed to assess how well software agents can navigate complex codebases and identify real bugs. This benchmark provides a standardized way to measure the effectiveness of AI-powered software engineering tools in real-world code bases.

What is SM-100?

SM-100 evaluates software agents across multiple dimensions including bug detection without context (needle in haystack), bug identification given specific PR/commit context, and the ability to remediate discovered issues.

Learn More

For an overview of the SM-100 benchmark, watch our presentation at the AI Engineers World Fair.

Get Involved

Have questions about the benchmark or want to contribute improvements? Visit our GitHub repository to ask questions, report issues, or submit enhancements to help make SM-100 even better.

About SM-100

What is SM-100?

Learn More

Get Involved