Dataset to Evaluate Coding Agents on Maintainability
Official Website · GitHub · Paper
Overview
NITR (Needle in the Repo) is a benchmark for evaluating coding agents on repository-grounded software engineering tasks that emphasize maintainability rather than surface-level patch completion alone.
Focus
The benchmark is organized around interpretable maintainability dimensions such as change locality, reuse, responsibility decomposition, extension structure, dependency control, testability, lifecycle management, and side-effect isolation.
Goal
The goal is to diagnose where coding agents succeed or fail when they need to make changes that remain local, reusable, testable, and structurally coherent inside an existing codebase.