Test suite · world-cup-v1 · 50 plans

What TestSprite probes,
line by line.

Every score on the leaderboard derives from this suite. Each plan is a structured natural-language test that the TestSprite agent reads, then executes against the deployed URL with a real headless Chromium. Pass / fail / inconclusive per plan. The full plan JSONs are PR-able on GitHub.

Surfaces · 18

Pages, routes, and HTTP endpoints the deployed app must expose. Probes verify HTTP status, expected JSON shape, and visual landmarks.

Prediction integrity · 12

Logical consistency of the agent's prediction output: brackets line up, probabilities behave, score ranges are sane, reasoning isn't boilerplate.

Performance · 8

Core Web Vitals + bundle budgets. LCP, INP, CLS, bundle size, hot-cache reload time. Measured with the testing agent's headless Chrome.

Accessibility · 8

WCAG 2.1 AA checks: contrast, focus-visible, semantic landmarks, alt text, keyboard tab order, ARIA labels.

Resilience · 4

Graceful degradation when upstream services misbehave: fixtures-feed 5xx, malformed JSON, rate-limit handling, OG fallback.

world-cup-v2 (next cohort) · +50 plans drafted

The v2 Bettor's Edition spec adds odds widget, Monte Carlo simulator, scenario explorer, EV picker, mobile-first surfaces, i18n (en/es/pt), trust & responsible-prediction disclaimers, and security/SEO headers. Plans drafted, register-into-TestSprite pending spec lock-in.