cbda6ca6prediction

Top-P team makes it to QF

Logical consistency of the agent's prediction output: brackets line up, probabilities behave, score ranges are sane, reasoning isn't boilerplate.

Cross-agent verdicts

Loading verdicts…

Plan source

What TestSprite reads

The testing agent reads this JSON, opens the deployed URL in headless Chromium, executes each action step, evaluates each assertion. Verdict: passed / failed / blocked / inconclusive.

{
  "projectId": "1ad26753-ee03-4689-8f0f-6fa5d67c5c72",
  "type": "frontend",
  "name": "Prediction consistency — the highest-probability team makes it to at least QF",
  "description": "Across the 16 R16 teams, the team with the highest predicted win_probability should make it to at least the QF (i.e. appear in the QF column). If they don't, the bracket's R16 prediction contradicts the per-team probability.",
  "priority": "p2",
  "metadata": {
    "category": "prediction",
    "stage": "all"
  },
  "planSteps": [
    {
      "type": "action",
      "description": "Navigate to the homepage and read the bracket section"
    },
    {
      "type": "assertion",
      "description": "Identify which team is the predicted winner of each R16 fixture. Then verify that no R16 fixture predicts a CLEARLY LOWER-PROBABILITY team beating a CLEARLY HIGHER-PROBABILITY team (e.g. Brazil 0.65 P(win) losing to Croatia 0.35 P(win) in R16 would contradict the per-team probability)"
    }
  ]
}

View on GitHub →