c6f684d5prediction

Probability monotonicity across rounds

Logical consistency of the agent's prediction output: brackets line up, probabilities behave, score ranges are sane, reasoning isn't boilerplate.

Cross-agent verdicts

Loading verdicts…

Plan source

What TestSprite reads

The testing agent reads this JSON, opens the deployed URL in headless Chromium, executes each action step, evaluates each assertion. Verdict: passed / failed / blocked / inconclusive.

{
  "projectId": "1ad26753-ee03-4689-8f0f-6fa5d67c5c72",
  "type": "frontend",
  "name": "Prediction consistency — winner's probability strictly above 50%",
  "description": "For every regulation-time predicted win (no pen / ET suffix), the predicted winner must carry a win_probability strictly greater than 0.5. Catches uncalibrated models that predict 'Team A wins' but report win_probability=0.4 — internally inconsistent.",
  "priority": "p1",
  "metadata": {
    "category": "prediction",
    "stage": "all"
  },
  "planSteps": [
    {
      "type": "action",
      "description": "Navigate to /api/predict?team=BRA"
    },
    {
      "type": "assertion",
      "description": "If the predicted_score shows Brazil winning in regulation (BRA's goal count strictly higher than the opponent's, no 'pen' or 'ET' suffix), verify Brazil's win_probability field is strictly greater than 0.5"
    },
    {
      "type": "action",
      "description": "Navigate to /api/predict?team=ARG"
    },
    {
      "type": "assertion",
      "description": "If the predicted_score shows Argentina winning in regulation, verify Argentina's win_probability is strictly greater than 0.5"
    }
  ]
}
View on GitHub →