Email Triage OpenEnv Docs-first benchmark runtime
Multi-task benchmark
25 emails / 8 replies
Cookie or x-session-id
Deterministic, 0.0-1.0
Dark API Portal · OpenEnv

Email Triage OpenEnv

A thread-aware benchmark for inbox decision-making under real workplace pressure. It combines incident response, executive coordination, budget approvals, duplicate threads, and reply-budget tradeoffs behind a single OpenEnv-compatible API.

Manual Explorer

Play all three tasks directly from the browser. Reset an episode, inspect the observation, submit one action, and inspect reward, done state, and grading details after each step.

Browser actions call your live environment only. No agent is invoked here. Use this to inspect task dynamics, edge cases, and session behavior manually.
Sessionnone Transportcookie + x-session-id Auto Loadon task or seed change
Choose a task and seed, then a fresh episode loads automatically. Ranking submit stays disabled until all visible ids are present exactly once.

Live Session

Observation and compact episode state for the current browser session.

Episodenone
Tasknone
Processed0
Remaining0
Cumulative Reward0.00
Donetrue

Observation Emails

The current observation is rendered below so you can inspect inbox state before each step.

Step Result

Reward, completion state, and raw grading details from the last action.

0.00 done: true last_action_result: none
{"reward": 0.0, "done": true}

Tasks

Three deterministic tasks cover classification, ranking, and high-pressure inbox triage with thread-level dependencies.

easy1 emails

classification

Classify a single email into one of 5 categories

medium8 emails

ranking

Rank 8 emails by priority order

hard25 emails

full_triage

Full inbox triage with 25 emails, threaded dependencies, duplicate handling, and a fixed response budget

Endpoints

The landing page stays minimal, but the API surface is complete and interactive docs are available at /docs.

GET /health

Runtime heartbeat and active session count.

GET /tasks

Available benchmark tasks with difficulty and email count.

GET /state

Current episode state for your session cookie.

POST /reset

Start a new episode and attach a session cookie.

POST /step

Apply one action and receive observation, reward, and final grade details.

openenv.yaml

Typed action and observation contract for validation.

Quick Start

Minimal examples for calling the live environment directly from curl or any HTTP client.

curl -X POST /reset \
  -H "Content-Type: application/json" \
  -d '{"task_type":"ranking","seed":42}'
curl -X POST /step \
  -H "Content-Type: application/json" \
  -d '{"email_id":"e1","ranking":["e1","e2","e3","e4","e5","e6","e7","e8"]}'