deerflow2

History

Xinmin Zeng 8d2e55a05f fix(subagent): structured subagent_status field over text parsing (#3146 ) (#3154 ) * fix(subagent): structured subagent_status field over text parsing Closes #3146. ## Why The frontend used to derive subtask card state by string-matching the leading text of the `task` tool's result. That contract surface was fragile — `#3107` BUG-007 and the `#3131` review both surfaced cases where new backend wording (`Task cancelled by user.`, `Task polling timed out after N minutes`, `ToolErrorHandlingMiddleware` exception wrappers) silently broke the card lifecycle. The frontend fallback kept growing more prefixes; any future rewording would break it again. ## Design 1. Backend → frontend contract: `ToolMessage.additional_kwargs` carries `subagent_status` (one of `completed \| failed \| cancelled \| timed_out \| polling_timed_out`) and an optional `subagent_error` blob. The frontend prefers it over parsing `content`. 2. Centralised stamping, not 8 sprinkled stamps: rather than have each of `task_tool.py`'s 5 normal-return + 3 pre-execution `Error:` paths remember to set `additional_kwargs`, `ToolErrorHandlingMiddleware` stamps the field after every task-tool call. Adding a new return path in `task_tool.py` cannot now skip the stamp. 3. Cross-language contract fixture: the prefix→status mapping is the one piece both sides must agree on. The shared fixture at `contracts/subagent_status_contract.json` lists every backend return string, the expected status, and what the error substring should contain. Backend test (`backend/tests/test_subagent_status_contract.py`) and frontend test (`frontend/tests/unit/core/tasks/subtask-result.test.ts`) both load that fixture and assert the same cases. A wording drift on either side fails the matching language's test. 4. Round-trip serialisation pinned: the round-trip test asserts `ToolMessage.model_dump_json()` → `model_validate_json()` preserves `additional_kwargs.subagent_status`. Catches the case where a future LangChain or Pydantic upgrade silently strips unknown kwargs. 5. Frontend status collapse documented: the backend has five status values, the frontend card has three (`completed \| failed \| in_progress`). `cancelled` / `timed_out` / `polling_timed_out` all collapse to `failed` with the original status preserved in `error`. `parseSubtaskResult` returns `in_progress` for unknown values so a backend that ships a new enum variant before the frontend upgrades degrades to the legacy prefix fallback instead of getting pinned. ## Changes Backend: - `deerflow.subagents.status_contract` — new module exporting `SUBAGENT_STATUS_KEY`, `SUBAGENT_ERROR_KEY`, `SUBAGENT_STATUS_VALUES`, `extract_subagent_status(content)`, and `make_subagent_additional_kwargs(status, error)`. - `ToolErrorHandlingMiddleware`: new `_stamp_task_subagent_status` helper centralises the stamp; `wrap_tool_call` / `awrap_tool_call` stamp on the success path; `_build_error_message` stamps on the wrapper path (carrying `ExcClass: detail` into `subagent_error`). Non-task tools are untouched. - New tests: `test_subagent_status_contract.py` (19 cases from the shared fixture + status-enum / blank-error / unknown-status rejection) and `test_tool_error_handling_subagent_stamp.py` (middleware integration: terminal-content stamps, non-terminal doesn't, non-task tools untouched, async path mirrors sync, existing additional_kwargs survive, JSON round-trip preserved). Frontend: - `parseSubtaskResult(text, additionalKwargs?)` — prefers the structured stamp; falls back to the legacy prefix matcher for historical threads / unknown future status values. - `STRUCTURED_STATUS_TO_SUBTASK` documents the five→three collapse. - `message-list.tsx` passes `message.additional_kwargs` through. - `subtask-result.test.ts` adds a structured-status block + a fixture-driven contract block; legacy prefix tests stay green for the fallback path. Contract: - `contracts/subagent_status_contract.json` — single source of truth both languages load. Whitespace variants, varied N for polling timeouts, the 3 pre-execution `Error:` returns task_tool produces, and the middleware wrapper shape are all in there. ## Test plan - `make lint` clean (backend + frontend). - `pytest tests/test_subagent_status_contract.py tests/test_tool_error_handling_subagent_stamp.py` → 37 passed. - `pnpm test --run` → 103 passed (was 76, +27 new). ## Migration / fallback retirement The text-prefix fallback stays in place until backend telemetry shows the frontend never hits it for newly produced messages. At that point a follow-up PR can drop the prefix branches and keep only the structured-status branch. Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131 (prior prefix-only fix), #3146 (this issue). * fix(subtask): back-fill result/error from text when structured status present Three follow-ups on the PR #3154 review: 1. `readStructuredStatus` no longer short-circuits the prefix parse. The backend currently stamps only the `subagent_status` enum value; the human-facing `result` body and wrapped-error message still live in `ToolMessage.content`. Dropping the text parse meant successful tasks rendered empty completed pills and wrapped failures lost their diagnostic. Now both shapes get composed: structured status wins, `result`/`error` come from text when both sides agree, and a lying success body under a `failed` stamp is dropped instead of leaking. 2. Replace the ESM-incompatible `__dirname` fixture lookup in subtask-result.test.ts with `fileURLToPath(new URL(..., import.meta.url))`. The frontend package is `"type": "module"`, so the previous path would have thrown at runtime if anything ever changed under the contract directory. 3. Drop the `$schema` reference from contracts/subagent_status_contract.json pointing at a file that doesn't exist in the tree. Three new tests cover the structured + text composition: completed back-fills the success body, failed back-fills the wrapper text, and unrecognised content under a `failed` stamp stays empty rather than echoing noise.		2026-06-07 22:49:55 +08:00
..
__init__.py	feat: add create_deerflow_agent SDK entry point (Phase 1) (#1203 )	2026-03-29 15:31:18 +08:00
clarification_middleware.py	fix(backend): make clarification messages idempotent (#2350 ) (#2351 )	2026-04-19 22:00:58 +08:00
dangling_tool_call_middleware.py	fix(runtime): guide malformed write_file recovery (#3040 )	2026-05-29 17:46:24 +08:00
deferred_tool_filter_middleware.py	fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state) (#3342 )	2026-06-02 22:43:22 +08:00
dynamic_context_middleware.py	fix(lint): remove duplicate is_dynamic_context_reminder definition (#2837 )	2026-05-09 23:40:46 +08:00
llm_error_handling_middleware.py	fix(#3189 ): prevent write_file streaming timeout on long reports (#3195 )	2026-06-07 17:47:11 +08:00
loop_detection_middleware.py	feat(loop-detection): defer warning injection (#2752 )	2026-05-21 14:36:07 +08:00
memory_middleware.py	refactor: thread app_config through lead and subagent task path (#2666 )	2026-05-02 06:37:49 +08:00
safety_finish_reason_middleware.py	fix(runtime): suppress tool execution when provider safety-terminates with tool_calls (#3035 )	2026-05-22 21:20:28 +08:00
safety_termination_detectors.py	fix(runtime): suppress tool execution when provider safety-terminates with tool_calls (#3035 )	2026-05-22 21:20:28 +08:00
sandbox_audit_middleware.py	feat(sandbox): strengthen bash command auditing with compound splitting and expanded patterns (#1881 )	2026-04-07 17:15:24 +08:00
subagent_limit_middleware.py	fix(middleware): sync raw tool call metadata (#2757 )	2026-05-08 10:08:53 +08:00
summarization_middleware.py	fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503 ) (#3378 )	2026-06-07 17:55:04 +08:00
thread_data_middleware.py	feat: enhance chat history loading with new hooks and UI components (#2338 )	2026-04-26 11:20:17 +08:00
title_middleware.py	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 )	2026-05-21 16:49:31 +08:00
todo_middleware.py	fix(todo): reuse thread state schema (#3206 )	2026-05-26 23:58:08 +08:00
token_usage_middleware.py	feat: stream subagent token usage to header via terminal task events (#2882 )	2026-05-13 23:52:19 +08:00
tool_call_metadata.py	fix(middleware): sync raw tool call metadata (#2757 )	2026-05-08 10:08:53 +08:00
tool_error_handling_middleware.py	fix(subagent): structured subagent_status field over text parsing (#3146 ) (#3154 )	2026-06-07 22:49:55 +08:00
tool_output_budget_middleware.py	feat(agent): add ToolOutputBudgetMiddleware for oversized tool output protection (#3303 )	2026-05-29 22:59:26 +08:00
uploads_middleware.py	fix(agents): offload UploadsMiddleware uploads scan off the event loop (#3311 )	2026-05-30 21:46:35 +08:00
view_image_middleware.py	fix(backend): preserve viewed image reducer metadata (#1900 )	2026-04-06 16:47:19 +08:00