deerflow2

History

Xinmin Zeng e93f658472 fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107 ) (#3131 ) * fix(task-tool): unwrap callback manager when locating usage recorder `config["callbacks"]` may arrive as a `BaseCallbackManager` (e.g. the `AsyncCallbackManager` LangChain hands to async tool runs), not just a plain list. The previous `for cb in callbacks` loop raised `TypeError: 'AsyncCallbackManager' object is not iterable`, which `ToolErrorHandlingMiddleware` then converted into a failed `task` ToolMessage even though the subagent had completed internally — Ultra mode lost subagent results and the lead agent fell back to redoing the work. Unwrap `BaseCallbackManager.handlers` before searching for the recorder. Refs: bytedance/deer-flow#3107 (BUG-002) * fix(frontend): treat any task tool error as a terminal subtask failure The subtask card status machine matched only three English prefixes (`Task Succeeded. Result:`, `Task failed.`, `Task timed out`). Anything else fell through to `in_progress`, so a `task` tool error wrapped by `ToolErrorHandlingMiddleware` (`Error: Tool 'task' failed ...`) left the card spinning forever even after the run had ended. Extract the prefix logic into `parseSubtaskResult` and recognise any leading `Error:` token as a terminal failure. The extracted function is unit-tested against the legacy prefixes plus the `AsyncCallbackManager` regression captured in the upstream issue. Refs: bytedance/deer-flow#3107 (BUG-007) * fix(frontend): exclude hidden, reasoning, and tool payloads from chat export `formatThreadAsMarkdown` / `formatThreadAsJSON` iterated raw messages without running the UI-level `isHiddenFromUIMessage` filter. Exported transcripts therefore included `hide_from_ui` system reminders, memory injections, provider `reasoning_content`, tool calls, and tool result messages — content that is intentionally hidden in the chat view. Filter the export to the user-visible transcript by default and gate reasoning / tool calls / tool messages / hidden messages behind explicit `ExportOptions` flags so a future debug export can opt back in without forking the formatter. Refs: bytedance/deer-flow#3107 (BUG-006) * fix(gateway): route get_config through get_app_config for mtime hot reload `get_config(request)` returned the `app.state.config` snapshot captured at startup. The worker / lead-agent path then threaded that frozen `AppConfig` through `RunContext` and `agent_factory`, so per-run fields edited in `config.yaml` (notably `max_tokens`) were ignored until the gateway process was restarted — even though `get_app_config()` already does mtime-based reload at the bottom layer. Route the request dependency through `get_app_config()` directly. Runtime `ContextVar` overrides (`push_current_app_config`) and test-injected singletons (`set_app_config`) keep working; `app.state.config` is now only read at startup for one-shot bootstrap (logging level, IM channels, `langgraph_runtime` engines). `tests/test_gateway_deps_config.py` encoded the old snapshot contract and is removed; `tests/test_gateway_config_freshness.py` replaces it with mtime, ContextVar, and `set_app_config` coverage. `test_skills_custom_router.py` and `test_uploads_router.py` now inject test configs via FastAPI `dependency_overrides[get_config]` instead of mutating `app.state.config`. Document the hot-reload boundary in `backend/CLAUDE.md` so reviewers know which fields are picked up on the next request vs. which still require a restart (`database`, `checkpointer`, `run_events`, `stream_bridge`, `sandbox.use`, `log_level`, `channels.`). Refs: bytedance/deer-flow#3107 (BUG-001) fix(gateway): broaden get_config 503 to any config-load failure Address review feedback on the previous commit: 1. Narrow exception catch removed. The old contract returned 503 whenever `app.state.config is None`. The first cut only mapped `FileNotFoundError`, leaving `PermissionError`, YAML parse errors, and pydantic `ValidationError` to bubble up as 500. At the request boundary we treat any inability to materialise the config as "configuration not available" (503) and log the original exception so the operator still has the stack. 2. Removed the unused `request: Request` parameter and the matching `# noqa: ARG001`. FastAPI's `Depends()` does not require the dependency to accept `Request`; the only call site uses the no-arg form. 3. `backend/CLAUDE.md` boundary now lists the reason each field is restart-required (engine binding, singleton caching, one-shot `apply_logging_level`, etc.), not just the field name, so reviewers do not have to reverse-engineer the boundary themselves. Tests parametrise four exception classes (`FileNotFoundError`, `PermissionError`, `ValueError`, `RuntimeError`) and assert 503 for each. Refs: bytedance/deer-flow#3107 (BUG-001) * fix(task-tool): defend _find_usage_recorder against non-list callbacks Address review feedback. The previous commit handled the two common shapes LangChain hands to async tool runs — a plain `list[BaseCallbackHandler]` and a `BaseCallbackManager` subclass — but iterated any other shape directly, which would still raise `TypeError` if e.g. a single handler instance leaked through without a list wrapper. Treat any non-list, non-manager `config["callbacks"]` value as "no recorder" rather than crash. Docstring now lists all four shapes explicitly. New tests cover the single-handler-object case, `runtime is None`, `callbacks is None`, and `runtime.config` being a non-dict — all required to be silent no-ops. Refs: bytedance/deer-flow#3107 (BUG-002) * fix(frontend): drop dead identity ternary and add opt-in export tests Address review feedback on the previous export commit: 1. Removed the no-op `typeof msg.content === "string" ? msg.content : msg.content` expression in `formatThreadAsJSON`. Both branches returned the same value; the message content now flows through unchanged whether it is a string or the rich `MessageContent[]` shape (LangChain JSON-serialises the array structure correctly already). 2. Expanded the JSDoc on `ExportOptions` to make it clearer that the four flags are not currently wired to any UI control — callers wanting a debug export must build the options object explicitly. The default behaviour continues to match the explicit prescription in bytedance/deer-flow#3107 BUG-006. 3. Added opt-in coverage. The previous tests only exercised the `options = {}` default path; the new cases verify each flag flips the corresponding payload back into the export so a future debug-export surface does not silently break the contract. Refs: bytedance/deer-flow#3107 (BUG-006) * fix(frontend): export subtask prefix constants and document fallback intent Address review feedback on the previous BUG-007 commit: 1. `SUCCESS_PREFIX`, `FAILURE_PREFIX`, `TIMEOUT_PREFIX`, and the `ERROR_WRAPPER_PATTERN` regex are now exported. The JSDoc explicitly pins them as part of the backend↔frontend contract defined in `task_tool.py` and `tool_error_handling_middleware.py`, so any future structured-status migration (e.g. backend writing `additional_kwargs.subagent_status` instead of leading text) can reference these from one canonical place rather than redefine them. 2. The `in_progress` fallback now carries a docstring explaining the deliberate choice — LangChain only ever emits a `ToolMessage` once the tool itself has returned, so unrecognised content means the contract has drifted and "still running" is the right operator signal (eagerly marking it terminal-failed would mask the drift). No behaviour change; this is documentation and an API export. Refs: bytedance/deer-flow#3107 (BUG-007) * fix(gateway): drop app.state.config snapshot and freeze run_events_config Address @ShenAC-SAC's BUG-001 review on #3131. The previous cut still stored an ``AppConfig`` snapshot on ``app.state.config`` for startup bootstrap. Two follow-on hazards from that: 1. Future code touching the gateway lifespan could accidentally start reading ``app.state.config`` again, silently regressing the request hot path back to a stale snapshot. 2. ``get_run_context()`` paired a freshly-reloaded ``AppConfig`` with the startup-bound ``event_store`` and a live ``run_events_config`` field — so an operator who edited ``run_events.backend`` mid-flight would have produced a run context whose ``event_store`` and ``run_events_config`` referred to different backends. Clean approach (aligned with the direction in PR #3128): - ``lifespan()`` keeps a local ``startup_config`` variable and passes it explicitly into ``langgraph_runtime(app, startup_config)`` and into ``start_channel_service``. No ``app.state.config`` attribute is set at any point. - ``langgraph_runtime`` now accepts ``startup_config`` as a required parameter, removing the ``getattr(app.state, "config", None)`` lookup and the "config not initialised" runtime error. - The matching ``run_events_config`` is frozen onto ``app.state`` next to ``run_event_store`` so ``get_run_context`` reads the two from the same startup-time source. ``app_config`` continues to be resolved live via ``get_app_config()``. - ``backend/CLAUDE.md`` boundary explanation updated to spell out the ``startup_config`` / ``get_app_config()`` split. New regression test ``test_run_context_app_config_reflects_yaml_edit`` exercises the worker-feeding path: it asserts that ``ctx.app_config`` follows a mid-flight ``config.yaml`` edit while ``ctx.run_events_config`` stays frozen to the startup snapshot the event store was built from. Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review * fix(frontend): parse Task cancelled and polling timed out as terminal Address @ShenAC-SAC's BUG-007 review on #3131. `task_tool.py` actually emits five terminal strings: - `Task Succeeded. Result: …` - `Task failed. …` - `Task timed out. …` - `Task cancelled by user.` ← previously matched none - `Task polling timed out after N minutes …` ← previously matched none The previous cut handled three; the last two fell through to the "unknown content" branch and pushed the subtask card back to `in_progress` even though the backend had already reached a terminal state. Add explicit matches plus regression tests for both. The `in_progress` fallback is now reserved for genuinely unrecognised output (i.e. contract drift), as documented. Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review * fix(frontend): sanitize JSON export content via the Markdown content path Address @ShenAC-SAC's BUG-006 review and the Copilot inline comment on #3131. The previous cut filtered hidden/tool messages out of the JSON export but still serialised `msg.content` verbatim, so: - inline `<think>…</think>` wrappers stayed in the exported `content` even with `includeReasoning: false`, - content-array thinking blocks leaked the `thinking` field, - `<uploaded_files>…</uploaded_files>` markers leaked the workspace paths a user uploaded files to. JSON now goes through the same sanitiser the Markdown path uses (`extractContentFromMessage` + `stripUploadedFilesTag`). Reasoning and tool_calls remain gated behind their `ExportOptions` flags. AI / human rows that sanitise to empty content with no opted-in reasoning or tool calls are dropped so the JSON matches the Markdown path's `continue` on empty assistant fragments. New regression tests cover the three leak shapes the reviewer called out plus the empty-content-drop case. Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review * test(gateway): align lifespan stub with langgraph_runtime two-arg signature Codex round-3 review of c0bc7a06 flagged this: changing `langgraph_runtime` to require `startup_config` as a second positional argument broke the one-arg stub `_noop_langgraph_runtime(_app)` in `test_gateway_lifespan_shutdown.py`, which is patched into `app.gateway.app.langgraph_runtime` by the lifespan shutdown bounded-timeout regression. Lifespan would then call the stub with two args and raise `TypeError` before the bounded-shutdown assertion ran. Update the stub to match the new signature. The shutdown test itself is unaffected — it only cares about the channel `stop_channel_service` hang path. Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review * fix(frontend): strip every known backend marker in export, not just uploads Codex round-3 review of 258ca800 and the matching maintainer feedback on PR #3131 made the same point: the JSON export now ran the Markdown-side sanitiser, but that sanitiser only stripped `<uploaded_files>`. The full set of payloads middleware embeds inside message `content` is larger: - `<uploaded_files>` — `UploadsMiddleware` - `<system-reminder>` — `DynamicContextMiddleware` - `<memory>` — `DynamicContextMiddleware` (nested inside system-reminder) - `<current_date>` — `DynamicContextMiddleware` The primary protection is still `isHiddenFromUIMessage`: the `<system-reminder>` HumanMessage is marked `hide_from_ui: true` and never reaches the formatter. This commit adds the second line of defence so a regression that drops the `hide_from_ui` flag — or any future middleware that injects the same tag vocabulary into a visible HumanMessage — cannot leak the payload into the export file. Concrete changes: - New `INTERNAL_MARKER_TAGS` constant + `stripInternalMarkers(content)` helper in `core/messages/utils.ts`. The constant doubles as documentation for the backend↔frontend contract. - `formatMessageContent` in `export.ts` now calls `stripInternalMarkers` instead of `stripUploadedFilesTag`. UI render paths (`message-list-item.tsx`) keep using the narrower function so a user legitimately typing `<memory>` in a meta-discussion is preserved. - The "drop empty rows" guard in `buildJSONMessage` switched from `=== undefined` to truthy `!` checks. Codex spotted the asymmetry: when `extractReasoningContentFromMessage` returned the empty string (which it legitimately can), the JSON path emitted `{reasoning: ""}` while the Markdown path's `!reasoning` `continue` correctly dropped the row. New regression tests cover the defence-in-depth strip with a `<system-reminder><memory><current_date>` payload deliberately not marked `hide_from_ui`; tool-message sanitization under `includeToolMessages: true`; the mixed-content-array case (`thinking + text + image_url`); and the opted-in empty-reasoning drop. Live verification on a real Ultra-mode thread that uploaded a PDF (`曾鑫民-薪资交易流水.pdf`): backend state's first HumanMessage carries the `<uploaded_files>` block (with `/mnt/user-data/uploads/...` paths) as part of a content-array. The Markdown and JSON export blobs both come back free of `<uploaded_files>`, `<system-reminder>`, `<current_date>`, `tool_calls`, and reasoning — while preserving the user's `这是什么？` prompt and the assistant's visible answer. Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review * test(frontend): cover trim, varied N, and pre-execution Error: prefixes Codex round-3 review of 50e2c257 flagged three coverage gaps in the subtask-status parser: 1. `Task cancelled by user.` and `Task polling timed out` previously had no whitespace-trim coverage — the original trim test only exercised the success prefix. Streaming chunks can arrive with leading/trailing newlines; the regex needed an explicit assertion. 2. The polling-timeout case was tested only at one `N` (15 minutes). The backend interpolates the live `timeout_seconds // 60` value, so the matcher must hold for any positive integer. Now we run the case for 1, 5, and 60 minutes. 3. `task_tool.py` also emits three `Error:` strings for pre-execution failures — unknown subagent type, host-bash disabled, and "task disappeared from background tasks". They are intentionally handled by `ERROR_WRAPPER_PATTERN` rather than dedicated prefixes (the wrapper already produces the right terminal-failed shape) but had no test coverage proving that wiring. Codex was right that a refactor splitting one of them off into its own prefix would silently break things. The JSDoc on the constants block now spells the three pre-execution errors out so the relationship between `task_tool.py` returns and the prefix vocabulary is explicit. No production code change beyond the docstring — this commit is pure coverage hardening for the contract that already exists. Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review		2026-05-21 21:18:10 +08:00
..
support	chore(dev): add async/thread boundary detector (#2936 )	2026-05-20 10:00:17 +08:00
_agent_e2e_helpers.py	fix(agents): make update_agent honor runtime.context user_id like setup_agent (#2867 )	2026-05-12 23:18:54 +08:00
_router_auth_helpers.py	fix the lint error in backend	2026-04-26 15:09:25 +08:00
conftest.py	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 )	2026-05-21 16:49:31 +08:00
test_acp_config.py	feat(acp): add env field to ACPAgentConfig for subprocess env injection (#1447 )	2026-03-27 20:03:30 +08:00
test_aio_sandbox_local_backend.py	[security] fix(sandbox): bind local Docker ports to loopback (#2633 )	2026-04-30 11:40:28 +08:00
test_aio_sandbox_provider.py	fix(sandbox): avoid blocking sandbox readiness polling (#2822 )	2026-05-21 14:44:34 +08:00
test_aio_sandbox_readiness.py	fix(sandbox): avoid blocking sandbox readiness polling (#2822 )	2026-05-21 14:44:34 +08:00
test_aio_sandbox.py	feat(sandbox) Adds download file interface in Sandbox (#3038 )	2026-05-20 10:16:31 +08:00
test_app_config_reload.py	fix(config): reset config-backed singletons on hot reload (#2588 )	2026-05-06 10:17:55 +08:00
test_artifacts_router.py	fix(gateway): cap skill artifact preview size (#2963 )	2026-05-15 22:15:58 +08:00
test_auth_config.py	fix(auth): persist auto-generated JWT secret to survive restarts (#2933 )	2026-05-16 09:24:40 +08:00
test_auth_errors.py	feat(auth): release-validation pass for 2.0-rc — 12 blockers + simplify follow-ups (#2008 )	2026-04-26 11:08:11 +08:00
test_auth_middleware.py	feat: implement process-local internal authentication for Gateway and enhance CSRF handling	2026-04-26 22:20:57 +08:00
test_auth_type_system.py	feat(auth): release-validation pass for 2.0-rc — 12 blockers + simplify follow-ups (#2008 )	2026-04-26 11:08:11 +08:00
test_auth.py	fix(security): harden auth system and fix run journal logic bug (#2593 )	2026-04-28 11:34:07 +08:00
test_blocking_io_detector.py	test: add blocking IO detector (#2924 )	2026-05-13 23:56:06 +08:00
test_blocking_io_probe_integration.py	test: add blocking IO detector (#2924 )	2026-05-13 23:56:06 +08:00
test_cancel_run_idempotent.py	fix(runtime): make RunManager.cancel() idempotent for already-interrupted runs (#3055 ) (#3058 )	2026-05-20 16:37:36 +08:00
test_channel_file_attachments.py	[security] fix(upload): reject symlinked upload destinations (#2623 )	2026-05-02 15:19:28 +08:00
test_channels.py	feat(loop-detection): defer warning injection (#2752 )	2026-05-21 14:36:07 +08:00
test_check_script.py	fix(check): windows pnpm version detection in check script (#2189 )	2026-04-14 10:29:44 +08:00
test_checkpointer_none_fix.py	feat(persistence):Unified persistence layer with event store, feedback, and rebase cleanup (#2134 )	2026-04-26 11:09:55 +08:00
test_checkpointer.py	fix(packaging): add postgres extra for store/checkpointer supportFix postgres extra install guidance (#2584 )	2026-05-09 09:49:08 +08:00
test_clarification_middleware.py	fix(backend): make clarification messages idempotent (#2350 ) (#2351 )	2026-04-19 22:00:58 +08:00
test_claude_provider_oauth_billing.py	fix(oauth): Harden Claude OAuth cache-control handling (#1583 )	2026-03-30 07:41:18 +08:00
test_claude_provider_prompt_caching.py	fix: cap prompt caching breakpoints at 4 to prevent API 400 errors (#2449 )	2026-04-25 19:40:06 +08:00
test_cli_auth_providers.py	fix(provider): preserve streamed Codex output when response.completed.output is empty (#1928 )	2026-04-07 18:21:22 +08:00
test_client_e2e.py	fix(harness): resolve runtime paths from project root (#2642 )	2026-05-01 22:19:50 +08:00
test_client_langfuse_metadata.py	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 )	2026-05-21 16:49:31 +08:00
test_client_live.py	[Security] Address critical host-shell escape in LocalSandboxProvider (#1547 )	2026-03-29 21:03:58 +08:00
test_client_message_serialization.py	feat: refine token usage display modes (#2329 )	2026-05-04 09:56:16 +08:00
test_client.py	feat: refine token usage display modes (#2329 )	2026-05-04 09:56:16 +08:00
test_codex_provider.py	chroe(2585): keep polishing the code of codex token usage (#2689 )	2026-05-02 15:04:11 +08:00
test_config_version.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_converters.py	feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930 )	2026-04-26 11:05:47 +08:00
test_create_deerflow_agent_live.py	feat: add create_deerflow_agent SDK entry point (Phase 1) (#1203 )	2026-03-29 15:31:18 +08:00
test_create_deerflow_agent.py	feat(loop-detection): make loop detection configurable with per-tool frequency overrides (#2711 )	2026-05-07 16:15:15 +08:00
test_credential_loader.py	feat(loop-detection): make loop detection configurable with per-tool frequency overrides (#2711 )	2026-05-07 16:15:15 +08:00
test_csrf_middleware.py	feat: static system prompt with DynamicContextMiddleware for prefix-cache optimization (#2801 )	2026-05-09 09:27:02 +08:00
test_custom_agent.py	feat(agent): add custom-agent self-updates with user isolation (#2713 )	2026-05-05 23:17:42 +08:00
test_dangling_tool_call_middleware.py	test(middleware): lock tool-call transcript boundary invariants (#3049 )	2026-05-19 22:34:51 +08:00
test_deferred_tool_promotion_real_llm.py	fix(tools): preserve tool_search promotions across re-entrant get_available_tools (#2885 )	2026-05-13 23:45:47 +08:00
test_deferred_tool_registry_promotion.py	fix(tools): preserve tool_search promotions across re-entrant get_available_tools (#2885 )	2026-05-13 23:45:47 +08:00
test_detect_thread_boundaries.py	chore(dev): add async/thread boundary detector (#2936 )	2026-05-20 10:00:17 +08:00
test_detect_uv_extras.py	fix(scripts): preserve uv extras across `make dev` restarts (#2754 ) (#2767 )	2026-05-10 22:28:29 +08:00
test_dev_entrypoint.py	fix(scripts): preserve uv extras across `make dev` restarts (#2754 ) (#2767 )	2026-05-10 22:28:29 +08:00
test_dingtalk_channel.py	feat(channels): add DingTalk channel integration (#2628 )	2026-04-30 11:25:33 +08:00
test_discord_channel.py	feat(channels): add Discord channel integration (#1806 )	2026-04-11 17:48:04 +08:00
test_docker_sandbox_mode_detection.py	fix Windows Docker sandbox path mounting (#1634 )	2026-03-31 22:19:27 +08:00
test_doctor.py	feat(dx): Setup Wizard + doctor command — closes #2030 (#2034 )	2026-04-10 17:43:39 +08:00
test_dynamic_context_middleware.py	fix(harness): preserve dynamic context across summarization (#2823 )	2026-05-09 19:39:36 +08:00
test_ensure_admin.py	refactor: Remove init_token handling from admin initialization logic and related tests	2026-04-26 11:09:56 +08:00
test_exa_tools.py	feat(community): add Exa search as community tool provider (#1357 )	2026-04-08 17:13:39 +08:00
test_feedback.py	feat(persistence):Unified persistence layer with event store, feedback, and rebase cleanup (#2134 )	2026-04-26 11:09:55 +08:00
test_feishu_parser.py	Feature/feishu receive file (#1608 )	2026-04-06 22:14:12 +08:00
test_file_conversion.py	[security] fix(uploads): require explicit opt-in for host-side document conversion (#2332 )	2026-04-18 22:47:42 +08:00
test_firecrawl_tools.py	feat(dx): Setup Wizard + doctor command — closes #2030 (#2034 )	2026-04-10 17:43:39 +08:00
test_gateway_config_freshness.py	fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107 ) (#3131 )	2026-05-21 21:18:10 +08:00
test_gateway_docs_toggle.py	fix(nginx): defer CORS to gateway allowlist (#2861 )	2026-05-11 17:38:37 +08:00
test_gateway_lifespan_shutdown.py	fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107 ) (#3131 )	2026-05-21 21:18:10 +08:00
test_gateway_runtime_cleanup.py	fix(nginx): defer CORS to gateway allowlist (#2861 )	2026-05-11 17:38:37 +08:00
test_gateway_services.py	fix(gateway): preserve message additional_kwargs in normalize_input (#3132 ) (#3136 )	2026-05-21 21:06:19 +08:00
test_guardrail_middleware.py	feat(guardrails): add pre-tool-call authorization middleware with pluggable providers (#1240 )	2026-03-23 18:07:33 +08:00
test_harness_boundary.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_infoquest_client.py	feat(harness): integration ACP agent tool (#1344 )	2026-03-26 14:20:18 +08:00
test_initialize_admin.py	fix(auth): replace setup-status 429 rate limit with cached response (#2915 )	2026-05-18 22:07:01 +08:00
test_invoke_acp_agent_tool.py	fix(harness): wrap all async-only tools for sync clients (#2935 )	2026-05-19 22:11:46 +08:00
test_jina_client.py	fix(jina): log transient failures at WARNING without traceback (#2484 ) (#2485 )	2026-04-24 16:00:14 +08:00
test_langgraph_auth.py	fix(security): harden auth system and fix run journal logic bug (#2593 )	2026-04-28 11:34:07 +08:00
test_lead_agent_model_resolution.py	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 )	2026-05-21 16:49:31 +08:00
test_lead_agent_prompt.py	fix(skills): enforce allowed-tools metadata (#2626 )	2026-05-07 08:34:43 +08:00
test_lead_agent_skills.py	fix(skills): enforce allowed-tools metadata (#2626 )	2026-05-07 08:34:43 +08:00
test_llm_error_handling_middleware.py	refactor: thread app_config through middleware factories (#2652 )	2026-04-30 12:41:09 +08:00
test_local_bash_tool_loading.py	fix(sandbox): improve sandbox security and preserve multimodal content (#2114 )	2026-04-11 16:52:10 +08:00
test_local_sandbox_encoding.py	fix(sandbox): disable msys path conversion (#2766 )	2026-05-08 10:13:11 +08:00
test_local_sandbox_provider_mounts.py	feat(sandbox) Adds download file interface in Sandbox (#3038 )	2026-05-20 10:16:31 +08:00
test_local_sandbox_virtual_path_contract.py	fix(sandbox): uphold /mnt/user-data contract at Sandbox API boundary (#2873 ) (#2881 )	2026-05-17 08:26:04 +08:00
test_local_skill_storage_write.py	refactor(skills): Unified skill storage capability (#2613 )	2026-05-01 13:23:26 +08:00
test_logging_level_from_config.py	fix(config): unify log_level from config.yaml across Gateway and debug entry points (#2601 )	2026-04-30 22:27:14 +08:00
test_loop_detection_config.py	feat(loop-detection): make loop detection configurable with per-tool frequency overrides (#2711 )	2026-05-07 16:15:15 +08:00
test_loop_detection_middleware.py	feat(loop-detection): defer warning injection (#2752 )	2026-05-21 14:36:07 +08:00
test_mcp_client_config.py	Fix env resolution in MCP config lists (#2556 )	2026-05-21 07:27:00 +08:00
test_mcp_config_secrets.py	fix(security): mask sensitive values in MCP config API responses (#2667 )	2026-05-21 10:28:57 +08:00
test_mcp_custom_interceptors.py	feat(mcp): support custom tool interceptors via extensions_config.json (#2451 )	2026-04-25 09:18:13 +08:00
test_mcp_oauth.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_mcp_sync_wrapper.py	fix(harness): wrap all async-only tools for sync clients (#2935 )	2026-05-19 22:11:46 +08:00
test_memory_prompt_injection.py	fix: inject longTermBackground into memory prompt (#1734 )	2026-04-03 11:21:58 +08:00
test_memory_queue_user_isolation.py	fix(memory): isolate queued memory updates by agent (#2941 )	2026-05-15 10:26:35 +08:00
test_memory_queue.py	fix(memory): isolate queued memory updates by agent (#2941 )	2026-05-15 10:26:35 +08:00
test_memory_router.py	feat(persistence): per-user filesystem isolation, run-scoped APIs, and state/history simplification (#2153 )	2026-04-26 11:13:01 +08:00
test_memory_storage_user_isolation.py	fix the lint error in backend	2026-04-26 15:09:25 +08:00
test_memory_storage.py	fix: Memory update system has cache corruption, data loss, and thread-safety bugs (#2251 )	2026-04-17 12:00:31 +08:00
test_memory_thread_meta_isolation.py	feat(persistence):Unified persistence layer with event store, feedback, and rebase cleanup (#2134 )	2026-04-26 11:09:55 +08:00
test_memory_updater_user_isolation.py	fix the lint error in backend	2026-04-26 15:09:25 +08:00
test_memory_updater.py	fix(trace):memory 中文 in trace info is unicode escape sequence. (#3104 )	2026-05-20 22:34:10 +08:00
test_memory_upload_filtering.py	feat: flush memory before summarization (#2176 )	2026-04-14 15:01:06 +08:00
test_migration_user_isolation.py	feat(agent): add custom-agent self-updates with user isolation (#2713 )	2026-05-05 23:17:42 +08:00
test_mindie_provider.py	feat(channels): enhance Discord with mention-only mode, thread routing, and typing indicators (#2842 )	2026-05-15 22:30:05 +08:00
test_model_config.py	feat(codex): support explicit OpenAI Responses API config (#1235 )	2026-03-22 20:39:26 +08:00
test_model_factory.py	feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930 )	2026-04-26 11:05:47 +08:00
test_owner_isolation.py	feat(persistence):Unified persistence layer with event store, feedback, and rebase cleanup (#2134 )	2026-04-26 11:09:55 +08:00
test_patched_deepseek.py	fix: resolve missing serialized kwargs in PatchedChatDeepSeek (#2025 )	2026-04-09 16:07:16 +08:00
test_patched_minimax.py	fix: improve MiniMax code plan integration (#1169 )	2026-03-20 17:18:59 +08:00
test_patched_openai.py	fix(LLM): fixing Gemini thinking + tool calls via OpenAI gateway (#1180 ) (#1205 )	2026-03-26 15:07:05 +08:00
test_paths_user_isolation.py	feat(agent): add custom-agent self-updates with user isolation (#2713 )	2026-05-05 23:17:42 +08:00
test_persistence_scaffold.py	fix(packaging): add postgres extra for store/checkpointer supportFix postgres extra install guidance (#2584 )	2026-05-09 09:49:08 +08:00
test_persistence_timezone.py	fix(persistence): emit tz-aware timestamps from SQLite-backed stores (#3130 )	2026-05-21 16:22:09 +08:00
test_present_file_tool_core_logic.py	feat(persistence): per-user filesystem isolation, run-scoped APIs, and state/history simplification (#2153 )	2026-04-26 11:13:01 +08:00
test_provisioner_kubeconfig.py	feat(provisioner): add optional PVC support for sandbox volumes (#2020 )	2026-04-10 20:40:30 +08:00
test_provisioner_pvc_volumes.py	fix(sandbox): scope provisioner PVC data by user (#2973 )	2026-05-17 15:23:42 +08:00
test_readability.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_reflection_resolvers.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_remote_sandbox_backend.py	fix(sandbox): avoid blocking sandbox readiness polling (#2822 )	2026-05-21 14:44:34 +08:00
test_run_event_store_pagination.py	fix the lint error in backend	2026-04-26 15:09:25 +08:00
test_run_event_store.py	fix(runtime): avoid postgres aggregate row lock (#2962 )	2026-05-15 10:32:09 +08:00
test_run_journal.py	fix(runtime): persist run message summaries (#2850 )	2026-05-11 19:54:00 +08:00
test_run_manager.py	fix(harness)!: hydrate runs from RunStore and persist interrupted status (#2932 )	2026-05-18 22:25:02 +08:00
test_run_naming.py	feat(trace):LangGraph -> lead_agent and set custom agent_name to run_name (#3101 )	2026-05-21 14:48:28 +08:00
test_run_repository.py	fix(harness)!: hydrate runs from RunStore and persist interrupted status (#2932 )	2026-05-18 22:25:02 +08:00
test_run_worker_rollback.py	feat(trace):LangGraph -> lead_agent and set custom agent_name to run_name (#3101 )	2026-05-21 14:48:28 +08:00
test_runs_api_endpoints.py	fix the lint error in backend	2026-04-26 15:09:25 +08:00
test_runtime_lifecycle_e2e.py	test(runtime): add lifecycle e2e coverage (#2946 )	2026-05-20 14:52:58 +08:00
test_runtime_paths.py	fix(harness): restore legacy skills path fallback (#2694 ) (#2696 )	2026-05-03 23:40:59 +08:00
test_sandbox_audit_middleware.py	feat(sandbox): strengthen bash command auditing with compound splitting and expanded patterns (#1881 )	2026-04-07 17:15:24 +08:00
test_sandbox_middleware.py	fix(sandbox): avoid blocking sandbox readiness polling (#2822 )	2026-05-21 14:44:34 +08:00
test_sandbox_orphan_reconciliation_e2e.py	fix(sandbox): add startup reconciliation to prevent orphaned container leaks (#1976 )	2026-04-09 17:21:23 +08:00
test_sandbox_orphan_reconciliation.py	fix(sandbox): add startup reconciliation to prevent orphaned container leaks (#1976 )	2026-04-09 17:21:23 +08:00
test_sandbox_search_tools.py	fix(sandbox): add missing path masking in ls_tool output (#2317 )	2026-04-18 08:46:59 +08:00
test_sandbox_tools_security.py	fix(runtime): bound write_file execution-failure observations (#3133 )	2026-05-21 20:35:46 +08:00
test_security_scanner.py	fix(skills): make security scanner JSON parsing robust for LLM output variations (#2987 )	2026-05-17 08:59:42 +08:00
test_serialization.py	feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli (#1403 )	2026-03-30 16:02:23 +08:00
test_serialize_message_content.py	feat(harness): integration ACP agent tool (#1344 )	2026-03-26 14:20:18 +08:00
test_serper_tools.py	feat(community): add Serper web search provider (#2630 )	2026-05-02 16:22:35 +08:00
test_setup_agent_e2e_user_isolation.py	fix(agents): make update_agent honor runtime.context user_id like setup_agent (#2867 )	2026-05-12 23:18:54 +08:00
test_setup_agent_http_e2e_real_server.py	fix(agents): make update_agent honor runtime.context user_id like setup_agent (#2867 )	2026-05-12 23:18:54 +08:00
test_setup_agent_tool.py	fix: keep new agent bootstrap in user scope (#2784 )	2026-05-09 19:43:50 +08:00
test_setup_wizard.py	enable token usage by default (#2841 )	2026-05-10 22:00:57 +08:00
test_skill_manage_tool.py	refactor(skills): Unified skill storage capability (#2613 )	2026-05-01 13:23:26 +08:00
test_skills_archive_root.py	refactor: extract shared skill installer and upload manager to harness (#1202 )	2026-03-25 16:28:33 +08:00
test_skills_bundled.py	fix(skills): validate bundled SKILL.md front-matter in CI (fixes #2443 ) (#2457 )	2026-04-23 14:06:14 +08:00
test_skills_custom_router.py	fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107 ) (#3131 )	2026-05-21 21:18:10 +08:00
test_skills_installer.py	refactor(skills): Unified skill storage capability (#2613 )	2026-05-01 13:23:26 +08:00
test_skills_loader.py	fix(harness): restore legacy skills path fallback (#2694 ) (#2696 )	2026-05-03 23:40:59 +08:00
test_skills_parser.py	fix(skills): enforce allowed-tools metadata (#2626 )	2026-05-07 08:34:43 +08:00
test_skills_validation.py	fix(skills): enforce allowed-tools metadata (#2626 )	2026-05-07 08:34:43 +08:00
test_sse_format.py	feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli (#1403 )	2026-03-30 16:02:23 +08:00
test_stream_bridge.py	Fix(#1702 ): stream resume run (#1858 )	2026-04-06 14:51:10 +08:00
test_subagent_executor.py	fix(subagents): make subagent timeout terminal state atomic (#2583 )	2026-05-18 22:19:32 +08:00
test_subagent_limit_middleware.py	fix(middleware): sync raw tool call metadata (#2757 )	2026-05-08 10:08:53 +08:00
test_subagent_prompt_security.py	feat(subagents): support per-subagent skill loading and custom subagent types (#2253 )	2026-04-23 23:59:47 +08:00
test_subagent_skills_config.py	refactor: thread app_config through lead and subagent task path (#2666 )	2026-05-02 06:37:49 +08:00
test_subagent_timeout_config.py	feat(subagents): allow model override per subagent in config.yaml (#2064 )	2026-04-12 16:40:21 +08:00
test_subagent_token_collector.py	fix: bucket subagent token usage into parent run totals (#2838 )	2026-05-10 22:47:30 +08:00
test_suggestions_router.py	refactor: thread release config through lead path (#2612 )	2026-04-28 14:53:18 +08:00
test_summarization_middleware.py	fix(memory): isolate queued memory updates by agent (#2941 )	2026-05-15 10:26:35 +08:00
test_task_tool_core_logic.py	fix(task-tool): cancel and schedule deferred cleanup on polling safety timeout (#3097 )	2026-05-21 07:47:19 +08:00
test_task_tool_usage_recorder.py	fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107 ) (#3131 )	2026-05-21 21:18:10 +08:00
test_thread_data_middleware.py	Fix Windows backend test compatibility (#1384 )	2026-03-26 17:39:16 +08:00
test_thread_meta_repo.py	perf(harness): push thread metadata filters into SQL (#2865 )	2026-05-12 23:21:22 +08:00
test_thread_run_messages_pagination.py	fix(harness)!: hydrate runs from RunStore and persist interrupted status (#2932 )	2026-05-18 22:25:02 +08:00
test_thread_token_usage.py	fix: use backend thread token usage for header total (#2800 )	2026-05-09 19:40:32 +08:00
test_threads_router.py	perf(harness): push thread metadata filters into SQL (#2865 )	2026-05-12 23:21:22 +08:00
test_title_generation.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_title_middleware_core_logic.py	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 )	2026-05-21 16:49:31 +08:00
test_todo_middleware.py	fix(middleware): Prevent todo completion reminder IMMessage leak (#2907 )	2026-05-15 22:12:37 +08:00
test_token_usage_config.py	enable token usage by default (#2841 )	2026-05-10 22:00:57 +08:00
test_token_usage_middleware.py	feat: stream subagent token usage to header via terminal task events (#2882 )	2026-05-13 23:52:19 +08:00
test_token_usage.py	feat(harness): integration ACP agent tool (#1344 )	2026-03-26 14:20:18 +08:00
test_tool_args_schema_no_pydantic_warning.py	fix(tools): make write_file append discoverable in model-facing schema (#2843 )	2026-05-10 23:09:03 +08:00
test_tool_deduplication.py	fix(harness): wrap all async-only tools for sync clients (#2935 )	2026-05-19 22:11:46 +08:00
test_tool_error_handling_middleware.py	fix(subagents): use model override for tools and middleware (#2641 )	2026-05-01 22:21:10 +08:00
test_tool_output_truncation.py	fix: add output truncation to ls_tool to prevent context window overflow (#1896 )	2026-04-06 15:09:57 +08:00
test_tool_search.py	fix: gate deferred MCP tool execution (#2513 )	2026-04-24 22:45:41 +08:00
test_tracing_config.py	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 )	2026-05-21 16:49:31 +08:00
test_tracing_factory.py	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 )	2026-05-21 16:49:31 +08:00
test_tracing_metadata.py	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 )	2026-05-21 16:49:31 +08:00
test_update_agent_e2e_user_isolation.py	fix(agents): make update_agent honor runtime.context user_id like setup_agent (#2867 )	2026-05-12 23:18:54 +08:00
test_update_agent_tool.py	feat(agent): add custom-agent self-updates with user isolation (#2713 )	2026-05-05 23:17:42 +08:00
test_uploads_manager.py	fix(uploads): add Windows support for safe symlink-protected uploads (#2794 )	2026-05-09 18:21:54 +08:00
test_uploads_middleware_core_logic.py	feat(persistence): per-user filesystem isolation, run-scoped APIs, and state/history simplification (#2153 )	2026-04-26 11:13:01 +08:00
test_uploads_router.py	fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107 ) (#3131 )	2026-05-21 21:18:10 +08:00
test_user_context.py	fix the lint error in backend	2026-04-26 15:09:25 +08:00
test_utils_time.py	fix(gateway): return ISO 8601 timestamps from threads endpoints (#2599 )	2026-05-02 15:16:16 +08:00
test_view_image_middleware.py	test: add unit tests for ViewImageMiddleware (#2256 )	2026-04-15 23:54:30 +08:00
test_view_image_tool.py	fix(harness): constrain view_image to thread data paths (#2557 )	2026-04-28 11:13:17 +08:00
test_vllm_provider.py	feat(models): add vLLM provider support (#1860 )	2026-04-06 15:18:34 +08:00
test_wechat_channel.py	feat: add WeChat channel integration (#1869 )	2026-04-10 20:49:28 +08:00
test_worker_langfuse_metadata.py	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 )	2026-05-21 16:49:31 +08:00