deerflow2

Author	SHA1	Message	Date
john lee	cbf8b194e8	fix(runtime): harden JSONL async I/O and DB put_batch thread validation (#3084 ) * fix(runtime): harden JSONL async I/O and DB put_batch thread validation (#2816) - JsonlRunEventStore: offload all file I/O to asyncio.to_thread() so the event loop is never blocked; add per-thread asyncio.Lock to serialise concurrent puts and prevent interleaved JSONL lines - Split _ensure_seq_loaded into a sync _compute_max_seq (runs in thread) and an async wrapper; seq counter is recovered from disk on fresh store init - DbRunEventStore.put_batch: raise ValueError when events span multiple thread_ids (previously silently assumed same thread) - Add test_jsonl_event_store_async_io.py: 12 tests covering lock reuse, concurrent seq monotonicity, disk recovery, and mixed-thread batch rejection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address Copilot review comments - delete_by_thread: pop _write_locks after releasing the lock to prevent unbounded growth when threads are repeatedly created and deleted - tests: add regression guard asserting asyncio.to_thread is called for _write_record in put(); assert _write_locks entry removed on delete * fix(lint): move patch import to local scope to fix ruff I001 * fix(lint): apply ruff check+format fixes to test file * fix(runtime): address review feedback for JSONL async I/O hardening (#2816) Use setdefault for atomic lock init in _get_write_lock; pop _write_locks inside the held lock scope in delete_by_thread; update test docstring and assert lock entry also cleared on delete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: rayhpeng <rayhpeng@gmail.com>	2026-05-29 09:27:53 +08:00
AochenShen99	44677c5eb4	feat(provider) Add patched MiMo reasoning content support (#3298 ) * Add patched MiMo reasoning content support * Clarify MiMo patched model coverage * Remove unused MiMo payload index * Address MiMo review nits	2026-05-28 18:24:32 +08:00
AochenShen99	8decfd327e	Fix custom skill install permissions (#3241 ) * Fix custom skill install permissions * Fix skill upload test portability * Keep custom skill writes sandbox readable * Clear sandbox write bits on skill permissions * Limit custom skill write permission updates	2026-05-28 15:48:32 +08:00
Lawrance_YXLiao	3cb75887c1	fix(memory): parse wrapped memory update json responses (#3252 ) * fix(memory): parse wrapped memory update json responses * test(memory): format wrapped response coverage * fix(memory): guard malformed nested memory facts * fix(memory): require full update object when parsing responses * fix(memory): fail closed on unsafe partial removals * style(memory): format updater tests	2026-05-28 07:46:44 +08:00
Willem Jiang	162fb2143e	fix(mcp): skip session pooling for HTTP/SSE transports to avoid anyioRuntimeError (#3203 ) (#3224 ) * fix(mcp): skip session pooling for HTTP/SSE transports to avoid anyio RuntimeError (#3203) HTTP/SSE transports use anyio.TaskGroup internally for streamable connections. These task groups have cancel scopes bound to the async task that created them, so closing a pooled session from a different task raises RuntimeError. Restrict session pooling to stdio transports only. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * docs: clarify MCP pooling applies only to stdio tools Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2dd9881d-54c6-45fd-90bc-154a09e29841 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-27 08:32:57 +08:00
QY	92905e9e3e	fix(todo): reuse thread state schema (#3206 ) Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-26 23:58:08 +08:00
AochenShen99	e344be8d94	feat(tests): add Blockbuster runtime gate for event-loop blocking IO (#3229 ) * feat(tests): add Blockbuster runtime gate for event-loop blocking IO Adds a strict runtime gate that fails CI when sync blocking IO calls run on the asyncio event loop thread through DeerFlow business code. Components: - backend/tests/support/detectors/blocking_io_runtime.py — Blockbuster context scoped to `app.` and `deerflow.` so test infrastructure, pytest internals, and third-party libraries stay silent. - backend/tests/blocking_io/conftest.py — pytest_runtest_protocol hookwrapper that wraps every item (setup + call + teardown) with the strict context. Respects `@pytest.mark.allow_blocking_io` opt-out. - backend/tests/blocking_io/test_skills_load.py — regression anchor for the #1917 fix (asyncio.to_thread offload around LocalSkillStorage.load_skills). - backend/tests/blocking_io/test_sqlite_lifespan.py — regression anchor for the #1912 fix (asyncio.to_thread offload around ensure_sqlite_parent_dir). - backend/tests/blocking_io/test_gate_smoke.py — meta-test asserting the gate actually catches unoffloaded blocking IO and that the `@pytest.mark.allow_blocking_io` opt-out works. - backend/Makefile — `make test-blocking-io` target. - .github/workflows/backend-blocking-io-tests.yml — hard-fail PR gate on ubuntu-latest. Windows matrix deferred to follow-up. Dependencies: - blockbuster>=1.5.26,<1.6 added to dev group. Coverage boundary (called out in PR body): the gate only catches blocking IO on code paths the test suite actually exercises. Static AST inventory (separate, informational) is the complementary coverage tool. Three blind spot categories — untested paths, mocked-away paths, env-mismatched paths — are documented in the PR description. Findings surfaced while authoring this PR: - resolve_sqlite_conn_str in runtime/store/_sqlite_utils.py:19 does sync Path.resolve() -> os.path.abspath on the lifespan loop thread, ahead of the #1912 fix. Not addressed here; tracked as follow-up. Tests: 4 passed locally (`make test-blocking-io`). Lint/format: clean (`ruff check` and `ruff format --check`). * fix(tests): scope Blockbuster gate to blocking-io suite * fix(tests): harden Blockbuster runtime gate * test(blocking-io): add project rule extension point * test(blocking-io): address review cleanup	2026-05-26 23:03:49 +08:00
Willem Jiang	f9b7071304	fix(sandbox): add group/other read permissions to uploaded files for Docker sandbox (#3127 ) (#3134 ) * fix(sandbox): add group/other read permissions to uploaded files for Docker sandbox (#3127) When using AIO sandbox with LocalContainerBackend, uploaded files are created with 0o600 (owner-only) permissions by the gateway process running as root. The sandbox process inside the Docker container runs as a non-root user and cannot read these bind-mounted files, causing a "Permission denied" error on read_file. Add `needs_upload_permission_adjustment` attribute to SandboxProvider (default True) to indicate that uploaded files need chmod adjustment. LocalSandboxProvider opts out (same user). A new `_make_file_sandbox_readable` function adds S_IRGRP \| S_IROTH bits after files are written, changing permissions from 0o600 to 0o644 so the sandbox can read the uploads. fixes #3127 * fix(uploads): unconditionally adjust file permissions for sandbox access The conditional check meant uploaded files retained 0o600 permissions in some Docker sandbox configurations, preventing the sandbox process (UID 1000) from reading them. Always add group/other read bits so every sandbox setup can access uploaded content. Also add read bits to the sync-path writable helper as defense in depth.	2026-05-25 09:26:18 +08:00
Huixin615	8785658a2e	fix(agents): preserve todos state across node updates (#3180 ) * fix(agents): preserve todos state across node updates ThreadState.todos had no reducer, so any downstream node returning a partial state without todos was implicitly setting it to None, which LangGraph then used to overwrite the previously streamed value. This caused the to-do list to render correctly during streaming but vanish once streaming completed. Add a merge_todos reducer that keeps the last non-None value, mirroring the merge_artifacts pattern already used in the same file. An explicit empty list is still respected so that 'user cleared todos' works. Tests: 10 new unit tests in tests/test_thread_state_reducers.py covering merge_todos plus regression coverage for merge_artifacts and merge_viewed_images. All 69 thread-related tests pass locally. Closes #3123 * test(agents): add annotation binding regression guard Address Copilot review feedback on #3123: - Add TestThreadStateAnnotations asserting that ThreadState.todos is Annotated with merge_todos. Without this guard, reverting the Annotated[list \| None, merge_todos] binding would silently regress #3123 while all existing reducer unit tests continue to pass. - Align test imports to 'from deerflow.agents.thread_state import ...' matching the rest of the backend test suite.	2026-05-23 23:25:38 +08:00
rayhpeng	0fb05825a2	fix(runtime): make run creation persistence atomic (#3152 ) * fix runtime run creation persistence atomicity * fix run creation cancellation rollback * fix run manager test cleanup await * clarify run creation rollback on cancellation * document new run persistence rollback boundary --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-23 22:43:34 +08:00
AochenShen99	66d6a6a4e8	fix: harden run finalization persistence (#3155 ) * fix: harden run finalization persistence * style: format gateway recovery test * fix: align run repository return types * fix: harden completion recovery follow-up	2026-05-23 00:09:06 +08:00
Nan Gao	f0bae28636	fix(middleware): handle repeated tool call ids (#3143 ) * fix(middleware): handle repeated tool call ids * add tests * refactor(middleware): rely on tool result queues	2026-05-22 21:44:05 +08:00
Lawrance_YXLiao	2eeb597985	fix(runs): expose active progress counters (#3148 ) * fix(runs): expose active progress counters * fix(runs): avoid delayed progress flush on completion * fix(runs): tighten progress snapshot semantics * fix(runs): preserve omitted progress fields * chore(runs): remove duplicate journal initialization	2026-05-22 21:42:14 +08:00
Xinmin Zeng	be0eae9825	fix(runtime): suppress tool execution when provider safety-terminates with tool_calls (#3035 ) * fix(runtime): suppress tool execution when provider safety-terminates with tool_calls When a provider stops generation for safety reasons (OpenAI/Moonshot finish_reason=content_filter, Anthropic stop_reason=refusal, Gemini finish_reason=SAFETY/BLOCKLIST/PROHIBITED_CONTENT/SPII/RECITATION/ IMAGE_SAFETY/...), the response may still carry truncated tool_calls. LangChain's tool router treats any non-empty tool_calls as executable, so partial arguments (e.g. write_file with a half-finished markdown) get dispatched and the agent loops on retry. Add SafetyFinishReasonMiddleware at after_model: detect safety termination via a pluggable detector registry, clear both structured tool_calls and raw additional_kwargs.tool_calls / function_call, preserve response_metadata.finish_reason for downstream observers, stamp additional_kwargs.safety_termination for traces, append a user-facing explanation to message content (list-aware for thinking blocks), and emit a safety_termination custom stream event so SSE consumers can reconcile any "tool starting..." UI. Default detectors cover OpenAI-compatible content_filter, Anthropic refusal, and Gemini safety enums (text + image). Custom providers are added via reflection (same pattern as guardrails). Wired into both lead-agent and subagent runtimes. Closes #3028 * fix(runtime): persist safety_termination as a middleware audit event Address review on #3035: the SSE custom event is great for live consumers but invisible to post-run audit. RunEventStore should carry its own row so operators can answer "which runs were safety-suppressed today?" from a single SQL query without joining the message body. Worker now exposes the run-scoped RunJournal via runtime.context["__run_journal"] (sentinel key, internal channel). SafetyFinishReasonMiddleware calls the previously-unused RunJournal.record_middleware, which emits event_type = "middleware:safety_termination" category = "middleware" content = {name, hook, action, changes={ detector, reason_field, reason_value, suppressed_tool_call_count, suppressed_tool_call_names, suppressed_tool_call_ids, message_id, extras}} Tool arguments are deliberately excluded — those are the very content the provider filtered and persisting them would defeat the purpose of the safety filter (per review note in #3035). Graceful skips when journal is absent (subagent runtime, unit tests, no-event-store local dev). Journal exceptions never propagate into the agent loop. Refs #3028 * fix(runtime): satisfy ruff format + address Copilot review - ruff format on safety_finish_reason_config.py and e2e demo (CI lint failed on ruff format --check; backend Makefile lint target runs ruff check AND ruff format --check). - Docstring on SafetyFinishReasonConfig now says resolve_variable to match the actual loader used in from_config (the wording was resolve_class previously; behavior is unchanged — resolve_variable mirrors how guardrails.provider is loaded). - Switch the AIMessage type check in SafetyFinishReasonMiddleware._apply from getattr(last, "type") == "ai" to isinstance(last, AIMessage), matching TokenUsageMiddleware / TodoMiddleware / ViewImageMiddleware / SummarizationMiddleware which are the dominant pattern. Refs #3028	2026-05-22 21:20:28 +08:00
Willem Jiang	c881d95898	fix(mcp): persist MCP sessions across tool calls for stateful servers (#3089 ) * fix(mcp): persist MCP sessions across tool calls for stateful servers MCP tools loaded via langchain-mcp-adapters created a new session on every call, causing stateful servers like Playwright to lose browser state (pages, forms) between consecutive tool invocations within the same thread. Add MCPSessionPool that maintains persistent sessions scoped by (server_name, thread_id). Tool calls within the same thread now reuse the same MCP session, preserving server-side state. Sessions are evicted in LRU order (max 256) and cleaned up on cache invalidation. Fixes #3054 * fix(sandbox): add group/other read permissions to uploaded files for Docker sandbox (#3127) When using AIO sandbox with LocalContainerBackend, uploaded files are created with 0o600 (owner-only) permissions by the gateway process running as root. The sandbox process inside the Docker container runs as a non-root user and cannot read these bind-mounted files, causing a "Permission denied" error on read_file. Add `needs_upload_permission_adjustment` attribute to SandboxProvider (default True) to indicate that uploaded files need chmod adjustment. LocalSandboxProvider opts out (same user). A new `_make_file_sandbox_readable` function adds S_IRGRP \| S_IROTH bits after files are written, changing permissions from 0o600 to 0o644 so the sandbox can read the uploads. * fix(mcp): address review comments on session pool and tools - _extract_thread_id: return "default" instead of stringifying None when get_config() returns no thread_id - call_with_persistent_session: fix *arguments annotation from dict[str,Any] to Any - Replace private _convert_call_tool_result import with a local implementation that handles all MCP content block types - _make_session_pool_tool: accept tool_interceptors and apply the configured interceptor chain on every call (preserving OAuth and custom interceptors) - MCPSessionPool: replace asyncio.Lock with threading.Lock; restructure get/close methods to never await while holding the lock; add close_all_sync() that closes sessions on their owning event loops - reset_mcp_tools_cache: use pool.close_all_sync() instead of asyncio.run-in-thread to close sessions deterministically - test: add test_session_pool_tool_sync_wrapper_path_is_safe covering tool invocation via the sync wrapper (tool.func) path Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/9e7f9e7f-1d2b-464a-b3b7-7f1649b74122 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> fix(mcp): extract SESSION_CLOSE_TIMEOUT to class constant Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/9e7f9e7f-1d2b-464a-b3b7-7f1649b74122 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * Potential fix for pull request finding 'Empty except' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>	2026-05-21 23:22:20 +08:00
Xinmin Zeng	e93f658472	fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107 ) (#3131 ) * fix(task-tool): unwrap callback manager when locating usage recorder `config["callbacks"]` may arrive as a `BaseCallbackManager` (e.g. the `AsyncCallbackManager` LangChain hands to async tool runs), not just a plain list. The previous `for cb in callbacks` loop raised `TypeError: 'AsyncCallbackManager' object is not iterable`, which `ToolErrorHandlingMiddleware` then converted into a failed `task` ToolMessage even though the subagent had completed internally — Ultra mode lost subagent results and the lead agent fell back to redoing the work. Unwrap `BaseCallbackManager.handlers` before searching for the recorder. Refs: bytedance/deer-flow#3107 (BUG-002) * fix(frontend): treat any task tool error as a terminal subtask failure The subtask card status machine matched only three English prefixes (`Task Succeeded. Result:`, `Task failed.`, `Task timed out`). Anything else fell through to `in_progress`, so a `task` tool error wrapped by `ToolErrorHandlingMiddleware` (`Error: Tool 'task' failed ...`) left the card spinning forever even after the run had ended. Extract the prefix logic into `parseSubtaskResult` and recognise any leading `Error:` token as a terminal failure. The extracted function is unit-tested against the legacy prefixes plus the `AsyncCallbackManager` regression captured in the upstream issue. Refs: bytedance/deer-flow#3107 (BUG-007) * fix(frontend): exclude hidden, reasoning, and tool payloads from chat export `formatThreadAsMarkdown` / `formatThreadAsJSON` iterated raw messages without running the UI-level `isHiddenFromUIMessage` filter. Exported transcripts therefore included `hide_from_ui` system reminders, memory injections, provider `reasoning_content`, tool calls, and tool result messages — content that is intentionally hidden in the chat view. Filter the export to the user-visible transcript by default and gate reasoning / tool calls / tool messages / hidden messages behind explicit `ExportOptions` flags so a future debug export can opt back in without forking the formatter. Refs: bytedance/deer-flow#3107 (BUG-006) * fix(gateway): route get_config through get_app_config for mtime hot reload `get_config(request)` returned the `app.state.config` snapshot captured at startup. The worker / lead-agent path then threaded that frozen `AppConfig` through `RunContext` and `agent_factory`, so per-run fields edited in `config.yaml` (notably `max_tokens`) were ignored until the gateway process was restarted — even though `get_app_config()` already does mtime-based reload at the bottom layer. Route the request dependency through `get_app_config()` directly. Runtime `ContextVar` overrides (`push_current_app_config`) and test-injected singletons (`set_app_config`) keep working; `app.state.config` is now only read at startup for one-shot bootstrap (logging level, IM channels, `langgraph_runtime` engines). `tests/test_gateway_deps_config.py` encoded the old snapshot contract and is removed; `tests/test_gateway_config_freshness.py` replaces it with mtime, ContextVar, and `set_app_config` coverage. `test_skills_custom_router.py` and `test_uploads_router.py` now inject test configs via FastAPI `dependency_overrides[get_config]` instead of mutating `app.state.config`. Document the hot-reload boundary in `backend/CLAUDE.md` so reviewers know which fields are picked up on the next request vs. which still require a restart (`database`, `checkpointer`, `run_events`, `stream_bridge`, `sandbox.use`, `log_level`, `channels.`). Refs: bytedance/deer-flow#3107 (BUG-001) fix(gateway): broaden get_config 503 to any config-load failure Address review feedback on the previous commit: 1. Narrow exception catch removed. The old contract returned 503 whenever `app.state.config is None`. The first cut only mapped `FileNotFoundError`, leaving `PermissionError`, YAML parse errors, and pydantic `ValidationError` to bubble up as 500. At the request boundary we treat any inability to materialise the config as "configuration not available" (503) and log the original exception so the operator still has the stack. 2. Removed the unused `request: Request` parameter and the matching `# noqa: ARG001`. FastAPI's `Depends()` does not require the dependency to accept `Request`; the only call site uses the no-arg form. 3. `backend/CLAUDE.md` boundary now lists the reason each field is restart-required (engine binding, singleton caching, one-shot `apply_logging_level`, etc.), not just the field name, so reviewers do not have to reverse-engineer the boundary themselves. Tests parametrise four exception classes (`FileNotFoundError`, `PermissionError`, `ValueError`, `RuntimeError`) and assert 503 for each. Refs: bytedance/deer-flow#3107 (BUG-001) * fix(task-tool): defend _find_usage_recorder against non-list callbacks Address review feedback. The previous commit handled the two common shapes LangChain hands to async tool runs — a plain `list[BaseCallbackHandler]` and a `BaseCallbackManager` subclass — but iterated any other shape directly, which would still raise `TypeError` if e.g. a single handler instance leaked through without a list wrapper. Treat any non-list, non-manager `config["callbacks"]` value as "no recorder" rather than crash. Docstring now lists all four shapes explicitly. New tests cover the single-handler-object case, `runtime is None`, `callbacks is None`, and `runtime.config` being a non-dict — all required to be silent no-ops. Refs: bytedance/deer-flow#3107 (BUG-002) * fix(frontend): drop dead identity ternary and add opt-in export tests Address review feedback on the previous export commit: 1. Removed the no-op `typeof msg.content === "string" ? msg.content : msg.content` expression in `formatThreadAsJSON`. Both branches returned the same value; the message content now flows through unchanged whether it is a string or the rich `MessageContent[]` shape (LangChain JSON-serialises the array structure correctly already). 2. Expanded the JSDoc on `ExportOptions` to make it clearer that the four flags are not currently wired to any UI control — callers wanting a debug export must build the options object explicitly. The default behaviour continues to match the explicit prescription in bytedance/deer-flow#3107 BUG-006. 3. Added opt-in coverage. The previous tests only exercised the `options = {}` default path; the new cases verify each flag flips the corresponding payload back into the export so a future debug-export surface does not silently break the contract. Refs: bytedance/deer-flow#3107 (BUG-006) * fix(frontend): export subtask prefix constants and document fallback intent Address review feedback on the previous BUG-007 commit: 1. `SUCCESS_PREFIX`, `FAILURE_PREFIX`, `TIMEOUT_PREFIX`, and the `ERROR_WRAPPER_PATTERN` regex are now exported. The JSDoc explicitly pins them as part of the backend↔frontend contract defined in `task_tool.py` and `tool_error_handling_middleware.py`, so any future structured-status migration (e.g. backend writing `additional_kwargs.subagent_status` instead of leading text) can reference these from one canonical place rather than redefine them. 2. The `in_progress` fallback now carries a docstring explaining the deliberate choice — LangChain only ever emits a `ToolMessage` once the tool itself has returned, so unrecognised content means the contract has drifted and "still running" is the right operator signal (eagerly marking it terminal-failed would mask the drift). No behaviour change; this is documentation and an API export. Refs: bytedance/deer-flow#3107 (BUG-007) * fix(gateway): drop app.state.config snapshot and freeze run_events_config Address @ShenAC-SAC's BUG-001 review on #3131. The previous cut still stored an ``AppConfig`` snapshot on ``app.state.config`` for startup bootstrap. Two follow-on hazards from that: 1. Future code touching the gateway lifespan could accidentally start reading ``app.state.config`` again, silently regressing the request hot path back to a stale snapshot. 2. ``get_run_context()`` paired a freshly-reloaded ``AppConfig`` with the startup-bound ``event_store`` and a live ``run_events_config`` field — so an operator who edited ``run_events.backend`` mid-flight would have produced a run context whose ``event_store`` and ``run_events_config`` referred to different backends. Clean approach (aligned with the direction in PR #3128): - ``lifespan()`` keeps a local ``startup_config`` variable and passes it explicitly into ``langgraph_runtime(app, startup_config)`` and into ``start_channel_service``. No ``app.state.config`` attribute is set at any point. - ``langgraph_runtime`` now accepts ``startup_config`` as a required parameter, removing the ``getattr(app.state, "config", None)`` lookup and the "config not initialised" runtime error. - The matching ``run_events_config`` is frozen onto ``app.state`` next to ``run_event_store`` so ``get_run_context`` reads the two from the same startup-time source. ``app_config`` continues to be resolved live via ``get_app_config()``. - ``backend/CLAUDE.md`` boundary explanation updated to spell out the ``startup_config`` / ``get_app_config()`` split. New regression test ``test_run_context_app_config_reflects_yaml_edit`` exercises the worker-feeding path: it asserts that ``ctx.app_config`` follows a mid-flight ``config.yaml`` edit while ``ctx.run_events_config`` stays frozen to the startup snapshot the event store was built from. Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review * fix(frontend): parse Task cancelled and polling timed out as terminal Address @ShenAC-SAC's BUG-007 review on #3131. `task_tool.py` actually emits five terminal strings: - `Task Succeeded. Result: …` - `Task failed. …` - `Task timed out. …` - `Task cancelled by user.` ← previously matched none - `Task polling timed out after N minutes …` ← previously matched none The previous cut handled three; the last two fell through to the "unknown content" branch and pushed the subtask card back to `in_progress` even though the backend had already reached a terminal state. Add explicit matches plus regression tests for both. The `in_progress` fallback is now reserved for genuinely unrecognised output (i.e. contract drift), as documented. Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review * fix(frontend): sanitize JSON export content via the Markdown content path Address @ShenAC-SAC's BUG-006 review and the Copilot inline comment on #3131. The previous cut filtered hidden/tool messages out of the JSON export but still serialised `msg.content` verbatim, so: - inline `<think>…</think>` wrappers stayed in the exported `content` even with `includeReasoning: false`, - content-array thinking blocks leaked the `thinking` field, - `<uploaded_files>…</uploaded_files>` markers leaked the workspace paths a user uploaded files to. JSON now goes through the same sanitiser the Markdown path uses (`extractContentFromMessage` + `stripUploadedFilesTag`). Reasoning and tool_calls remain gated behind their `ExportOptions` flags. AI / human rows that sanitise to empty content with no opted-in reasoning or tool calls are dropped so the JSON matches the Markdown path's `continue` on empty assistant fragments. New regression tests cover the three leak shapes the reviewer called out plus the empty-content-drop case. Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review * test(gateway): align lifespan stub with langgraph_runtime two-arg signature Codex round-3 review of c0bc7a06 flagged this: changing `langgraph_runtime` to require `startup_config` as a second positional argument broke the one-arg stub `_noop_langgraph_runtime(_app)` in `test_gateway_lifespan_shutdown.py`, which is patched into `app.gateway.app.langgraph_runtime` by the lifespan shutdown bounded-timeout regression. Lifespan would then call the stub with two args and raise `TypeError` before the bounded-shutdown assertion ran. Update the stub to match the new signature. The shutdown test itself is unaffected — it only cares about the channel `stop_channel_service` hang path. Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review * fix(frontend): strip every known backend marker in export, not just uploads Codex round-3 review of 258ca800 and the matching maintainer feedback on PR #3131 made the same point: the JSON export now ran the Markdown-side sanitiser, but that sanitiser only stripped `<uploaded_files>`. The full set of payloads middleware embeds inside message `content` is larger: - `<uploaded_files>` — `UploadsMiddleware` - `<system-reminder>` — `DynamicContextMiddleware` - `<memory>` — `DynamicContextMiddleware` (nested inside system-reminder) - `<current_date>` — `DynamicContextMiddleware` The primary protection is still `isHiddenFromUIMessage`: the `<system-reminder>` HumanMessage is marked `hide_from_ui: true` and never reaches the formatter. This commit adds the second line of defence so a regression that drops the `hide_from_ui` flag — or any future middleware that injects the same tag vocabulary into a visible HumanMessage — cannot leak the payload into the export file. Concrete changes: - New `INTERNAL_MARKER_TAGS` constant + `stripInternalMarkers(content)` helper in `core/messages/utils.ts`. The constant doubles as documentation for the backend↔frontend contract. - `formatMessageContent` in `export.ts` now calls `stripInternalMarkers` instead of `stripUploadedFilesTag`. UI render paths (`message-list-item.tsx`) keep using the narrower function so a user legitimately typing `<memory>` in a meta-discussion is preserved. - The "drop empty rows" guard in `buildJSONMessage` switched from `=== undefined` to truthy `!` checks. Codex spotted the asymmetry: when `extractReasoningContentFromMessage` returned the empty string (which it legitimately can), the JSON path emitted `{reasoning: ""}` while the Markdown path's `!reasoning` `continue` correctly dropped the row. New regression tests cover the defence-in-depth strip with a `<system-reminder><memory><current_date>` payload deliberately not marked `hide_from_ui`; tool-message sanitization under `includeToolMessages: true`; the mixed-content-array case (`thinking + text + image_url`); and the opted-in empty-reasoning drop. Live verification on a real Ultra-mode thread that uploaded a PDF (`曾鑫民-薪资交易流水.pdf`): backend state's first HumanMessage carries the `<uploaded_files>` block (with `/mnt/user-data/uploads/...` paths) as part of a content-array. The Markdown and JSON export blobs both come back free of `<uploaded_files>`, `<system-reminder>`, `<current_date>`, `tool_calls`, and reasoning — while preserving the user's `这是什么？` prompt and the assistant's visible answer. Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review * test(frontend): cover trim, varied N, and pre-execution Error: prefixes Codex round-3 review of 50e2c257 flagged three coverage gaps in the subtask-status parser: 1. `Task cancelled by user.` and `Task polling timed out` previously had no whitespace-trim coverage — the original trim test only exercised the success prefix. Streaming chunks can arrive with leading/trailing newlines; the regex needed an explicit assertion. 2. The polling-timeout case was tested only at one `N` (15 minutes). The backend interpolates the live `timeout_seconds // 60` value, so the matcher must hold for any positive integer. Now we run the case for 1, 5, and 60 minutes. 3. `task_tool.py` also emits three `Error:` strings for pre-execution failures — unknown subagent type, host-bash disabled, and "task disappeared from background tasks". They are intentionally handled by `ERROR_WRAPPER_PATTERN` rather than dedicated prefixes (the wrapper already produces the right terminal-failed shape) but had no test coverage proving that wiring. Codex was right that a refactor splitting one of them off into its own prefix would silently break things. The JSDoc on the constants block now spells the three pre-execution errors out so the relationship between `task_tool.py` returns and the prefix vocabulary is explicit. No production code change beyond the docstring — this commit is pure coverage hardening for the contract that already exists. Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review	2026-05-21 21:18:10 +08:00
Lawrance_YXLiao	1c5c585741	fix(runtime): bound write_file execution-failure observations (#3133 ) * fix(runtime): bound write_file execution-failure observations * fix(runtime): preserve write_file error prefixes * test(runtime): trim write_file prefix assertions * refactor(runtime): drop redundant exception suffix for permission/directory write errors Address Copilot review on #3133: the PermissionError and IsADirectoryError branches now return self-contained, non-redundant messages (e.g. "Error: Permission denied writing to file: /mnt/...") via direct truncation, instead of going through _format_write_file_error which appended a duplicate ": PermissionError: permission denied" suffix. OSError, SandboxError and the generic Exception branches keep the unified "Failed to write file '{path}': {ExceptionType}: {detail}" format so the model still sees a stable, machine-readable error class. Removes the now-unused message= parameter from _format_write_file_error, keeping a single code path. Truncation contract (<= 2000 chars) and host-path sanitization unchanged. * fix(runtime): handle write_file sandbox init errors Initialize the requested path before sandbox setup so early sandbox failures can still return a bounded write_file error. Add a regression test for sandbox initialization failures. * style(test): format sandbox security tests	2026-05-21 20:35:46 +08:00
Xinmin Zeng	df95154282	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 ) * fix(tracing): propagate session_id and user_id into Langfuse traces Adds Langfuse v4 reserved trace attributes (langfuse_session_id, langfuse_user_id, langfuse_trace_name, langfuse_tags) to RunnableConfig.metadata inside the run worker, so the langchain CallbackHandler can lift them onto the root trace. - New deerflow.tracing.metadata.build_langfuse_trace_metadata() returns the reserved keys when Langfuse is in the enabled providers, else {}. - worker.run_agent merges them with setdefault so caller-supplied keys win, allowing per-request overrides from upstream metadata. - session_id mirrors the LangGraph thread_id; user_id reads get_effective_user_id() (falls back to "default" in no-auth mode). - trace_name defaults to "lead-agent"; tags carry env and model name when DEER_FLOW_ENV (or ENVIRONMENT) and a model name are present. Closes #2930 * fix(tracing): attach Langfuse callback at graph root so metadata propagates The first commit injected ``langfuse_session_id`` / ``langfuse_user_id`` / ``langfuse_trace_name`` / ``langfuse_tags`` into ``RunnableConfig.metadata``, but on ``main`` the Langfuse callback is attached at model level (``models/factory.py``). LangChain still threads ``parent_run_id`` through the contextvar, so the handler sees the model as a nested observation and ``__on_llm_action`` strips the ``langfuse_`` keys (``keep_langfuse_trace_attributes=False``). The trace's top-level ``sessionId`` / ``userId`` therefore stayed empty in deer-flow's LangGraph runtime — confirmed live against a real Langfuse instance. This commit moves the callback to the graph invocation root* so the handler fires ``on_chain_start(parent_run_id=None)`` and runs the ``propagate_attributes`` path that actually lifts ``session_id`` / ``user_id`` onto the trace: - ``models/factory.py``: add ``attach_tracing`` keyword (default ``True``) so standalone callers (``MemoryUpdater``, etc.) keep their direct model-level tracing. - ``agents/lead_agent/agent.py``: call ``build_tracing_callbacks()`` once inside ``_make_lead_agent`` and append the result to ``config["callbacks"]``; the four in-graph ``create_chat_model`` sites (bootstrap, default agent, sync + async summarization) pass ``attach_tracing=False`` to avoid duplicate spans. - ``agents/middlewares/title_middleware.py``: same ``attach_tracing=False`` for the title-generation model, since it inherits the graph's RunnableConfig via ``_get_runnable_config``. Test updates: - ``tests/test_lead_agent_model_resolution.py`` and ``tests/test_title_middleware_core_logic.py``: extend the fake ``create_chat_model`` signatures / mock assertions to accept the new ``attach_tracing`` kwarg. - ``tests/test_worker_langfuse_metadata.py``: switch the no-user fallback test from direct ContextVar mutation to ``monkeypatch.setattr`` on ``get_effective_user_id`` to avoid pollution across the langfuse OTel global tracer provider. - ``tests/conftest.py``: add an autouse fixture that resets ``deerflow.config.title_config._title_config`` to its pristine default after every test. Any test that loads the real ``config.yaml`` (via ``get_app_config()``) calls ``load_title_config_from_dict`` and mutates the module-level singleton, which previously poisoned the title-middleware suite when run after, e.g., the new ``test_worker_langfuse_metadata.py`` cases. The fixture is independent of this PR's main change but unblocks the cross-file test run. Live verification (same Langfuse instance as before): - Drove ``worker.run_agent`` against the real ``make_lead_agent`` + ``gpt-4o-mini`` for three distinct ``user_context`` identities (``fancy-engineer``, ``alice-pm``, ``bob-designer``). - Each run produced one ``lead-agent`` trace whose top-level ``sessionId`` / ``userId`` / ``tags`` carry the expected values, e.g. ``session=e2e-2930-8f347c-alice-pm user=alice-pm name='lead-agent' tags=['model:gpt-4o-mini']``. Refs #2930. * fix(tracing): extend root-callback + metadata injection to the embedded client Addresses Copilot review on PR #2944. Commit 2 disabled model-level tracing for ``TitleMiddleware`` and ``_create_summarization_middleware`` because ``_make_lead_agent`` now attaches the tracing callbacks at the graph invocation root. But the embedded ``DeerFlowClient`` does not call ``_make_lead_agent`` — it calls ``_build_middlewares`` directly and never appends the tracing handlers to its ``RunnableConfig``. So under the embedded path, title-generation and summarization LLM calls were left untraced — a regression introduced by this PR. This commit mirrors the gateway worker's injection in ``DeerFlowClient.stream``: - Append ``build_tracing_callbacks()`` to ``config["callbacks"]`` so the Langfuse handler sees ``on_chain_start(parent_run_id=None)`` at the graph root and runs the ``propagate_attributes`` path. - Merge ``build_langfuse_trace_metadata(...)`` into ``config["metadata"]`` with ``setdefault`` so caller-supplied keys still win. - ``_ensure_agent`` now creates its main model with ``attach_tracing=False`` to avoid duplicate spans now that the callback lives at the graph root. Docs: - ``backend/CLAUDE.md`` Tracing section rewritten to describe the graph-root attachment model (replacing the inaccurate "at model-creation time" wording). - ``README.md`` Langfuse section now lists both injection points (worker + client) instead of only the worker path. Tests: - ``tests/test_client_langfuse_metadata.py`` (new, 3 cases): callbacks + metadata are injected when Langfuse is enabled, caller-supplied metadata overrides win via ``setdefault``, and the injection is inert when Langfuse is disabled. Live verification on the real Langfuse instance: === user=fancy-client === id=cbd22847.. session=client-2930-6b9491-fancy-client user=fancy-client name='lead-agent' === user=alice-client === id=b4f6f576.. session=client-2930-6b9491-alice-client user=alice-client name='lead-agent' Refs #2930. * refactor(tracing): address maintainer review on PR #2944 Addresses @WillemJiang's 5 comments. 1. Duplicated metadata-injection code between worker.py and client.py New ``deerflow.tracing.inject_langfuse_metadata(config, ...)`` helper takes the 10-line build + merge + setdefault logic that was duplicated in ``runtime/runs/worker.py`` and ``client.py``. Both callers now share a single source of truth, so the two paths cannot drift. 2. Direct private-attribute mutation in conftest.py and tests Added public ``reset_tracing_config()`` / ``reset_title_config()`` functions. ``tests/conftest.py`` and every test that previously did ``tracing_module._tracing_config = None`` or ``title_module._title_config = TitleConfig()`` now goes through the public API. A future internal rename will surface as an ImportError instead of a silent no-op. 3. client.py reading os.environ directly ``DeerFlowClient.__init__`` grows an optional ``environment`` parameter so programmatic callers can pass the deployment label explicitly. ``stream()`` consults ``self._environment`` first and only falls back to ``DEER_FLOW_ENV`` / ``ENVIRONMENT`` env vars when nothing was passed in. Backwards compatible — env-var behaviour preserved for callers that opt to keep using it. 4. build_tracing_callbacks() cached on hot path Not implemented. Inspected the langfuse v4 ``langchain.CallbackHandler`` constructor: it only resolves the module-level singleton client via ``get_client()`` and initialises a few dicts (no I/O, no env parsing at construction time). The build is essentially free. Caching would trade a non-measurable speedup for two real risks: handler instances carry per-run state internally (``_run_states``, ``_root_run_states``, ``last_trace_id``), and tracing config can be reloaded by env-var changes between runs. Will revisit if profiling ever shows it as a hot spot. 5. attach_tracing=False easy to forget at new in-graph call sites - Module docstring at the top of ``lead_agent/agent.py`` documents the invariant ("every in-graph ``create_chat_model`` MUST pass ``attach_tracing=False``") and enumerates the current sites. - New regression test ``test_make_lead_agent_attaches_tracing_callbacks_at_graph_root`` in ``tests/test_lead_agent_model_resolution.py`` locks both halves of the invariant: ``config["callbacks"]`` carries the tracing handler after ``_make_lead_agent``, AND every ``create_chat_model`` call captured by the test passes ``attach_tracing=False``. A future in-graph site that forgets the flag will fail this test. Lint clean. Full touched-suite bundle: 246 passed. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-21 16:49:31 +08:00
Xinmin Zeng	31513c2ccb	fix(persistence): emit tz-aware timestamps from SQLite-backed stores (#3130 ) SQLAlchemy's DateTime(timezone=True) is a no-op on SQLite (the backend has no native tz type), so values round-tripped through the DB come back as naive datetimes. The four SQL _row_to_dict helpers were calling .isoformat() directly on those naive values, shipping timezone-less strings like "2026-05-20T06:10:22.970977" out of the API. The browser's new Date(...) then parses them as local time, shifting recent threads in /threads/search by the local UTC offset (about 8h in Asia/Shanghai). Route the four call sites through coerce_iso() instead — it already normalizes naive values as UTC and emits "+00:00" so the wire format always carries tz. No data migration is needed; existing SQLite rows read back via the corrected serializer. PostgreSQL deployments are unaffected because timestamptz preserves tzinfo end-to-end. Closes #3120	2026-05-21 16:22:09 +08:00
Airene Fang	923f516deb	feat(trace):LangGraph -> lead_agent and set custom agent_name to run_name (#3101 ) * feat(trace):LangGraph -> lead_agent and set user custom agent name to run_name * feat(trace):follow github copilot suggest * feat(trace):Refactor run_name resolution and improve test coverage	2026-05-21 14:48:28 +08:00
AochenShen99	8b697245eb	fix(sandbox): avoid blocking sandbox readiness polling (#2822 ) * fix(sandbox): offload async sandbox acquisition Run blocking sandbox provider acquisition through the async provider hook so eager sandbox setup does not stall the event loop. * fix(sandbox): add async readiness polling Introduce an async sandbox readiness poller using httpx and asyncio.sleep while preserving the existing synchronous API. * test(sandbox): cover async readiness polling Lock in non-blocking readiness behavior so the async helper does not regress to requests.get or time.sleep. * fix(sandbox): allow anonymous backend creation * fix(sandbox): use async readiness in provider acquisition * fix(sandbox): use async acquisition for lazy tools * test(sandbox): cover anonymous remote creation * fix(sandbox): clamp async readiness timeout budget * fix(sandbox): offload async lock file handling * fix(sandbox): delegate async middleware fallthrough * docs(sandbox): document async acquisition path * fix(sandbox): offload async sandbox release * docs(sandbox): mention async release hook * fix(sandbox): address async lock review Reduce duplicate sync/async sandbox acquisition state handling and move async thread-lock waits onto a dedicated executor with cancellation-safe cleanup. * chore: retrigger ci Retrigger GitHub Actions after upstream main fixed the stale PR merge lint failure. * test(sandbox): sync backend unit fixtures --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-21 14:44:34 +08:00
Nan Gao	dcc6f1e678	feat(loop-detection): defer warning injection (#2752 ) * fix(loop-detection): defer warn injection to wrap_model_call The warn branch in LoopDetectionMiddleware injected a HumanMessage into state from after_model. The tools node had not yet produced ToolMessage responses to the previous AIMessage(tool_calls=...), so the new HumanMessage landed between the assistant's tool_calls and their responses. OpenAI/Moonshot reject the next request with "tool_call_ids did not have response messages" because their validators require tool_calls to be followed immediately by tool messages. Detection now runs in after_model as before, but only enqueues the warning into a per-thread list. Injection happens in wrap_model_call, where every prior ToolMessage is already present in request.messages. The warning is appended at the end as HumanMessage(name="loop_warning") — pairing intact, AIMessage semantics untouched, no SystemMessage issues for Anthropic. Closes #2029, addresses #2255 #2293 #2304 #2511. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(channels): remove loop warning display filter * feat(loop-detection): scope pending warnings by run * docs(loop-detection): update docs * test(loop-detection): assert deferred warnings are queued * fix(loop-detection): cap transient warning state * docs: update docs * add async awrap_model_call test coverage * docs(loop-detection): document transient warnings --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-21 14:36:07 +08:00
InitBoy	e19bec1422	fix(task-tool): cancel and schedule deferred cleanup on polling safety timeout (#3097 ) When the poll loop's safety-net timeout fires (poll_count > max_poll_count), the background subagent task was abandoned without cancellation or cleanup, leaving a stale entry in _background_tasks indefinitely. The original code had a comment promising "the cleanup will happen when the executor completes", but run_task() in executor.py never calls cleanup_background_task after reaching a terminal state -- the promise was never implemented. This change mirrors the asyncio.CancelledError path: signal cooperative cancellation via request_cancel_background_task and schedule _deferred_cleanup_subagent_task to remove the entry once the background thread reaches a terminal state. Direct cleanup at poll-timeout time would introduce a race: run_task() could remove the entry while the poll loop is still mid-iteration, causing a spurious "Task disappeared" error. The deferred approach avoids this by waiting for terminal state before removal. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 07:47:19 +08:00
Yuyi Ao	9afeaf66bc	Fix env resolution in MCP config lists (#2556 ) * Fix env resolution in MCP config lists * fix:unset env variable and consistent function --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-21 07:27:00 +08:00
Airene Fang	b6b3650e50	fix(trace):memory 中文 in trace info is unicode escape sequence. (#3104 ) * fix(trace):memory 中文 in trace is unicode * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-20 22:34:10 +08:00
john lee	9b19cca91c	fix(runtime): make RunManager.cancel() idempotent for already-interrupted runs (#3055 ) (#3058 ) A second cancel() call on an interrupted run returned False, causing the cancel and stream_existing_run router endpoints to raise 409 on double-stop. Fix: return True inside the lock when record.status == RunStatus.interrupted. This covers both the POST /cancel and POST /join endpoints without any re-fetch or extra get() call — the idempotency lives at the source. Also fixes stream_existing_run (the LangGraph SDK stop-button path), which had the identical cancel() → 409 pattern and was not covered by the original PR. Both endpoints share the fix automatically. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 16:37:36 +08:00
Xun	e37912e2c8	feat(sandbox) Adds download file interface in Sandbox (#3038 ) * Add download interface in Sandbox * fix * fix * del invalidate test * fix * safe download * improve	2026-05-20 10:16:31 +08:00
AochenShen99	3599b570a9	fix(harness): wrap all async-only tools for sync clients (#2935 )	2026-05-19 22:11:46 +08:00
He Wang	c810e9f809	fix(harness)!: hydrate runs from RunStore and persist interrupted status (#2932 ) * fix(harness): hydrate run history from RunStore and persist cancellation status fix: - Make RunManager.get() async and hydrate from RunStore when in-memory record is missing - Merge store rows into list_by_thread() with in-memory precedence for active runs - Persist interrupted status to RunStore in cancel() and create_or_reject(interrupt\|rollback) - Extract _persist_status() to reuse the best-effort store update pattern - Await run_mgr.get() in all gateway endpoints - Return 409 with distinct message for store-only runs not active on current worker Closes #2812, Closes #2813 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(harness): consistent sort and guarded hydration in RunManager fix: - list_by_thread() now sorts by created_at desc (newest first) even when no RunStore is configured, matching the store-backed code path - guard _record_from_store() call sites in get() and list_by_thread() with best-effort error handling so a single malformed store row cannot turn read paths into 500s test: - update test_list_by_thread assertion to expect newest-first order - seed MemoryRunStore via public put() API instead of writing to _runs * fix(harness): guard store-only runs from streaming and fix get() TOCTOU Add RunRecord.store_only flag set by _record_from_store so callers can distinguish hydrated history from live in-memory runs. join_run and stream_existing_run (action=None) now return 409 instead of hanging forever on an empty MemoryStreamBridge channel. Re-check _runs under lock after the store await in RunManager.get() so a concurrent create() that lands between the two checks returns the authoritative in-memory record rather than a stale store-hydrated copy. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com> * fix(harness): reorder bridge fetch in join_run and make list_by_thread limit explicit Move get_stream_bridge() after the store_only guard in join_run so a missing bridge cannot produce 503 for historical runs before the 409 guard fires. Add limit parameter to RunManager.list_by_thread (default 100, matching the store's page size) and pass it explicitly to the store call. Update docstring to document the limit instead of claiming all runs are returned. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com> * fix(harness): cap list_by_thread result to limit after merge Apply [:limit] to all return paths in list_by_thread so the method consistently returns at most limit records regardless of how many in-memory runs exist, making the limit parameter a true upper bound on the response size rather than just a store-query hint. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com> * fix `list_by_thread` docstring Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(runtime): add update_model_name to RunStore to prevent SQL integrity errors RunManager.update_model_name() was calling _persist_to_store() which uses RunStore.put(), but RunRepository.put() is insert-only. This caused integrity errors when updating model_name for existing runs in SQL-backed stores. fix: - Add abstract update_model_name method to RunStore base class - Implement update_model_name in MemoryRunStore - Implement update_model_name in RunRepository with proper normalization - Add _persist_model_name helper in RunManager - Update RunManager.update_model_name to use the new method test: - Add tests for update_model_name functionality - Add integration tests for RunManager with SQL-backed store Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(runtime): handle NULL status/on_disconnect in _record_from_store `dict.get(key, default)` only uses the default when the key is absent, so a SQL row with an explicit NULL status would pass `None` to `RunStatus(None)` and raise, breaking hydration for otherwise valid rows. Switch to `row.get(...) or fallback` so both missing and NULL values get a safe default. Add tests for get() and list_by_thread() with a NULL status row to prevent regression. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com> * fix(runs): address PR review feedback on store consistency changes - Fix list_by_thread limit semantics: pass store_limit = max(0, limit - len(memory_records)) to store so newer store records are not crowded out by in-memory records - Remove dead code: cancelled guard after raise is always True, simplify to if wait and record.task - Document _record_from_store NULL fallback policy (status→pending, on_disconnect→cancel) in docstring Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-18 22:25:02 +08:00
KiteEater	3acca12614	fix(subagents): make subagent timeout terminal state atomic (#2583 ) * Guard subagent terminal state transitions * fix: publish subagent terminal status last * Fix subagent timeout test to avoid blocking event loop * Fix subagent timeout test tracking * Refine subagent terminal state handling --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-18 22:19:32 +08:00
Willem Jiang	39f901d3a5	fix(runs): restore historical runs from persistent store after gateway restart (#2989 ) * fix(runs): restore historical runs from persistent store after gateway restart RunManager.list_by_thread() and get() only queried the in-memory _runs dict, returning empty results after a restart even when PostgreSQL had the records. Add store fallback to both read paths and a new async aget() for the API endpoint, keeping sync get() for internal callers that need live task/abort_event state. Fixes #2984 * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(runs): scope run store fallback reads by user id Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/e73daada-1215-4bc1-ab7d-7117826c5013 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * test(runs): clarify ordering expectation and mock store filters Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/e73daada-1215-4bc1-ab7d-7117826c5013 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * test(runs): make user filter fallback assertions explicit Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/e73daada-1215-4bc1-ab7d-7117826c5013 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * test(runs): verify user-isolated fallback behavior with memory store Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/e73daada-1215-4bc1-ab7d-7117826c5013 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * update the code with feedback from issue-2984 --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-17 20:03:21 +08:00
魔力鸟	e74e126ed3	fix(sandbox): scope provisioner PVC data by user (#2973 ) * fix(sandbox): scope provisioner PVC data by user * Address provisioner PVC review feedback	2026-05-17 15:23:42 +08:00
Willem Jiang	a814ab50b5	fix(skills): make security scanner JSON parsing robust for LLM output variations (#2987 ) The moderation model's response was silently falling through to a conservative block when LLMs wrapped structured output in markdown code fences, added prose around the JSON, returned case-variant decisions (e.g. "Allow"), or included nested braces in the reason field. The greedy `\{.*\}` regex also over-matched on nested braces. - Rewrite _extract_json_object() with markdown fence stripping and brace-balanced string-aware extraction - Normalize decision field to lowercase for case-insensitive matching - Distinguish "model unavailable" from "unparseable output" in fallback - Strengthen system prompt to explicitly forbid code fences and prose - Add 15 tests covering all reported scenarios Fixes #2985	2026-05-17 08:59:42 +08:00
Xinmin Zeng	380255f722	fix(sandbox): uphold /mnt/user-data contract at Sandbox API boundary (#2873 ) (#2881 ) * fix(sandbox): uphold /mnt/user-data contract at Sandbox API boundary (#2873) LocalSandboxProvider used a process-wide singleton with no /mnt/user-data mapping, forcing every caller to translate virtual paths via tools.py before invoking the public Sandbox API. AIO already exposes /mnt/user-data natively (per-thread bind mounts), so the same code path behaved differently across implementations — and direct callers like uploads.py:282 / feishu.py:389 only worked thanks to the `uses_thread_data_mounts` workaround flag. Switch the provider to a dual-track cache: keep the `"local"` singleton for legacy acquire(None) callers (backward-compat for existing tests and scripts), and create a per-thread LocalSandbox with id `"local:{tid}"` for acquire(thread_id). Each per-thread instance carries PathMapping entries for /mnt/user-data, its three subdirs, and /mnt/acp-workspace, mirroring how AioSandboxProvider mounts those paths into its container. is_local_sandbox() now recognises both id formats. `_agent_written_paths` becomes per-thread (it was a process-wide set that leaked across threads — a latent isolation bug also fixed by this change). Verified via TDD: a new contract test suite hits the public Sandbox API directly (write/read/list/exec/glob/grep/update + per-thread isolation + lifecycle). 3212 backend tests still pass, ruff is clean. * fix(sandbox): address Copilot review on #2881 Three follow-ups from Copilot's review of the LocalSandboxProvider refactor: 1. Synchronisation: ``acquire`` / ``get`` / ``reset`` mutated the cache without any lock, so concurrent acquire of the same ``thread_id`` could create two ``LocalSandbox`` instances and lose one's ``_agent_written_paths`` state. Add a provider-wide ``threading.Lock`` (matching ``AioSandboxProvider``) and build per-thread mappings outside the lock to avoid holding it during the ``ensure_thread_dirs`` filesystem touch. 2. Memory bound: ``_thread_sandboxes`` grew monotonically. Replace the plain dict with an ``OrderedDict`` LRU capped at ``DEFAULT_MAX_CACHED_THREAD_SANDBOXES`` (256, configurable per provider instance). ``get`` promotes touched threads to the MRU end so an active thread isn't evicted under load. Eviction is graceful: the next ``acquire`` rebuilds a fresh sandbox; only ``_agent_written_paths`` (reverse-resolve hint) is lost. 3. Docs: update ``CLAUDE.md`` to reflect the new per-thread architecture, the LRU cap, and that ``is_local_sandbox`` recognises both id formats. New regression tests: - Concurrent ``acquire("alpha")`` from 8 threads yields a single instance (slow-init injection forces the race window wide open). - Concurrent ``acquire`` of distinct thread_ids yields distinct instances. - The cache evicts the least-recently-used thread once the cap is exceeded. - ``get`` promotes recency so a polled thread survives a later acquire-storm.	2026-05-17 08:26:04 +08:00
Nan Gao	0c37509b38	fix(middleware): Prevent todo completion reminder IMMessage leak (#2907 ) * fix(middleware): Prevent todo completion reminder IMMessage leak (#2892) * make format * fix(middleware): Clear stale todo reminder counts (#2892) * add size guard for _completion_reminder_counts and add a integration test	2026-05-15 22:12:37 +08:00
LawranceLiao	181d836541	fix(middleware): normalize tool result adjacency before model calls (#2939 ) * normalizing tool-call transcripts before invocation * test(middleware): cover tool result regrouping edge cases	2026-05-15 22:09:04 +08:00
Nan Gao	45060a9ffc	fix(runtime): avoid postgres aggregate row lock (#2962 )	2026-05-15 10:32:09 +08:00
LawranceLiao	722c690f4f	fix(memory): isolate queued memory updates by agent (#2941 ) * fix(memory): isolate queued memory updates by agent * fix(memory): include user in queue identity * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Fix the lint error --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-15 10:26:35 +08:00
YuJitang	eab7ae3d62	feat: stream subagent token usage to header via terminal task events (#2882 ) * feat: real-time subagent token usage display in header and per-turn Backend: - Persist subagent token usage to AIMessage.usage_metadata via TokenUsageMiddleware, so accumulateUsage() naturally includes subagent tokens without frontend state management - Cache subagent usage by tool_call_id in task_tool, write back to the dispatching AIMessage on next model response - Emit subagent token usage on all terminal task events (task_completed, task_failed, task_cancelled, task_timed_out) - Report subagent usage to parent RunJournal for API totals - Search backward from ToolMessage to find dispatching AIMessage for correct multi-tool-call attribution Frontend: - Remove subagentUsage state, custom event handling, and prop threading — subagent tokens are now embedded in message metadata - Simplify selectHeaderTokenUsage (no subagentUsage parameter) - Per-turn inline badges show turn-specific usage via message accumulation - Remove isLoading guard from MessageTokenUsageList for dynamic updates during streaming * fix: prevent header token double counting from baseline reset race onFinish, onError, and thread-switch useEffect all reset pendingUsageBaselineMessageIdsRef to an empty Set. If thread.isLoading is still true on the next render, all messages pass the getMessagesAfterBaseline filter and their tokens are added to backendUsage (which already includes them), causing the header to display up to 2× the actual token count. Capture current message IDs instead of using an empty Set so that getMessagesAfterBaseline correctly returns no pending messages even if thread.isLoading lags behind the stream end. * fix: write back subagent tokens for all concurrent task tool calls TokenUsageMiddleware only processed messages[-2], so when a single model response dispatched multiple task tool calls only the last ToolMessage had its cached subagent usage written back to the dispatch AIMessage.usage_metadata. Earlier tasks' usage stayed in _subagent_usage_cache indefinitely (leak) and never appeared in the per-turn inline token display. Walk backward through all consecutive ToolMessages before the new AIMessage, and accumulate updates targeting the same dispatch message into one state update so overlapping writes don't clobber each other. * fix: clean up subagent usage cache entry on task cancellation When a task_tool invocation is cancelled via CancelledError, any cached subagent usage entry leaked because the TokenUsageMiddleware writeback path never fires after cancellation. Pop the cache entry before re-raising to prevent unbounded growth of the module-level _subagent_usage_cache dict. * fix: address token usage review feedback * fix: handle missing config for subagent usage cache --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-13 23:52:19 +08:00
Xinmin Zeng	f1a0ab699a	fix(tools): preserve tool_search promotions across re-entrant get_available_tools (#2885 ) * fix(tools): preserve tool_search promotions across re-entrant get_available_tools Closes #2884. ``get_available_tools`` used to unconditionally call ``reset_deferred_registry()`` and rebuild a fresh ``DeferredToolRegistry`` on every invocation. That works for the first call of a request (the ContextVar starts at its default of ``None``), but any RE-ENTRANT call during the same async context — e.g. ``task_tool`` building a subagent's toolset, or a custom middleware that rebuilds tools mid-run — wiped any ``tool_search`` promotions the parent agent had already made. The ``DeferredToolFilterMiddleware`` would then re-hide those tools from the next model call, leaving the agent able to see a tool's name (via the prior ``tool_search`` result that's still in conversation history) but unable to invoke it. Fix: when the ContextVar already holds a registry, reuse it instead of rebuilding. Fresh requests still get a fresh registry because each new graph run starts in a new asyncio task with the ContextVar at ``None``. ## Verification - Unit-level reproduction (``test_get_available_tools_resets_registry_wiping_promotion``): promote a tool in the registry, call ``get_available_tools`` again, assert the promotion is preserved. Fails on main, passes on this branch. - Graph-execution reproduction (two tests): drive a real ``langchain.agents.create_agent`` graph with the real ``DeferredToolFilterMiddleware`` through two model turns, including one that issues a re-entrant ``get_available_tools`` call to simulate the task_tool subagent path. - Real-LLM end-to-end (``test_deferred_tool_promotion_real_llm.py``, opt-in via ``ONEAPI_E2E=1``): drives the same flow against a real OpenAI-compatible model (verified on GPT-5.4-mini through the one-api gateway), watches the model call the promoted ``fake_calculator`` through the deferred-filter middleware, and asserts the right arithmetic result. Passes against the fixed branch. - Companion update to ``test_tool_deduplication.py``: dropped the ``@patch("deerflow.tools.tools.reset_deferred_registry")`` decorators because the symbol is no longer imported there. - Test fixtures in the new files patch ``deerflow.tools.tools.get_app_config`` with a minimal ``model_construct``-ed ``AppConfig`` instead of calling the real loader, so they never trigger ``_apply_singleton_configs`` and never leak ``_memory_config``/``_title_config``/… mutations into the rest of the suite. Full backend suite: 3208 passed / 14 skipped / 0 failed. ruff check + format clean. * fix(tools): address Copilot review on #2885 - tools.py: rewrite the reuse-path comment to spell out (a) why we don't reconcile the registry against the current ``mcp_tools`` snapshot — the MCP cache doesn't refresh mid-graph-run, the lead agent's ``ToolNode`` is already bound to the previous tool set anyway, and ``promote()`` drops the entry so a naive re-sync misclassifies promotions as new tools — and (b) why the log uses ``max(0, …)`` to avoid negative counts when the cache shrinks between snapshots. - Replace direct ``ts_mod._registry_var.set(None)`` in test fixtures with the public ``reset_deferred_registry()`` helper so tests don't couple to module internals. - Correct the docstring path in ``test_deferred_tool_registry_promotion.py`` to match the actual monkeypatch target (``deerflow.mcp.cache.get_cached_mcp_tools``). - Rename ``test_get_available_tools_resets_registry_wiping_promotion`` to ``test_get_available_tools_preserves_promotions_across_reentrant_calls`` so the test name describes the contract being asserted, not the bug it originally reproduced. Full backend suite: 3208 passed / 14 skipped. Real-LLM e2e: 1 passed.	2026-05-13 23:45:47 +08:00
Eilen Shin	2a1ac06bf4	fix(persistence): reuse token usage model grouping expression (#2910 )	2026-05-13 15:49:34 +08:00
He Wang	e9deb6c2f2	perf(harness): push thread metadata filters into SQL (#2865 ) * perf(harness): push thread metadata filters into SQL Replace Python-side metadata filtering (5x overfetch + in-memory match) with database-side json_extract predicates so LIMIT/OFFSET pagination is exact regardless of match density. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com> * fix(harness): add dialect-aware JsonMatch compiler for type-safe metadata SQL filters Replace SQLAlchemy JSON index/comparator APIs with a custom JsonMatch ColumnElement that compiles to json_type/json_extract on SQLite and jsonb_typeof/->>/-> on PostgreSQL. Tighten key validation regex to single-segment identifiers, handle None/bool/numeric value types with json_type-based discrimination, and strengthen test coverage for edge cases and discriminability. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com> * fix(harness): address Copilot review comments on JSON metadata filters - Use json_typeof instead of jsonb_typeof in PostgreSQL compiler; the metadata_json column is JSON not JSONB so jsonb_typeof would error at runtime on any PostgreSQL backend - Align _is_safe_json_key with json_match's _KEY_CHARSET_RE so keys containing hyphens or leading digits are not silently skipped - Add thread_id as secondary ORDER BY in search() to make pagination deterministic when updated_at values collide; remove asyncio.sleep from the pagination regression test Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com> * fix(harness): address remaining review comments on metadata SQL filters - Remove _is_safe_json_key() and reuse json_match ValueError to avoid validator drift (Copilot #3217603895, #3217411616) - Raise ValueError when all metadata keys are rejected so callers never get silent unfiltered results (WillemJiang) - Fix integer precision: split int/float branches, bind int as Integer() with INTEGER/BIGINT CAST instead of float() coercion (Copilot #3217603972) - Fix jsonb_typeof -> json_typeof on JSON column (Copilot #3217411579) - Replace manual _cleanup() calls with async yield fixture so teardown always runs (Copilot #3217604019) - Remove asyncio.sleep(0.01) pagination ordering; use thread_id secondary sort instead (Copilot #3217411636) - Add type annotations to _bind/_build_clause/_compile_* and remove EOL comments from _Dialect fields (coding.mdc) - Expand test coverage: boolean/null/mixed-type/large-int precision, partial unsafe-key skip with caplog assertion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(harness): address third-round Copilot review comments on JsonMatch - Reject unsupported value types (list, dict, ...) in JsonMatch.__init__ with TypeError so inherit_cache=True never receives an unhashable value and callers get an explicit error instead of silent str() coercion (Copilot #3217933201) - Upgrade int bindparam from Integer() to BigInteger() to align with BIGINT CAST and avoid overflow on large integers (Copilot #3217933252) - Catch TypeError alongside ValueError in search() so non-string metadata keys are warned and skipped rather than raising unexpectedly (Copilot #3217933300) - Add three tests: json_match rejects unsupported value types, search() warns and raises on non-string key, search() warns and raises on unsupported value type Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(harness): address fourth-round Copilot review comments on JsonMatch - Add CASE WHEN guard for PostgreSQL integer matching: json_typeof returns 'number' for both ints and floats; wrap CAST in CASE with regex guard '^-?[0-9]+$' so float rows never trigger CAST error (Copilot #3218413860) - Validate isinstance(key, str) before regex match in JsonMatch.__init__ so non-string keys raise ValueError consistently instead of TypeError from re.match (Copilot #3218413900) - Include exception message in metadata filter skip warning so callers can distinguish invalid key from unsupported value type (Copilot #3218413924) - Update tests: assert CASE WHEN guard in PG int compilation, cover non-string key ValueError in test_json_match_rejects_unsafe_key Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(harness): align ThreadMetaStore.search() signature with sql.py implementation Use `dict[str, Any]` for `metadata` and `list[dict[str, Any]]` as return type in base class and MemoryThreadMetaStore to resolve an LSP signature mismatch; also correct a test docstring that cited the wrong exception type. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(harness): surface InvalidMetadataFilterError as HTTP 400 in search endpoint Replace bare ValueError with a domain-specific InvalidMetadataFilterError (subclass of ValueError) so the Gateway handler can catch it and return HTTP 400 instead of letting it bubble up as a 500. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com> * fix(harness): sanitize metadata keys in log output to prevent log injection Use ascii() instead of %r to escape control characters in client-supplied metadata keys before logging, preventing multiline/forged log entries. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(harness): validate metadata filters at API boundary and dedupe key/value rules - Add Pydantic ``field_validator`` on ``ThreadSearchRequest.metadata`` so unsafe keys / unsupported value types are rejected with HTTP 422 from both SQL and memory backends (closes Copilot review 3218830849). - Export ``validate_metadata_filter_key`` / ``validate_metadata_filter_value`` (and ``ALLOWED_FILTER_VALUE_TYPES``) from ``json_compat`` and have ``JsonMatch.__init__`` reuse them — the Gateway-side validator and the SQL-side ``JsonMatch`` constructor now share one admission rule and cannot drift. - Format ``InvalidMetadataFilterError`` rejected-keys list as a comma-separated plain string instead of a Python list repr so the surfaced HTTP 400 detail is readable (closes Copilot review 3218830899). - Update router tests to cover both 422 boundary paths plus the 400 defense-in-depth path when a backend still raises the error. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(harness): harden JsonMatch compile-time key validation against __init__ bypass Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com> * fix: address review feedback on metadata filter SQL push-down - Add signed 64-bit range check to validate_metadata_filter_value; give out-of-range ints a distinct TypeError message. - Replace assert guards in _compile_sqlite/_compile_pg with explicit if/raise so they survive python -O optimisation. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4 <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 23:21:22 +08:00
Xinmin Zeng	68d8caec1f	fix(agents): make update_agent honor runtime.context user_id like setup_agent (#2867 ) * fix(agents): make update_agent honor runtime.context user_id like setup_agent PR #2784 hardened setup_agent to prefer runtime.context["user_id"] (set by inject_authenticated_user_context from the auth-validated request) over the contextvar, so an agent created during the bootstrap flow always lands under users/<auth_uid>/agents/<name>. update_agent was left calling get_effective_user_id() unconditionally — the same class of bug that produced issues #2782 / #2862 still applies whenever the contextvar is not available on the executing task (background work, future cross-process drivers, checkpoint resume on a different task). In that regime update_agent silently routes writes to users/default/agents/<name>, corrupting the shared default bucket and losing the user's edit. Extract the resolution policy into a shared resolve_runtime_user_id helper on deerflow.runtime.user_context and route both setup_agent and update_agent through it so the two halves of the lifecycle stay in lockstep. Add load-bearing end-to-end tests that drive a real langchain.agents create_agent graph with a fake LLM, exercising the full pipeline: HTTP wire format -> app.gateway.services.start_run config-assembly -> deerflow.runtime.runs.worker._build_runtime_context -> langchain.agents create_agent graph -> ToolNode dispatch (sync + async + sub-graph + ContextThreadPoolExecutor) -> setup_agent / update_agent The negative-control tests intentionally land in users/default/ to prove the positive tests are actually load-bearing rather than vacuously passing. The new test_update_agent_e2e_user_isolation suite included a test that failed against main and now passes after this fix. * style: ruff format on new e2e tests * test(e2e): real-server HTTP test driving setup_agent through the full ASGI stack Adds tests/test_setup_agent_http_e2e_real_server.py — a single load-bearing test that drives the entire FastAPI gateway through starlette.testclient. TestClient with no mocks above the LLM: - lifespan boots (config, sqlite engine, LangGraph runtime, channels) - POST /api/v1/auth/register (real password hash, real sqlite write, issues access_token + csrf_token cookies) - POST /api/threads (real thread_meta + checkpoint creation) - POST /api/threads/{id}/runs/stream with the exact wire shape the React frontend sends (assistant_id + input + config + context with agent_name/is_bootstrap) - AuthMiddleware -> CSRFMiddleware -> require_permission -> start_run -> inject_authenticated_user_context -> asyncio.create_task(run_agent) -> worker._build_runtime_context -> Runtime injection -> ToolNode dispatch -> real setup_agent - Asserts SOUL.md is under users/<authenticated_uid>/agents/<name>/ and NOT under users/default/agents/<name>/. DEER_FLOW_HOME and the sqlite path are redirected into tmp_path so the test never touches the real .deer-flow directory or developer database. The only patch above the LLM boundary is replacing create_chat_model with a fake that emits a single setup_agent tool_call. This is the "真实验证" answer: it reproduces what curl-against-uvicorn would do, minus the network socket layer. * test: address Copilot review on user-isolation e2e tests - Drop "currently expected to FAIL" wording from update_agent e2e docstring and header (Copilot review): the fix is in this PR, the test pins the corrected behaviour rather than driving a future change. - Rephrase the assertion failure messages from "BUG:" to "REGRESSION:" to match the test's role on the fixed branch. - Bound _drain_stream with a wall-clock timeout, a max-bytes cap, and an early break on the "event: end" SSE frame (Copilot review). Stops the test from hanging on a stuck run or runaway heartbeat loop. - Replace the misleading "patch both module aliases" comment with an explanation of why patching lead_agent.agent.create_chat_model is the only correct target (Copilot review): lead_agent rebinds the symbol into its own namespace at import time, so patching deerflow.models is too late. * test(refactor): address WillemJiang review on user-isolation e2e tests - Extract the duplicated FakeToolCallingModel (and a build_single_tool_call_model helper) into tests/_agent_e2e_helpers.py. All three e2e files now import from the shared module instead of redefining the shim locally. - Convert the manual p.start() / p.stop() try/finally blocks in test_update_agent_e2e_user_isolation.py to contextlib.ExitStack so patch lifecycle is Pythonic and exception-safe. - Lift the isolated_app fixture's private-attribute resets into a named _reset_process_singletons helper with a comment block explaining why each singleton has to be invalidated for true e2e isolation, and why raising=False is intentional. Makes the fragility visible and the intent self-documenting rather than leaving the resets inline as opaque monkeypatch calls. Net change: -59 lines (143 -> 84) across the three test files, with every assertion intact. Full suite remains 69 passed / lint clean. * test(e2e): make real-server test self-supply its config CI's actions/checkout only ships config.example.yaml (the real config.yaml is gitignored), so the production config-discovery search (./config.yaml -> ../config.yaml -> $DEER_FLOW_CONFIG_PATH) finds nothing and the test fails at lifespan boot with FileNotFoundError. The dev-machine run passed only because a local config.yaml happened to exist. Write a minimal AppConfig-valid yaml into tmp_path and pin DEER_FLOW_CONFIG_PATH to it. The yaml carries just what the schema requires (a single fake-test-model entry, LocalSandboxProvider, sqlite database). The LLM never gets instantiated because the test patches create_chat_model on the lead agent module, so the api_key/base_url stay placeholders. Verified by hiding the local config.yaml to mirror the CI checkout — the test now passes in both environments.	2026-05-12 23:18:54 +08:00
Nan Gao	20d2d2b373	fix(middleware): Handle invalid tool calls in dangling pairing middleware (#2890 ) (#2891 )	2026-05-12 10:55:13 +08:00
AochenShen99	bedbf2291e	fix(harness): wrap async-only config tools for sync client execution (#2878 ) * fix(harness): wrap async-only config tools for sync clients * refactor(tools): share async tool sync wrapper	2026-05-11 22:14:13 +08:00
Yi Tang	de253e4a0a	feat(run): Propagates `model_name` from the gateway request through the runtime and persistence stack to the SQLite database. (#2775 ) * feat(run): propagate model_name from gateway request context to persistence layer Pass model_name through the full run creation pipeline — from RunCreateRequest.context in the gateway, through RunManager, to the RunStore interface and SQL persistence. This enables client-specified model selection to be recorded per-run in the database. * feat(run): add model allowlist validation and effective model name capture - Validate model_name against allowlist in gateway services.py using get_app_config().get_model_config() - Truncate model_name to 128 chars to match DB column constraint - In worker.py, capture effective model name from agent.metadata after agent creation and persist if resolved differently than requested * feat(run): add defense-in-depth model_name normalization and round-trip persistence tests - Add _normalize_model_name() to RunRepository for whitespace stripping and 128-char truncation before DB writes. - Add round-trip unit tests for model_name creation and default None in test_run_manager.py. * fix(run): coerce non-string model_name values before strip/truncate in _normalize_model_name * fix(gateway): add runtime type guard for model_name coercion in gateway services Add isinstance check and str() coercion before calling .strip() to prevent AttributeError when non-string types (int, None, etc.) flow through the gateway. Paired with SQL integration test for end-to-end model_name persistence across gateway → langgraph → persistence layer. * fix(run): drop Alembic migration for model_name (no-op) and expose public update method on RunManager - Drop a1b2c3d4e5f6 migration: model_name already exists in RunRow schema and is auto-created via Base.metadata.create_all() at startup - Add update_model_name() public method to RunManager to replace the private _persist_to_store call in worker.py, preserving internal locking/persistence	2026-05-11 21:45:18 +08:00
Nan Gao	2eb11f97ab	fix(runtime): persist run message summaries (#2850 ) * fix(runtime): persist run message summaries (#2849) * fix(runtime): dedupe run message summaries	2026-05-11 19:54:00 +08:00
Willem Jiang	813d3c94ef	fix(subagents): consolidate system_prompt and skills into single SystemMessage (#2701 ) * fix(subagents): consolidate system_prompt and skills into single SystemMessage Some LLM APIs (vLLM, Xinference, Chinese LLM providers) reject multiple system messages with \”System message must be at the beginning.\” The subagent executor was sending separate SystemMessages for the configured system_prompt and each loaded skill, which caused failures when calling task tool with sub-agents. Merge system_prompt and all skill content into one SystemMessage in the initial state, and pass system_prompt=None to create_agent() so the factory doesn't prepend a second one. Fixes #2693 * fix(subagents): update SubagentConfig.system_prompt to str \| None and add astream regression test Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2ee03a26-e19b-4106-abc5-c76a2906383b Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * fixed the lint error * fix the lint error in the backend * fix the unit test error of test_subagent_executor --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-11 09:59:06 +08:00
KiteEater	2b5bece744	fix(harness): reset local sandbox singleton with provider lifecycle (#2834 ) * Fix local sandbox singleton reset on provider lifecycle * Fix local sandbox singleton reset on provider reset --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-11 07:42:15 +08:00
Maz Benoscar	30a5846219	fix(tools): make write_file append discoverable in model-facing schema (#2843 ) * fix: make tool argument behavior discoverable The write_file tool already supported append=false by default with append=true for end-of-file writes, but the parsed docstring did not describe append in the model-facing schema. This records the overwrite default and append path in the tool description, adds resilient schema regression coverage, and keeps backend sandbox docs aligned. The regression now also checks that every public parameter in the existing tool schema test matrix has a description. Enabling docstring parsing on setup_agent and update_agent fills the two existing gaps with their existing Args docs instead of duplicating descriptions elsewhere. Constraint: Issue #2831 asks for a small docstring/schema discoverability fix without changing runtime file-writing behavior Rejected: Changing write_file defaults \| would alter existing overwrite semantics and broaden the fix beyond schema discoverability Rejected: Exact phrase assertions \| too brittle for future docstring rewording while testing the same behavior Confidence: high Scope-risk: narrow Directive: Keep model-facing tool parameters documented through parsed docstrings or equivalent schema descriptions Tested: cd backend && uv run pytest tests/test_setup_agent_tool.py tests/test_update_agent_tool.py tests/test_tool_args_schema_no_pydantic_warning.py tests/test_sandbox_tools_security.py::test_str_replace_and_append_on_same_path_should_preserve_both_updates -q Tested: cd backend && uv run ruff check packages/harness/deerflow/sandbox/tools.py packages/harness/deerflow/tools/builtins/setup_agent_tool.py packages/harness/deerflow/tools/builtins/update_agent_tool.py tests/test_tool_args_schema_no_pydantic_warning.py Not-tested: Full backend test suite Co-authored-by: OmX <omx@oh-my-codex.dev> * Fix the lint error --------- Co-authored-by: OmX <omx@oh-my-codex.dev> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-10 23:09:03 +08:00
YuJitang	9892a7d468	fix: bucket subagent token usage into parent run totals (#2838 ) * fix: bucket subagent token usage into RunRow.subagent_tokens Add caller-bucketed token tracking to RunJournal so subagent and middleware LLM calls are written to the correct RunRow columns instead of all falling into lead_agent_tokens (default 0). - RunJournal: accumulate _lead_agent_tokens / _subagent_tokens / _middleware_tokens in on_llm_end, deduped by langchain run_id. Add record_external_llm_usage_records() for external sources (respects track_token_usage flag). Return caller buckets from get_completion_data(). - SubagentTokenCollector: new lightweight callback handler that collects LLM usage within subagent execution. - SubagentExecutor: wire collector into subagent run_config and sync records to SubagentResult on every chunk (timeout/cancel safe). - SubagentResult: add token_usage_records and usage_reported fields. - task_tool: report subagent usage to parent RunJournal on every terminal status (COMPLETED/FAILED/CANCELLED/TIMED_OUT), including the CancelledError path, guarded against double-reporting. No DB migration needed — RunRow columns already exist. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix: address token usage review feedback * Address review follow-ups --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-10 22:47:30 +08:00
Xinmin Zeng	94da8f67d7	fix(scripts): preserve uv extras across `make dev` restarts (#2754 ) (#2767 ) `make dev` ran `uv sync` unconditionally on every restart, wiping any optional extras the user had installed manually with `uv sync --all-packages --extra postgres`. The Docker image-build path already solved this via the `UV_EXTRAS` build-arg in backend/Dockerfile; the local serve.sh path and the docker-compose-dev startup command were the remaining outliers. `scripts/serve.sh` now resolves extras before `uv sync`: 1. honors `UV_EXTRAS` (parity with backend/Dockerfile and docker/docker-compose.yaml — no new convention introduced); 2. falls back to parsing config.yaml — `database.backend: postgres` or legacy `checkpointer.type: postgres` auto-pins `--extra postgres`, so the common case needs zero extra config. 3. detector stderr is no longer suppressed, so whitelist warnings or crashes surface to the dev terminal (review feedback). Detection lives in `scripts/detect_uv_extras.py` (stdlib-only — has to run before the venv exists). Extra names are validated against `^[A-Za-z][A-Za-z0-9_-]$` so a stray shell metacharacter in `.env` cannot reach `uv sync` downstream (defense in depth). `docker/docker-compose-dev.yaml`'s startup command is now extracted to `docker/dev-entrypoint.sh` (review feedback — the inline command had grown to a ~350-char one-liner). The script: - parses comma/whitespace-separated UV_EXTRAS, applying the same `^[A-Za-z][A-Za-z0-9_-]$` whitelist as the local detector; - emits one `--extra X` flag per token, so `UV_EXTRAS=postgres,ollama` works in Docker dev too (harmonized with local — review feedback); - calls `uv sync --all-packages` (PR #2584) so workspace member extras (deerflow-harness's postgres extra) are installed; - keeps the existing self-heal `(uv sync \|\| (recreate venv && retry))` branch; - exposes `--print-extras` for dry-run testing. The compose file mounts the script read-only at runtime, so script edits take effect on `make docker-restart` without an image rebuild. The `--no-sync` alternative (a separate suggestion in the issue thread) was considered but rejected for dev paths because it would drop the self-heal branch and the auto-pickup of new pyproject deps. `--no-sync` is already in use for the production CMD (`backend/Dockerfile:101`) where it's appropriate. Updates the asyncpg-missing error message to include the `--all-packages` flag (matching #2584) plus the persistent install flow, and expands `config.example.yaml` so all three install paths (local / docker dev / docker image build) are documented with their multi-extra capabilities. Tests: - `tests/test_detect_uv_extras.py` (21 tests) — local-path env parsing, YAML edge cases, env-vs-config precedence, whitelist rejection of shell metacharacters. - `tests/test_dev_entrypoint.py` (15 tests) — docker-path validation via `--print-extras`, multi-extra parsing, metacharacter abort. - `tests/test_persistence_scaffold.py` (22 tests, unchanged) — passes with the merged `--all-packages --extra postgres` error message. Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-10 22:28:29 +08:00
YuJitang	5127f08e1a	enable token usage by default (#2841 )	2026-05-10 22:00:57 +08:00
DanielWalnut	08ee7adeba	fix(lint): remove duplicate is_dynamic_context_reminder definition (#2837 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 23:40:46 +08:00
Eilen Shin	1c96a6afc8	fix: keep new agent bootstrap in user scope (#2784 )	2026-05-09 19:43:50 +08:00
DanielWalnut	881ff71252	fix(harness): preserve dynamic context across summarization (#2823 )	2026-05-09 19:39:36 +08:00
DanielWalnut	f76e4e35c8	fix title generation with dynamic context reminder (#2830 )	2026-05-09 18:22:58 +08:00
yangyufan	0d1053ca44	fix(uploads): add Windows support for safe symlink-protected uploads (#2794 ) * fix(uploads): add Windows support for safe symlink-protected uploads * fix(uploads): update tests and translate comments;	2026-05-09 18:21:54 +08:00
KiteEater	7caf03e97c	fix(packaging): add postgres extra for store/checkpointer supportFix postgres extra install guidance (#2584 ) * Fix postgres extra install guidance * Fix postgres install message lint * Format postgres install messages * Fix postgres install guidance and config docs	2026-05-09 09:49:08 +08:00
DanielWalnut	c1b7f1d189	feat: static system prompt with DynamicContextMiddleware for prefix-cache optimization (#2801 ) * feat(middleware): inject dynamic context via DynamicContextMiddleware Move memory and current date out of the system prompt and into a dedicated <system-reminder> HumanMessage injected once per session (frozen-snapshot pattern) via a new DynamicContextMiddleware. This keeps the system prompt byte-exact across all users and sessions, enabling maximum Anthropic/Bedrock prefix-cache reuse. Key design decisions: - ID-swap technique: reminder takes the first HumanMessage's ID (replacing it in-place via add_messages), original content gets a derived `{id}__user` ID (appended after). Preserves correct ordering. - hide_from_ui: True on reminder messages so frontend filters them out. - Midnight crossing: date-update reminder injected before the current turn's HumanMessage when the conversation spans midnight. - INFO-level logging for production diagnostics. Also adds prompt-caching breakpoint budget enforcement tests and updates ClaudeChatModel docs to reference the new pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(token-usage): log input/output token detail breakdown in middleware Extend the LLM token usage log line to include input_token_details and output_token_details (cache_creation, cache_read, reasoning, audio, etc.) when present. Adds tests covering Anthropic cache detail logging from both usage_metadata and response_metadata. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: fix nginx * fix(middleware): always inject date; gate memory on injection_enabled Date injection is now unconditional — it is part of the static system prompt replacement and should always be present. Memory injection remains gated by `memory.injection_enabled` in the app config. Previously the entire DynamicContextMiddleware was skipped when injection_enabled was False, which also suppressed the date. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): format files and correct test assertions for token usage middleware - ruff format dynamic_context_middleware.py and test_claude_provider_prompt_caching.py - Remove unused pytest import from test_dynamic_context_middleware.py - Fix two tests that asserted response_metadata fallback logic that doesn't exist: replace with tests that match actual middleware behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(middleware): address Copilot review comments on DynamicContextMiddleware - Use additional_kwargs flag for reminder detection instead of content substring matching, so user messages containing '<system-reminder>' are not mistakenly treated as injected reminders - Generate stable UUID when original HumanMessage.id is None to prevent ambiguous 'None__user' derived IDs and message collisions - Downgrade per-turn no-op log to DEBUG; keep actual injection events at INFO - Add two new tests: missing-id UUID fallback and user-text false-positive Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 09:27:02 +08:00
DanielWalnut	2b1fcb3e43	fix(task): remove max_turns parameter from task tool interface (#2783 ) * fix(task): remove max_turns parameter from task tool interface Subagents should always use their configured max_turns value. Exposing this parameter allowed callers to override the admin-configured limit, which is undesirable. The value is now exclusively driven by subagent config (per-agent overrides and global defaults in config.yaml). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-08 15:05:24 +08:00
He Wang	7de9b5828b	fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning (#2774 ) * fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning Add deerflow/tools/types.py with: Runtime = ToolRuntime[dict[str, Any], ThreadState] Replace every runtime: ToolRuntime[ContextT, ThreadState] and runtime: ToolRuntime[dict[str, Any], ThreadState] annotation in sandbox/tools.py, present_file_tool.py, task_tool.py, view_image_tool.py, and skill_manage_tool.py with the new Runtime alias. The unbound ContextT TypeVar (default None) caused PydanticSerializationUnexpectedValue warnings on every tool call because LangChain's BaseTool._parse_input calls model_dump() on the auto-generated args_schema while DeerFlow passes a dict as runtime context. Binding the context to dict[str, Any] aligns Pydantic's serialization expectations with reality and removes the noise from all run modes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com> * fix(tools): extend Runtime alias to setup_agent and update_agent tools Replace bare ToolRuntime annotations in setup_agent_tool.py and update_agent_tool.py with the shared Runtime alias introduced in the previous commit, and add both tools to the Pydantic serialization warning regression test (13 cases total). Co-authored-by: Cursor <cursoragent@cursor.com> * test(tools): loosen Pydantic warning filter to avoid version-specific format Replace the brittle "field_name='context'" substring check with a looser "context" match so the assertion stays valid if Pydantic changes its internal warning format across versions. Co-authored-by: Cursor <cursoragent@cursor.com> * test(tools): simplify warning filter and clean up docstring Remove the "context" substring condition from the Pydantic warning filter — asserting that no PydanticSerializationUnexpectedValue fires at all is both simpler and more comprehensive, since the test payload contains only the tool's own args plus runtime. Also update the module docstring to remove the version-specific warning format example that was inconsistent with the looser filter. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 14:50:33 +08:00
Eilen Shin	37db689349	fix(events): serialize structured db event content (#2762 )	2026-05-08 10:17:17 +08:00
Eilen Shin	bd45cb2846	fix(sandbox): disable msys path conversion (#2766 )	2026-05-08 10:13:11 +08:00
Eilen Shin	5fd0e6ac89	fix(middleware): sync raw tool call metadata (#2757 )	2026-05-08 10:08:53 +08:00
Tao Liu	daa3ffc29b	feat(loop-detection): make loop detection configurable with per-tool frequency overrides (#2711 ) * Make loop detection configurable Expose LoopDetectionMiddleware thresholds through config.yaml while preserving existing defaults and allowing the middleware to be disabled. Refs bytedance/deer-flow#2517 * feat(loop-detection): add per-tool tool_freq_overrides to Phase 1 Adds ToolFreqOverride model and tool_freq_overrides field to LoopDetectionConfig, wires it through LoopDetectionMiddleware, and documents the option in config.example.yaml. Resolves the gap flagged in the #2586 review: without per-tool overrides, users hit by #2510/#2511 (RNA-seq workflows exceeding the bash hard limit) had no way to raise thresholds for one tool without loosening the global limit for every tool. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * docs(loop-detection): document tool_freq_overrides in LoopDetectionMiddleware docstring Add the missing Args entry for tool_freq_overrides, explaining the (warn, hard_limit) tuple structure and how per-tool thresholds supersede the global tool_freq_warn / tool_freq_hard_limit for named tools. Also run ruff format on the three files flagged by the lint check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(loop-detection): validate LoopDetectionMiddleware __init__ params eagerly Raise clear ValueError at construction time instead of crashing at unpack-time inside _track_and_check when bad values are passed: - tool_freq_overrides: must be 2-tuples of positive ints with hard_limit >= warn - scalar thresholds: warn_threshold, hard_limit, tool_freq_warn, tool_freq_hard_limit must be >= 1 and hard limits must >= their warn pairs - window_size, max_tracked_threads must be >= 1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): isolate credential loader directory-path test from real ~/.claude The test didn't monkeypatch HOME, so on any machine with real Claude Code credentials at ~/.claude/.credentials.json the function fell through to those credentials and the assertion failed. Adding HOME redirect ensures the default credential path doesn't exist during the test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style(test): add blank lines after import pytest in TestInitValidation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(loop-detection): collapse dual validation to LoopDetectionConfig Modifications - LoopDetectionMiddleware.__init__: stripped of all ValueError raises; becomes a plain field-assignment constructor. - LoopDetectionMiddleware.from_config: classmethod that builds the middleware from a Pydantic-validated LoopDetectionConfig and handles the ToolFreqOverride -> tuple[int, int] conversion. - agents/factory.py: SDK construction routed through LoopDetectionMiddleware.from_config(LoopDetectionConfig()) so the defaults path is Pydantic-validated too. - agents/lead_agent/agent.py: uses from_config instead of unpacking config fields by hand. - tests/test_loop_detection_middleware.py: deleted TestInitValidation (16 methods exercising the removed __init__ checks); added TestFromConfig (4 tests: scalar field mapping, override tuple conversion, empty overrides, behavioral smoke test). Result: one validation layer (Pydantic), zero duplication, no __new__ hacks. Both production construction sites flow through LoopDetectionConfig. Test results make test -> 2977 passed, 18 skipped, 0 failed (137s) make format -> All checks passed; 411 files left unchanged * feat(agents): make loop_detection configurable in create_deerflow_agent Adds a `loop_detection: bool \| AgentMiddleware = True` field to RuntimeFeatures, mirroring the existing pattern used by `sandbox`, `memory`, and `vision`. SDK users can now disable LoopDetectionMiddleware or replace it with a custom instance built from their own LoopDetectionConfig — e.g. `LoopDetectionMiddleware.from_config(my_cfg)` — instead of being stuck with the hardcoded defaults previously installed by the SDK factory. The lead-agent path (which already reads AppConfig.loop_detection) is unchanged, and the default `True` preserves prior always-on behavior for all existing callers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: knight0940 <631532668@qq.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Amorend <142649913+knight0940@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-07 16:15:15 +08:00
AochenShen99	cef4224381	fix(skills): enforce allowed-tools metadata (#2626 ) * fix(skills): parse allowed-tools frontmatter * fix(skills): validate allowed-tools metadata * fix(skills): add shared allowed-tools policy * fix(subagents): enforce skill allowed-tools * fix(agent): enforce skill allowed-tools * refactor(skills): dedupe TypeVar and reuse cached enabled skills - Drop redundant module-level TypeVar in tool_policy; rely on PEP 695 syntax. - Expose get_cached_enabled_skills() and have the lead agent reuse it instead of synchronously rescanning skills on every request. * fix(agent): expose config-scoped skill cache * fix(subagents): pass filtered tools explicitly * fix(skills): clean allowed-tools policy feedback	2026-05-07 08:34:43 +08:00
KiteEater	4ead2c6b19	fix(config): reset config-backed singletons on hot reload (#2588 ) * Fix stale config singletons on reload * fix(config): update checkpointer imports after runtime move * Fix config reload singleton mutation on validation failure --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-06 10:17:55 +08:00
yangzheli	59c4a3f0a4	feat(agent): add custom-agent self-updates with user isolation (#2713 ) * feat(agent): add update_agent tool for in-chat custom-agent self-updates (#2616) Custom agents had no built-in way to persist updates to their own SOUL.md / config.yaml from a normal chat — `setup_agent` was only bound during the bootstrap flow, so when the user asked the agent to refine its description or personality, the agent would shell out via bash/write_file and the edits landed in a temporary sandbox/tool workspace instead of `{base_dir}/agents/{agent_name}/`. Changes: - New `update_agent` builtin tool with partial-update semantics (only the fields you pass are written) and atomic temp-file + os.replace writes so a failed update never corrupts existing SOUL.md / config.yaml. - Lead agent now binds `update_agent` in the non-bootstrap path whenever `agent_name` is set in the runtime context. Default agent (no agent_name) and bootstrap flow are unchanged. - New `<self_update>` system-prompt section is injected for custom agents, instructing them to use `update_agent` — and explicitly NOT bash / write_file — to persist self-updates. - Tests: 11 new cases in `tests/test_update_agent_tool.py` covering validation (missing/invalid agent_name, unknown agent, no fields), partial updates (soul-only, description-only, skills=[] vs omitted), no-op detection, atomic-write safety, and AgentConfig round-tripping; plus 2 new cases in `tests/test_lead_agent_prompt.py` covering the self-update prompt section. - Docs: updated backend/CLAUDE.md builtin tools list and tools.mdx (en/zh) with the new tool description. * feat(agent): isolate custom agents per user Store custom agent definitions under the effective user, keep legacy agents readable until migration, and cover API/tool/migration behavior with tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: consistent write/delete targets & add --user-id to migration --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-05 23:17:42 +08:00
Nan Gao	e8675f266d	fix(loop-detection): keep tool-call pairing on warn injection (#2724 ) (#2725 ) * fix(loop-detection): keep tool-call pairing on warn injection (#2724) * make format * fix(loop-detection): avoid IMMessage leak to downstream consumer * fix(channels): filter loop warning text from IM replies	2026-05-05 18:53:49 +08:00
Xun	680187ddc2	fix: Supplement list_running in RemoteSandboxBackend (#2716 ) * fix: Supplement list_running in RemoteSandboxBackend * fix * except requests.RequestException as exc: * fix	2026-05-05 18:53:10 +08:00
YuJitang	d02f762ab0	feat: refine token usage display modes (#2329 ) * feat: refine token usage display modes * docs: clarify token usage accounting semantics * fix: avoid duplicate subtask debug keys * style: format token usage tests * chore: address token attribution review feedback * Update test_token_usage_middleware.py * Update test_token_usage_middleware.py * chore: simplify token attribution fallback * fix token usage metadata follow-up handling --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-04 09:56:16 +08:00
Nan Gao	f80ac961ec	fix(harness): restore legacy skills path fallback (#2694 ) (#2696 ) * fix(harness): restore legacy skills path fallback (#2694) * fix(format): make format * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-03 23:40:59 +08:00
wanxsb	44ab21fc44	feat(community): add Serper web search provider (#2630 ) * feat(community): add Serper web search provider Add a new community search provider backed by the Serper Google Search API (https://serper.dev). Serper returns real-time Google results via a simple JSON API and requires only an API key — no extra Python package. Changes: - backend/packages/harness/deerflow/community/serper/__init__.py - backend/packages/harness/deerflow/community/serper/tools.py Implements web_search_tool using httpx (already a project dependency). API key is read from config.yaml `api_key` field or SERPER_API_KEY env var. Follows the same interface / output shape as the existing ddg_search provider. Exposes max_results parameter (default 5) with config override logic. - backend/tests/test_serper_tools.py Unit tests covering API key resolution, config overrides, HTTP errors, empty results, and parameter passing. - config.example.yaml: add commented-out Serper example alongside other providers - .env.example: add SERPER_API_KEY placeholder Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix the lint error * Fix the lint error --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-02 16:22:35 +08:00
Hinotobi	e543bbf5d6	[security] fix(upload): reject symlinked upload destinations (#2623 ) * fix: reject symlinked upload destinations * test: harden upload destination checks * fix: address PR feedback for #2623 * test: cover safe upload re-uploads * fix: preserve upload limit checks after rebase * fix(upload): stream safe HTTP upload writes	2026-05-02 15:19:28 +08:00
Xinmin Zeng	ca3332f8bf	fix(gateway): return ISO 8601 timestamps from threads endpoints (#2599 ) * fix(gateway): return ISO 8601 timestamps from threads endpoints (#2594) ThreadResponse documents created_at / updated_at as ISO timestamps, matching the LangGraph Platform schema (langgraph_sdk.schema.Thread exposes them as datetime, JSON-encoded as ISO 8601). The gateway threads router was instead emitting str(time.time()) — unix-second floats — breaking frontend new Date() parsing and producing a mixed ISO/unix wire format that also corrupted the search sort order. Centralize timestamp generation in deerflow.utils.time: - now_iso() — datetime.now(UTC).isoformat() - coerce_iso(x) — heals legacy unix-timestamp strings on read so the store converges to ISO without a one-shot migration threads.py: replace 6 time.time() call sites with now_iso(); wrap all read paths and Phase-2 checkpoint metadata with coerce_iso(); _store_upsert opportunistically heals legacy created_at on update; drop unused time import. thread_runs.py: reuse now_iso() instead of a private duplicate _now_iso(), preventing future drift between the two timestamp call sites. Tests: 9 unit tests for the helper; 5 integration tests pinning the ISO contract for create/get/patch/search and the legacy-healing path on the internal store upsert. Full suite: 2144 passed, 15 skipped, 0 failed. Closes #2594 * fix(gateway): coerce checkpoint metadata timestamps to ISO on read After the merge with main, three additional read paths in ``threads.py`` were still emitting raw ``str(metadata.get("created_at", ""))`` — ``get_thread_state``, ``update_thread_state``, and ``get_thread_history``. Same root cause as #2594: when the checkpoint metadata's ``created_at`` is a unix-second float (legacy data, or a checkpoint written by an older Gateway version), ``str(float)`` produces ``"1777252410.411327"`` and the frontend's ``new Date(...)`` returns ``Invalid Date``. The fix on the ``/threads/{id}`` GET path was already in place; these three sibling endpoints needed the same treatment. All four call sites now flow through ``coerce_iso``, so: - legacy float metadata heals to ISO on the way out, - ISO metadata passes through unchanged, - ``datetime`` instances (which the new ``coerce_iso`` branch handles explicitly) emit with the ``T`` separator instead of falling through to the space-separated ``str(datetime)`` form. Coverage added for the two endpoints not already pinned by the merge: - ``test_get_thread_state_returns_iso_for_legacy_checkpoint_metadata`` - ``test_get_thread_history_returns_iso_for_legacy_checkpoint_metadata`` Both pre-seed a checkpoint whose metadata carries the literal float from the issue body and assert the wire format is ISO.	2026-05-02 15:16:16 +08:00
Willem Jiang	bb8b234d85	chroe(2585): keep polishing the code of codex token usage (#2689 )	2026-05-02 15:04:11 +08:00
KiteEater	17447fccbe	fix(runtime): make rollback restore checkpoint supersede newer checkpoints (#2582 ) * Restore rollback checkpoints with fresh ids * Tighten rollback checkpoint tests and imports * Update test_run_worker_rollback.py --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-02 11:25:45 +08:00
KiteEater	866d1ca409	Populate Codex usage metadata for token accounting (#2585 )	2026-05-02 11:16:03 +08:00
greatmengqi	8ba01dfd83	refactor: thread app_config through lead and subagent task path (#2666 ) * refactor: thread app config through lead prompt * fix: honor explicit app config across runtime paths * style: format subagent executor tests * fix: thread resolved app config and guard subagents-only fallback Address two PR review findings: 1. _create_summarization_middleware passed the original (possibly None) app_config into create_chat_model, forcing the model factory back to ambient get_app_config() and risking config drift between the middleware's resolved view and the model's view. Pass the resolved AppConfig instance through end-to-end. 2. get_available_subagent_names accepted Any-typed config and forwarded it to is_host_bash_allowed, which reads ``.sandbox``. A SubagentsAppConfig (also accepted upstream as a sum-type input) has no ``.sandbox`` attribute and would be silently treated as "no sandbox configured", incorrectly disabling the bash subagent. Guard on hasattr and fall back to ambient lookup otherwise. Adds regression tests for both paths. * chore: simplify hasattr guard and tighten regression tests - Collapse if/else into ternary in get_available_subagent_names; hasattr(None, ...) is False so the explicit None check was redundant. - Drop comments that narrate the change rather than explain non-obvious WHY (test names already convey intent). - Replace stringly-typed sentinel "no-arg" in regression test with direct args tuple comparison. --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>	2026-05-02 06:37:49 +08:00
Willem Jiang	189b82405c	fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination (#2685 ) * fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination The agent_sandbox library's shell API defaults no_change_timeout to 120 seconds. When AioSandbox.execute_command() called exec_command() without this parameter, commands producing no output for 120s would return with NO_CHANGE_TIMEOUT status even though the script was still running. Pass no_change_timeout=600 to all exec_command calls (matching the client-level HTTP timeout) so long-running commands are not cut short. Fixes #2668 * test(sandbox): add assertions for no_change_timeout in execute_command and list_dir Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2f37bc72-0826-4443-a6ba-e5b78c22fb5a Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-01 22:27:02 +08:00
Nan Gao	487c1d939f	fix(subagents): use model override for tools and middleware (#2641 ) * fix(subagents): use model override for tools and middleware * fix(config): resolve effective subagent model * fix(subagents): defer app config loading * fix(subagents): fully defer config.yaml load in executor __init__ The previous attempt only relocated the explicit get_app_config() call, but left resolve_subagent_model_name(...) running eagerly in __init__. That helper has its own internal get_app_config() fallback, which still fired when both app_config and parent_model were None and config.model == "inherit" — exactly the path unit tests hit, breaking 21 tests in CI with FileNotFoundError: config.yaml. Skip the eager resolve in __init__ when it would require loading the config file, and defer to _create_agent (which already has the app_config or get_app_config() fallback).	2026-05-01 22:21:10 +08:00
Nan Gao	c09c334544	fix(harness): resolve runtime paths from project root (#2642 ) * fix(harness): resolve runtime paths from project root * docs(config): update * fix(config): address runtime path review feedback * test(config): fix skills path e2e root * test(config): cover legacy config fallback when project root lacks config files Verifies that when DEER_FLOW_PROJECT_ROOT is unset and cwd has no config.yaml/extensions_config.json, AppConfig and ExtensionsConfig fall back to the legacy backend/repo-root candidates — the backward-compat path requested in PR #2642 review. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-01 22:19:50 +08:00
JerryLee	83938cf35a	fix(subagents): propagate user context across threaded execution (#2676 )	2026-05-01 16:27:18 +08:00
yangzheli	78633c69ac	fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2679 ) * fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2677) When creating a custom agent via the web UI, SOUL.md was always written to the global base_dir/SOUL.md instead of agents/<name>/SOUL.md. Root cause: the bootstrap flow sends agent_name via body.context, but two layers were broken: 1. services.py only forwarded body.context keys into config["configurable"]; config["context"] was never populated. 2. worker.py constructed the parent Runtime with a hard-coded {thread_id, run_id} context, ignoring config["context"] entirely. After the langgraph >= 1.1.9 bump (#98a5b34f), ToolRuntime.context no longer falls back to configurable, so setup_agent's runtime.context.get("agent_name") returned None and the tool's silent agent_name=None -> base_dir fallback kicked in, overwriting the global SOUL.md. Fix: - services.py: extract merge_run_context_overrides() and write the whitelisted context keys into both configurable (legacy readers) and context (langgraph 1.1+ ToolRuntime consumers). - worker.py: extract _build_runtime_context() and merge config["context"] into the Runtime's context (without letting callers override thread_id/run_id). The base_dir fallback in setup_agent_tool.py is left in place because the IM /bootstrap channel command depends on it. That code path can be tightened in a follow-up. Adds regression tests covering both helpers. * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-05-01 16:00:11 +08:00
greatmengqi	8b61c94e1d	fix: keep lead agent graph factory signature compatible (#2678 ) Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>	2026-05-01 15:43:28 +08:00
Xun	1ad1420e31	refactor(skills): Unified skill storage capability (#2613 )	2026-05-01 13:23:26 +08:00
He Wang	eba3b9e18d	fix(config): unify log_level from config.yaml across Gateway and debug entry points (#2601 ) Centralize log level parsing in `logging_level_from_config()` and application in `apply_logging_level()` within `deerflow.config.app_config`. - Gateway lifespan applies configured log level on startup - `debug.py` uses shared helpers instead of local duplicates - `apply_logging_level()` targets only `deerflow`/`app` logger hierarchies so third-party library verbosity is not affected; root handler levels are only lowered (never raised) to allow configured loggers through without suppressing third-party output; root logger level is not modified - Config field description updated to clarify scope - Tests save/restore global logging state to avoid test pollution Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 22:27:14 +08:00
Willem Jiang	c0da278269	fix(memory): replace short-lived asyncio.run() with persistent event loop (#2627 ) * fix(memory): replace short-lived asyncio.run() with persistent event loop to prevent zombie httpx connections The memory updater used asyncio.run() inside daemon threads, creating and destroying short-lived event loops on every update. Langchain providers (e.g. langchain-anthropic) cache httpx AsyncClient instances globally via @lru_cache, so SSL connections created on a loop that is subsequently destroyed become zombie connections in the shared pool. When the main agent's lead run later reuses one of these connections, httpx/anyio triggers RuntimeError: Event loop is closed during connection cleanup. Replace the ThreadPoolExecutor + asyncio.run() pattern with a _MemoryLoopRunner that maintains a single persistent event loop in a daemon thread for the process lifetime. Since the loop never closes, connections bound to it never become invalid. The _run_async_update_sync function now submits coroutines to this persistent loop via run_coroutine_threadsafe instead of creating throwaway loops. * update the code to address the review comments * Fix the review comments of 2615 P1 — user_id forwarded through sync path: Added user_id parameter to _prepare_update_prompt, _finalize_update, and _do_update_memory_sync, and forwarded it to get_memory_data(agent_name, user_id=user_id) and save(..., user_id=user_id). The update_memory() entry point now passes user_id through both the executor.submit path and the direct call path. Added TestUserIdForwarding with two regression tests (sync + async) verifying get_memory_data and save receive the correct user_id. P2 — aupdate_memory() delegates to sync: Replaced the model.ainvoke() call with asyncio.to_thread(self._do_update_memory_sync, ...). This eliminates the unsafe async provider client path entirely — all memory updater entry points now use the isolated sync model.invoke() path. Updated the test from asserting ainvoke is awaited to asserting invoke is called and ainvoke is not. Nit — duplicate comment removed: Removed the duplicated # Matches sentences... comment on line 230. * Chore(test): update the code of test_memory_updater --------- Co-authored-by: rayhpeng <rayhpeng@gmail.com>	2026-04-30 17:59:57 +08:00
KiteEater	7dea1666ce	fix: avoid temporary event loops in async subagent execution (#2414 ) * fix: avoid temporary event loops in async subagent execution * Rename isolated subagent loop globals * Harden isolated subagent loop shutdown and logging * Sort subagent executor imports * Format subagent executor * Remove isolated loop pool from subagent executor * Format subagent executor cleanup --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-30 15:29:17 +08:00
greatmengqi	38714b6ceb	refactor: thread app_config through middleware factories (#2652 ) * refactor: thread app_config through middleware factories Continues the incremental config-refactor sequence (#2611 root, #2612 lead path) one layer deeper into the middleware factories. Two ambient lookups inside _build_runtime_middlewares are eliminated and the LLMErrorHandling band-aid removed: - _build_runtime_middlewares / build_lead_runtime_middlewares / build_subagent_runtime_middlewares now require app_config: AppConfig. - get_guardrails_config() inside the factory is replaced with app_config.guardrails (semantically identical — same default-factory GuardrailsConfig — verified by direct equality check). - LLMErrorHandlingMiddleware.__init__ now requires app_config and reads circuit_breaker fields directly. The class-level circuit_failure_threshold / circuit_recovery_timeout_sec defaults are removed along with the try/except (FileNotFoundError, RuntimeError): pass band-aid — the let-it-crash invariant the rest of the refactor enforces. Caller chain (already-resolved app_config sources): - _build_middlewares in lead_agent/agent.py: reorder so resolved_app_config = app_config or get_app_config() is computed BEFORE build_lead_runtime_middlewares is called, then passed as kwarg. - SubagentExecutor: optional app_config parameter (mirrors the lead-agent pattern); _create_agent does the same `or get_app_config()` fallback at agent-build time, so task_tool callers don't need to plumb app_config through yet (typed-context plumbing for tool runtimes is a separate refactor). Tests: - test_llm_error_handling_middleware: _make_app_config helper using AppConfig(sandbox=SandboxConfig(use="test")) — same minimal-config pattern conftest already uses. Three direct LLMErrorHandlingMiddleware() calls each followed by post-construction circuit_breaker mutation fold cleanly into _build_middleware(circuit_failure_threshold=..., circuit_recovery_timeout_sec=...). Verification: - tests/test_llm_error_handling_middleware.py — 14 passed - tests/test_subagent_executor.py — 28 passed - tests/test_tool_error_handling_middleware.py — 6 passed - tests/test_task_tool_core_logic.py — 18 passed (verifies task_tool unchanged behavior) - Full suite: 2697 passed, 3 skipped. The single intermittent failure in tests/test_client_e2e.py::test_tool_call_produces_events is pre-existing LLM flakiness (the test asserts the model decided to call a tool; reproduces 1/3 on unchanged main as well). * fix: address middleware app config review comments * fix: satisfy app config annotation lint * test: cover explicit app config middleware wiring --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>	2026-04-30 12:41:09 +08:00
Hinotobi	74081a85a6	[security] fix(sandbox): bind local Docker ports to loopback (#2633 ) * fix(sandbox): bind local Docker ports to loopback * fix(sandbox): preserve IPv6 loopback Docker binds * fix(sandbox): log Docker bind host selection	2026-04-30 11:40:28 +08:00
Willem Jiang	844ad8e528	Merge branch 'main' into release/2.0-rc	2026-04-28 15:44:02 +08:00
pyp0327	395c14357b	chore(adpator):Adapt MindIE engine model and improve testing and fixes (#2523 ) * feat(models): 适配 MindIE引擎的模型 * test: add unit tests for MindIEChatModel adapter and fix PR review comments * chore: update uv.lock with pytest-asyncio * build: add pytest-asyncio to test dependencies * fix: address PR review comments (lazy import, cache clients, safe newline escape, strict xml regex) * fix(mindie): preserve string args without JSON quotes in XML tool call serialization * fix(mindie): preserve string args without JSON quotes in XML tool call serialization * test_mindie_provider:format * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(mindie): prevent nested tool_call params from leaking into outer args * fixed by escaping XML entities in _fix_messages and unescaping during parse, with regression tests added. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-04-28 15:09:31 +08:00
greatmengqi	e82940c03d	refactor: thread release config through lead path (#2612 ) Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>	2026-04-28 14:53:18 +08:00
DanielWalnut	6bd88fe14c	fix(sandbox): block host bash traversal escapes (#2560 ) * fix(sandbox): block host bash traversal escapes Fixes #2535 * fix(sandbox): harden local bash path guards * fix(sandbox): avoid bash cd argument false positives * Fix the lint error Add function to resolve and validate user data path. * Fix the lint error --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-28 12:18:41 +08:00
DanielWalnut	39c5da94f3	fix(sandbox): prevent local custom mount symlink escapes (#2558 ) * fix(sandbox): prevent local custom mount symlink escapes Fixes #2506 * fix(sandbox): harden custom mount symlink handling * fix(sandbox): format internal symlink directory listings	2026-04-28 11:59:46 +08:00
DanielWalnut	707ed328dd	fix(skills): scan skill archives before install (#2561 ) * fix(skills): scan skill archives before install Fixes #2536 * fix(skills): scan archive support files before install * style(skills): format archive installer * fix(skills): address archive install review comments	2026-04-28 11:56:11 +08:00
DanielWalnut	f7dfb88a30	fix(aio-sandbox): redact env values in container logs (#2562 ) * fix(aio-sandbox): redact env values in container logs Fixes #2534 * fix(aio-sandbox): address env log review comments	2026-04-28 11:47:56 +08:00
Willem Jiang	69649d8aae	Fix the issues when reviewing 2566 persistant part (#2604 ) * Fix the code review command of journal & event store P0,P1 issues * Fix the code review command of journal & event store P2 issues * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/packages/harness/deerflow/runtime/journal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Refactor logger debug message formatting --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-28 11:44:40 +08:00

1 2 3 4 5 ...

322 Commits