deerflow2

Author	SHA1	Message	Date
yorick	02569136df	fix(sandbox): improve sandbox security and preserve multimodal content (#2114 ) * fix: improve sandbox security and preserve multimodal content * Add unit test modifications for test_injects_uploaded_files_tag_into_list_content * format updated_content * Add regression tests for multimodal upload content and host bash default safety	2026-04-11 16:52:10 +08:00
Jin	718dddde75	fix(sandbox): prevent memory leak in file operation locks using WeakValueDictionary (#2096 ) * fix(sandbox): prevent memory leak in file operation locks using WeakValueDictionary * lint: fix lint issue in sandbox tools security	2026-04-10 22:55:53 +08:00
Zic-Wang	fa96acdf4b	feat: add WeChat channel integration (#1869 ) * feat: add WeChat channel integration * fix(backend): recover stale channel threads and align upload artifact handling * refactor(wechat): reduce scope and restore QR bootstrap * fix(backend): sort manager imports for Ruff lint * fix(tests): add missing patch import in test_channels.py * Update backend/app/channels/wechat.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/app/channels/manager.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(wechat): streamline allowed file extensions initialization and clean up test file --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-10 20:49:28 +08:00
Willem Jiang	90299e2710	feat(provisioner): add optional PVC support for sandbox volumes (#2020 ) * feat(provisioner): add optional PVC support for sandbox volumes (#1978) Add SKILLS_PVC_NAME and USERDATA_PVC_NAME env vars to allow sandbox Pods to use PersistentVolumeClaims instead of hostPath volumes. This prevents data loss in production when pods are rescheduled across nodes. When USERDATA_PVC_NAME is set, a subPath of threads/{thread_id}/user-data is used so a single PVC can serve multiple threads. Falls back to hostPath when the new env vars are not set, preserving backward compatibility. * add unit test for provisioner pvc volumes * refactor: extract shared provisioner_module fixture to conftest.py Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/e7ccf708-c6ba-40e4-844a-b526bdb249dd Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: JeffJiang <for-eleven@hotmail.com>	2026-04-10 20:40:30 +08:00
greatmengqi	b1aabe88b8	fix(backend): stream DeerFlowClient AI text as token deltas (#1969 ) (#1974 ) * fix(backend): stream DeerFlowClient AI text as token deltas (#1969) DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values", "custom"] which only delivers full-state snapshots at graph-node boundaries, so AI replies were dumped as a single messages-tuple event per node instead of streaming token-by-token. `client.stream("hello")` looked identical to `client.chat("hello")` — the bug reported in #1969. Subscribe to "messages" mode as well, forward AIMessageChunk deltas as messages-tuple events with delta semantics (consumers accumulate by id), and dedup the values-snapshot path so it does not re-synthesize AI text that was already streamed. Introduce a per-id usage_metadata counter so the final AIMessage in the values snapshot and the final "messages" chunk — which carry the same cumulative usage — are not double-counted. chat() now accumulates per-id deltas and returns the last message's full accumulated text. Non-streaming mock sources (single event per id) are a degenerate case of the same logic, keeping existing callers and tests backward compatible. Verified end-to-end against a real LLM: a 15-number count emits 35 messages-tuple events with BPE subword boundaries clearly visible ("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across the window, end-event usage matches the values-snapshot usage exactly (not doubled). tests/test_client_live.py::TestLiveStreaming passes. New unit tests: - test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3 delta events with correct content/id/usage, values-snapshot does not duplicate, usage counted once. - test_chat_accumulates_streamed_deltas: chat() rebuilds full text from deltas. - test_messages_mode_tool_message: ToolMessage delivered via messages mode is not duplicated by the values-snapshot synthesis path. The stream() docstring now documents why this client does not reuse Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw LangChain objects vs serialized dicts, single caller vs HTTP fan-out). Fixes #1969 * refactor(backend): simplify DeerFlowClient streaming helpers (#1969) Post-review cleanup for the token-level streaming fix. No behavior change for correct inputs; one efficiency regression fixed. Fix: chat() O(n²) accumulator ----------------------------- `chat()` accumulated per-id text via `buffers[id] = buffers.get(id,"") + delta`, which is O(n) per concat → O(n²) total over a streamed response. At ~2 KB cumulative text this becomes user-visible; at 50 KB / 5000 chunks it costs roughly 100-300 ms of pure copying. Switched to `dict[str, list[str]]` + `"".join()` once at return. Cleanup ------- - Extract `_serialize_tool_calls`, `_ai_text_event`, `_ai_tool_calls_event`, and `_tool_message_event` static helpers. The messages-mode and values-mode branches previously repeated four inline dict literals each; they now call the same builders. - `StreamEvent.type` is now typed as `Literal["values", "messages-tuple", "custom", "end"]` via a `StreamEventType` alias. Makes the closed set explicit and catches typos at type-check time. - Direct attribute access on `AIMessage`/`AIMessageChunk`: `.usage_metadata`, `.tool_calls`, `.id` all have default values on the base class, so the `getattr(..., None)` fallbacks were dead code. Removed from the hot path. - `_account_usage` parameter type loosened to `Any` so that LangChain's `UsageMetadata` TypedDict is accepted under strict type checking. - Trimmed narrating comments on `seen_ids` / `streamed_ids` / the values-synthesis skip block; kept the non-obvious ones that document the cross-mode dedup invariant. Net diff: -15 lines. All 132 unit tests + harness boundary test still pass; ruff check and ruff format pass. * docs(backend): add STREAMING.md design note (#1969) Dedicated design document for the token-level streaming architecture, prompted by the bug investigation in #1969. Contents: - Why two parallel streaming paths exist (Gateway HTTP/async vs DeerFlowClient sync/in-process) and why they cannot be merged. - LangGraph's three-layer mode naming (Graph "messages" vs Platform SDK "messages-tuple" vs HTTP SSE) and why a shared string constant would be harmful. - Gateway path: run_agent + StreamBridge + sse_consumer with a sequence diagram. - DeerFlowClient path: sync generator + direct yield, delta semantics, chat() accumulator. - Why the three id sets (seen_ids / streamed_ids / counted_usage_ids) each carry an independent invariant and cannot be collapsed. - End-to-end sequence for a real conversation turn. - Lessons from #1969: why mock-based tests missed the bug, why BPE subword boundaries in live output are the strongest correctness signal, and the regression test that locks it in. - Source code location index. Also: - Link from backend/CLAUDE.md Embedded Client section. - Link from backend/docs/README.md under Feature Documentation. * test(backend): add refactor regression guards for stream() (#1969) Three new tests in TestStream that lock the contract introduced by PR #1974 so any future refactor (sync->async migration, sharing a core with Gateway's run_agent, dedup strategy change) cannot silently change behavior. - test_dedup_requires_messages_before_values_invariant: canary that documents the order-dependence of cross-mode dedup. streamed_ids is populated only by the messages branch, so values-before-messages for the same id produces duplicate AI text events. Real LangGraph never inverts this order, but a refactor that does (or that makes dedup idempotent) must update this test deliberately. - test_messages_mode_golden_event_sequence: locks the exact event sequence (4 events: 2 messages-tuple deltas, 1 values snapshot, 1 end) for a canonical streaming turn. List equality gives a clear diff on any drift in order, type, or payload shape. - test_chat_accumulates_in_linear_time: perf canary for the O(n^2) fix in commit 1f11ba10. 10,000 single-char chunks must accumulate in under 1s; the threshold is wide enough to pass on slow CI but tight enough to fail if buffer = buffer + delta is restored. All three tests pass alongside the existing 12 TestStream tests (15/15). ruff check + ruff format clean. * docs(backend): clarify stream() docstring on JSON serialization (#1969) Replace the misleading "raw LangChain objects (AIMessage, usage_metadata as dataclasses), not dicts" claim in the "Why not reuse Gateway's run_agent?" section. The implementation already yields plain Python dicts (StreamEvent.data is dict, and usage_metadata is a TypedDict), so the original wording suggested a richer return type than the API actually delivers. The corrected wording focuses on what is actually true and relevant: this client skips the JSON/SSE serialization layer that Gateway adds for HTTP wire transmission, and yields stream event payloads directly as Python data structures. Addresses Copilot review feedback on PR #1974. * test(backend): document none-id messages dedup limitation (#1969) Add test_none_id_chunks_produce_duplicates_known_limitation to TestStream that explicitly documents and asserts the current behavior when an LLM provider emits AIMessageChunk with id=None (vLLM, certain custom backends). The cross-mode dedup machinery cannot record a None id in streamed_ids (guarded by ``if msg_id:``), so the values snapshot's reassembled AIMessage with a real id falls through and synthesizes a duplicate AI text event. The test asserts len == 2 and locks this as a known limitation rather than silently letting future contributors hit it without context. Why this is documented rather than fixed: * Falling back to ``metadata.get("id")`` does not help — LangGraph's messages-mode metadata never carries the message id. * Synthesizing ``f"_synth_{id(msg_chunk)}"`` only helps if the values snapshot uses the same fallback, which it does not. * A real fix requires provider cooperation (always emit chunk ids) or content-based dedup (false-positive risk), neither of which belongs in this PR. If a real fix lands, replace this test with a positive assertion that dedup works for None-id chunks. Addresses Copilot review feedback on PR #1974 (client.py:515). * fix(frontend): UI polish - fix CSS typo, dark mode border, and hardcoded colors (#1942) - Fix `font-norma` typo to `font-normal` in message-list subtask count - Fix dark mode `--border` using reddish hue (22.216) instead of neutral - Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground` - Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground` - Add missing `font-sans` to welcome description `<pre>` for consistency - Make case-study-section padding responsive (`px-4 md:px-20`) Closes #1940 * docs: clarify deployment sizing guidance (#1963) * fix(frontend): prevent stale 'new' thread ID from triggering 422 history requests (#1960) After history.replaceState updates the URL from /chats/new to /chats/{UUID}, Next.js useParams does not update because replaceState bypasses the router. The useEffect in useThreadChat would then set threadIdFromPath ('new') as the threadId, causing the LangGraph SDK to call POST /threads/new/history which returns HTTP 422 (Invalid thread ID: must be a UUID). This fix adds a guard to skip the threadId update when threadIdFromPath is the literal string 'new', preserving the already-correct UUID that was set when the thread was created. * fix(frontend): avoid using route new as thread id (#1967) Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> * Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965) * Fix event loop conflict in SubagentExecutor.execute() When SubagentExecutor.execute() is called from within an already-running event loop (e.g., when the parent agent uses async/await), calling asyncio.run() creates a new event loop that conflicts with asyncio primitives (like httpx.AsyncClient) that were created in and bound to the parent loop. This fix detects if we're already in a running event loop, and if so, runs the subagent in a separate thread with its own isolated event loop to avoid conflicts. Fixes: sub-task cards not appearing in Ultra mode when using async parent agents Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(subagent): harden isolated event loop execution --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(backend): remove dead getattr in _tool_message_event --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> Co-authored-by: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com> Co-authored-by: 13ernkastel <LennonCMJ@live.com> Co-authored-by: siwuai <458372151@qq.com> Co-authored-by: 肖 <168966994+luoxiao6645@users.noreply.github.com> Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> Co-authored-by: Saber <11769524+hawkli-1994@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-10 18:16:38 +08:00
DanielWalnut	eef0a6e2da	feat(dx): Setup Wizard + doctor command — closes #2030 (#2034 )	2026-04-10 17:43:39 +08:00
shivam johri	194bab4691	feat(config): add when_thinking_disabled support for model configs (#1970 ) * feat(config): add when_thinking_disabled support for model configs Allow users to explicitly configure what parameters are sent to the model when thinking is disabled, via a new `when_thinking_disabled` field in model config. This mirrors the existing `when_thinking_enabled` pattern and takes full precedence over the hardcoded disable behavior when set. Backwards compatible — existing configs work unchanged. Closes #1675 * fix(config): address copilot review — gate when_thinking_disabled independently - Switch truthiness check to `is not None` so empty dict overrides work - Restructure disable path so when_thinking_disabled is gated independently of has_thinking_settings, allowing it to work without when_thinking_enabled - Update test to reflect new behavior	2026-04-09 18:49:00 +08:00
luo jiyin	35f141fc48	feat: implement full checkpoint rollback on user cancellation (#1867 ) * feat: implement full checkpoint rollback on user cancellation - Capture pre-run checkpoint snapshot including checkpoint state, metadata, and pending_writes - Add _rollback_to_pre_run_checkpoint() function to restore thread state - Implement _call_checkpointer_method() helper to support both async and sync checkpointer methods - Rollback now properly restores checkpoint, metadata, channel_versions, and pending_writes - Remove obsolete TODO comment (Phase 2) as rollback is now complete This resolves the TODO(Phase 2) comment and enables full thread state restoration when a run is cancelled by the user. * fix: address rollback review feedback * fix: strengthen checkpoint rollback validation and error handling - Validate restored_config structure and checkpoint_id before use - Raise RuntimeError on malformed pending_writes instead of silent skip - Normalize None checkpoint_ns to empty string instead of "None" - Move delete_thread to only execute when pre_run_snapshot is None - Add docstring noting non-atomic rollback as known limitation This addresses review feedback on PR #1867 regarding data integrity in the checkpoint rollback implementation. * test: add comprehensive coverage for checkpoint rollback edge cases - test_rollback_restores_snapshot_without_deleting_thread - test_rollback_deletes_thread_when_no_snapshot_exists - test_rollback_raises_when_restore_config_has_no_checkpoint_id - test_rollback_normalizes_none_checkpoint_ns_to_root_namespace - test_rollback_raises_on_malformed_pending_write_not_a_tuple - test_rollback_raises_on_malformed_pending_write_non_string_channel - test_rollback_propagates_aput_writes_failure Covers all scenarios from PR #1867 review feedback. * test: format rollback worker tests	2026-04-09 17:56:36 +08:00
Xinmin Zeng	0b6fa8b9e1	fix(sandbox): add startup reconciliation to prevent orphaned container leaks (#1976 ) * fix(sandbox): add startup reconciliation to prevent orphaned container leaks Sandbox containers were never cleaned up when the managing process restarted, because all lifecycle tracking lived in in-memory dictionaries. This adds startup reconciliation that enumerates running containers via `docker ps` and either destroys orphans (age > idle_timeout) or adopts them into the warm pool. Closes #1972 * fix(sandbox): address Copilot review — adopt-all strategy, improved error handling - Reconciliation now adopts all containers into warm pool unconditionally, letting the idle checker decide cleanup. Avoids destroying containers that another concurrent process may still be using. - list_running() logs stderr on docker ps failure and catches FileNotFoundError/OSError. - Signal handler test restores SIGTERM/SIGINT in addition to SIGHUP. - E2E test docstring corrected to match actual coverage scope. * fix(sandbox): address maintainer review — batch inspect, lock tightening, import hygiene - _reconcile_orphans(): merge check-and-insert into a single lock acquisition per container to eliminate the TOCTOU window. - list_running(): batch the per-container docker inspect into a single call. Total subprocess calls drop from 2N+1 to 2 (one ps + one batch inspect). Parse port and created_at from the inspect JSON payload. - Extract _parse_docker_timestamp() and _extract_host_port() as module-level pure helpers and test them directly. - Move datetime/json imports to module top level. - _make_provider_for_reconciliation(): document the __new__ bypass and the lockstep coupling to AioSandboxProvider.__init__. - Add assertion that list_running() makes exactly ONE inspect call.	2026-04-09 17:21:23 +08:00
Admire	563383c60f	fix(agent): file-io path guidance in agent prompts (#2019 ) * fix(prompt): guide workspace-relative file io * Clarify bash agent file IO path guidance	2026-04-09 16:12:34 +08:00
Xun	1b74d84590	fix: resolve missing serialized kwargs in PatchedChatDeepSeek (#2025 ) * add tests * fix ci * fix ci	2026-04-09 16:07:16 +08:00
Octopus	616caa92b1	fix(models): resolve duplicate keyword argument error when reasoning_effort appears in both config and kwargs (#2017 ) When a model config includes `reasoning_effort` as an extra YAML field (ModelConfig uses `extra="allow"`), and the thinking-disabled code path also injects `reasoning_effort="minimal"` into kwargs, the previous `model_class(kwargs, model_settings_from_config)` call raises: TypeError: got multiple values for keyword argument 'reasoning_effort' Fix by merging the two dicts before instantiation, giving runtime kwargs precedence over config values: `{model_settings_from_config, kwargs}`. Fixes #1977 Co-authored-by: octo-patch <octo-patch@github.com>	2026-04-09 15:09:39 +08:00
knukn	31a3c9a3de	feat(client): add thread query methods `list_threads` and `get_thread` (#1609 ) * feat(client): add thread query methods `list_threads` and `get_thread` Implemented two public API methods in `DeerFlowClient` to query threads using the underlying `checkpointer`. * Update backend/packages/harness/deerflow/client.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/packages/harness/deerflow/client.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/tests/test_client.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/packages/harness/deerflow/client.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(deerflow): Fix possible KeyError issue when sorting threads * fix unit test --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-09 15:00:22 +08:00
Xinmin Zeng	ad6d934a5f	fix(middleware): handle string-serialized options in ClarificationMiddleware (#1997 ) * fix(middleware): handle string-serialized options in ClarificationMiddleware (#1995) Some models (e.g. Qwen3-Max) serialize array tool parameters as JSON strings instead of native arrays. Add defensive type checking in _format_clarification_message() to deserialize string options before iteration, preventing per-character rendering. * fix(middleware): normalize options after JSON deserialization Address Copilot review feedback: - Add post-deserialization normalization so options is always a list (handles json.loads returning a scalar string, dict, or None) - Add test for JSON-encoded scalar string ("development") - Fix test_json_string_with_mixed_types to use actual mixed types	2026-04-08 21:04:20 +08:00
hung_ng__	5350b2fb24	feat(community): add Exa search as community tool provider (#1357 ) * feat(community): add Exa search as community tool provider Add Exa (exa.ai) as a new community search provider alongside Tavily, Firecrawl, InfoQuest, and Jina AI. Exa is an AI-native search engine with neural, keyword, and auto search types. New files: - community/exa/tools.py: web_search_tool and web_fetch_tool - tests/test_exa_tools.py: 10 unit tests with mocked Exa client Changes: - pyproject.toml: add exa-py dependency - config.example.yaml: add commented-out Exa configuration examples Usage: set `use: deerflow.community.exa.tools:web_search_tool` in config.yaml and provide EXA_API_KEY. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(community): address PR review comments for Exa tools - Make _get_exa_client() accept tool_name param so web_fetch reads its own config - Remove __init__.py to match namespace package pattern of other providers - Add duplicate tool name warning in config.example.yaml - Add regression tests for web_fetch config resolution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update revision in uv.lock to 3 --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-08 17:13:39 +08:00
Saber	e5b149068c	Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965 ) * Fix event loop conflict in SubagentExecutor.execute() When SubagentExecutor.execute() is called from within an already-running event loop (e.g., when the parent agent uses async/await), calling asyncio.run() creates a new event loop that conflicts with asyncio primitives (like httpx.AsyncClient) that were created in and bound to the parent loop. This fix detects if we're already in a running event loop, and if so, runs the subagent in a separate thread with its own isolated event loop to avoid conflicts. Fixes: sub-task cards not appearing in Ultra mode when using async parent agents Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(subagent): harden isolated event loop execution --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:46:06 +08:00
Async23	0948c7a4e1	fix(provider): preserve streamed Codex output when response.completed.output is empty (#1928 ) * fix: preserve streamed Codex output items * fix: prefer completed Codex output over streamed placeholders	2026-04-07 18:21:22 +08:00
koppx	c3170f22da	fix(backend): make loop detection hash tool calls by stable keys (#1911 ) * fix(backend): make loop detection hash tool calls by stable keys The loop detection middleware previously hashed full tool call arguments, which made repeated calls look different when only non-essential argument details changed. In particular, `read_file` calls with nearby line ranges could bypass repetition detection even when the agent was effectively reading the same file region again and again. - Hash tool calls using stable keys instead of the full raw args payload - Bucket `read_file` line ranges so nearby reads map to the same region key - Prefer stable identifiers such as `path`, `url`, `query`, or `command` before falling back to JSON serialization of args - Keep hashing order-independent so the same tool call set produces the same hash regardless of call order Fixes #1905 * fix(backend): harden loop detection hash normalization - Normalize and parse stringified tool args defensively - Expand stable key derivation to include pattern, glob, and cmd - Normalize reversed read_file ranges before bucketing Fixes #1905 * fix(backend): harden loop detection tool format * exclude write_file and str_replace from the stable-key path — writing different content to the same file shouldn't be flagged. --------- Co-authored-by: JeffJiang <for-eleven@hotmail.com>	2026-04-07 17:46:33 +08:00
KKK	3b3e8e1b0b	feat(sandbox): strengthen bash command auditing with compound splitting and expanded patterns (#1881 ) * fix(sandbox): strengthen regex coverage in SandboxAuditMiddleware Expand high-risk patterns from 6 to 13 and medium-risk from 4 to 6, closing several bypass vectors identified by cross-referencing Claude Code's BashSecurity validator chain against DeerFlow's threat model. High-risk additions: - Generalised pipe-to-sh (replaces narrow curl\|sh rule) - Targeted command substitution ($() / backtick with dangerous executables) - base64 decode piped to execution - Overwrite system binaries (/usr/bin/, /bin/, /sbin/) - Overwrite shell startup files (~/.bashrc, ~/.profile, etc.) - /proc//environ leakage - LD_PRELOAD / LD_LIBRARY_PATH hijack - /dev/tcp/ bash built-in networking Medium-risk additions: - sudo/su (no-op under Docker root, warn only) - PATH= modification (long attack chain, warn only) Design decisions: - Command substitution uses targeted matching (curl/wget/bash/sh/python/ ruby/perl/base64) rather than blanket block to avoid false positives on safe usage like $(date) or `whoami`. - Skipped encoding/obfuscation checks (hex, octal, Unicode homoglyphs) as ROI is low in Docker sandbox — LLMs don't generate encoded commands and container isolation bounds the blast radius. - Merged pip/pip3 into single pip3? pattern. feat(sandbox): compound command splitting and fork bomb detection Split compound bash commands (&&, \|\|, ;) into sub-commands and classify each independently — prevents dangerous commands hidden after safe prefixes (e.g. "cd /workspace && rm -rf /") from bypassing detection. - Add _split_compound_command() with shlex quote-aware splitting - Add fork bomb detection patterns (classic and while-loop variants) - Most severe verdict wins; block short-circuits - 15 new tests covering compound commands, splitting, and fork bombs * test(sandbox): add async tests for fork bomb and compound commands Cover awrap_tool_call path for fork bomb detection (3 variants) and compound command splitting (block/warn/pass scenarios). * fix(sandbox): address Copilot review — no-whitespace operators, >>/etc/, whole-command scan - _split_compound_command: replace shlex-based implementation with a character-by-character quote/escape-aware scanner. shlex.split only separates '&&' / '\|\|' / ';' when they are surrounded by whitespace, so payloads like 'rm -rf /&&echo ok' or 'safe;rm -rf /' bypassed the previous splitter and therefore the per-sub-command classifier. - _HIGH_RISK_PATTERNS: change r'>\s/etc/' to r'>+\s/etc/' so append redirection ('>>/etc/hosts') is also blocked. - _classify_command: run a whole-command high-risk scan before splitting. Structural attacks like 'while true; do bash & done' span multiple shell statements — splitting on ';' destroys the pattern context, so the raw command must be scanned first. - tests: add no-whitespace operator cases to TestSplitCompoundCommand and test_compound_command_classification to lock in the bypass fix.	2026-04-07 17:15:24 +08:00
lulusiyuyu	f0dd8cb0d2	fix(subagents): add cooperative cancellation for subagent threads (#1873 ) * fix(subagents): add cooperative cancellation for subagent threads Subagent tasks run inside ThreadPoolExecutor threads with their own event loop (asyncio.run). When a user clicks stop, RunManager cancels the parent asyncio.Task, but Future.cancel() cannot terminate a running thread and asyncio.Event does not propagate across event loops. This causes subagent threads to keep executing (writing files, calling LLMs) even after the user explicitly stops the run. Fix: add a threading.Event (cancel_event) to SubagentResult and check it cooperatively in _aexecute()'s astream iteration loop. On cancel, request_cancel_background_task() sets the event, and the thread exits at the next iteration boundary. Changes: - executor.py: Add cancel_event field to SubagentResult, check it in _aexecute loop, set it on timeout, add request_cancel_background_task - task_tool.py: Call request_cancel_background_task on CancelledError * fix(subagents): guard cancel status and add pre-check before astream - Only overwrite status to FAILED when still RUNNING, preserving TIMED_OUT set by the scheduler thread. - Add cancel_event pre-check before entering the astream loop so cancellation is detected immediately when already signalled. * fix(subagents): guard status updates with lock to prevent race condition Wrap the check-and-set on result.status in _aexecute with _background_tasks_lock so the timeout handler in execute_async cannot interleave between the read and write. * fix(subagents): add dedicated CANCELLED status for user cancellation Introduce SubagentStatus.CANCELLED to distinguish user-initiated cancellation from actual execution failures. Update _aexecute, task_tool polling, cleanup terminal-status sets, and test fixtures. * test(subagents): add cancellation tests and fix timeout regression test - Add dedicated TestCooperativeCancellation test class with 6 tests: - Pre-set cancel_event prevents astream from starting - Mid-stream cancel_event returns CANCELLED immediately - request_cancel_background_task() sets cancel_event correctly - request_cancel on nonexistent task is a no-op - Real execute_async timeout does not overwrite CANCELLED (deterministic threading.Event sync, no wall-clock sleeps) - cleanup_background_task removes CANCELLED tasks - Add task_tool cancellation coverage: - test_cancellation_calls_request_cancel: assert CancelledError path calls request_cancel_background_task(task_id) - test_task_tool_returns_cancelled_message: assert CANCELLED polling branch emits task_cancelled event and returns expected message - Fix pre-existing test infrastructure issue: add deerflow.sandbox.security to _MOCKED_MODULE_NAMES (fixes ModuleNotFoundError for all executor tests) - Add RUNNING guard to timeout handler in executor.py to prevent TIMED_OUT from overwriting CANCELLED status - Add cooperative cancellation granularity comment documenting that cancellation is only detected at astream iteration boundaries --------- Co-authored-by: lulusiyuyu <lulusiyuyu@users.noreply.github.com>	2026-04-07 11:12:25 +08:00
DanielWalnut	7643a46fca	fix(skill): make skill prompt cache refresh nonblocking (#1924 ) * fix: make skill prompt cache refresh nonblocking * fix: harden skills prompt cache refresh * chore: add timeout to skills cache warm-up	2026-04-07 10:50:34 +08:00
Markus Corazzione	c4da0e8ca9	Move async SQLite mkdir off the event loop (#1921 ) Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>	2026-04-07 10:47:20 +08:00
JilongSun	88e535269e	Feature/feishu receive file (#1608 ) * feat(feishu): add channel file materialization hook for inbound messages - Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op. - Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text. - Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files. - No impact on Slack/Telegram or other channels (they inherit the default no-op). * style(backend): format code with ruff for lint compliance - Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format` - Ensured both files conform to project linting standards - Fixes CI lint check failures caused by code style issues * fix(feishu): handle file write operation asynchronously to prevent blocking * fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code * test(feishu): add tests for receive_file method and placeholder replacement * fix(manager): remove unnecessary type casting for channel retrieval * fix(feishu): update logging messages to reflect resource handling instead of image * fix(feishu): sanitize filename by replacing invalid characters in file uploads * fix(feishu): improve filename sanitization and reorder image key handling in message processing * fix(feishu): add thread lock to prevent filename conflicts during file downloads * fix(test): correct bad merge in test_feishu_parser.py * chore: run ruff and apply formatting cleanup fix(feishu): preserve rich-text attachment order and improve fallback filename handling	2026-04-06 22:14:12 +08:00
DanielWalnut	888f7bfb9d	Implement skill self-evolution and skill_manage flow (#1874 ) * chore: ignore .worktrees directory * Add skill_manage self-evolution flow * Fix CI regressions for skill_manage * Address PR review feedback for skill evolution * fix(skill-evolution): preserve history on delete * fix(skill-evolution): tighten scanner fallbacks * docs: add skill_manage e2e evidence screenshot * fix(skill-manage): avoid blocking fs ops in session runtime --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-06 22:07:11 +08:00
KKK	055e4df049	fix(sandbox): add input sanitisation guard to SandboxAuditMiddleware (#1872 ) * fix(sandbox): add L2 input sanitisation to SandboxAuditMiddleware Add _validate_input() to reject malformed bash commands before regex classification: empty commands, oversized commands (>10 000 chars), and null bytes that could cause detection/execution layer inconsistency. * fix(sandbox): address Copilot review — type guard, log truncation, reject reason - Coerce None/non-string command to str before validation - Truncate oversized commands in audit logs to prevent log amplification - Propagate reject_reason through _pre_process() to block message - Remove L2 label from comments and test class names * fix(sandbox): isinstance type guard + async input sanitisation tests Address review comments: - Replace str() coercion with isinstance(raw_command, str) guard so non-string truthy values (0, [], False) fall back to empty string instead of passing validation as "0"/"[]"/"False". - Add TestInputSanitisationBlocksInAwrapToolCall with 4 async tests covering empty, null-byte, oversized, and None command via awrap_tool_call path.	2026-04-06 17:21:58 +08:00
Zhou	1ced6e977c	fix(backend): preserve viewed image reducer metadata (#1900 ) Fix concurrent viewed_images state updates for multi-image input by preserving the reducer metadata in the vision middleware state schema.	2026-04-06 16:47:19 +08:00
NmanQAQ	dd30e609f7	feat(models): add vLLM provider support (#1860 ) support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking. Co-authored-by: NmanQAQ <normangyao@qq.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-06 15:18:34 +08:00
yangzheli	5fd2c581f6	fix: add output truncation to ls_tool to prevent context window overflow (#1896 ) ls_tool was the only sandbox tool without output size limits, allowing multi-MB results from large directories to blow up the model context window. Add head-truncation (configurable via ls_output_max_chars, default 20000) consistent with existing bash and read_file truncation. Closes #1887 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-06 15:09:57 +08:00
肖	7c68dd4ad4	Fix(#1702 ): stream resume run (#1858 ) * fix: repair stream resume run metadata # Conflicts: # backend/packages/harness/deerflow/runtime/stream_bridge/memory.py # frontend/src/core/threads/hooks.ts * fix(stream): repair resumable replay validation --------- Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-06 14:51:10 +08:00
suyua9	29575c32f9	fix: expose custom events from DeerFlowClient.stream() (#1827 ) * fix: expose custom client stream events Signed-off-by: suyua9 <1521777066@qq.com> * fix(client): normalize streamed custom mode values * test(client): satisfy backend ruff import ordering --------- Signed-off-by: suyua9 <1521777066@qq.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-06 10:09:39 +08:00
Chris Z	117fa9b05d	fix(channels): normalize slack allowed user ids (#1802 ) * fix(channels): normalize slack allowed user ids * style(channels): apply backend formatter --------- Co-authored-by: haimingZZ <15558128926@qq.com> Co-authored-by: suyua9 <1521777066@qq.com>	2026-04-05 18:04:21 +08:00
thefoolgy	8049785de6	fix(memory): case-insensitive fact deduplication and positive reinforcement detection (#1804 ) * fix(memory): case-insensitive fact deduplication and positive reinforcement detection Two fixes to the memory system: 1. _fact_content_key() now lowercases content before comparison, preventing semantically duplicate facts like "User prefers Python" and "user prefers python" from being stored separately. 2. Adds detect_reinforcement() to MemoryMiddleware (closes #1719), mirroring detect_correction(). When users signal approval ("yes exactly", "perfect", "完全正确", etc.), the memory updater now receives reinforcement_detected=True and injects a hint prompting the LLM to record confirmed preferences and behaviors with high confidence. Changes across the full signal path: - memory_middleware.py: _REINFORCEMENT_PATTERNS + detect_reinforcement() - queue.py: reinforcement_detected field in ConversationContext and add() - updater.py: reinforcement_detected param in update_memory() and update_memory_from_conversation(); builds reinforcement_hint alongside the existing correction_hint Tests: 11 new tests covering deduplication, hint injection, and signal detection (Chinese + English patterns, window boundary, conflict with correction). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): address Copilot review comments on reinforcement detection - Tighten _REINFORCEMENT_PATTERNS: remove 很好, require punctuation/end-of-string boundaries on remaining patterns, split this-is-good into stricter variants - Suppress reinforcement_detected when correction_detected is true to avoid mixed-signal noise - Use casefold() instead of lower() for Unicode-aware fact deduplication - Add missing test coverage for reinforcement_detected OR merge and forwarding in queue --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 16:23:00 +08:00
Evan Wu	9ca68ffaaa	fix: preserve virtual path separator style (#1828 ) * fix: preserve virtual path separator style * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-05 15:52:22 +08:00
Markus Corazzione	0ffe5a73c1	chroe(config):Increase subagent max-turn limits (#1852 )	2026-04-05 15:41:00 +08:00
Octopus	a283d4a02d	fix: include soul field in GET /api/agents list response (fixes #1819 ) (#1863 ) Previously, the list endpoint always returned soul=null because _agent_config_to_response() was called without include_soul=True. This caused confusion since PUT /api/agents/{name} and GET /api/agents/{name} both returned the soul content, but the list endpoint silently omitted it. Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>	2026-04-05 10:49:58 +08:00
DanielWalnut	2a150f5d4a	fix: unblock concurrent threads and workspace hydration (#1839 ) * fix: unblock concurrent threads and workspace hydration * fix: restore async title generation * fix: address PR review feedback * style: format lead agent prompt	2026-04-04 21:19:35 +08:00
SHIYAO ZHANG	163121d327	fix(uploads): handle split-bold headings and artefacts in extract_outline (#1838 ) * feat(uploads): guide agent to use grep/glob/read_file for uploaded documents Add workflow guidance to the <uploaded_files> context block so the agent knows to use grep and glob (added in #1784) alongside read_file when working with uploaded documents, rather than falling back to web search. This is the final piece of the three-PR PDF agentic search pipeline: - PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings - PR2 (#1738): document outline injected into agent context with line numbers - PR3 (this): agent guided to use outline + grep + read_file workflow * feat(uploads): add file-first priority and fallback guidance to uploaded_files context * fix(uploads): handle split-bold headings and artefacts in extract_outline - Add _clean_bold_title() to merge adjacent bold spans ( ) produced by pymupdf4llm when bold text crosses span boundaries - Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise <num> <title> headings common in academic papers; excludes pure-number table headers and rows with more than 4 bold blocks - When outline is empty, read first 5 non-empty lines of the .md as a content preview and surface a grep hint in the agent context - Update _format_file_entry to render the preview + grep hint instead of silently omitting the outline section - Add 3 new extract_outline tests and 2 new middleware tests (65 total) * fix(uploads): address Copilot review comments on extract_outline regex - Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII titles (e.g. 1 概述); pure-numeric/punctuation blocks still excluded - Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input - Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex) - Log debug message with exc_info when preview extraction fails	2026-04-04 14:25:08 +08:00
fengxsong	19809800f1	feat: support wecom channel (#1390 ) * feat: support wecom channel * fix: sending file to client Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * test: add unit tests for wecom channel Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * docs: add example configs and setup docs Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * revert pypi default index setting Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * revert: keeping codes in harness untouched Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * fix: format issue Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * fix: resolve Copilot comments Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> --------- Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-04 11:28:35 +08:00
ppyt	db82b59254	fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware (#1823 ) * fix: inject longTermBackground into memory prompt The format_memory_for_injection function only processed recentMonths and earlierContext from the history section, silently dropping longTermBackground. The LLM writes longTermBackground correctly and it persists to memory.json, but it was never injected into the system prompt — making the user's long-term background invisible to the AI. Add the missing field handling and a regression test. * fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware LangChain AIMessage.content can be str \| list. When using providers that return structured content blocks (e.g. Anthropic thinking mode, certain OpenAI-compatible gateways), content is a list of dicts like [{"type": "text", "text": "..."}]. The hard_limit branch in _apply() concatenated content with a string via (last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}", which raises TypeError when content is a non-empty list (list + str is invalid). Add _append_text() static method that: - Returns the text directly when content is None - Appends a {"type": "text"} block when content is a list - Falls back to string concatenation when content is a str This is consistent with how other modules in the project already handle list content (client.py._extract_text, memory_middleware, executor.py). * test(middleware): add unit tests for _append_text and list content hard stop Add regression tests to verify LoopDetectionMiddleware handles list-type AIMessage.content correctly during hard stop: - TestAppendText: unit tests for the new _append_text() static method covering None, str, list (including empty list) content types - TestHardStopWithListContent: integration tests verifying hard stop works correctly with list content (Anthropic thinking mode), None content, and str content Requested by reviewer in PR #1823. * fix(middleware): improve _append_text robustness and test isolation - Add explicit isinstance(content, str) check with fallback for unexpected types (coerce to str) to prevent TypeError on edge cases - Deep-copy list content in _make_state() test helper to prevent shared mutable references across test iterations - Add test_unexpected_type_coerced_to_str: verify fallback for non-str/list/None content types - Add test_list_content_not_mutated_in_place: verify _append_text does not modify the original list * style: fix ruff format whitespace in test file --------- Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>	2026-04-04 10:38:22 +08:00
SHIYAO ZHANG	ddfc988bef	feat(uploads): add pymupdf4llm PDF converter with auto-fallback and async offload (#1727 ) * feat(uploads): add pymupdf4llm PDF converter with auto-fallback and async offload - Introduce pymupdf4llm as an optional PDF converter with better heading detection and table preservation than MarkItDown - Auto mode: prefer pymupdf4llm when installed; fall back to MarkItDown when output is suspiciously sparse (image-based / scanned PDFs) - Sparsity check uses chars-per-page (< 50 chars/page) rather than an absolute threshold, correctly handling both short and long documents - Large files (> 1 MB) are offloaded to asyncio.to_thread() to avoid blocking the event loop (related: #1569) - Add UploadsConfig with pdf_converter field (auto/pymupdf4llm/markitdown) - Add pymupdf4llm as optional dependency: pip install deerflow-harness[pymupdf] - Add 14 unit tests covering sparsity heuristic, routing logic, and async path * fix(uploads): address Copilot review comments on PDF converter - Fix docstring: MIN_CHARS_PYMUPDF -> _MIN_CHARS_PER_PAGE (typo) - Fix file handle leak: wrap pymupdf.open in try/finally to ensure doc.close() - Fix silent fallback gap: _convert_pdf_with_pymupdf4llm now catches all conversion exceptions (not just ImportError), so encrypted/corrupt PDFs fall back to MarkItDown instead of propagating - Tighten type: pdf_converter field changed from str to Literal[auto\|pymupdf4llm\|markitdown] - Normalize config value: _get_pdf_converter() strips and lowercases the raw config string, warns and falls back to 'auto' on unknown values	2026-04-03 21:59:45 +08:00
SHIYAO ZHANG	5ff230eafd	feat(uploads): inject document outline into agent context for converted files (#1738 ) * feat(uploads): inject document outline into agent context for converted files Extract headings from converted .md files and inject them into the <uploaded_files> context block so the agent can navigate large documents by line number before reading. - Add `extract_outline()` to `file_conversion.py`: recognises standard Markdown headings (#/##/###) and SEC-style bold structural headings (ITEM N. BUSINESS, PART II); caps at 50 entries; excludes cover-page boilerplate (WASHINGTON DC, CURRENT REPORT, SIGNATURES) - Add `_extract_outline_for_file()` helper in `uploads_middleware.py`: looks for a sibling `.md` file produced by the conversion pipeline - Update `UploadsMiddleware._create_files_message()` to render the outline under each file entry with `L{line}: {title}` format and a `read_file` prompt for range-based reading - Tests: 10 new tests for `extract_outline()`, 4 new tests for outline injection in `UploadsMiddleware`; existing test updated for new `outline` field in `uploaded_files` state Partially addresses #1647 (agent ignores uploaded files). * fix(uploads): stream outline file reads and strip inline bold from heading titles - Switch extract_outline() from read_text().splitlines() to open()+line iteration so large converted documents are not loaded into memory on every agent turn; exits as soon as MAX_OUTLINE_ENTRIES is reached (Copilot suggestion) - Strip ... wrapper from standard Markdown heading titles before appending to outline so agent context stays clean (e.g. "## Overview" → "Overview") (Copilot suggestion) - Remove unused pathlib.Path import and fix import sort order in test_file_conversion.py to satisfy ruff CI lint * fix(uploads): show truncation hint when outline exceeds MAX_OUTLINE_ENTRIES When extract_outline() hits the cap it now appends a sentinel entry {"truncated": True} instead of silently dropping the rest of the headings. UploadsMiddleware reads the sentinel and renders a hint line: ... (showing first 50 headings; use `read_file` to explore further) Without this the agent had no way to know the outline was incomplete and would treat the first 50 headings as the full document structure. * fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id runtime.context does not always carry thread_id (depends on LangGraph invocation path). ThreadDataMiddleware already falls back to get_config().configurable.thread_id — apply the same pattern so UploadsMiddleware can resolve the uploads directory and attach outlines in all invocation paths. * style: apply ruff format --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-03 20:52:47 +08:00
d 🔹	6dbdd4674f	fix: guarantee END sentinel delivery when stream bridge queue is full (#1695 ) When MemoryStreamBridge queue reaches capacity, publish_end() previously used the same 30s timeout + drop strategy as regular events. If the END sentinel was dropped, subscribe() would loop forever waiting for it, causing the SSE connection to hang indefinitely and leaking _queues and _counters resources for that run_id. Changes: - publish_end() now evicts oldest regular events when queue is full to guarantee END sentinel delivery — the sentinel is the only signal that allows subscribers to terminate - Added per-run drop counters (_dropped_counts) with dropped_count() and dropped_total properties for observability - cleanup() and close() now clear drop counters - publish() logs total dropped count per run for easier debugging Tests: - test_end_sentinel_delivered_when_queue_full: verifies END arrives even with a completely full queue - test_end_sentinel_evicts_oldest_events: verifies eviction behavior - test_end_sentinel_no_eviction_when_space_available: no side effects when queue has room - test_concurrent_tasks_end_sentinel: 4 concurrent producer/consumer pairs all terminate properly - test_dropped_count_tracking, test_dropped_total, test_cleanup_clears_dropped_counts, test_close_clears_dropped_counts: drop counter coverage Closes #1689 Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>	2026-04-03 20:12:30 +08:00
Octopus	83039fa22c	fix: use SystemMessage+HumanMessage for follow-up question generation (#1751 ) * fix: use SystemMessage+HumanMessage for follow-up question generation (fixes #1697) Some models (e.g. MiniMax-M2.7) require the system prompt and user content to be passed as separate message objects rather than a single combined string. Invoking with a plain string sends everything as a HumanMessage, which causes these models to ignore the generation instructions and fail to produce valid follow-up questions. * test: verify model is invoked with SystemMessage and HumanMessage	2026-04-03 20:09:01 +08:00
finallylly	1694c616ef	feat(sandbox): add read-only support for local sandbox path mappings (#1808 )	2026-04-03 19:46:22 +08:00
DanielWalnut	c6cdf200ce	feat(sandbox): add built-in grep and glob tools (#1784 ) * feat(sandbox): add grep and glob tools * refactor(aio-sandbox): use native file search APIs * fix(sandbox): address review issues in grep/glob tools - aio_sandbox: use should_ignore_path() instead of should_ignore_name() for include_dirs=True branch to filter nested ignored paths correctly - aio_sandbox: add early exit when max_results reached in glob loop - aio_sandbox: guard entry.path.startswith(path) before stripping prefix - aio_sandbox: validate regex locally before sending to remote API - search: skip lines exceeding max_line_chars to prevent ReDoS - search: remove resolve() syscall in os.walk loop - tools: avoid double get_thread_data() call in glob_tool/grep_tool - tests: add 6 new cases covering the above code paths - tests: patch get_app_config in truncation test to isolate config * Fix sandbox grep/glob review feedback * Remove unrelated Langfuse RFC from PR	2026-04-03 16:03:06 +08:00
Admire	48565664e0	fix ACP mcpServers payload (#1735 ) * fix ACP mcpServers payload * Handle invalid ACP MCP config	2026-04-03 15:28:56 +08:00
knukn	76fad8b08d	feat(client): add `available_skills` parameter to DeerFlowClient (#1779 ) * feat(client): add `available_skills` parameter to DeerFlowClient for dynamic runtime skill filtering * Update backend/packages/harness/deerflow/client.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(client): include `agent_name` and `available_skills` in agent config cache key --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-03 11:22:58 +08:00
ppyt	5664b9d413	fix: inject longTermBackground into memory prompt (#1734 ) The format_memory_for_injection function only processed recentMonths and earlierContext from the history section, silently dropping longTermBackground. The LLM writes longTermBackground correctly and it persists to memory.json, but it was never injected into the system prompt — making the user's long-term background invisible to the AI. Add the missing field handling and a regression test. Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>	2026-04-03 11:21:58 +08:00
Subham Singhania	6de9c7b43f	Improve Python reliability in channel retries and thread typing (#1776 ) Agent-Logs-Url: https://github.com/0xxy0/deer-flow/sessions/95336da6-e16d-43b4-834a-e5534c9396c5 Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-04-03 07:50:11 +08:00
moose-lab	f56d0b4869	fix(sandbox): exclude URL paths from absolute path validation (#1385 ) (#1419 ) * fix(sandbox): URL路径被误判为不安全绝对路径 (#1385) 在本地沙箱模式下，bash工具对命令做绝对路径安全校验时，会把curl命令中的 HTTPS URL（如 https://example.com/api/v1/check）误识别为本地绝对路径并拦截。根因：_ABSOLUTE_PATH_PATTERN 正则的负向后行断言 (?<![:\w]) 只排除了冒号和单词字符，但 :// 中第二个斜杠前面是第一个斜杠（/），不在排除列表中，导致 //example.com/api/... 被匹配为绝对路径 /example.com/api/...。修复：在负向后行断言中增加斜杠字符，改为 (?<![:\w/])，使得 :// 中的连续斜杠不会触发绝对路径匹配。同时补充了URL相关的单元测试用例。 Signed-off-by: moose-lab <moose-lab@users.noreply.github.com> * fix(sandbox): refine absolute path regex to preserve file:// defense-in-depth Change lookbehind from (?<![:\w/]) to (?<![:\w])(?<!:/) so only the second slash in :// sequences is excluded. This keeps URL paths from false-positiving while still letting the regex detect /etc/passwd in file:///etc/passwd. Also add explicit file:// URL blocking and tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: moose-lab <moose-lab@users.noreply.github.com> Co-authored-by: moose-lab <moose-lab@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-02 16:09:14 +08:00
Varian_米泽	a2cb38f62b	fix: prevent concurrent subagent file write conflicts in sandbox tools (#1714 ) * fix: prevent concurrent subagent file write conflicts Serialize same-path str_replace operations in sandbox tools Guard AioSandbox write_file/update_file with the existing sandbox lock Add regression tests for concurrent str_replace and append races Verify with backend full tests and ruff lint checks * fix(sandbox): Fix the concurrency issue of file operations on the same path in isolated sandboxes. Ensure that different sandbox instances use independent locks for file operations on the same virtual path to avoid concurrency conflicts. Change the lock key from a single path to a composite key of (sandbox.id, path), and add tests to verify the concurrent safety of isolated sandboxes. * feat(sandbox): Extract file operation lock logic to standalone module and fix concurrency issues Extract file operation lock related logic from tools.py into a separate file_operation_lock.py module. Fix data race issues during concurrent str_replace and write_file operations.	2026-04-02 15:39:41 +08:00
knukn	f8fb8d6fb1	feat/per agent skill filter (#1650 ) * feat(agent): 为AgentConfig添加skills字段并更新lead_agent系统提示在AgentConfig中添加skills字段以支持配置agent可用技能更新lead_agent的系统提示模板以包含可用技能信息 * fix: resolve agent skill configuration edge cases and add tests * Update backend/packages/harness/deerflow/agents/lead_agent/prompt.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * refactor(agent): address PR review comments for skills configuration - Add detailed docstring to `skills` field in `AgentConfig` to clarify the semantics of `None` vs `[]`. - Add unit tests in `test_custom_agent.py` to verify `load_agent_config()` correctly parses omitted skills and explicit empty lists. - Fix `test_make_lead_agent_empty_skills_passed_correctly` to include `agent_name` in the runtime config, ensuring it exercises the real code path. * docs: 添加关于按代理过滤技能的配置说明在配置示例文件和文档中添加说明，解释如何通过代理的config.yaml文件限制加载的技能 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-02 15:02:09 +08:00
totoyang	2d1f90d5dc	feat(tracing): add optional Langfuse support (#1717 ) * feat(tracing): add optional Langfuse support * Fix tracing fail-fast behavior for explicitly enabled providers * fix(lint)	2026-04-02 13:06:10 +08:00
肖	3a672b39c7	Fix/1681 llm call retry handling (#1683 ) * fix(runtime): handle llm call errors gracefully * fix(runtime): preserve graph control flow in llm retry middleware --------- Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>	2026-04-02 10:12:17 +08:00
SHIYAO ZHANG	df5339b5d0	feat(sandbox): truncate oversized bash and read_file tool outputs (#1677 ) * feat(sandbox): truncate oversized bash and read_file tool outputs Long tool outputs (large directory listings, multi-MB source files) can overflow the model's context window. Two new configurable limits: - bash_output_max_chars (default 20000): middle-truncates bash output, preserving both head and tail so stderr at the end is not lost - read_file_output_max_chars (default 50000): head-truncates file output with a hint to use start_line/end_line for targeted reads Both limits are enforced at the tool layer (sandbox/tools.py) rather than middleware, so truncation is guaranteed regardless of call path. Setting either limit to 0 disables truncation entirely. Measured: read_file on a 250KB source file drops from 63,698 tokens to 19,927 tokens (69% reduction) with the default limit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tests): remove unused pytest import and fix import sort order * style: apply ruff format to sandbox/tools.py * refactor(sandbox): address Copilot review feedback on truncation feature - strict hard cap: while-loop ensures result (including marker) ≤ max_chars - max_chars=0 now returns "" instead of original output - get_app_config() wrapped in try/except with fallback to defaults - sandbox_config.py: add ge=0 validation on truncation limit fields - config.example.yaml: bump config_version 4→5 - tests: add len(result) <= max_chars assertions, edge-case (max=0, small max, various sizes) tests; fix skipped-count test for strict hard cap * refactor(sandbox): replace while-loop truncation with fixed marker budget Use a pre-allocated constant (_MARKER_MAX_LEN) instead of a convergence loop to ensure result <= max_chars. Simpler, safer, and skipped-char count in the marker is now an exact predictable value. * refactor(sandbox): compute marker budget dynamically instead of hardcoding * fix(sandbox): make max_chars=0 disable truncation instead of returning empty string --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: JeffJiang <for-eleven@hotmail.com>	2026-04-02 09:22:41 +08:00
LYU Yichen	0a379602b8	fix: avoid treating Feishu file paths as commands (#1654 ) Feishu channel classified any slash-prefixed text (including absolute paths such as /mnt/user-data/...) as a COMMAND, causing them to be misrouted through the command pipeline instead of the chat pipeline. Fix by introducing a shared KNOWN_CHANNEL_COMMANDS frozenset in app/channels/commands.py — the single authoritative source for the set of supported slash commands. Both the Feishu inbound parser and the ChannelManager's unknown-command reply now derive from it, so adding or removing a command requires only one edit. Changes: - app/channels/commands.py (new): defines KNOWN_CHANNEL_COMMANDS - app/channels/feishu.py: replace local KNOWN_FEISHU_COMMANDS with the shared constant; _is_feishu_command() now gates on it - app/channels/manager.py: import KNOWN_CHANNEL_COMMANDS and use it in the unknown-command fallback reply so the displayed list stays in sync - tests/test_feishu_parser.py: parametrize over every entry in KNOWN_CHANNEL_COMMANDS (each must yield msg_type=command) and add parametrized chat cases for /unknown, absolute paths, etc. Made with Cursor Made-with: Cursor Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-01 23:23:00 +08:00
Jason	1fb5acee39	fix(gateway): prevent 400 error when client sends context with configurable (#1660 ) * fix(gateway): prevent 400 error when client sends context with configurable Fixes #1290 LangGraph >= 0.6.0 rejects requests that include both 'configurable' and 'context' in the run config. If the client (e.g. useStream hook) sends a 'context' key, we now honour it and skip creating our own 'configurable' dict to avoid the conflict. When no 'context' is provided, we fall back to the existing 'configurable' behaviour with thread_id. * fix(gateway): address review feedback — warn on dual keys, fix runtime injection, add tests - Log a warning when client sends both 'context' and 'configurable' so it's no longer silently dropped (reviewer feedback) - Ensure thread_id is available in config['context'] when present so middlewares can find it there too - Add test coverage for the context path, the both-keys-present case, passthrough of other keys, and the no-config fallback * style: ruff format services.py --------- Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-01 23:21:32 +08:00
Alian	e97c8c9943	fix(skills): support parsing multiline YAML strings in SKILL.md frontmatter (#1703 ) * fix(skills): support parsing multiline YAML strings in SKILL.md frontmatter * test(skills): add tests for multiline YAML descriptions	2026-04-01 23:08:30 +08:00
rayhpeng	c2ff59a5b1	fix(gateway): merge context field into configurable for langgraph-compat runs (#1699 ) (#1707 ) The langgraph-compat layer dropped the DeerFlow-specific `context` field from run requests, causing agent config (subagent_enabled, is_plan_mode, thinking_enabled, etc.) to fall back to defaults. Add `context` to RunCreateRequest and merge allowlisted keys into config.configurable in start_run, with existing configurable values taking precedence. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 17:17:09 +08:00
Shengyuan Wang	2f3744f807	refactor: replace sync requests with async httpx in Jina AI client (#1603 ) * refactor: replace sync requests with async httpx in Jina AI client Replace synchronous `requests.post()` with `httpx.AsyncClient` in JinaClient.crawl() and make web_fetch_tool async. This is part of the planned async concurrency optimization for the agent hot path (see docs/TODO.md). * fix: address Copilot review feedback on async Jina client - Short-circuit error strings in web_fetch_tool before passing to ReadabilityExtractor, preventing misleading extraction results - Log missing JINA_API_KEY warning only once per process to reduce noise under concurrent async fetching - Use logger.exception instead of logger.error in crawl exception handler to preserve stack traces for debugging - Add async web_fetch_tool tests and warn-once coverage * fix: mock get_app_config in web_fetch_tool tests for CI The web_fetch_tool tests failed in CI because get_app_config requires a config.yaml file that isn't present in the test environment. Mock the config loader to remove the filesystem dependency. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-01 17:02:39 +08:00
AochenShen99	0cdecf7b30	feat(memory): structured reflection + correction detection in MemoryMiddleware (#1620 ) (#1668 ) * feat(memory): add structured reflection and correction detection * fix(memory): align sourceError schema and prompt guidance --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-01 16:45:29 +08:00
LYU Yichen	3e461d9d08	fix: use safe docker bind mount syntax for sandbox mounts (#1655 ) Docker's -v host:container syntax is ambiguous for Windows drive-letter paths (e.g. D:/...) because ':' is both the drive separator and the volume separator, causing mount failures on Windows hosts. Introduce _format_container_mount() which uses '--mount type=bind,...' for Docker (unambiguous on all platforms) and keeps '-v' for Apple Container runtime which does not support the --mount flag yet. Adds unit tests covering Windows paths, read-only mounts, and Apple Container pass-through. Made-with: Cursor	2026-04-01 11:42:12 +08:00
d 🔹	6ff60f2af1	fix(gateway): forward assistant_id as agent_name in build_run_config (#1667 ) * fix(gateway): forward assistant_id as agent_name in build_run_config Fixes #1644 When the LangGraph Platform-compatible /runs endpoint receives a custom assistant_id (e.g. 'finalis'), the Gateway's build_run_config() silently ignored it — configurable['agent_name'] was never set, so make_lead_agent fell through to the default lead agent and SOUL.md was never loaded. Root cause (introduced in #1403): resolve_agent_factory() correctly falls back to make_lead_agent for all assistant_id values, but build_run_config() had no assistant_id parameter and never injected configurable['agent_name']. The full call chain: POST /runs (assistant_id='finalis') → resolve_agent_factory('finalis') # returns make_lead_agent ✓ → build_run_config(thread_id, ...) # no agent_name injected ✗ → make_lead_agent(config) → cfg.get('agent_name') → None → load_agent_soul(None) → base SOUL.md (doesn't exist) → None Fix: - Add keyword-only parameter to build_run_config(). - When assistant_id is set and differs from 'lead_agent', inject it as configurable['agent_name'] (matching the channel manager's existing _resolve_run_params() logic for IM channels). - Honour an explicit configurable['agent_name'] in the request body; assistant_id mapping only fills the gap when it is absent. - Remove stale log-only branch from resolve_agent_factory(); update docstring to explain the factory/configurable split. Tests added (test_gateway_services.py): - Custom assistant_id injects configurable['agent_name'] - 'lead_agent' assistant_id does NOT inject agent_name - None assistant_id does NOT inject agent_name - Explicit configurable['agent_name'] in request is not overwritten - resolve_agent_factory returns make_lead_agent for all inputs * style: format with ruff * fix: validate and normalize assistant_id to prevent path traversal Addresses Copilot review: strip/lowercase/replace underscores and reject names that don't match [a-z0-9-]+, consistent with ChannelManager._normalize_custom_agent_name(). --------- Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>	2026-04-01 11:15:56 +08:00
Matt Van Horn	a3bfea631c	fix(sandbox): serialize concurrent exec_command calls in AioSandbox (#1435 ) * fix(sandbox): serialize concurrent exec_command calls in AioSandbox The AIO sandbox container maintains a single persistent shell session that corrupts when multiple exec_command requests arrive concurrently (e.g. when ToolNode issues parallel tool_calls). The corrupted session returns 'ErrorObservation' strings as output, cascading into subsequent commands. Add a threading.Lock to AioSandbox to serialize shell commands. As a secondary defense, detect ErrorObservation in output and retry with a fresh session ID. Fixes #1433 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(sandbox): address Copilot review findings - Fix shell injection in list_dir: use shlex.quote(path) to escape user-provided paths in the find command - Narrow ErrorObservation retry condition from broad substring match to the specific corruption signature to prevent false retries - Improve test_lock_prevents_concurrent_execution: use threading.Barrier to ensure all workers contend for the lock simultaneously - Improve test_list_dir_uses_lock: assert lock.locked() is True during exec_command to verify lock acquisition * style: auto-format with ruff --------- Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:33:35 +08:00
Admire	aae59a8ba8	fix: surface configured sandbox mounts to agents (#1638 ) * fix: surface configured sandbox mounts to agents * fix: address PR review feedback --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-31 22:22:30 +08:00
Admire	3ff15423d6	fix Windows Docker sandbox path mounting (#1634 ) * fix windows docker sandbox paths * fix windows sandbox mount validation * fix backend checks for windows sandbox path PR	2026-03-31 22:19:27 +08:00
Admire	9a557751d6	feat: support memory import and export (#1521 ) * feat: support memory import and export * fix(memory): address review feedback * style: format memory settings page --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-30 17:25:47 +08:00
rayhpeng	34e835bc33	feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli (#1403 ) * feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli Implement all core LangGraph Platform API endpoints in the Gateway, allowing it to fully replace the langgraph-cli dev server for local development. This eliminates a heavyweight dependency and simplifies the development stack. Changes: - Add runs lifecycle endpoints (create, stream, wait, cancel, join) - Add threads CRUD and search endpoints - Add assistants compatibility endpoints (search, get, graph, schemas) - Add StreamBridge (in-memory pub/sub for SSE) and async provider - Add RunManager with atomic create_or_reject (eliminates TOCTOU race) - Add worker with interrupt/rollback cancel actions and runtime context injection - Route /api/langgraph/* to Gateway in nginx config - Skip langgraph-cli startup by default (SKIP_LANGGRAPH_SERVER=0 to restore) - Add unit tests for RunManager, SSE format, and StreamBridge * fix: drain bridge queue on client disconnect to prevent backpressure When on_disconnect=continue, keep consuming events from the bridge without yielding, so the worker is not blocked by a full queue. Only on_disconnect=cancel breaks out immediately. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: remove pytest import Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: Fix default stream_mode to ["values", "messages-tuple"] Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: Remove unused if_exists field from ThreadCreateRequest Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: address review comments on gateway LangGraph API - Mount runs.py router in app.py (missing include_router) - Normalize interrupt_before/after "" to node list before run_agent() - Use entry.id for SSE event ID instead of counter - Drain bridge queue on disconnect when on_disconnect=continue - Reuse serialization helper in wait_run() for consistent wire format - Reject unsupported multitask_strategy with 400 - Remove SKIP_LANGGRAPH_SERVER fallback, always use Gateway feat: extract app.state access into deps.py Encapsulate read/write operations for singleton objects (RunManager, StreamBridge, checkpointer) held in app.state into a shared utility, reducing repeated access patterns across router modules. * feat: extract deerflow.runtime.serialization module with tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace duplicated serialization with deerflow.runtime.serialization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: extract app/gateway/services.py with run lifecycle logic Create a service layer that centralizes SSE formatting, input/config normalization, and run lifecycle management. Router modules will delegate to these functions instead of using private cross-imported helpers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: wire routers to use services layer, remove cross-module private imports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: apply ruff formatting to refactored files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(runtime): support LangGraph dev server and add compat route - Enable official LangGraph dev server for local development workflow - Decouple runtime components from agents package for better separation - Provide gateway-backed fallback route when dev server is skipped - Simplify lifecycle management using context manager in gateway * feat(runtime): add Store providers with auto-backend selection - Add async_provider.py and provider.py under deerflow/runtime/store/ - Support memory, sqlite, postgres backends matching checkpointer config - Integrate into FastAPI lifespan via AsyncExitStack in deps.py - Replace hardcoded InMemoryStore with config-driven factory * refactor(gateway): migrate thread management from checkpointer to Store and resolve multiple endpoint failures - Add Store-backed CRUD helpers (_store_get, _store_put, _store_upsert) - Replace checkpoint-scanning search with two-phase strategy: phase 1 reads Store (O(threads)), phase 2 backfills from checkpointer for legacy/LangGraph Server threads with lazy migration - Extend Store record schema with values field for title persistence - Sync thread title from checkpoint to Store after run completion - Fix /threads/{id}/runs/{run_id}/stream 405 by accepting both GET and POST methods; POST handles interrupt/rollback actions - Fix /threads/{id}/state 500 by separating read_config and write_config, adding checkpoint_ns to configurable, and shallow-copying checkpoint/metadata before mutation - Sync title to Store on state update for immediate search reflection - Move _upsert_thread_in_store into services.py, remove duplicate logic - Add _sync_thread_title_after_run: await run task, read final checkpoint title, write back to Store record - Spawn title sync as background task from start_run when Store exists * refactor(runtime): deduplicate store and checkpointer provider logic Extract _ensure_sqlite_parent_dir() helper into checkpointer/provider.py and use it in all three places that previously inlined the same mkdir logic. Consolidate duplicate error constants in store/async_provider.py by importing from store/provider.py instead of redefining them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(runtime): move SQLite helpers to runtime/store, checkpointer imports from store _resolve_sqlite_conn_str and _ensure_sqlite_parent_dir now live in runtime/store/provider.py. agents/checkpointer/provider and agents/checkpointer/async_provider import from there, reversing the previous dependency direction (store → checkpointer becomes checkpointer → store). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(runtime): extract SQLite helpers into runtime/store/_sqlite_utils.py Move resolve_sqlite_conn_str and ensure_sqlite_parent_dir out of checkpointer/provider.py into a dedicated _sqlite_utils module. Functions are now public (no underscore prefix), making cross-module imports semantically correct. All four provider files import from the single shared location. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(gateway): use adelete_thread to fully remove thread checkpoints on delete AsyncSqliteSaver has no adelete method — the previous hasattr check always evaluated to False, silently leaving all checkpoint rows in the database. Switch to adelete_thread(thread_id) which deletes every checkpoint and pending-write row for the thread across all namespaces (including sub-graph checkpoints). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(gateway): remove dead bridge_cm/ckpt_cm code and fix StrEnum lint app.py had unreachable code after the async-with lifespan refactor: bridge_cm and ckpt_cm were referenced but never defined (F821), and the channel service startup/shutdown was outside the langgraph_runtime block so it never ran. Move channel service lifecycle inside the async-with block where it belongs. Replace str+Enum inheritance in RunStatus and DisconnectMode with StrEnum as suggested by UP042. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: format with ruff --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: JeffJiang <for-eleven@hotmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-30 16:02:23 +08:00
d 🔹	9bcdba6038	fix: promote deferred tools after tool_search returns schema (#1570 ) * fix: promote matched tools from deferred registry after tool_search returns schema After tool_search returns a tool's full schema, the tool is promoted (removed from the deferred registry) so DeferredToolFilterMiddleware stops filtering it from bind_tools on subsequent LLM calls. Without this, deferred tools are permanently filtered — the LLM gets the schema from tool_search but can never invoke the tool because the middleware keeps stripping it. Fixes #1554 * test: add promote() and tool_search promotion tests Tests cover: - promote removes tools from registry - promote nonexistent/empty is no-op - search returns nothing after promote - middleware passes promoted tools through - tool_search auto-promotes matched tools (select + keyword) * fix: address review — lint blank line + empty registry guard - Add missing blank line between FakeRequest methods (E301) - Use 'if not registry' to handle empty registries consistently --------- Co-authored-by: d 🔹 <258577966+voidborne-d@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-30 11:23:15 +08:00
SHIYAO ZHANG	9aa3ff7c48	feat(sandbox): add SandboxAuditMiddleware for bash command security auditing (#1532 ) * feat(sandbox): add SandboxAuditMiddleware for bash command security auditing Addresses the LocalSandbox escape vector reported in #1224 where bash tool calls can execute destructive commands against the host filesystem. - Add SandboxAuditMiddleware with three-tier command classification: - High-risk (block): rm -rf /, curl\|bash, dd if=, mkfs, /etc/shadow access - Medium-risk (warn): pip install, apt install, chmod 777 - Safe (pass): normal workspace operations - Register middleware after GuardrailMiddleware in _build_runtime_middlewares, applied to both lead agent and subagents - Structured audit log via standard logger (visible in langgraph.log) - Medium-risk commands execute but append a warning to the tool result, allowing the LLM to self-correct without blocking legitimate workflows - High-risk commands return an error ToolMessage without calling the handler, so the agent loop continues gracefully * fix(lint): sort imports in test_sandbox_audit_middleware * refactor(sandbox-audit): address Copilot review feedback (3/5/6) - Fix class docstring to match implementation: medium-risk commands are executed with a warning appended (not rejected), and cwd anchoring note removed (handled in a separate PR) - Remove capsys.disabled() from benchmark test to avoid CI log noise; keep assertions for recall/precision targets - Remove misleading 'cwd fix' from test module docstring * test(sandbox-audit): add async tests for awrap_tool_call * fix(sandbox-audit): address Copilot review feedback (1/2) - Narrow rm high-risk regex to only block truly destructive targets (/, /, ~, ~/, /home, /root); legitimate workspace paths like /mnt/user-data/ are no longer false-positived - Handle list-typed ToolMessage content in _append_warn_to_result; append a text block instead of str()-ing the list to avoid breaking structured content normalization * style: apply ruff format to sandbox_audit_middleware files * fix(sandbox-audit): update benchmark comment to match assert-based implementation --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-30 07:48:31 +08:00
Markus Corazzione	5ceb19f6f6	fix(oauth): Harden Claude OAuth cache-control handling (#1583 )	2026-03-30 07:41:18 +08:00
Admire	fc7de7fffe	feat: support manual add and edit for memory facts (#1538 ) * feat: support manual add and edit for memory facts * fix: restore memory updater save helper * fix: address memory fact review feedback * fix: remove duplicate memory fact edit action * docs: simplify memory fact review setup * docs: relax memory review startup instructions * fix: clear rebase marker in memory settings page * fix: address memory fact review and format issues * fix: address memory fact review feedback * refactor: make memory fact updates explicit patch semantics --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-29 23:53:23 +08:00
SHIYAO ZHANG	cdb2a3a017	fix(sandbox): anchor relative paths to thread workspace in local mode (#1522 ) * fix(task_tool): fallback to configurable thread_id when context is missing task_tool only read thread_id from runtime.context, but when invoked via LangGraph Server, thread_id lives in config.configurable instead. Add the same fallback that ThreadDataMiddleware uses (PR #1237). Fixes subagent execution failure: 'Thread ID is required in runtime context or config.configurable' * remove debug logging from task_tool * fix(sandbox): anchor relative paths to thread workspace in local mode In local sandbox mode, bash commands using relative paths were resolved against the langgraph server process cwd (backend/) instead of the per-thread workspace directory. This allowed relative-path writes to escape the thread isolation boundary. Root cause: validate_local_bash_command_paths and replace_virtual_paths_in_command only process absolute paths (scanning for '/' prefix). Relative paths pass through untouched and inherit the process cwd at subprocess.run time. Fix: after virtual path translation, prepend `cd {workspace} &&` to anchor the shell's cwd to the thread-isolated workspace directory before execution. shlex.quote() ensures paths with spaces or special characters are handled safely. This mirrors the approach used by OpenHands (fixed cwd at execution layer) and is the correct fix for local mode where each subprocess.run is an independent process with no persistent shell session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(sandbox): extract _apply_cwd_prefix and add unit tests Extract the workspace cd-prefix logic from bash_tool into a dedicated _apply_cwd_prefix() helper so it can be unit-tested in isolation. Add four tests covering: normal prefix, no thread_data, missing workspace_path, and paths with spaces (shlex.quote). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * revert: remove unrelated configurable thread_id fallback from sandbox/tools.py This change belongs in a separate PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: remove trailing whitespace in test_sandbox_tools_security --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-29 23:21:06 +08:00
Admire	68c9e09a7a	fix: add Windows shell fallback for local sandbox (#1505 ) * fix: add Windows shell fallback for local sandbox * fix: handle PowerShell execution on Windows * fix: handle Windows local shell execution --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-29 21:31:29 +08:00
13ernkastel	92c7a20cb7	[Security] Address critical host-shell escape in LocalSandboxProvider (#1547 ) * fix(security): disable host bash by default in local sandbox * fix(security): address review feedback for local bash hardening * fix(ci): sort live test imports for lint * style: apply backend formatter --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-29 21:03:58 +08:00
DAN	9e5ba74ecd	fix(sandbox): allow MCP filesystem server paths in local bash commands (#1527 ) * feat/bug-fix: copy the allowed path configurations in MCP filesystem tools to bash tool. With updated unit test * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-29 17:10:27 +08:00
greatmengqi	25df82cbfd	style: format unformatted files and add .omc/ to prettierignore (#1539 ) Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>	2026-03-29 16:45:31 +08:00
greatmengqi	084dc7e748	ci: enforce code formatting checks for backend and frontend (#1536 )	2026-03-29 15:34:38 +08:00
greatmengqi	06a623f9c8	feat: add create_deerflow_agent SDK entry point (Phase 1) (#1203 )	2026-03-29 15:31:18 +08:00
Admire	7eb3a150b5	feat: add memory management actions and local filters in memory settings (#1467 ) * Add MVP memory management actions * Fix memory settings locale coverage * Polish memory management interactions * Add memory search and type filters * Refine memory settings review feedback * docs: simplify memory settings review setup * fix: restore memory updater compatibility helpers * fix: address memory settings review feedback * docs: soften memory sample review wording --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: JeffJiang <for-eleven@hotmail.com>	2026-03-29 13:14:45 +08:00
knukn	481494b9c0	feat(client): support custom middleware injection (#1520 ) * feat(client): support custom middleware injection Add support for custom middleware, allowing custom middleware list to be passed when initializing DeerFlowClient. These middleware will be injected after the default middleware when creating the agent, extending the agent's functionality. * feat: inject custom middlewares before ClarificationMiddleware to preserve ordering - Add `custom_middlewares` param to `_build_middlewares` - Inject custom middlewares right before `ClarificationMiddleware` to keep it as the last in the chain - Remove unsafe `.extend()` in `client.py` - Update tests in `test_client.py` and `test_lead_agent_model_resolution.py` to assert correct injection ordering	2026-03-29 11:24:46 +08:00
Nan Gao	89183ae76a	fix(channel): reject concurrent same-thread runs (#1465 ) (#1475 ) * fix(channel): reject concurrent same-thread runs (#1465) * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(lint): sort imports in manager.py and test_channels.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(channel): widen _is_thread_busy_error to BaseException and downgrade busy log to warning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-29 09:55:47 +08:00
DanielWalnut	18e3487888	Support custom channel assistant IDs via lead_agent (#1500 ) * Support custom channel assistant IDs via lead agent * Normalize custom channel agent names	2026-03-28 19:07:38 +08:00
DanielWalnut	c2dd8937ed	Fix IM channel backend URLs in Docker (#1497 ) * Fix IM channel backend URLs in Docker * Address Copilot review comments	2026-03-28 16:37:41 +08:00
taka6745	43ef3691a5	fix(oauth): inject billing header for Claude oAuth Models (#1442 ) * fix(oauth): inject billing header for non-Haiku model access The Anthropic Messages API requires a billing identification block in the system prompt when using Claude Code OAuth tokens (sk-ant-oat) to access non-Haiku models (Opus, Sonnet). Without it, the API returns a generic 400 "Error" with no actionable detail. This was discovered by intercepting Claude Code CLI requests — the CLI injects an `x-anthropic-billing-header` text block as the first system prompt entry on every request. Third-party consumers of the same OAuth tokens must do the same. Changes: - Add `_apply_oauth_billing()` to `ClaudeChatModel` that prepends the billing header block to the system prompt when `_is_oauth` is True - Add `metadata.user_id` with device/session identifiers (required by the API alongside the billing header) - Called from `_get_request_payload()` before prompt caching runs Verified with Claude Max OAuth tokens against all three model tiers: - claude-opus-4-6: 200 OK - claude-sonnet-4-6: 200 OK - claude-haiku-4-5-20251001: 200 OK (was already working) Closes #1245 fix(oauth): address review feedback on billing header injection - Make OAUTH_BILLING_HEADER configurable via ANTHROPIC_BILLING_HEADER env var - Normalize billing block to always be first in system list (strip + reinsert) - Guard metadata with isinstance check for non-dict values - Replace os.uname() with socket.gethostname() for Windows compat - Fix docstrings to say "all OAuth requests" instead of "non-Haiku" - Move inline imports to module level (fixes ruff I001) - Add 9 unit tests for _apply_oauth_billing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 08:49:34 +08:00
Andrew Barnes	50f50d7654	test: add unit tests for skill frontmatter validation (#1309 ) * test: add unit tests for skill frontmatter validation Cover _validate_skill_frontmatter logic: - Valid minimal and full-field skills - Missing SKILL.md, missing frontmatter, invalid YAML - Required field validation (name, description) - Unexpected key rejection - Name format: hyphen-case, no leading/trailing/consecutive hyphens - Name and description length limits - Angle bracket rejection in description * test: fix unused variables flagged by ruff F841 Replace unused tuple elements with _ and add assertions on msg/name return values in success-path tests. * test: address review feedback on unused variables * test: consolidate validation tests into single module Move the UTF-8/windows-locale test from test_skills_router.py into test_skills_validation.py and remove test_skills_router.py to eliminate duplicated assertions and future maintenance drift. * fix: match assertion strings to actual validation messages --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-27 20:20:31 +08:00
DanielWalnut	8590249db4	feat(acp): add env field to ACPAgentConfig for subprocess env injection (#1447 ) Allow per-agent environment variables to be declared in config.yaml under acp_agents.<name>.env. Values prefixed with $ are resolved from the host environment at invocation time, consistent with other config fields. Passes None to spawn_agent_process when env is empty so the subprocess inherits the parent environment unchanged. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 20:03:30 +08:00
Admire	40a4acbbed	fix(sandbox): Relax upload permissions for aio sandbox sync (#1409 ) * Relax upload permissions for aio sandbox sync * Harden upload permission sync checks --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-27 17:37:44 +08:00
luo jiyin	43a19f9627	fix(task): avoid blocking in task tool polling (#1320 ) * fix: avoid blocking in task tool polling * test: adapt task tool polling tests for async tool * fix: clean up cancelled task tool polling --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-27 17:12:40 +08:00
knukn	1c542ab7f1	feat(memory): Introduce configurable memory storage abstraction (#1353 ) * feat(内存存储): 添加可配置的内存存储提供者支持实现内存存储的抽象基类 MemoryStorage 和文件存储实现 FileMemoryStorage 重构内存数据加载和保存逻辑到存储提供者中添加 storage_class 配置项以支持自定义存储提供者 * refactor(memory): 重构内存存储模块并更新相关测试将内存存储逻辑从updater模块移动到独立的storage模块使用存储接口模式替代直接文件操作更新所有相关测试以使用新的存储接口 * Update backend/packages/harness/deerflow/agents/memory/storage.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/packages/harness/deerflow/agents/memory/storage.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(内存存储): 添加线程安全锁并增加测试用例添加线程锁确保内存存储单例初始化的线程安全增加对无效代理名称的验证测试补充单例线程安全性和异常处理的测试用例 * Update backend/tests/test_memory_storage.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(agents): 使用统一模式验证代理名称修改代理名称验证逻辑以使用仓库中定义的AGENT_NAME_PATTERN模式，确保代码库一致性并防止路径遍历等安全问题。同时更新测试用例以覆盖更多无效名称情况。 --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-27 07:41:06 +08:00
13ernkastel	0d3cefaa5a	fix(gateway): enforce safe download for active artifact MIME types to mitigate stored XSS (#1389 ) * docs: refocus security review on high-confidence artifact XSS * fix(gateway): block inline active-content artifacts to mitigate XSS * chore: remove security review markdown from PR * Delete SECURITY_REVIEW.md * fix(gateway): harden artifact attachment handling	2026-03-26 17:44:25 +08:00
Admire	b9583f7204	Fix Windows backend test compatibility (#1384 ) * Fix Windows backend test compatibility * Preserve ACP path style on Windows * Fix installer import ordering * Address review comments for Windows fixes --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-26 17:39:16 +08:00
Willem Jiang	a087fe7bcc	fix(LLM): fixing Gemini thinking + tool calls via OpenAI gateway (#1180 ) (#1205 ) * fix(LLM): fixing Gemini thinking + tool calls via OpenAI gateway (#1180) When using Gemini with thinking enabled through an OpenAI-compatible gateway, the API requires that fields on thinking content blocks are preserved and echoed back verbatim in subsequent requests. Standard silently drops these signatures when serializing messages, causing HTTP 400 errors: Changes: - Add PatchedChatOpenAI adapter that re-injects signed thinking blocks into request payloads, preserving the signature chain across multi-turn conversations with tool calls. - Support two LangChain storage patterns: additional_kwargs.thinking_blocks and content list. - Add 11 unit tests covering signed/unsigned blocks, storage patterns, edge cases, and precedence rules. - Update config.example.yaml with Gemini + thinking gateway example. - Update CONFIGURATION.md with detailed guidance and error explanation. Fixes: #1180 * Updated the patched_openai.py with thought_signature of function call * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * docs: fix inaccurate thought_signature description in CONFIGURATION.md (#1220) * Initial plan * docs: fix CONFIGURATION.md wording for thought_signature - tool-call objects, not thinking blocks Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/360f5226-4631-48a7-a050-189094af8ffe --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>	2026-03-26 15:07:05 +08:00
Admire	080a03f3bc	fix(config): fix summarization model alias resolution (#1378 ) Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-26 14:48:45 +08:00
DanielWalnut	d119214fee	feat(harness): integration ACP agent tool (#1344 ) * refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.) and app (app.) Physically split the monolithic backend/src/ package into two layers: - Harness (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - App* (`app/`): unpublished application code with import prefix `app.`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src. → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.` or `import app.` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow./app. after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(harness): add tool-first ACP agent invocation (#37) * feat(harness): add tool-first ACP agent invocation * build(harness): make ACP dependency required * fix(harness): address ACP review feedback * feat(harness): decouple ACP agent workspace from thread data ACP agents (codex, claude-code) previously used per-thread workspace directories, causing path resolution complexity and coupling task execution to DeerFlow's internal thread data layout. This change: - Replace _resolve_cwd() with a fixed _get_work_dir() that always uses {base_dir}/acp-workspace/, eliminating virtual path translation and thread_id lookups - Introduce /mnt/acp-workspace virtual path for lead agent read-only access to ACP agent output files (same pattern as /mnt/skills) - Add security guards: read-only validation, path traversal prevention, command path allowlisting, and output masking for acp-workspace - Update system prompt and tool description to guide LLM: send self-contained tasks to ACP agents, copy results via /mnt/acp-workspace - Add 11 new security tests for ACP workspace path handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(prompt): inject ACP section only when ACP agents are configured The ACP agent guidance in the system prompt is now conditionally built by _build_acp_section(), which checks get_acp_agents() and returns an empty string when no ACP agents are configured. This avoids polluting the prompt with irrelevant instructions for users who don't use ACP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix lint * fix(harness): address Copilot review comments on sandbox path handling and ACP tool - local_sandbox: fix path-segment boundary bug in _resolve_path (== or startswith +"/") and add lookahead in _resolve_paths_in_command regex to prevent /mnt/skills matching inside /mnt/skills-extra - local_sandbox_provider: replace print() with logger.warning(..., exc_info=True) - invoke_acp_agent_tool: guard getattr(option, "optionId") with None default + continue; move full prompt from INFO to DEBUG level (truncated to 200 chars) - sandbox/tools: fix _get_acp_workspace_host_path docstring to match implementation; remove misleading "read-only" language from validate_local_bash_command_paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(acp): thread-isolated workspaces, permission guardrail, and ContextVar registry P1.1 – ACP workspace thread isolation - Add `Paths.acp_workspace_dir(thread_id)` for per-thread paths - `_get_work_dir(thread_id)` in invoke_acp_agent_tool now uses `{base_dir}/threads/{thread_id}/acp-workspace/`; falls back to global workspace when thread_id is absent or invalid - `_invoke` extracts thread_id from `RunnableConfig` via `Annotated[RunnableConfig, InjectedToolArg]` - `sandbox/tools.py`: `_get_acp_workspace_host_path(thread_id)`, `_resolve_acp_workspace_path(path, thread_id)`, and all callers (`replace_virtual_paths_in_command`, `mask_local_paths_in_output`, `ls_tool`, `read_file_tool`) now resolve ACP paths per-thread P1.2 – ACP permission guardrail - New `auto_approve_permissions: bool = False` field in `ACPAgentConfig` - `_build_permission_response(options, , auto_approve: bool)` now defaults to deny; only approves when `auto_approve=True` - Document field in `config.example.yaml` P2 – Deferred tool registry race condition - Replace module-level `_registry` global with `contextvars.ContextVar` - Each asyncio request context gets its own registry; worker threads inherit the context automatically via `loop.run_in_executor` - Expose `get_deferred_registry` / `set_deferred_registry` / `reset_deferred_registry` helpers Tests: 831 pass (57 for affected modules, 3 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> fix(sandbox): mount /mnt/acp-workspace in docker sandbox container The AioSandboxProvider was not mounting the ACP workspace into the sandbox container, so /mnt/acp-workspace was inaccessible when the lead agent tried to read ACP results in docker mode. Changes: - `ensure_thread_dirs`: also create `acp-workspace/` (chmod 0o777) so the directory exists before the sandbox container starts — required for Docker volume mounts - `_get_thread_mounts`: add read-only `/mnt/acp-workspace` mount using the per-thread host path (`host_paths.acp_workspace_dir(thread_id)`) - Update stale CLAUDE.md description (was "fixed global workspace") Tests: `test_aio_sandbox_provider.py` (4 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): remove unused imports in test_aio_sandbox_provider Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix config --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 14:20:18 +08:00
Andrew Barnes	ac97dc6d42	test: add unit tests for TodoMiddleware (#1307 ) * test: add unit tests for TodoMiddleware Cover context-loss detection logic: - _todos_in_messages and _reminder_in_messages helpers - _format_todos formatting - Reminder injection when write_todos truncated - No-op when todos visible or reminder already present - abefore_model async delegation * test: fix event loop error in todo middleware async test Use asyncio.run() instead of get_event_loop().run_until_complete() to avoid RuntimeError on Python 3.12 where no default event loop exists in the main thread. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-26 00:20:50 +08:00
Andrew Barnes	1f0ae64e02	test: add unit tests for DanglingToolCallMiddleware (#1305 ) * test: add unit tests for DanglingToolCallMiddleware Cover message patching logic for dangling tool calls: - No-op when all tool calls have responses - Synthetic ToolMessage insertion at correct positions - Mixed responded/dangling scenarios - wrap_model_call and awrap_model_call integration * test: fix async tests and strengthen override assertions - Use @pytest.mark.anyio + async def instead of deprecated asyncio.get_event_loop().run_until_complete() (fixes Py3.12 CI failure) - Assert that override() receives the correct patched messages kwarg in both wrap_model_call and awrap_model_call tests --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-26 00:20:08 +08:00
greatmengqi	b8bc80d89b	refactor: extract shared skill installer and upload manager to harness (#1202 ) * refactor: extract shared skill installer and upload manager to harness Move duplicated business logic from Gateway routers and Client into shared harness modules, eliminating code duplication. New shared modules: - deerflow.skills.installer: 6 functions (zip security, extraction, install) - deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate, list, delete, get_uploads_dir, ensure_uploads_dir) Key improvements: - SkillAlreadyExistsError replaces stringly-typed 409 status routing - normalize_filename rejects backslash-containing filenames - Read paths (list/delete) no longer mkdir via get_uploads_dir - Write paths use ensure_uploads_dir for explicit directory creation - list_files_in_dir does stat inside scandir context (no re-stat) - install_skill_from_archive uses single is_file() check (one syscall) - Fix agent config key not reset on update_mcp_config/update_skill Tests: 42 new (22 installer + 20 upload manager) + client hardening * refactor: centralize upload URL construction and clean up installer - Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing() into shared manager.py, eliminating 6 duplicated URL constructions across Gateway router and Client - Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of hardcoded "mnt/user-data/uploads" strings - Eliminate TOCTOU pre-checks and double file read in installer — single ZipFile() open with exception handling replaces is_file() + is_zipfile() + ZipFile() sequence - Add missing re-exports: ensure_uploads_dir in uploads/__init__.py, SkillAlreadyExistsError in skills/__init__.py - Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS - Hoist sandbox_uploads_dir(thread_id) before loop in uploads router * fix: add input validation for thread_id and filename length - Reject thread_id containing unsafe filesystem characters (only allow alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs like <script> or shell metacharacters - Reject filenames longer than 255 bytes (OS limit) in normalize_filename - Gateway upload router maps ValueError to 400 for invalid thread_id * fix: address PR review — symlink safety, input validation coverage, error ordering - list_files_in_dir: use follow_symlinks=False to prevent symlink metadata leakage; check is_dir() instead of exists() for non-directory paths - install_skill_from_archive: restore is_file() pre-check before extension validation so error messages match the documented exception contract - validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so all entry points (upload/list/delete) are protected - delete_uploaded_file: catch ValueError from thread_id validation (was 500) - requires_llm marker: also skip when OPENAI_API_KEY is unset - e2e fixture: update TitleMiddleware exclusion comment (kept filtering — middleware triggers extra LLM calls that add non-determinism to tests) * chore: revert uv.lock to main — no dependency changes in this PR * fix: use monkeypatch for global config in e2e fixture to prevent test pollution The e2e_env fixture was calling set_title_config() and set_summarization_config() directly, which mutated global singletons without automatic cleanup. When pytest ran test_client_e2e.py before test_title_middleware_core_logic.py, the leaked enabled=False caused 5 title tests to fail in CI. Switched to monkeypatch.setattr on the module-level private variables so pytest restores the originals after each test. * fix: address code review — URL encoding, API consistency, test isolation - upload_artifact_url: percent-encode filename to handle spaces/#/? - deduplicate_filename: mutate seen set in place (caller no longer needs manual .add() — less error-prone API) - list_files_in_dir: document that size is int, enrich stringifies - e2e fixture: monkeypatch _app_config instead of set_app_config() to prevent global singleton pollution (same pattern as title/summarization fix) - _make_e2e_config: read LLM connection details from env vars so external contributors can override defaults - Update tests to match new deduplicate_filename contract * docs: rewrite RFC in English and add alternatives/breaking changes sections * fix: address code review feedback on PR #1202 - Rename deduplicate_filename to claim_unique_filename to make the in-place set mutation explicit in the function name - Replace PermissionError with PathTraversalError(ValueError) for path traversal detection — malformed input is 400, not 403 * fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>	2026-03-25 16:28:33 +08:00
Andrew Barnes	ec46ae075d	test: add unit tests for SubagentLimitMiddleware (#1306 ) * test: add unit tests for SubagentLimitMiddleware Cover subagent limit enforcement: - _clamp_subagent_limit boundary clamping - Task call truncation when exceeding limit - Non-task tool calls preserved during truncation - after_model/aafter_model delegation * Update test_subagent_limit_middleware.py * Fix import statement for MAX_CONCURRENT_SUBAGENTS --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-25 10:20:16 +08:00
Andrew Barnes	afb0f66c73	test: add unit tests for skills parser (#1308 ) Cover parse_skill_file logic: - Valid SKILL.md parsing with all fields - Missing required fields (name, description) return None - Missing/wrong filename returns None - Optional license field handling - Custom and default relative_path behavior - Colons in description values - Empty front matter handling Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-25 10:17:40 +08:00

1 2 3 4 5

219 Commits