* feat(channels): add Discord channel integration
Add a Discord bot channel following the existing Telegram/Slack pattern.
The bot listens for messages, creates conversation threads, and relays
responses back to Discord with 2000-char message splitting.
- DiscordChannel extends Channel base class
- Lazy imports discord.py with install hint
- Thread-based conversations (each Discord thread maps to a DeerFlow thread)
- Allowed guilds filter for access control
- File attachment support via discord.File
- Registered in service.py and manager.py
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(channels): address Copilot review suggestions for Discord integration
- Disable @everyone/@here mentions via AllowedMentions.none()
- Add 10s timeout to client close to prevent shutdown hangs
- Log publish_inbound errors via future callback instead of silently dropping
- Open file handle on caller thread to avoid cross-thread ownership issues
- Notify user in channel when thread creation fails
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(discord): resolve lint errors in Discord channel
- Replace asyncio.TimeoutError with builtin TimeoutError (UP041)
- Remove extraneous f-string prefix (F541)
- Apply ruff format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(tests): remove fake langgraph_sdk shim from test_discord_channel
The module-level sys.modules.setdefault shim installed a fake
langgraph_sdk.errors.ConflictError during pytest collection. Because
pytest imports all test modules before running them, test_channels.py
then imported the fake ConflictError instead of the real one.
In test_handle_feishu_stream_conflict_sends_busy_message, the test
constructs ConflictError(message, response=..., body=...). The fake
only subclasses Exception (which takes no kwargs), so the construction
raised TypeError. The manager's _is_thread_busy_error check then saw a
TypeError instead of a ConflictError and fell through to the generic
'An error occurred' message.
langgraph_sdk is a real dependency, so the shim is unnecessary.
Removing it makes both test files import the same real ConflictError
and the full suite pass (1773 passed, 15 skipped).
---------
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(sandbox): resolve paths in read_file/write_file content for LocalSandbox
In LocalSandbox mode, read_file and write_file now transform
container paths in file content, matching the path handling
behavior of bash tool.
- write_file: resolves virtual paths in content to system paths
before writing, so scripts with /mnt/user-data paths work
when executed
- read_file: reverse-resolves system paths back to virtual
paths in returned content for consistency
This fixes scenarios where agents write Python scripts with
virtual paths, then execute them via bash tool expecting the
paths to work.
Fixes#1778
* fix(sandbox): address Copilot review — dedicated content resolver + forward-slash safety + tests
- Extract _resolve_paths_in_content() separate from _resolve_paths_in_command()
to decouple file-content path resolution from shell-command parsing
- Normalize resolved paths to forward slashes to avoid Windows backslash
escape issues in source files (e.g. \U in Python string literals)
- Add 4 focused tests: write resolves content, forward-slash guarantee,
read reverse-resolves content, and write→read roundtrip
* style: fix ruff lint — remove extraneous f-string prefix
* fix(sandbox): only reverse-resolve paths in agent-written files
read_file previously applied _reverse_resolve_paths_in_output to ALL
file content, which could silently rewrite paths in user uploads and
external tool output (Willem Jiang review on #1935).
Now tracks files written through write_file in _agent_written_paths.
Only those files get reverse-resolved on read. Non-agent files are
returned as-is.
---------
Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
* fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware
The existing hash-based loop detection only catches identical tool call
sets. When the agent calls the same tool type (e.g. read_file) on many
different files, each call produces a unique hash and bypasses detection.
This causes the agent to exhaust recursion_limit, consuming 150K-225K
tokens per failed run.
Add a second detection layer that tracks cumulative call counts per tool
type per thread. Warns at 30 calls (configurable) and forces stop at 50.
The hard stop message now uses the actual returned message instead of a
hardcoded constant, so both hash-based and frequency-based stops produce
accurate diagnostics.
Also fix _apply() to use the warning message returned by
_track_and_check() for hard stops, instead of always using _HARD_STOP_MSG.
Closes#1987
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(lint): remove unused imports and fix line length
- Remove unused _TOOL_FREQ_HARD_STOP_MSG and _TOOL_FREQ_WARNING_MSG
imports from test file (F401)
- Break long _TOOL_FREQ_WARNING_MSG string to fit within 240 char limit (E501)
* style: apply ruff format
* test: add LRU eviction and per-thread reset coverage for frequency state
Address review feedback from @WillemJiang:
- Verify _tool_freq and _tool_freq_warned are cleaned on LRU eviction
- Add test for reset(thread_id=...) clearing only the target thread's
frequency state while leaving others intact
* fix(makefile): route Windows shell-script targets through Git Bash (#2060)
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Asish Kumar <87874775+officialasishkumar@users.noreply.github.com>
* fix: improve sandbox security and preserve multimodal content
* Add unit test modifications for test_injects_uploaded_files_tag_into_list_content
* format updated_content
* Add regression tests for multimodal upload content and host bash default safety
* feat(provisioner): add optional PVC support for sandbox volumes (#1978)
Add SKILLS_PVC_NAME and USERDATA_PVC_NAME env vars to allow sandbox
Pods to use PersistentVolumeClaims instead of hostPath volumes. This
prevents data loss in production when pods are rescheduled across nodes.
When USERDATA_PVC_NAME is set, a subPath of threads/{thread_id}/user-data
is used so a single PVC can serve multiple threads. Falls back to hostPath
when the new env vars are not set, preserving backward compatibility.
* add unit test for provisioner pvc volumes
* refactor: extract shared provisioner_module fixture to conftest.py
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/e7ccf708-c6ba-40e4-844a-b526bdb249dd
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* fix(backend): stream DeerFlowClient AI text as token deltas (#1969)
DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values",
"custom"] which only delivers full-state snapshots at graph-node
boundaries, so AI replies were dumped as a single messages-tuple event
per node instead of streaming token-by-token. `client.stream("hello")`
looked identical to `client.chat("hello")` — the bug reported in #1969.
Subscribe to "messages" mode as well, forward AIMessageChunk deltas as
messages-tuple events with delta semantics (consumers accumulate by id),
and dedup the values-snapshot path so it does not re-synthesize AI
text that was already streamed. Introduce a per-id usage_metadata
counter so the final AIMessage in the values snapshot and the final
"messages" chunk — which carry the same cumulative usage — are not
double-counted.
chat() now accumulates per-id deltas and returns the last message's
full accumulated text. Non-streaming mock sources (single event per id)
are a degenerate case of the same logic, keeping existing callers and
tests backward compatible.
Verified end-to-end against a real LLM: a 15-number count emits 35
messages-tuple events with BPE subword boundaries clearly visible
("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across
the window, end-event usage matches the values-snapshot usage exactly
(not doubled). tests/test_client_live.py::TestLiveStreaming passes.
New unit tests:
- test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3
delta events with correct content/id/usage, values-snapshot does not
duplicate, usage counted once.
- test_chat_accumulates_streamed_deltas: chat() rebuilds full text
from deltas.
- test_messages_mode_tool_message: ToolMessage delivered via messages
mode is not duplicated by the values-snapshot synthesis path.
The stream() docstring now documents why this client does not reuse
Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw
LangChain objects vs serialized dicts, single caller vs HTTP fan-out).
Fixes#1969
* refactor(backend): simplify DeerFlowClient streaming helpers (#1969)
Post-review cleanup for the token-level streaming fix. No behavior
change for correct inputs; one efficiency regression fixed.
Fix: chat() O(n²) accumulator
-----------------------------
`chat()` accumulated per-id text via `buffers[id] = buffers.get(id,"") + delta`,
which is O(n) per concat → O(n²) total over a streamed response. At
~2 KB cumulative text this becomes user-visible; at 50 KB / 5000 chunks
it costs roughly 100-300 ms of pure copying. Switched to
`dict[str, list[str]]` + `"".join()` once at return.
Cleanup
-------
- Extract `_serialize_tool_calls`, `_ai_text_event`, `_ai_tool_calls_event`,
and `_tool_message_event` static helpers. The messages-mode and
values-mode branches previously repeated four inline dict literals each;
they now call the same builders.
- `StreamEvent.type` is now typed as `Literal["values", "messages-tuple",
"custom", "end"]` via a `StreamEventType` alias. Makes the closed set
explicit and catches typos at type-check time.
- Direct attribute access on `AIMessage`/`AIMessageChunk`: `.usage_metadata`,
`.tool_calls`, `.id` all have default values on the base class, so the
`getattr(..., None)` fallbacks were dead code. Removed from the hot
path.
- `_account_usage` parameter type loosened to `Any` so that LangChain's
`UsageMetadata` TypedDict is accepted under strict type checking.
- Trimmed narrating comments on `seen_ids` / `streamed_ids` / the
values-synthesis skip block; kept the non-obvious ones that document
the cross-mode dedup invariant.
Net diff: -15 lines. All 132 unit tests + harness boundary test still
pass; ruff check and ruff format pass.
* docs(backend): add STREAMING.md design note (#1969)
Dedicated design document for the token-level streaming architecture,
prompted by the bug investigation in #1969.
Contents:
- Why two parallel streaming paths exist (Gateway HTTP/async vs
DeerFlowClient sync/in-process) and why they cannot be merged.
- LangGraph's three-layer mode naming (Graph "messages" vs Platform
SDK "messages-tuple" vs HTTP SSE) and why a shared string constant
would be harmful.
- Gateway path: run_agent + StreamBridge + sse_consumer with a
sequence diagram.
- DeerFlowClient path: sync generator + direct yield, delta semantics,
chat() accumulator.
- Why the three id sets (seen_ids / streamed_ids / counted_usage_ids)
each carry an independent invariant and cannot be collapsed.
- End-to-end sequence for a real conversation turn.
- Lessons from #1969: why mock-based tests missed the bug, why
BPE subword boundaries in live output are the strongest
correctness signal, and the regression test that locks it in.
- Source code location index.
Also:
- Link from backend/CLAUDE.md Embedded Client section.
- Link from backend/docs/README.md under Feature Documentation.
* test(backend): add refactor regression guards for stream() (#1969)
Three new tests in TestStream that lock the contract introduced by
PR #1974 so any future refactor (sync->async migration, sharing a
core with Gateway's run_agent, dedup strategy change) cannot
silently change behavior.
- test_dedup_requires_messages_before_values_invariant: canary that
documents the order-dependence of cross-mode dedup. streamed_ids
is populated only by the messages branch, so values-before-messages
for the same id produces duplicate AI text events. Real LangGraph
never inverts this order, but a refactor that does (or that makes
dedup idempotent) must update this test deliberately.
- test_messages_mode_golden_event_sequence: locks the *exact* event
sequence (4 events: 2 messages-tuple deltas, 1 values snapshot, 1
end) for a canonical streaming turn. List equality gives a clear
diff on any drift in order, type, or payload shape.
- test_chat_accumulates_in_linear_time: perf canary for the O(n^2)
fix in commit 1f11ba10. 10,000 single-char chunks must accumulate
in under 1s; the threshold is wide enough to pass on slow CI but
tight enough to fail if buffer = buffer + delta is restored.
All three tests pass alongside the existing 12 TestStream tests
(15/15). ruff check + ruff format clean.
* docs(backend): clarify stream() docstring on JSON serialization (#1969)
Replace the misleading "raw LangChain objects (AIMessage,
usage_metadata as dataclasses), not dicts" claim in the
"Why not reuse Gateway's run_agent?" section. The implementation
already yields plain Python dicts (StreamEvent.data is dict, and
usage_metadata is a TypedDict), so the original wording suggested
a richer return type than the API actually delivers.
The corrected wording focuses on what is actually true and
relevant: this client skips the JSON/SSE serialization layer that
Gateway adds for HTTP wire transmission, and yields stream event
payloads directly as Python data structures.
Addresses Copilot review feedback on PR #1974.
* test(backend): document none-id messages dedup limitation (#1969)
Add test_none_id_chunks_produce_duplicates_known_limitation to
TestStream that explicitly documents and asserts the current
behavior when an LLM provider emits AIMessageChunk with id=None
(vLLM, certain custom backends).
The cross-mode dedup machinery cannot record a None id in
streamed_ids (guarded by ``if msg_id:``), so the values snapshot's
reassembled AIMessage with a real id falls through and synthesizes
a duplicate AI text event. The test asserts len == 2 and locks
this as a known limitation rather than silently letting future
contributors hit it without context.
Why this is documented rather than fixed:
* Falling back to ``metadata.get("id")`` does not help — LangGraph's
messages-mode metadata never carries the message id.
* Synthesizing ``f"_synth_{id(msg_chunk)}"`` only helps if the
values snapshot uses the same fallback, which it does not.
* A real fix requires provider cooperation (always emit chunk ids)
or content-based dedup (false-positive risk), neither of which
belongs in this PR.
If a real fix lands, replace this test with a positive assertion
that dedup works for None-id chunks.
Addresses Copilot review feedback on PR #1974 (client.py:515).
* fix(frontend): UI polish - fix CSS typo, dark mode border, and hardcoded colors (#1942)
- Fix `font-norma` typo to `font-normal` in message-list subtask count
- Fix dark mode `--border` using reddish hue (22.216) instead of neutral
- Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground`
- Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground`
- Add missing `font-sans` to welcome description `<pre>` for consistency
- Make case-study-section padding responsive (`px-4 md:px-20`)
Closes#1940
* docs: clarify deployment sizing guidance (#1963)
* fix(frontend): prevent stale 'new' thread ID from triggering 422 history requests (#1960)
After history.replaceState updates the URL from /chats/new to
/chats/{UUID}, Next.js useParams does not update because replaceState
bypasses the router. The useEffect in useThreadChat would then set
threadIdFromPath ('new') as the threadId, causing the LangGraph SDK
to call POST /threads/new/history which returns HTTP 422 (Invalid
thread ID: must be a UUID).
This fix adds a guard to skip the threadId update when
threadIdFromPath is the literal string 'new', preserving the
already-correct UUID that was set when the thread was created.
* fix(frontend): avoid using route new as thread id (#1967)
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
* Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965)
* Fix event loop conflict in SubagentExecutor.execute()
When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.
This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.
Fixes: sub-task cards not appearing in Ultra mode when using async parent agents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(subagent): harden isolated event loop execution
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(backend): remove dead getattr in _tool_message_event
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com>
Co-authored-by: 13ernkastel <LennonCMJ@live.com>
Co-authored-by: siwuai <458372151@qq.com>
Co-authored-by: 肖 <168966994+luoxiao6645@users.noreply.github.com>
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
Co-authored-by: Saber <11769524+hawkli-1994@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(config): add when_thinking_disabled support for model configs
Allow users to explicitly configure what parameters are sent to the
model when thinking is disabled, via a new `when_thinking_disabled`
field in model config. This mirrors the existing `when_thinking_enabled`
pattern and takes full precedence over the hardcoded disable behavior
when set. Backwards compatible — existing configs work unchanged.
Closes#1675
* fix(config): address copilot review — gate when_thinking_disabled independently
- Switch truthiness check to `is not None` so empty dict overrides work
- Restructure disable path so when_thinking_disabled is gated independently
of has_thinking_settings, allowing it to work without when_thinking_enabled
- Update test to reflect new behavior
* feat: implement full checkpoint rollback on user cancellation
- Capture pre-run checkpoint snapshot including checkpoint state, metadata, and pending_writes
- Add _rollback_to_pre_run_checkpoint() function to restore thread state
- Implement _call_checkpointer_method() helper to support both async and sync checkpointer methods
- Rollback now properly restores checkpoint, metadata, channel_versions, and pending_writes
- Remove obsolete TODO comment (Phase 2) as rollback is now complete
This resolves the TODO(Phase 2) comment and enables full thread state
restoration when a run is cancelled by the user.
* fix: address rollback review feedback
* fix: strengthen checkpoint rollback validation and error handling
- Validate restored_config structure and checkpoint_id before use
- Raise RuntimeError on malformed pending_writes instead of silent skip
- Normalize None checkpoint_ns to empty string instead of "None"
- Move delete_thread to only execute when pre_run_snapshot is None
- Add docstring noting non-atomic rollback as known limitation
This addresses review feedback on PR #1867 regarding data integrity
in the checkpoint rollback implementation.
* test: add comprehensive coverage for checkpoint rollback edge cases
- test_rollback_restores_snapshot_without_deleting_thread
- test_rollback_deletes_thread_when_no_snapshot_exists
- test_rollback_raises_when_restore_config_has_no_checkpoint_id
- test_rollback_normalizes_none_checkpoint_ns_to_root_namespace
- test_rollback_raises_on_malformed_pending_write_not_a_tuple
- test_rollback_raises_on_malformed_pending_write_non_string_channel
- test_rollback_propagates_aput_writes_failure
Covers all scenarios from PR #1867 review feedback.
* test: format rollback worker tests
* fix(sandbox): add startup reconciliation to prevent orphaned container leaks
Sandbox containers were never cleaned up when the managing process restarted,
because all lifecycle tracking lived in in-memory dictionaries. This adds
startup reconciliation that enumerates running containers via `docker ps` and
either destroys orphans (age > idle_timeout) or adopts them into the warm pool.
Closes#1972
* fix(sandbox): address Copilot review — adopt-all strategy, improved error handling
- Reconciliation now adopts all containers into warm pool unconditionally,
letting the idle checker decide cleanup. Avoids destroying containers
that another concurrent process may still be using.
- list_running() logs stderr on docker ps failure and catches
FileNotFoundError/OSError.
- Signal handler test restores SIGTERM/SIGINT in addition to SIGHUP.
- E2E test docstring corrected to match actual coverage scope.
* fix(sandbox): address maintainer review — batch inspect, lock tightening, import hygiene
- _reconcile_orphans(): merge check-and-insert into a single lock acquisition
per container to eliminate the TOCTOU window.
- list_running(): batch the per-container docker inspect into a single call.
Total subprocess calls drop from 2N+1 to 2 (one ps + one batch inspect).
Parse port and created_at from the inspect JSON payload.
- Extract _parse_docker_timestamp() and _extract_host_port() as module-level
pure helpers and test them directly.
- Move datetime/json imports to module top level.
- _make_provider_for_reconciliation(): document the __new__ bypass and the
lockstep coupling to AioSandboxProvider.__init__.
- Add assertion that list_running() makes exactly ONE inspect call.
When a model config includes `reasoning_effort` as an extra YAML field
(ModelConfig uses `extra="allow"`), and the thinking-disabled code path
also injects `reasoning_effort="minimal"` into kwargs, the previous
`model_class(**kwargs, **model_settings_from_config)` call raises:
TypeError: got multiple values for keyword argument 'reasoning_effort'
Fix by merging the two dicts before instantiation, giving runtime kwargs
precedence over config values: `{**model_settings_from_config, **kwargs}`.
Fixes#1977
Co-authored-by: octo-patch <octo-patch@github.com>
* fix(middleware): handle string-serialized options in ClarificationMiddleware (#1995)
Some models (e.g. Qwen3-Max) serialize array tool parameters as JSON
strings instead of native arrays. Add defensive type checking in
_format_clarification_message() to deserialize string options before
iteration, preventing per-character rendering.
* fix(middleware): normalize options after JSON deserialization
Address Copilot review feedback:
- Add post-deserialization normalization so options is always a list
(handles json.loads returning a scalar string, dict, or None)
- Add test for JSON-encoded scalar string ("development")
- Fix test_json_string_with_mixed_types to use actual mixed types
* feat(community): add Exa search as community tool provider
Add Exa (exa.ai) as a new community search provider alongside Tavily,
Firecrawl, InfoQuest, and Jina AI. Exa is an AI-native search engine
with neural, keyword, and auto search types.
New files:
- community/exa/tools.py: web_search_tool and web_fetch_tool
- tests/test_exa_tools.py: 10 unit tests with mocked Exa client
Changes:
- pyproject.toml: add exa-py dependency
- config.example.yaml: add commented-out Exa configuration examples
Usage: set `use: deerflow.community.exa.tools:web_search_tool` in
config.yaml and provide EXA_API_KEY.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(community): address PR review comments for Exa tools
- Make _get_exa_client() accept tool_name param so web_fetch reads its own config
- Remove __init__.py to match namespace package pattern of other providers
- Add duplicate tool name warning in config.example.yaml
- Add regression tests for web_fetch config resolution
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update revision in uv.lock to 3
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* Fix event loop conflict in SubagentExecutor.execute()
When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.
This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.
Fixes: sub-task cards not appearing in Ultra mode when using async parent agents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(subagent): harden isolated event loop execution
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(backend): make loop detection hash tool calls by stable keys
The loop detection middleware previously hashed full tool call arguments,
which made repeated calls look different when only non-essential argument
details changed. In particular, `read_file` calls with nearby line ranges
could bypass repetition detection even when the agent was effectively
reading the same file region again and again.
- Hash tool calls using stable keys instead of the full raw args payload
- Bucket `read_file` line ranges so nearby reads map to the same region key
- Prefer stable identifiers such as `path`, `url`, `query`, or `command`
before falling back to JSON serialization of args
- Keep hashing order-independent so the same tool call set produces the
same hash regardless of call order
Fixes#1905
* fix(backend): harden loop detection hash normalization
- Normalize and parse stringified tool args defensively
- Expand stable key derivation to include pattern, glob, and cmd
- Normalize reversed read_file ranges before bucketing
Fixes#1905
* fix(backend): harden loop detection tool format
* exclude write_file and str_replace from the stable-key path — writing different content to the same file shouldn't be flagged.
---------
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* fix(subagents): add cooperative cancellation for subagent threads
Subagent tasks run inside ThreadPoolExecutor threads with their own
event loop (asyncio.run). When a user clicks stop, RunManager cancels
the parent asyncio.Task, but Future.cancel() cannot terminate a running
thread and asyncio.Event does not propagate across event loops. This
causes subagent threads to keep executing (writing files, calling LLMs)
even after the user explicitly stops the run.
Fix: add a threading.Event (cancel_event) to SubagentResult and check
it cooperatively in _aexecute()'s astream iteration loop. On cancel,
request_cancel_background_task() sets the event, and the thread exits
at the next iteration boundary.
Changes:
- executor.py: Add cancel_event field to SubagentResult, check it in
_aexecute loop, set it on timeout, add request_cancel_background_task
- task_tool.py: Call request_cancel_background_task on CancelledError
* fix(subagents): guard cancel status and add pre-check before astream
- Only overwrite status to FAILED when still RUNNING, preserving
TIMED_OUT set by the scheduler thread.
- Add cancel_event pre-check before entering the astream loop so
cancellation is detected immediately when already signalled.
* fix(subagents): guard status updates with lock to prevent race condition
Wrap the check-and-set on result.status in _aexecute with
_background_tasks_lock so the timeout handler in execute_async
cannot interleave between the read and write.
* fix(subagents): add dedicated CANCELLED status for user cancellation
Introduce SubagentStatus.CANCELLED to distinguish user-initiated
cancellation from actual execution failures. Update _aexecute,
task_tool polling, cleanup terminal-status sets, and test fixtures.
* test(subagents): add cancellation tests and fix timeout regression test
- Add dedicated TestCooperativeCancellation test class with 6 tests:
- Pre-set cancel_event prevents astream from starting
- Mid-stream cancel_event returns CANCELLED immediately
- request_cancel_background_task() sets cancel_event correctly
- request_cancel on nonexistent task is a no-op
- Real execute_async timeout does not overwrite CANCELLED (deterministic
threading.Event sync, no wall-clock sleeps)
- cleanup_background_task removes CANCELLED tasks
- Add task_tool cancellation coverage:
- test_cancellation_calls_request_cancel: assert CancelledError path
calls request_cancel_background_task(task_id)
- test_task_tool_returns_cancelled_message: assert CANCELLED polling
branch emits task_cancelled event and returns expected message
- Fix pre-existing test infrastructure issue: add deerflow.sandbox.security
to _MOCKED_MODULE_NAMES (fixes ModuleNotFoundError for all executor tests)
- Add RUNNING guard to timeout handler in executor.py to prevent
TIMED_OUT from overwriting CANCELLED status
- Add cooperative cancellation granularity comment documenting that
cancellation is only detected at astream iteration boundaries
---------
Co-authored-by: lulusiyuyu <lulusiyuyu@users.noreply.github.com>
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(sandbox): add L2 input sanitisation to SandboxAuditMiddleware
Add _validate_input() to reject malformed bash commands before regex
classification: empty commands, oversized commands (>10 000 chars), and
null bytes that could cause detection/execution layer inconsistency.
* fix(sandbox): address Copilot review — type guard, log truncation, reject reason
- Coerce None/non-string command to str before validation
- Truncate oversized commands in audit logs to prevent log amplification
- Propagate reject_reason through _pre_process() to block message
- Remove L2 label from comments and test class names
* fix(sandbox): isinstance type guard + async input sanitisation tests
Address review comments:
- Replace str() coercion with isinstance(raw_command, str) guard so
non-string truthy values (0, [], False) fall back to empty string
instead of passing validation as "0"/"[]"/"False".
- Add TestInputSanitisationBlocksInAwrapToolCall with 4 async tests
covering empty, null-byte, oversized, and None command via
awrap_tool_call path.
support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking.
Co-authored-by: NmanQAQ <normangyao@qq.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
ls_tool was the only sandbox tool without output size limits, allowing
multi-MB results from large directories to blow up the model context
window. Add head-truncation (configurable via ls_output_max_chars,
default 20000) consistent with existing bash and read_file truncation.
Closes#1887
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(memory): case-insensitive fact deduplication and positive reinforcement detection
Two fixes to the memory system:
1. _fact_content_key() now lowercases content before comparison, preventing
semantically duplicate facts like "User prefers Python" and "user prefers
python" from being stored separately.
2. Adds detect_reinforcement() to MemoryMiddleware (closes#1719), mirroring
detect_correction(). When users signal approval ("yes exactly", "perfect",
"完全正确", etc.), the memory updater now receives reinforcement_detected=True
and injects a hint prompting the LLM to record confirmed preferences and
behaviors with high confidence.
Changes across the full signal path:
- memory_middleware.py: _REINFORCEMENT_PATTERNS + detect_reinforcement()
- queue.py: reinforcement_detected field in ConversationContext and add()
- updater.py: reinforcement_detected param in update_memory() and
update_memory_from_conversation(); builds reinforcement_hint alongside
the existing correction_hint
Tests: 11 new tests covering deduplication, hint injection, and signal
detection (Chinese + English patterns, window boundary, conflict with correction).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): address Copilot review comments on reinforcement detection
- Tighten _REINFORCEMENT_PATTERNS: remove 很好, require punctuation/end-of-string boundaries on remaining patterns, split this-is-good into stricter variants
- Suppress reinforcement_detected when correction_detected is true to avoid mixed-signal noise
- Use casefold() instead of lower() for Unicode-aware fact deduplication
- Add missing test coverage for reinforcement_detected OR merge and forwarding in queue
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, the list endpoint always returned soul=null because
_agent_config_to_response() was called without include_soul=True.
This caused confusion since PUT /api/agents/{name} and GET /api/agents/{name}
both returned the soul content, but the list endpoint silently omitted it.
Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents
Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.
This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this): agent guided to use outline + grep + read_file workflow
* feat(uploads): add file-first priority and fallback guidance to uploaded_files context
* fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline
- Add _clean_bold_title() to merge adjacent bold spans (** **) produced
by pymupdf4llm when bold text crosses span boundaries
- Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise **<num>** **<title>**
headings common in academic papers; excludes pure-number table headers
and rows with more than 4 bold blocks
- When outline is empty, read first 5 non-empty lines of the .md as a
content preview and surface a grep hint in the agent context
- Update _format_file_entry to render the preview + grep hint instead of
silently omitting the outline section
- Add 3 new extract_outline tests and 2 new middleware tests (65 total)
* fix(uploads): address Copilot review comments on extract_outline regex
- Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII
titles (e.g. **1** **概述**); pure-numeric/punctuation blocks still excluded
- Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to
keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input
- Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex)
- Log debug message with exc_info when preview extraction fails
* fix: inject longTermBackground into memory prompt
The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.
The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.
Add the missing field handling and a regression test.
* fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware
LangChain AIMessage.content can be str | list. When using providers that
return structured content blocks (e.g. Anthropic thinking mode, certain
OpenAI-compatible gateways), content is a list of dicts like
[{"type": "text", "text": "..."}].
The hard_limit branch in _apply() concatenated content with a string via
(last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}", which raises
TypeError when content is a non-empty list (list + str is invalid).
Add _append_text() static method that:
- Returns the text directly when content is None
- Appends a {"type": "text"} block when content is a list
- Falls back to string concatenation when content is a str
This is consistent with how other modules in the project already handle
list content (client.py._extract_text, memory_middleware, executor.py).
* test(middleware): add unit tests for _append_text and list content hard stop
Add regression tests to verify LoopDetectionMiddleware handles list-type
AIMessage.content correctly during hard stop:
- TestAppendText: unit tests for the new _append_text() static method
covering None, str, list (including empty list) content types
- TestHardStopWithListContent: integration tests verifying hard stop
works correctly with list content (Anthropic thinking mode), None
content, and str content
Requested by reviewer in PR #1823.
* fix(middleware): improve _append_text robustness and test isolation
- Add explicit isinstance(content, str) check with fallback for
unexpected types (coerce to str) to prevent TypeError on edge cases
- Deep-copy list content in _make_state() test helper to prevent
shared mutable references across test iterations
- Add test_unexpected_type_coerced_to_str: verify fallback for
non-str/list/None content types
- Add test_list_content_not_mutated_in_place: verify _append_text
does not modify the original list
* style: fix ruff format whitespace in test file
---------
Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
* feat(uploads): add pymupdf4llm PDF converter with auto-fallback and async offload
- Introduce pymupdf4llm as an optional PDF converter with better heading
detection and table preservation than MarkItDown
- Auto mode: prefer pymupdf4llm when installed; fall back to MarkItDown
when output is suspiciously sparse (image-based / scanned PDFs)
- Sparsity check uses chars-per-page (< 50 chars/page) rather than an
absolute threshold, correctly handling both short and long documents
- Large files (> 1 MB) are offloaded to asyncio.to_thread() to avoid
blocking the event loop (related: #1569)
- Add UploadsConfig with pdf_converter field (auto/pymupdf4llm/markitdown)
- Add pymupdf4llm as optional dependency: pip install deerflow-harness[pymupdf]
- Add 14 unit tests covering sparsity heuristic, routing logic, and async path
* fix(uploads): address Copilot review comments on PDF converter
- Fix docstring: MIN_CHARS_PYMUPDF -> _MIN_CHARS_PER_PAGE (typo)
- Fix file handle leak: wrap pymupdf.open in try/finally to ensure doc.close()
- Fix silent fallback gap: _convert_pdf_with_pymupdf4llm now catches all
conversion exceptions (not just ImportError), so encrypted/corrupt PDFs
fall back to MarkItDown instead of propagating
- Tighten type: pdf_converter field changed from str to Literal[auto|pymupdf4llm|markitdown]
- Normalize config value: _get_pdf_converter() strips and lowercases the raw
config string, warns and falls back to 'auto' on unknown values
* feat(uploads): inject document outline into agent context for converted files
Extract headings from converted .md files and inject them into the
<uploaded_files> context block so the agent can navigate large documents
by line number before reading.
- Add `extract_outline()` to `file_conversion.py`: recognises standard
Markdown headings (#/##/###) and SEC-style bold structural headings
(**ITEM N. BUSINESS**, **PART II**); caps at 50 entries; excludes
cover-page boilerplate (WASHINGTON DC, CURRENT REPORT, SIGNATURES)
- Add `_extract_outline_for_file()` helper in `uploads_middleware.py`:
looks for a sibling `.md` file produced by the conversion pipeline
- Update `UploadsMiddleware._create_files_message()` to render the outline
under each file entry with `L{line}: {title}` format and a `read_file`
prompt for range-based reading
- Tests: 10 new tests for `extract_outline()`, 4 new tests for outline
injection in `UploadsMiddleware`; existing test updated for new `outline`
field in `uploaded_files` state
Partially addresses #1647 (agent ignores uploaded files).
* fix(uploads): stream outline file reads and strip inline bold from heading titles
- Switch extract_outline() from read_text().splitlines() to open()+line iteration
so large converted documents are not loaded into memory on every agent turn;
exits as soon as MAX_OUTLINE_ENTRIES is reached (Copilot suggestion)
- Strip **...** wrapper from standard Markdown heading titles before appending
to outline so agent context stays clean (e.g. "## **Overview**" → "Overview")
(Copilot suggestion)
- Remove unused pathlib.Path import and fix import sort order in test_file_conversion.py
to satisfy ruff CI lint
* fix(uploads): show truncation hint when outline exceeds MAX_OUTLINE_ENTRIES
When extract_outline() hits the cap it now appends a sentinel entry
{"truncated": True} instead of silently dropping the rest of the headings.
UploadsMiddleware reads the sentinel and renders a hint line:
... (showing first 50 headings; use `read_file` to explore further)
Without this the agent had no way to know the outline was incomplete and
would treat the first 50 headings as the full document structure.
* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id
runtime.context does not always carry thread_id (depends on LangGraph
invocation path). ThreadDataMiddleware already falls back to
get_config().configurable.thread_id — apply the same pattern so
UploadsMiddleware can resolve the uploads directory and attach outlines
in all invocation paths.
* style: apply ruff format
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
When MemoryStreamBridge queue reaches capacity, publish_end() previously
used the same 30s timeout + drop strategy as regular events. If the END
sentinel was dropped, subscribe() would loop forever waiting for it,
causing the SSE connection to hang indefinitely and leaking _queues and
_counters resources for that run_id.
Changes:
- publish_end() now evicts oldest regular events when queue is full to
guarantee END sentinel delivery — the sentinel is the only signal that
allows subscribers to terminate
- Added per-run drop counters (_dropped_counts) with dropped_count() and
dropped_total properties for observability
- cleanup() and close() now clear drop counters
- publish() logs total dropped count per run for easier debugging
Tests:
- test_end_sentinel_delivered_when_queue_full: verifies END arrives even
with a completely full queue
- test_end_sentinel_evicts_oldest_events: verifies eviction behavior
- test_end_sentinel_no_eviction_when_space_available: no side effects
when queue has room
- test_concurrent_tasks_end_sentinel: 4 concurrent producer/consumer
pairs all terminate properly
- test_dropped_count_tracking, test_dropped_total,
test_cleanup_clears_dropped_counts, test_close_clears_dropped_counts:
drop counter coverage
Closes#1689
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
* fix: use SystemMessage+HumanMessage for follow-up question generation (fixes#1697)
Some models (e.g. MiniMax-M2.7) require the system prompt and user
content to be passed as separate message objects rather than a single
combined string. Invoking with a plain string sends everything as a
HumanMessage, which causes these models to ignore the generation
instructions and fail to produce valid follow-up questions.
* test: verify model is invoked with SystemMessage and HumanMessage
The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.
The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.
Add the missing field handling and a regression test.
Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
* fix(sandbox): URL路径被误判为不安全绝对路径 (#1385)
在本地沙箱模式下,bash工具对命令做绝对路径安全校验时,会把curl命令中的
HTTPS URL(如 https://example.com/api/v1/check)误识别为本地绝对路径并拦截。
根因:_ABSOLUTE_PATH_PATTERN 正则的负向后行断言 (?<![:\w]) 只排除了冒号和
单词字符,但 :// 中第二个斜杠前面是第一个斜杠(/),不在排除列表中,导致
//example.com/api/... 被匹配为绝对路径 /example.com/api/...。
修复:在负向后行断言中增加斜杠字符,改为 (?<![:\w/]),使得 :// 中的连续
斜杠不会触发绝对路径匹配。同时补充了URL相关的单元测试用例。
Signed-off-by: moose-lab <moose-lab@users.noreply.github.com>
* fix(sandbox): refine absolute path regex to preserve file:// defense-in-depth
Change lookbehind from (?<![:\w/]) to (?<![:\w])(?<!:/) so only the
second slash in :// sequences is excluded. This keeps URL paths from
false-positiving while still letting the regex detect /etc/passwd in
file:///etc/passwd. Also add explicit file:// URL blocking and tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Signed-off-by: moose-lab <moose-lab@users.noreply.github.com>
Co-authored-by: moose-lab <moose-lab@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: prevent concurrent subagent file write conflicts
Serialize same-path str_replace operations in sandbox tools
Guard AioSandbox write_file/update_file with the existing sandbox lock
Add regression tests for concurrent str_replace and append races
Verify with backend full tests and ruff lint checks
* fix(sandbox): Fix the concurrency issue of file operations on the same path in isolated sandboxes.
Ensure that different sandbox instances use independent locks for file operations on the same virtual path to avoid concurrency conflicts. Change the lock key from a single path to a composite key of (sandbox.id, path), and add tests to verify the concurrent safety of isolated sandboxes.
* feat(sandbox): Extract file operation lock logic to standalone module and fix concurrency issues
Extract file operation lock related logic from tools.py into a separate file_operation_lock.py module.
Fix data race issues during concurrent str_replace and write_file operations.
* feat(agent): 为AgentConfig添加skills字段并更新lead_agent系统提示
在AgentConfig中添加skills字段以支持配置agent可用技能
更新lead_agent的系统提示模板以包含可用技能信息
* fix: resolve agent skill configuration edge cases and add tests
* Update backend/packages/harness/deerflow/agents/lead_agent/prompt.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* refactor(agent): address PR review comments for skills configuration
- Add detailed docstring to `skills` field in `AgentConfig` to clarify the semantics of `None` vs `[]`.
- Add unit tests in `test_custom_agent.py` to verify `load_agent_config()` correctly parses omitted skills and explicit empty lists.
- Fix `test_make_lead_agent_empty_skills_passed_correctly` to include `agent_name` in the runtime config, ensuring it exercises the real code path.
* docs: 添加关于按代理过滤技能的配置说明
在配置示例文件和文档中添加说明,解释如何通过代理的config.yaml文件限制加载的技能
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat(sandbox): truncate oversized bash and read_file tool outputs
Long tool outputs (large directory listings, multi-MB source files) can
overflow the model's context window. Two new configurable limits:
- bash_output_max_chars (default 20000): middle-truncates bash output,
preserving both head and tail so stderr at the end is not lost
- read_file_output_max_chars (default 50000): head-truncates file output
with a hint to use start_line/end_line for targeted reads
Both limits are enforced at the tool layer (sandbox/tools.py) rather
than middleware, so truncation is guaranteed regardless of call path.
Setting either limit to 0 disables truncation entirely.
Measured: read_file on a 250KB source file drops from 63,698 tokens to
19,927 tokens (69% reduction) with the default limit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(tests): remove unused pytest import and fix import sort order
* style: apply ruff format to sandbox/tools.py
* refactor(sandbox): address Copilot review feedback on truncation feature
- strict hard cap: while-loop ensures result (including marker) ≤ max_chars
- max_chars=0 now returns "" instead of original output
- get_app_config() wrapped in try/except with fallback to defaults
- sandbox_config.py: add ge=0 validation on truncation limit fields
- config.example.yaml: bump config_version 4→5
- tests: add len(result) <= max_chars assertions, edge-case (max=0, small
max, various sizes) tests; fix skipped-count test for strict hard cap
* refactor(sandbox): replace while-loop truncation with fixed marker budget
Use a pre-allocated constant (_MARKER_MAX_LEN) instead of a convergence
loop to ensure result <= max_chars. Simpler, safer, and skipped-char
count in the marker is now an exact predictable value.
* refactor(sandbox): compute marker budget dynamically instead of hardcoding
* fix(sandbox): make max_chars=0 disable truncation instead of returning empty string
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Feishu channel classified any slash-prefixed text (including absolute
paths such as /mnt/user-data/...) as a COMMAND, causing them to be
misrouted through the command pipeline instead of the chat pipeline.
Fix by introducing a shared KNOWN_CHANNEL_COMMANDS frozenset in
app/channels/commands.py — the single authoritative source for the set
of supported slash commands. Both the Feishu inbound parser and the
ChannelManager's unknown-command reply now derive from it, so adding
or removing a command requires only one edit.
Changes:
- app/channels/commands.py (new): defines KNOWN_CHANNEL_COMMANDS
- app/channels/feishu.py: replace local KNOWN_FEISHU_COMMANDS with the
shared constant; _is_feishu_command() now gates on it
- app/channels/manager.py: import KNOWN_CHANNEL_COMMANDS and use it in
the unknown-command fallback reply so the displayed list stays in sync
- tests/test_feishu_parser.py: parametrize over every entry in
KNOWN_CHANNEL_COMMANDS (each must yield msg_type=command) and add
parametrized chat cases for /unknown, absolute paths, etc.
Made with Cursor
Made-with: Cursor
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(gateway): prevent 400 error when client sends context with configurable
Fixes#1290
LangGraph >= 0.6.0 rejects requests that include both 'configurable' and
'context' in the run config. If the client (e.g. useStream hook) sends
a 'context' key, we now honour it and skip creating our own
'configurable' dict to avoid the conflict.
When no 'context' is provided, we fall back to the existing
'configurable' behaviour with thread_id.
* fix(gateway): address review feedback — warn on dual keys, fix runtime injection, add tests
- Log a warning when client sends both 'context' and 'configurable' so
it's no longer silently dropped (reviewer feedback)
- Ensure thread_id is available in config['context'] when present so
middlewares can find it there too
- Add test coverage for the context path, the both-keys-present case,
passthrough of other keys, and the no-config fallback
* style: ruff format services.py
---------
Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The langgraph-compat layer dropped the DeerFlow-specific `context` field
from run requests, causing agent config (subagent_enabled, is_plan_mode,
thinking_enabled, etc.) to fall back to defaults. Add `context` to
RunCreateRequest and merge allowlisted keys into config.configurable in
start_run, with existing configurable values taking precedence.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: replace sync requests with async httpx in Jina AI client
Replace synchronous `requests.post()` with `httpx.AsyncClient` in
JinaClient.crawl() and make web_fetch_tool async. This is part of the
planned async concurrency optimization for the agent hot path
(see docs/TODO.md).
* fix: address Copilot review feedback on async Jina client
- Short-circuit error strings in web_fetch_tool before passing to
ReadabilityExtractor, preventing misleading extraction results
- Log missing JINA_API_KEY warning only once per process to reduce
noise under concurrent async fetching
- Use logger.exception instead of logger.error in crawl exception
handler to preserve stack traces for debugging
- Add async web_fetch_tool tests and warn-once coverage
* fix: mock get_app_config in web_fetch_tool tests for CI
The web_fetch_tool tests failed in CI because get_app_config requires
a config.yaml file that isn't present in the test environment. Mock
the config loader to remove the filesystem dependency.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Docker's -v host:container syntax is ambiguous for Windows drive-letter
paths (e.g. D:/...) because ':' is both the drive separator and the
volume separator, causing mount failures on Windows hosts.
Introduce _format_container_mount() which uses '--mount type=bind,...'
for Docker (unambiguous on all platforms) and keeps '-v' for Apple
Container runtime which does not support the --mount flag yet.
Adds unit tests covering Windows paths, read-only mounts, and Apple
Container pass-through.
Made-with: Cursor
* fix(gateway): forward assistant_id as agent_name in build_run_config
Fixes#1644
When the LangGraph Platform-compatible /runs endpoint receives a custom
assistant_id (e.g. 'finalis'), the Gateway's build_run_config() silently
ignored it — configurable['agent_name'] was never set, so make_lead_agent
fell through to the default lead agent and SOUL.md was never loaded.
Root cause (introduced in #1403):
resolve_agent_factory() correctly falls back to make_lead_agent for all
assistant_id values, but build_run_config() had no assistant_id parameter
and never injected configurable['agent_name']. The full call chain:
POST /runs (assistant_id='finalis')
→ resolve_agent_factory('finalis') # returns make_lead_agent ✓
→ build_run_config(thread_id, ...) # no agent_name injected ✗
→ make_lead_agent(config)
→ cfg.get('agent_name') → None
→ load_agent_soul(None) → base SOUL.md (doesn't exist) → None
Fix:
- Add keyword-only parameter to build_run_config().
- When assistant_id is set and differs from 'lead_agent', inject it as
configurable['agent_name'] (matching the channel manager's existing
_resolve_run_params() logic for IM channels).
- Honour an explicit configurable['agent_name'] in the request body;
assistant_id mapping only fills the gap when it is absent.
- Remove stale log-only branch from resolve_agent_factory(); update
docstring to explain the factory/configurable split.
Tests added (test_gateway_services.py):
- Custom assistant_id injects configurable['agent_name']
- 'lead_agent' assistant_id does NOT inject agent_name
- None assistant_id does NOT inject agent_name
- Explicit configurable['agent_name'] in request is not overwritten
- resolve_agent_factory returns make_lead_agent for all inputs
* style: format with ruff
* fix: validate and normalize assistant_id to prevent path traversal
Addresses Copilot review: strip/lowercase/replace underscores and
reject names that don't match [a-z0-9-]+, consistent with
ChannelManager._normalize_custom_agent_name().
---------
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
* fix(sandbox): serialize concurrent exec_command calls in AioSandbox
The AIO sandbox container maintains a single persistent shell session
that corrupts when multiple exec_command requests arrive concurrently
(e.g. when ToolNode issues parallel tool_calls). The corrupted session
returns 'ErrorObservation' strings as output, cascading into subsequent
commands.
Add a threading.Lock to AioSandbox to serialize shell commands. As a
secondary defense, detect ErrorObservation in output and retry with a
fresh session ID.
Fixes#1433
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(sandbox): address Copilot review findings
- Fix shell injection in list_dir: use shlex.quote(path) to escape
user-provided paths in the find command
- Narrow ErrorObservation retry condition from broad substring match
to the specific corruption signature to prevent false retries
- Improve test_lock_prevents_concurrent_execution: use threading.Barrier
to ensure all workers contend for the lock simultaneously
- Improve test_list_dir_uses_lock: assert lock.locked() is True during
exec_command to verify lock acquisition
* style: auto-format with ruff
---------
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli
Implement all core LangGraph Platform API endpoints in the Gateway,
allowing it to fully replace the langgraph-cli dev server for local
development. This eliminates a heavyweight dependency and simplifies
the development stack.
Changes:
- Add runs lifecycle endpoints (create, stream, wait, cancel, join)
- Add threads CRUD and search endpoints
- Add assistants compatibility endpoints (search, get, graph, schemas)
- Add StreamBridge (in-memory pub/sub for SSE) and async provider
- Add RunManager with atomic create_or_reject (eliminates TOCTOU race)
- Add worker with interrupt/rollback cancel actions and runtime context injection
- Route /api/langgraph/* to Gateway in nginx config
- Skip langgraph-cli startup by default (SKIP_LANGGRAPH_SERVER=0 to restore)
- Add unit tests for RunManager, SSE format, and StreamBridge
* fix: drain bridge queue on client disconnect to prevent backpressure
When on_disconnect=continue, keep consuming events from the bridge
without yielding, so the worker is not blocked by a full queue.
Only on_disconnect=cancel breaks out immediately.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: remove pytest import
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: Fix default stream_mode to ["values", "messages-tuple"]
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: Remove unused if_exists field from ThreadCreateRequest
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: address review comments on gateway LangGraph API
- Mount runs.py router in app.py (missing include_router)
- Normalize interrupt_before/after "*" to node list before run_agent()
- Use entry.id for SSE event ID instead of counter
- Drain bridge queue on disconnect when on_disconnect=continue
- Reuse serialization helper in wait_run() for consistent wire format
- Reject unsupported multitask_strategy with 400
- Remove SKIP_LANGGRAPH_SERVER fallback, always use Gateway
* feat: extract app.state access into deps.py
Encapsulate read/write operations for singleton objects (RunManager,
StreamBridge, checkpointer) held in app.state into a shared utility,
reducing repeated access patterns across router modules.
* feat: extract deerflow.runtime.serialization module with tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: replace duplicated serialization with deerflow.runtime.serialization
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: extract app/gateway/services.py with run lifecycle logic
Create a service layer that centralizes SSE formatting, input/config
normalization, and run lifecycle management. Router modules will delegate
to these functions instead of using private cross-imported helpers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: wire routers to use services layer, remove cross-module private imports
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff formatting to refactored files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(runtime): support LangGraph dev server and add compat route
- Enable official LangGraph dev server for local development workflow
- Decouple runtime components from agents package for better separation
- Provide gateway-backed fallback route when dev server is skipped
- Simplify lifecycle management using context manager in gateway
* feat(runtime): add Store providers with auto-backend selection
- Add async_provider.py and provider.py under deerflow/runtime/store/
- Support memory, sqlite, postgres backends matching checkpointer config
- Integrate into FastAPI lifespan via AsyncExitStack in deps.py
- Replace hardcoded InMemoryStore with config-driven factory
* refactor(gateway): migrate thread management from checkpointer to Store and resolve multiple endpoint failures
- Add Store-backed CRUD helpers (_store_get, _store_put, _store_upsert)
- Replace checkpoint-scanning search with two-phase strategy:
phase 1 reads Store (O(threads)), phase 2 backfills from checkpointer
for legacy/LangGraph Server threads with lazy migration
- Extend Store record schema with values field for title persistence
- Sync thread title from checkpoint to Store after run completion
- Fix /threads/{id}/runs/{run_id}/stream 405 by accepting both
GET and POST methods; POST handles interrupt/rollback actions
- Fix /threads/{id}/state 500 by separating read_config and
write_config, adding checkpoint_ns to configurable, and
shallow-copying checkpoint/metadata before mutation
- Sync title to Store on state update for immediate search reflection
- Move _upsert_thread_in_store into services.py, remove duplicate logic
- Add _sync_thread_title_after_run: await run task, read final
checkpoint title, write back to Store record
- Spawn title sync as background task from start_run when Store exists
* refactor(runtime): deduplicate store and checkpointer provider logic
Extract _ensure_sqlite_parent_dir() helper into checkpointer/provider.py
and use it in all three places that previously inlined the same mkdir logic.
Consolidate duplicate error constants in store/async_provider.py by importing
from store/provider.py instead of redefining them.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(runtime): move SQLite helpers to runtime/store, checkpointer imports from store
_resolve_sqlite_conn_str and _ensure_sqlite_parent_dir now live in
runtime/store/provider.py. agents/checkpointer/provider and
agents/checkpointer/async_provider import from there, reversing the
previous dependency direction (store → checkpointer becomes
checkpointer → store).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(runtime): extract SQLite helpers into runtime/store/_sqlite_utils.py
Move resolve_sqlite_conn_str and ensure_sqlite_parent_dir out of
checkpointer/provider.py into a dedicated _sqlite_utils module.
Functions are now public (no underscore prefix), making cross-module
imports semantically correct. All four provider files import from
the single shared location.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gateway): use adelete_thread to fully remove thread checkpoints on delete
AsyncSqliteSaver has no adelete method — the previous hasattr check
always evaluated to False, silently leaving all checkpoint rows in the
database. Switch to adelete_thread(thread_id) which deletes every
checkpoint and pending-write row for the thread across all namespaces
(including sub-graph checkpoints).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gateway): remove dead bridge_cm/ckpt_cm code and fix StrEnum lint
app.py had unreachable code after the async-with lifespan refactor:
bridge_cm and ckpt_cm were referenced but never defined (F821), and
the channel service startup/shutdown was outside the langgraph_runtime
block so it never ran. Move channel service lifecycle inside the
async-with block where it belongs.
Replace str+Enum inheritance in RunStatus and DisconnectMode with
StrEnum as suggested by UP042.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: format with ruff
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: promote matched tools from deferred registry after tool_search returns schema
After tool_search returns a tool's full schema, the tool is promoted
(removed from the deferred registry) so DeferredToolFilterMiddleware
stops filtering it from bind_tools on subsequent LLM calls.
Without this, deferred tools are permanently filtered — the LLM gets
the schema from tool_search but can never invoke the tool because
the middleware keeps stripping it.
Fixes#1554
* test: add promote() and tool_search promotion tests
Tests cover:
- promote removes tools from registry
- promote nonexistent/empty is no-op
- search returns nothing after promote
- middleware passes promoted tools through
- tool_search auto-promotes matched tools (select + keyword)
* fix: address review — lint blank line + empty registry guard
- Add missing blank line between FakeRequest methods (E301)
- Use 'if not registry' to handle empty registries consistently
---------
Co-authored-by: d 🔹 <258577966+voidborne-d@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(sandbox): add SandboxAuditMiddleware for bash command security auditing
Addresses the LocalSandbox escape vector reported in #1224 where bash tool
calls can execute destructive commands against the host filesystem.
- Add SandboxAuditMiddleware with three-tier command classification:
- High-risk (block): rm -rf /, curl|bash, dd if=, mkfs, /etc/shadow access
- Medium-risk (warn): pip install, apt install, chmod 777
- Safe (pass): normal workspace operations
- Register middleware after GuardrailMiddleware in _build_runtime_middlewares,
applied to both lead agent and subagents
- Structured audit log via standard logger (visible in langgraph.log)
- Medium-risk commands execute but append a warning to the tool result,
allowing the LLM to self-correct without blocking legitimate workflows
- High-risk commands return an error ToolMessage without calling the handler,
so the agent loop continues gracefully
* fix(lint): sort imports in test_sandbox_audit_middleware
* refactor(sandbox-audit): address Copilot review feedback (3/5/6)
- Fix class docstring to match implementation: medium-risk commands are
executed with a warning appended (not rejected), and cwd anchoring note
removed (handled in a separate PR)
- Remove capsys.disabled() from benchmark test to avoid CI log noise;
keep assertions for recall/precision targets
- Remove misleading 'cwd fix' from test module docstring
* test(sandbox-audit): add async tests for awrap_tool_call
* fix(sandbox-audit): address Copilot review feedback (1/2)
- Narrow rm high-risk regex to only block truly destructive targets
(/, /*, ~, ~/*, /home, /root); legitimate workspace paths like
/mnt/user-data/ are no longer false-positived
- Handle list-typed ToolMessage content in _append_warn_to_result;
append a text block instead of str()-ing the list to avoid breaking
structured content normalization
* style: apply ruff format to sandbox_audit_middleware files
* fix(sandbox-audit): update benchmark comment to match assert-based implementation
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(task_tool): fallback to configurable thread_id when context is missing
task_tool only read thread_id from runtime.context, but when invoked
via LangGraph Server, thread_id lives in config.configurable instead.
Add the same fallback that ThreadDataMiddleware uses (PR #1237).
Fixes subagent execution failure: 'Thread ID is required in runtime
context or config.configurable'
* remove debug logging from task_tool
* fix(sandbox): anchor relative paths to thread workspace in local mode
In local sandbox mode, bash commands using relative paths were resolved
against the langgraph server process cwd (backend/) instead of the
per-thread workspace directory. This allowed relative-path writes to
escape the thread isolation boundary.
Root cause: validate_local_bash_command_paths and
replace_virtual_paths_in_command only process absolute paths (scanning
for '/' prefix). Relative paths pass through untouched and inherit the
process cwd at subprocess.run time.
Fix: after virtual path translation, prepend `cd {workspace} &&` to
anchor the shell's cwd to the thread-isolated workspace directory before
execution. shlex.quote() ensures paths with spaces or special characters
are handled safely.
This mirrors the approach used by OpenHands (fixed cwd at execution
layer) and is the correct fix for local mode where each subprocess.run
is an independent process with no persistent shell session.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(sandbox): extract _apply_cwd_prefix and add unit tests
Extract the workspace cd-prefix logic from bash_tool into a dedicated
_apply_cwd_prefix() helper so it can be unit-tested in isolation.
Add four tests covering: normal prefix, no thread_data, missing
workspace_path, and paths with spaces (shlex.quote).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* revert: remove unrelated configurable thread_id fallback from sandbox/tools.py
This change belongs in a separate PR.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: remove trailing whitespace in test_sandbox_tools_security
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: add Windows shell fallback for local sandbox
* fix: handle PowerShell execution on Windows
* fix: handle Windows local shell execution
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(security): disable host bash by default in local sandbox
* fix(security): address review feedback for local bash hardening
* fix(ci): sort live test imports for lint
* style: apply backend formatter
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat/bug-fix: copy the allowed path configurations in MCP filesystem tools to bash tool. With updated unit test
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat(client): support custom middleware injection
Add support for custom middleware, allowing custom middleware list to be passed when initializing DeerFlowClient. These middleware will be injected after the default middleware when creating the agent, extending the agent's functionality.
* feat: inject custom middlewares before ClarificationMiddleware to preserve ordering
- Add `custom_middlewares` param to `_build_middlewares`
- Inject custom middlewares right before `ClarificationMiddleware` to keep it as the last in the chain
- Remove unsafe `.extend()` in `client.py`
- Update tests in `test_client.py` and `test_lead_agent_model_resolution.py` to assert correct injection ordering
* fix(channel): reject concurrent same-thread runs (#1465)
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(lint): sort imports in manager.py and test_channels.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(channel): widen _is_thread_busy_error to BaseException and downgrade busy log to warning
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(oauth): inject billing header for non-Haiku model access
The Anthropic Messages API requires a billing identification block
in the system prompt when using Claude Code OAuth tokens (sk-ant-oat*)
to access non-Haiku models (Opus, Sonnet). Without it, the API returns
a generic 400 "Error" with no actionable detail.
This was discovered by intercepting Claude Code CLI requests — the CLI
injects an `x-anthropic-billing-header` text block as the first system
prompt entry on every request. Third-party consumers of the same OAuth
tokens must do the same.
Changes:
- Add `_apply_oauth_billing()` to `ClaudeChatModel` that prepends the
billing header block to the system prompt when `_is_oauth` is True
- Add `metadata.user_id` with device/session identifiers (required by
the API alongside the billing header)
- Called from `_get_request_payload()` before prompt caching runs
Verified with Claude Max OAuth tokens against all three model tiers:
- claude-opus-4-6: 200 OK
- claude-sonnet-4-6: 200 OK
- claude-haiku-4-5-20251001: 200 OK (was already working)
Closes#1245
* fix(oauth): address review feedback on billing header injection
- Make OAUTH_BILLING_HEADER configurable via ANTHROPIC_BILLING_HEADER env var
- Normalize billing block to always be first in system list (strip + reinsert)
- Guard metadata with isinstance check for non-dict values
- Replace os.uname() with socket.gethostname() for Windows compat
- Fix docstrings to say "all OAuth requests" instead of "non-Haiku"
- Move inline imports to module level (fixes ruff I001)
- Add 9 unit tests for _apply_oauth_billing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* test: add unit tests for skill frontmatter validation
Cover _validate_skill_frontmatter logic:
- Valid minimal and full-field skills
- Missing SKILL.md, missing frontmatter, invalid YAML
- Required field validation (name, description)
- Unexpected key rejection
- Name format: hyphen-case, no leading/trailing/consecutive hyphens
- Name and description length limits
- Angle bracket rejection in description
* test: fix unused variables flagged by ruff F841
Replace unused tuple elements with _ and add assertions on
msg/name return values in success-path tests.
* test: address review feedback on unused variables
* test: consolidate validation tests into single module
Move the UTF-8/windows-locale test from test_skills_router.py into
test_skills_validation.py and remove test_skills_router.py to eliminate
duplicated assertions and future maintenance drift.
* fix: match assertion strings to actual validation messages
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Allow per-agent environment variables to be declared in config.yaml under
acp_agents.<name>.env. Values prefixed with $ are resolved from the host
environment at invocation time, consistent with other config fields.
Passes None to spawn_agent_process when env is empty so the subprocess
inherits the parent environment unchanged.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix Windows backend test compatibility
* Preserve ACP path style on Windows
* Fix installer import ordering
* Address review comments for Windows fixes
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(LLM): fixing Gemini thinking + tool calls via OpenAI gateway (#1180)
When using Gemini with thinking enabled through an OpenAI-compatible gateway,
the API requires that fields on thinking content blocks are
preserved and echoed back verbatim in subsequent requests. Standard
silently drops these signatures when serializing
messages, causing HTTP 400 errors:
Changes:
- Add PatchedChatOpenAI adapter that re-injects signed thinking blocks into
request payloads, preserving the signature chain across multi-turn
conversations with tool calls.
- Support two LangChain storage patterns: additional_kwargs.thinking_blocks
and content list.
- Add 11 unit tests covering signed/unsigned blocks, storage patterns, edge
cases, and precedence rules.
- Update config.example.yaml with Gemini + thinking gateway example.
- Update CONFIGURATION.md with detailed guidance and error explanation.
Fixes: #1180
* Updated the patched_openai.py with thought_signature of function call
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* docs: fix inaccurate thought_signature description in CONFIGURATION.md (#1220)
* Initial plan
* docs: fix CONFIGURATION.md wording for thought_signature - tool-call objects, not thinking blocks
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/360f5226-4631-48a7-a050-189094af8ffe
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* refactor: extract shared utils to break harness→app cross-layer imports
Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: split backend/src into harness (deerflow.*) and app (app.*)
Physically split the monolithic backend/src/ package into two layers:
- **Harness** (`packages/harness/deerflow/`): publishable agent framework
package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
models, MCP, skills, config, and all core infrastructure.
- **App** (`app/`): unpublished application code with import prefix `app.*`.
Contains gateway (FastAPI REST API) and channels (IM integrations).
Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure
Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add harness→app boundary check test and update docs
Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.
Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add config versioning with auto-upgrade on startup
When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.
- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix comments
* fix: update src.* import in test_sandbox_tools_security to deerflow.*
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle empty config and search parent dirs for config.example.yaml
Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
looking next to config.yaml, fixing detection in common setups
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: correct skills root path depth and config_version type coercion
- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
after harness split, file lives at packages/harness/deerflow/skills/
so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
_check_config_version() to prevent TypeError when YAML stores value
as string (e.g. config_version: "1")
- tests: add regression tests for both fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test imports from src.* to deerflow.*/app.* after harness refactor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(harness): add tool-first ACP agent invocation (#37)
* feat(harness): add tool-first ACP agent invocation
* build(harness): make ACP dependency required
* fix(harness): address ACP review feedback
* feat(harness): decouple ACP agent workspace from thread data
ACP agents (codex, claude-code) previously used per-thread workspace
directories, causing path resolution complexity and coupling task
execution to DeerFlow's internal thread data layout. This change:
- Replace _resolve_cwd() with a fixed _get_work_dir() that always uses
{base_dir}/acp-workspace/, eliminating virtual path translation and
thread_id lookups
- Introduce /mnt/acp-workspace virtual path for lead agent read-only
access to ACP agent output files (same pattern as /mnt/skills)
- Add security guards: read-only validation, path traversal prevention,
command path allowlisting, and output masking for acp-workspace
- Update system prompt and tool description to guide LLM: send
self-contained tasks to ACP agents, copy results via /mnt/acp-workspace
- Add 11 new security tests for ACP workspace path handling
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(prompt): inject ACP section only when ACP agents are configured
The ACP agent guidance in the system prompt is now conditionally built
by _build_acp_section(), which checks get_acp_agents() and returns an
empty string when no ACP agents are configured. This avoids polluting
the prompt with irrelevant instructions for users who don't use ACP.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix lint
* fix(harness): address Copilot review comments on sandbox path handling and ACP tool
- local_sandbox: fix path-segment boundary bug in _resolve_path (== or startswith +"/")
and add lookahead in _resolve_paths_in_command regex to prevent /mnt/skills matching
inside /mnt/skills-extra
- local_sandbox_provider: replace print() with logger.warning(..., exc_info=True)
- invoke_acp_agent_tool: guard getattr(option, "optionId") with None default + continue;
move full prompt from INFO to DEBUG level (truncated to 200 chars)
- sandbox/tools: fix _get_acp_workspace_host_path docstring to match implementation;
remove misleading "read-only" language from validate_local_bash_command_paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(acp): thread-isolated workspaces, permission guardrail, and ContextVar registry
P1.1 – ACP workspace thread isolation
- Add `Paths.acp_workspace_dir(thread_id)` for per-thread paths
- `_get_work_dir(thread_id)` in invoke_acp_agent_tool now uses
`{base_dir}/threads/{thread_id}/acp-workspace/`; falls back to
global workspace when thread_id is absent or invalid
- `_invoke` extracts thread_id from `RunnableConfig` via
`Annotated[RunnableConfig, InjectedToolArg]`
- `sandbox/tools.py`: `_get_acp_workspace_host_path(thread_id)`,
`_resolve_acp_workspace_path(path, thread_id)`, and all callers
(`replace_virtual_paths_in_command`, `mask_local_paths_in_output`,
`ls_tool`, `read_file_tool`) now resolve ACP paths per-thread
P1.2 – ACP permission guardrail
- New `auto_approve_permissions: bool = False` field in `ACPAgentConfig`
- `_build_permission_response(options, *, auto_approve: bool)` now
defaults to deny; only approves when `auto_approve=True`
- Document field in `config.example.yaml`
P2 – Deferred tool registry race condition
- Replace module-level `_registry` global with `contextvars.ContextVar`
- Each asyncio request context gets its own registry; worker threads
inherit the context automatically via `loop.run_in_executor`
- Expose `get_deferred_registry` / `set_deferred_registry` /
`reset_deferred_registry` helpers
Tests: 831 pass (57 for affected modules, 3 new tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(sandbox): mount /mnt/acp-workspace in docker sandbox container
The AioSandboxProvider was not mounting the ACP workspace into the
sandbox container, so /mnt/acp-workspace was inaccessible when the lead
agent tried to read ACP results in docker mode.
Changes:
- `ensure_thread_dirs`: also create `acp-workspace/` (chmod 0o777) so
the directory exists before the sandbox container starts — required
for Docker volume mounts
- `_get_thread_mounts`: add read-only `/mnt/acp-workspace` mount using
the per-thread host path (`host_paths.acp_workspace_dir(thread_id)`)
- Update stale CLAUDE.md description (was "fixed global workspace")
Tests: `test_aio_sandbox_provider.py` (4 new tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): remove unused imports in test_aio_sandbox_provider
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix config
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* test: add unit tests for TodoMiddleware
Cover context-loss detection logic:
- _todos_in_messages and _reminder_in_messages helpers
- _format_todos formatting
- Reminder injection when write_todos truncated
- No-op when todos visible or reminder already present
- abefore_model async delegation
* test: fix event loop error in todo middleware async test
Use asyncio.run() instead of get_event_loop().run_until_complete()
to avoid RuntimeError on Python 3.12 where no default event loop
exists in the main thread.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* test: add unit tests for DanglingToolCallMiddleware
Cover message patching logic for dangling tool calls:
- No-op when all tool calls have responses
- Synthetic ToolMessage insertion at correct positions
- Mixed responded/dangling scenarios
- wrap_model_call and awrap_model_call integration
* test: fix async tests and strengthen override assertions
- Use @pytest.mark.anyio + async def instead of deprecated
asyncio.get_event_loop().run_until_complete() (fixes Py3.12 CI failure)
- Assert that override() receives the correct patched messages kwarg
in both wrap_model_call and awrap_model_call tests
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* refactor: extract shared skill installer and upload manager to harness
Move duplicated business logic from Gateway routers and Client into
shared harness modules, eliminating code duplication.
New shared modules:
- deerflow.skills.installer: 6 functions (zip security, extraction, install)
- deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate,
list, delete, get_uploads_dir, ensure_uploads_dir)
Key improvements:
- SkillAlreadyExistsError replaces stringly-typed 409 status routing
- normalize_filename rejects backslash-containing filenames
- Read paths (list/delete) no longer mkdir via get_uploads_dir
- Write paths use ensure_uploads_dir for explicit directory creation
- list_files_in_dir does stat inside scandir context (no re-stat)
- install_skill_from_archive uses single is_file() check (one syscall)
- Fix agent config key not reset on update_mcp_config/update_skill
Tests: 42 new (22 installer + 20 upload manager) + client hardening
* refactor: centralize upload URL construction and clean up installer
- Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing()
into shared manager.py, eliminating 6 duplicated URL constructions across
Gateway router and Client
- Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of
hardcoded "mnt/user-data/uploads" strings
- Eliminate TOCTOU pre-checks and double file read in installer — single
ZipFile() open with exception handling replaces is_file() + is_zipfile()
+ ZipFile() sequence
- Add missing re-exports: ensure_uploads_dir in uploads/__init__.py,
SkillAlreadyExistsError in skills/__init__.py
- Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS
- Hoist sandbox_uploads_dir(thread_id) before loop in uploads router
* fix: add input validation for thread_id and filename length
- Reject thread_id containing unsafe filesystem characters (only allow
alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs
like <script> or shell metacharacters
- Reject filenames longer than 255 bytes (OS limit) in normalize_filename
- Gateway upload router maps ValueError to 400 for invalid thread_id
* fix: address PR review — symlink safety, input validation coverage, error ordering
- list_files_in_dir: use follow_symlinks=False to prevent symlink metadata
leakage; check is_dir() instead of exists() for non-directory paths
- install_skill_from_archive: restore is_file() pre-check before extension
validation so error messages match the documented exception contract
- validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so
all entry points (upload/list/delete) are protected
- delete_uploaded_file: catch ValueError from thread_id validation (was 500)
- requires_llm marker: also skip when OPENAI_API_KEY is unset
- e2e fixture: update TitleMiddleware exclusion comment (kept filtering —
middleware triggers extra LLM calls that add non-determinism to tests)
* chore: revert uv.lock to main — no dependency changes in this PR
* fix: use monkeypatch for global config in e2e fixture to prevent test pollution
The e2e_env fixture was calling set_title_config() and
set_summarization_config() directly, which mutated global singletons
without automatic cleanup. When pytest ran test_client_e2e.py before
test_title_middleware_core_logic.py, the leaked enabled=False caused
5 title tests to fail in CI.
Switched to monkeypatch.setattr on the module-level private variables
so pytest restores the originals after each test.
* fix: address code review — URL encoding, API consistency, test isolation
- upload_artifact_url: percent-encode filename to handle spaces/#/?
- deduplicate_filename: mutate seen set in place (caller no longer
needs manual .add() — less error-prone API)
- list_files_in_dir: document that size is int, enrich stringifies
- e2e fixture: monkeypatch _app_config instead of set_app_config()
to prevent global singleton pollution (same pattern as title/summarization fix)
- _make_e2e_config: read LLM connection details from env vars so
external contributors can override defaults
- Update tests to match new deduplicate_filename contract
* docs: rewrite RFC in English and add alternatives/breaking changes sections
* fix: address code review feedback on PR #1202
- Rename deduplicate_filename to claim_unique_filename to make
the in-place set mutation explicit in the function name
- Replace PermissionError with PathTraversalError(ValueError) for
path traversal detection — malformed input is 400, not 403
* fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
LoopDetectionMiddleware injected SystemMessage mid-conversation to warn
about repetitive tool calls. This crashes Anthropic models because
langchain_anthropic's _format_messages() requires system messages to
appear only at the start of the conversation — interleaved system
messages raise 'Received multiple non-consecutive system messages'.
Switch the warning injection from SystemMessage to HumanMessage, which
works with all providers (Anthropic, OpenAI, Google, etc.).
Fixes#1299
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
* fix(mcp): implement sync invocation wrapper for async MCP tools
Since DeerFlowClient streams synchronously, invoking async-only MCP tools
(loaded via langchain-mcp-adapters) resulted in a NotImplementedError.
This commit bridges the sync/async gap by dynamically injecting a `func`
wrapper into `StructuredTool` instances that only have a `coroutine`.
Key changes:
- Added `sync_wrapper` in `get_mcp_tools` to execute async tool calls.
- Handled nested event loops by delegating to a global `ThreadPoolExecutor`
when an event loop is already running, avoiding `RuntimeError`.
- Added detailed error logging within the wrapper for better transparency.
- Added comprehensive test coverage in `test_mcp_sync_wrapper.py` verifying
tool patching, event loop behavior, and exception propagation.
* refactor(mcp): extract sync wrapper to module level and fix test mocks
Addressed PR review comments:
- Extracted _make_sync_tool_wrapper to module level to avoid nested func definitions.
- Refactored tests to use the actual production helper instead of duplicating logic.
- Fixed AsyncMock patching for awaited dependencies in tests.
- Added atexit hook for graceful thread pool shutdown.
- Fixed PEP8 blank line formatting in tests.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(threads): clean up local thread data after thread deletion
Delete DeerFlow-managed thread directories after the web UI removes a LangGraph thread.
This keeps local thread data in sync with conversation deletion and adds regression coverage for the cleanup flow.
* fix(threads): address thread cleanup review feedback
Encode thread cleanup URLs in the web client, keep cache updates explicit when no thread search data is cached, and return a generic 500 response from the cleanup endpoint while documenting the sanitized error behavior.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Add GuardrailMiddleware that evaluates every tool call before execution.
Three provider options: built-in AllowlistProvider (zero deps), OAP passport
providers (open standard), or custom providers loaded by class path.
- GuardrailProvider protocol with GuardrailRequest/Decision dataclasses
- GuardrailMiddleware (AgentMiddleware, position 5 in chain)
- AllowlistProvider for simple deny/allow by tool name
- GuardrailsConfig (Pydantic singleton, loaded from config.yaml)
- 25 tests covering allow/deny, fail-closed/open, async, GraphBubbleUp
- Comprehensive docs at backend/docs/GUARDRAILS.md
Closes#1213
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add Claude Code OAuth and Codex CLI providers
Port of bytedance/deer-flow#1136 from @solanian's feat/cli-oauth-providers branch.\n\nCarries the feature forward on top of current main without the original CLA-blocked commit metadata, while preserving attribution in the commit message for review.
* fix: harden CLI credential loading
Align Codex auth loading with the current ~/.codex/auth.json shape, make Docker credential mounts directory-based to avoid broken file binds on hosts without exported credential files, and add focused loader tests.
* refactor: tighten codex auth typing
Replace the temporary Any return type in CodexChatModel._load_codex_auth with the concrete CodexCliCredential type after the credential loader was stabilized.
* fix: load Claude Code OAuth from Keychain
Match Claude Code's macOS storage strategy more closely by checking the Keychain-backed credentials store before falling back to ~/.claude/.credentials.json. Keep explicit file overrides and add focused tests for the Keychain path.
* fix: require explicit Claude OAuth handoff
* style: format thread hooks reasoning request
* docs: document CLI-backed auth providers
* fix: address provider review feedback
* fix: harden provider edge cases
* Fix deferred tools, Codex message normalization, and local sandbox paths
* chore: narrow PR scope to OAuth providers
* chore: remove unrelated frontend changes
* chore: reapply OAuth branch frontend scope cleanup
* fix: preserve upload guards with reasoning effort wiring
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: normalize ToolMessage structured content in serialization
When models return ToolMessage content as a list of content blocks
(e.g. [{"type": "text", "text": "..."}]), the UI previously displayed
the raw Python repr string instead of the extracted text.
Replace str(msg.content) with the existing _extract_text() helper in
both _serialize_message() and stream() to properly normalize
list-of-blocks content to plain text.
Fixes#1149
Also fixes the same root cause as #1188 (characters displayed one per
line when tool response content is returned as structured blocks).
Added 11 regression tests covering string, list-of-blocks, mixed,
empty, and fallback content types.
* fix(memory): extract text from structured LLM responses in memory updater
When LLMs return response content as list of content blocks
(e.g. [{"type": "text", "text": "..."}]) instead of plain strings,
str() produces Python repr which breaks JSON parsing in the memory
updater. This caused memory updates to silently fail.
Changes:
- Add _extract_text() helper in updater.py for safe content normalization
- Use _extract_text() instead of str(response.content) in update_memory()
- Fix format_conversation_for_update() to handle plain strings in list content
- Fix subagent executor fallback path to extract text from list content
- Replace print() with structured logging (logger.info/warning/error)
- Add 13 regression tests covering _extract_text, format_conversation,
and update_memory with structured LLM responses
* fix: address Copilot review - defensive text extraction + logger.exception
- client.py _extract_text: use block.get('text') + isinstance check (prevent KeyError/TypeError)
- prompt.py format_conversation_for_update: same defensive check for dict text blocks
- executor.py: type-safe text extraction in both code paths, fallback to placeholder instead of str(raw_content)
- updater.py: use logger.exception() instead of logger.error() for traceback preservation
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: preserve chunked structured content without spurious newlines
* fix: restore backend unit test compatibility
---------
Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: track token usage per conversation turn
Add token usage tracking to the streaming API so consumers can monitor
cost per turn without additional API calls.
Changes:
1. _serialize_message now includes usage_metadata for AI messages in
values events, exposing input_tokens/output_tokens/total_tokens
from LangChain's native metadata.
2. stream() accumulates token usage across all AI messages in a turn
and emits the cumulative totals in the end event:
{usage: {input_tokens: N, output_tokens: N, total_tokens: N}}
3. Each messages-tuple AI event with text content now includes a
per-message usage_metadata field for granular tracking.
This enables the frontend to display token consumption per turn,
support cost-aware UX, and let users monitor API spending.
10 tests added covering serialization passthrough and cumulative
aggregation logic.
Co-Authored-By: OpenClaw <noreply@openclaw.ai>
* fix: address Copilot review - use Mapping access for usage_metadata
- Replace getattr(usage, 'input_tokens', 0) with usage.get('input_tokens', 0)
since LangChain usage_metadata is a dict, not an object
- Remove unused 'import pytest' (fixes Ruff F401)
- Add proper stream() integration tests for cumulative usage in end event
and per-message usage_metadata in messages-tuple events
---------
Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com>
Co-authored-by: OpenClaw <noreply@openclaw.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
This PR improves MiniMax Code Plan integration in DeerFlow by fixing three issues in the current flow: stream errors were not clearly surfaced in the UI, the frontend could not display the actual provider model ID, and MiniMax reasoning output could leak into final assistant content as inline <think>...</think>. The change adds a MiniMax-specific adapter, exposes real model IDs end-to-end, and adds a frontend fallback for historical messages.
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(feishu): support @bot message in topic groups
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(feishu): preserve rich-text formatting and add parser unit tests
* chore(test): remove unused import to fix ruff lint error
* style: auto-format imports to satisfy ruff
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(manager): add bootstrap command to initialize soul.md in correct place
* feat(channels): add /bootstrap command to IM channels
Add a `/bootstrap` command that routes to the chat handler with
`is_bootstrap: True` in the run context, allowing the agent to invoke
its setup/initialization flow (e.g. `setup_agent`).
- The text after `/bootstrap` is forwarded as the chat message; when
omitted a default "Initialize workspace" message is used.
- Feishu channels use the streaming path as with normal chat.
- No changes to ChannelStore — bootstrap is stateless and triggered
purely by the command.
- Update /help output to include /bootstrap.
- Add 5 tests covering: text/no-text variants, Feishu streaming path,
thread creation, and help text.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix: accept copilot suggestion
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(gateway): remove generated markdown on upload delete
Keep thread upload storage consistent by deleting the generated markdown companion when the original convertible upload is removed.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(harness): allow agent read access to /mnt/skills in local sandbox
Skill files under /mnt/skills/ were blocked by the path validator,
preventing agents from reading skill definitions. This change:
- Refactors `resolve_local_tool_path` into `validate_local_tool_path`,
a pure security gate that no longer resolves paths (left to the sandbox)
- Permits read-only access to the skills container path (/mnt/skills by
default, configurable via config.skills.container_path)
- Blocks write access to skills paths (PermissionError)
- Allows /mnt/skills in bash command path validation
- Adds `LocalSandbox.update_path_mappings` and injects per-thread
user-data mappings into the sandbox so all virtual-path resolution
is handled uniformly by the sandbox layer
- Covers all new behaviour with tests
Fixes#1177
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(sandbox): unify all virtual path resolution in tools.py
Move skills path resolution from LocalSandbox into tools.py so that all
virtual-to-host path translation (user-data and skills) lives in one
layer. LocalSandbox becomes a pure execution layer that receives only
real host paths — no more path_mappings, _resolve_path, or reverse
resolve logic.
This addresses architecture feedback that path resolution was split
across two layers (tools.py for user-data, LocalSandbox for skills),
making the flow hard to follow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(sandbox): address Copilot review — cache-on-success and error path masking
- Replace @lru_cache with manual cache-on-success for _get_skills_container_path
and _get_skills_host_path so transient failures at startup don't permanently
disable skills access.
- Add _sanitize_error() helper that masks host filesystem paths in error
messages via mask_local_paths_in_output before returning them to the agent.
- Apply _sanitize_error() to all catch-all (Exception/OSError) handlers in
sandbox tool functions to prevent host path leakage in error output.
- Remove unused lru_cache import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(tools): add tool_search for deferred MCP tool loading
When multiple MCP servers are enabled, total tool count can exceed 30-50,
causing context bloat and degraded tool selection accuracy. This adds a
deferred tool loading mechanism controlled by `tool_search.enabled` config.
- Add ToolSearchConfig with single `enabled` field
- Add DeferredToolRegistry with regex search (select:, +keyword, keyword)
- Add tool_search tool returning OpenAI-compatible function JSON
- Add DeferredToolFilterMiddleware to hide deferred schemas from bind_tools
- Add <available-deferred-tools> section to system prompt
- Enable MCP tool_name_prefix to prevent cross-server name collisions
- Add 34 unit tests covering registry, tool, prompt, and middleware
* fix: reset stale deferred registry and bump config_version
- Reset deferred registry upfront in get_available_tools() to prevent
stale tool entries when MCP servers are disabled between calls
- Bump config_version to 2 for new tool_search config field
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tests): mock get_app_config in prompt section tests for CI
CI has no config.yaml, causing TestDeferredToolsPromptSection to fail
with FileNotFoundError. Add autouse fixture to mock get_app_config.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(harness): normalize structured content for titles
Flatten structured LangChain message content before prompting the title model so list/block payloads don't leak Python reprs into generated thread titles.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* refactor: extract shared utils to break harness→app cross-layer imports
Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: split backend/src into harness (deerflow.*) and app (app.*)
Physically split the monolithic backend/src/ package into two layers:
- **Harness** (`packages/harness/deerflow/`): publishable agent framework
package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
models, MCP, skills, config, and all core infrastructure.
- **App** (`app/`): unpublished application code with import prefix `app.*`.
Contains gateway (FastAPI REST API) and channels (IM integrations).
Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure
Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add harness→app boundary check test and update docs
Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.
Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add config versioning with auto-upgrade on startup
When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.
- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix comments
* fix: update src.* import in test_sandbox_tools_security to deerflow.*
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle empty config and search parent dirs for config.example.yaml
Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
looking next to config.yaml, fixing detection in common setups
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: correct skills root path depth and config_version type coercion
- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
after harness split, file lives at packages/harness/deerflow/skills/
so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
_check_config_version() to prevent TypeError when YAML stores value
as string (e.g. config_version: "1")
- tests: add regression tests for both fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test imports from src.* to deerflow.*/app.* after harness refactor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat(feishu): stream updates on a single card
* fix(feishu): ensure final message on stream error and warn on missing card ID
- Wrap streaming loop in try/except/finally so a is_final=True outbound
message is always published, even when the LangGraph stream breaks
mid-way. This prevents _running_card_ids memory leaks and ensures the
Feishu card shows a DONE reaction instead of hanging on "Working on it".
- Log a warning when _ensure_running_card gets no message_id back from
the Feishu reply API, making silent fallback to new-card behavior
visible in logs.
- Add test_handle_feishu_stream_error_still_sends_final to cover the
error path.
- Reformat service.py dict comprehension (ruff format, no logic change).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Avoid blocking inbound on Feishu card creation
---------
Co-authored-by: songyaolun <songyaolun@bytedance.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add LoopDetectionMiddleware to break repetitive tool call loops
Adds a new AgentMiddleware that detects when the agent is stuck calling
the same tools with the same arguments repeatedly, which currently runs
until the recursion limit kills the run.
Detection: per-thread sliding window of tool call hashes (name + args).
- Warn threshold (default 3): injects a "wrap up" system message
- Hard limit (default 5): strips tool_calls, forcing final text output
Includes 13 unit tests covering hashing, thresholds, window sliding,
reset, and edge cases.
Closes#1055
* fix: address PR #1056 review feedback for LoopDetectionMiddleware
- Remove unused imports (Awaitable, Callable, ModelCallResult,
ModelRequest, ModelResponse, AIMessage) from loop_detection_middleware
- Remove unused pytest import from test file
- Fix _hash_tool_calls sort key: sort by (name, serialized args) for
deterministic hashing when multiple calls share the same tool name
- Revert subagent_enabled default to False in agent.py to match
DeerFlowClient and channel defaults
- Remove unrelated SearxNG tools and Next.js rewrite changes from PR
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address 2nd round review feedback on PR #1056
- Inject loop warning only once per thread (prevents context bloat)
- Add threading.Lock for thread-safe history mutations
- Use runtime.context thread_id instead of workspace_path
- Add LRU eviction for per-thread history (max 100 threads)
- Add 5 new tests covering warn-once, LRU eviction, thread isolation,
fallback thread_id, and lock presence
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve lint errors in loop detection middleware tests
Sort imports (I001) and remove unused _WARNING_MSG import (F401)
to fix ruff lint failures in CI.
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Add MiniMax as an OpenAI-compatible model provider
MiniMax offers high-performance LLMs (M2.5, M2.5-highspeed) with
204K context windows. This commit adds MiniMax as a selectable
provider in the configuration system.
Changes:
- Add MiniMax to SUPPORTED_MODELS with model definitions
- Add MiniMax provider configuration in conf/config.yaml
- Update documentation with MiniMax setup instructions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update README to remove MiniMax API details
Removed mention of MiniMax API usage and configuration examples.
---------
Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: preserve conversation context in Telegram private chats
In private (1-on-1) chats, set topic_id=None so all messages map to a
single DeerFlow thread per chat instead of creating a new thread for
every message. Also fix _cmd_generic to use topic_id=None in private
chats so /new correctly targets the default thread.
Group chat behavior is unchanged (reply_to or msg_id as topic_id).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: preserve conversation context in Telegram private chats
Fixes#1101
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: mirror _on_text reply logic in _cmd_generic for group chats
_cmd_generic now prefers reply_to_message.message_id over msg_id in
group/supergroup chats, consistent with _on_text. This ensures commands
like /new and /status target the correct conversation thread when sent
as a reply in group chats.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(sandbox): harden local file access and mask host paths
- enforce local sandbox file tools to only accept /mnt/user-data paths
- add path traversal checks against thread workspace/uploads/outputs roots
- preserve requested virtual paths in tool error messages (no host path leaks)
- mask local absolute paths in bash output back to virtual sandbox paths
- update bash tool guidance to prefer thread-local venv + python -m pip
- add regression tests for path mapping, masking, and access restrictions
Fixes#968
* feat(sandbox): restrict risky absolute paths in local bash commands
- validate absolute path usage in local-mode bash commands
- allow only /mnt/user-data virtual paths for user data access
- keep a small allowlist for system executable/device paths
- return clear permission errors for unsafe command paths
- add regression tests for bash path validation rules
* test(sandbox): add success path test for resolve_local_tool_path (#992)
* Initial plan
* test(sandbox): add success path test for resolve_local_tool_path
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
* fix(sandbox): reject bare virtual root early with clear error in resolve_local_tool_path (#991)
* Initial plan
* fix(sandbox): reject bare virtual root early with clear error in resolve_local_tool_path
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* fix(gateway): ignore archive metadata wrappers
Treat top-level __MACOSX and dotfile entries as packaging metadata so valid .skill archives still resolve to their real skill directory.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(gateway): allow standard skill frontmatter metadata
Accept standard optional frontmatter fields during .skill installs so external skills with version, author, or compatibility metadata do not fail validation.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* docs: sync skill installer metadata behavior
Document the skill install allowlist so user-facing and backend contributor docs match the gateway validation contract.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(gateway): normalize suggestion response content
Handle list-style model content before JSON parsing so provider wrappers do not silently drop follow-up suggestions.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* docs: sync suggestions endpoint behavior
Document the rich-content normalization path so the README and backend gateway notes stay aligned with the current router contract.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(tracing): support LANGCHAIN_* env fallback for LangSmith config
- add backward-compatible env parsing in tracing_config.py
- support fallback keys:
LANGCHAIN_TRACING_V2 / LANGCHAIN_TRACING
LANGCHAIN_API_KEY
LANGCHAIN_PROJECT
LANGCHAIN_ENDPOINT
- keep LANGSMITH_* as preferred source when both are present
- add regression tests in test_tracing_config.py
* fix(tracing): correct LANGSMITH_* precedence over LANGCHAIN_* for enabled flag (#1067)
* Initial plan
* fix(tracing): use first-present-wins logic for enabled flag, add precedence docs and test
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* fix(subagents): cleanup background tasks after completion to prevent memory leak
Added cleanup_background_task() function to remove completed subagent results
from the global _background_tasks dict. Found a small issue: completed tasks
were never removed, causing memory to grow indefinitely with each subagent
execution.
Alternative approaches considered:
- Future + SubagentHandle pattern: Not chosen due to requiring refactoring
Chose the simple cleanup approach for minimal code changes while effectively
resolving the memory leak.
Changes:
- Add cleanup_background_task() in executor.py
- Call cleanup in all task_tool return paths (completed, failed, timed out)
* fix(subagents): prevent race condition in background task cleanup
Address Copilot review feedback on memory leak fix:
- Add terminal state check in cleanup_background_task() to only remove
tasks that are COMPLETED/FAILED/TIMED_OUT or have completed_at set
- Remove cleanup call from polling safety-timeout branch in task_tool
since the task may still be running
- Add comprehensive tests for cleanup behavior including:
- Verification that cleanup is called on terminal states
- Verification that cleanup is NOT called on polling timeout
- Tests for terminal state check logic in executor
This prevents KeyError when the background executor tries to update
a task that was prematurely removed from _background_tasks.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(checkpointer): return InMemorySaver instead of None when not configured (#1016)
* fix(checkpointer): also fix get_checkpointer() to return InMemorySaver
Make all three checkpointer functions consistent:
- make_checkpointer() (async) → InMemorySaver
- checkpointer_context() (sync) → InMemorySaver
- get_checkpointer() (sync singleton) → InMemorySaver
This ensures DeerFlowClient always has a valid checkpointer.
* fix: address CI failure and Copilot review feedback
- Fix import order in test_checkpointer_none_fix.py (I001 ruff error)
- Fix type annotation: _checkpointer should be Checkpointer | None
- Update docstring: change "None if not configured" to "InMemorySaver if not configured"
- Ensure app config is loaded before checking checkpointer config to prevent incorrect InMemorySaver fallback
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add IM channels system for Feishu, Slack, and Telegram integration
Bridge external messaging platforms to DeerFlow via LangGraph Server with
async message bus, thread management, and per-channel configuration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address review comments on IM channels system
Fix topic_id handling in store remove/list_entries and manager commands,
correct Telegram reply threading, remove unused imports/variables, update
docstrings and docs to match implementation, and prevent config mutation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* update skill creator
* fix im reply text
* fix comments
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add checkpointer configuration to config.example.yaml
- Introduced a new section for checkpointer configuration to enable state persistence for the embedded DeerFlowClient.
- Documented supported types: memory, sqlite, and postgres, along with examples for each.
- Clarified that the LangGraph Server manages its own state persistence separately.
* refactor(checkpointer): streamline checkpointer initialization and logging
* fix(uv.lock): update revision and add new wheel URLs for brotlicffi package
* feat: add langchain-anthropic dependency and update related configurations
* Fix checkpointer lifecycle, docstring, and path resolution bugs from PR #1005 review (#4)
* Initial plan
* Address all review suggestions from PR #1005
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* Fix resolve_path to always return real Path; move SQLite special-string handling to callers
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* feat: u may ask
* chore: adjust code according to CR
* chore: adjust code according to CR
* ut: test for suggestions.py
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(subagent): support async MCP tools in subagent executor
SubagentExecutor.execute() was synchronous and could not handle async-only tools like MCP tools. This caused failures when trying to use MCP tools within subagents.
Changes:
- Add _aexecute() async method using agent.astream() for async execution
- Refactor execute() to use asyncio.run() wrapping _aexecute()
- This allows subagents to use async tools (MCP) within ThreadPoolExecutor
* test(subagent): add unit tests for executor async/sync paths
Add comprehensive tests covering:
- Async _aexecute() with success/error cases
- Sync execute() wrapper using asyncio.run()
- Async tool (MCP) support verification
- Thread pool execution safety
* fix(subagent): subagent-test-circular-depend
- Use session-scoped fixture with delayed import to handle circular dependencies
without affecting other test modules
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
- replace with explicit runtime deps:
- regenerate after dependency changes
- make deterministic by patching
to avoid leaked global affecting expected paths
* feat(upload): implement optimistic UI for file uploads and enhance message handling
* feat(middleware): enhance file handling by collecting historical uploads from directory
* feat(thread-title): update page title handling for new threads and improve loading state
* feat(uploads-middleware): enhance file extraction by verifying file existence in uploads directory
* feat(thread-stream): update file path reference to use virtual_path for uploads
* feat(tests): add core behaviour tests for UploadsMiddleware
* feat(tests): remove unused pytest import from test_uploads_middleware_core_logic.py
* feat: enhance file upload handling and localization support
- Update UploadsMiddleware to validate filenames more robustly.
- Modify MessageListItem to parse uploaded files from raw content for backward compatibility.
- Add localization for uploading messages in English and Chinese.
- Introduce parseUploadedFiles utility to extract uploaded files from message content.
* fix(memory): prevent file upload events from persisting in long-term memory
Uploaded files are session-scoped and unavailable in future sessions.
Previously, upload interactions were recorded in memory, causing the
agent to search for non-existent files in subsequent conversations.
Changes:
- memory_middleware: skip human messages containing <uploaded_files>
and their paired AI responses from the memory queue
- updater: post-process generated memory to strip upload mentions
before saving to file
- prompt: instruct the memory LLM to ignore file upload events
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): address Copilot review feedback on upload filtering
- memory_middleware: strip <uploaded_files> block from human messages
instead of dropping the entire turn; only skip the turn (and paired
AI response) when nothing remains after stripping
- updater: narrow the upload-scrubbing regex to explicit upload events
(avoids false-positive removal of "User works with CSV files" etc.);
also filter upload-event facts from the facts array
- prompt: move `import re` to module scope; skip upload-only human
messages (empty after stripping) rather than appending "User: "
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): allow optional words between 'upload' and 'file' in scrub regex
The previous pattern required 'uploading file' with no intervening words,
so 'uploading a test file' was not matched and leaked into long-term memory.
Allow up to 3 modifier words between the verb and noun (e.g. 'uploading a
test file', 'uploaded the attachment').
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(memory): add unit tests for upload filtering in memory pipeline
Covers _filter_messages_for_memory and _strip_upload_mentions_from_memory
per Copilot review suggestion. 15 test cases verify:
- Upload-only turns (and paired AI responses) are excluded from memory queue
- User's real question is preserved when combined with an upload block
- Upload file paths are never present in filtered message content
- Intermediate tool messages are always excluded
- Multi-turn conversations: only the upload turn is dropped
- Multimodal (list-content) human messages are handled
- Upload-event sentences are removed from summaries and facts
- Legitimate file-related facts (CSV preferences, PDF exports) are preserved
- "uploading a test file" (words between verb and noun) is caught by regex
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add agent management functionality with creation, editing, and deletion
* feat: enhance agent creation and chat experience
- Added AgentWelcome component to display agent description on new thread creation.
- Improved agent name validation with availability check during agent creation.
- Updated NewAgentPage to handle agent creation flow more effectively, including enhanced error handling and user feedback.
- Refactored chat components to streamline message handling and improve user experience.
- Introduced new bootstrap skill for personalized onboarding conversations, including detailed conversation phases and a structured SOUL.md template.
- Updated localization files to reflect new features and error messages.
- General code cleanup and optimizations across various components and hooks.
* Refactor workspace layout and agent management components
- Updated WorkspaceLayout to use useLayoutEffect for sidebar state initialization.
- Removed unused AgentFormDialog and related edit functionality from AgentCard.
- Introduced ArtifactTrigger component to manage artifact visibility.
- Enhanced ChatBox to handle artifact selection and display.
- Improved message list rendering logic to avoid loading states.
- Updated localization files to remove deprecated keys and add new translations.
- Refined hooks for local settings and thread management to improve performance and clarity.
- Added temporal awareness guidelines to deep research skill documentation.
* feat: refactor chat components and introduce thread management hooks
* feat: improve artifact file detail preview logic and clean up console logs
* feat: refactor lead agent creation logic and improve logging details
* feat: validate agent name format and enhance error handling in agent setup
* feat: simplify thread search query by removing unnecessary metadata
* feat: update query key in useDeleteThread and useRenameThread for consistency
* feat: add isMock parameter to thread and artifact handling for improved testing
* fix: reorder import of setup_agent for consistency in builtins module
* feat: append mock parameter to thread links in CaseStudySection for testing purposes
* fix: update load_agent_soul calls to use cfg.name for improved clarity
* fix: update date format in apply_prompt_template for consistency
* feat: integrate isMock parameter into artifact content loading for enhanced testing
* docs: add license section to SKILL.md for clarity and attribution
* feat(agent): enhance model resolution and agent configuration handling
* chore: remove unused import of _resolve_model_name from agents
* feat(agent): remove unused field
* fix(agent): set default value for requested_model_name in _resolve_model_name function
* feat(agent): update get_available_tools call to handle optional agent_config and improve middleware function signature
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: Add reasoning effort configuration support
* Add `reasoning_effort` parameter to model config and agent initialization
* Support reasoning effort levels (minimal/low/medium/high) for Doubao/GPT-5 models
* Add UI controls in input box for reasoning effort selection
* Update doubao-seed-1.8 example config with reasoning effort support
Fixes & Cleanup:
* Ensure UTF-8 encoding for file operations
* Remove unused imports
* fix: set reasoning_effort to None for unsupported models
* fix: unit test error
* Update frontend/src/components/workspace/input-box.tsx
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
add oauth schema to MCP server config (extensions_config.json)
support client_credentials and refresh_token grants
implement token manager with caching and pre-expiry refresh
inject OAuth Authorization header for MCP tool discovery and tool calls
extend MCP gateway config models to read/write OAuth settings
update docs and examples for OAuth configuration
add unit tests for token fetch/cache and header injection
Validate that all dict-returning client methods conform to Gateway
Pydantic response models (ModelsListResponse, ModelResponse,
SkillsListResponse, SkillResponse, SkillInstallResponse,
McpConfigResponse, UploadResponse, MemoryConfigResponse,
MemoryStatusResponse). Pydantic ValidationError in CI catches
schema drift between client and Gateway with zero production coupling.
Also includes prior review fixes: enhanced client methods, expanded
unit tests (67→77), live integration test improvements, and updated
documentation.
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Add `DeerFlowClient` class that provides direct in-process access to
DeerFlow's agent and Gateway capabilities without requiring LangGraph
Server or Gateway API processes. This enables users to import and use
DeerFlow as a Python library.
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* fix: recover from stale model context after config model changes
* fix: fail fast on missing model config and expand model resolution tests
* fix: remove duplicate get_app_config imports
* fix: align model resolution tests with runtime imports
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: remove duplicate model resolution test case
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat(subagents): make subagent timeout configurable via config.yaml
- Add SubagentsAppConfig supporting global and per-agent timeout_seconds
- Load subagents config section in AppConfig.from_file()
- Registry now applies config.yaml overrides without mutating builtin defaults
- Polling safety-net in task_tool is now dynamic (execution timeout + 60s buffer)
- Document subagents section in config.example.yaml
- Add make test command and enforce TDD policy in CLAUDE.md
- Add 38 unit tests covering config validation, timeout resolution, registry
override behavior, and polling timeout formula
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(subagents): add logging for subagent timeout config and execution
- Log loaded timeout config (global default + per-agent overrides) on startup
- Log debug message in registry when config.yaml overrides a builtin timeout
- Include timeout in executor's async execution start log
- Log effective timeout and polling limit when a task is dispatched
- Fix UnboundLocalError: move max_poll_count assignment before logger.info
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* ci(backend): add lint step and run all unit tests via Makefile
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix lint
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add AioSandboxProvider for Docker-based sandbox execution with
configurable container lifecycle, volume mounts, and port management
- Add TitleMiddleware to auto-generate thread titles after first
user-assistant exchange using LLM
- Add Claude Code documentation (CLAUDE.md, AGENTS.md)
- Extend SandboxConfig with Docker-specific options (image, port, mounts)
- Fix hardcoded mount path to use expanduser
- Add agent-sandbox and dotenv dependencies
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>