579e416459
643 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
881ff71252
|
fix(harness): preserve dynamic context across summarization (#2823) | ||
|
|
f76e4e35c8
|
fix title generation with dynamic context reminder (#2830) | ||
|
|
0d1053ca44
|
fix(uploads): add Windows support for safe symlink-protected uploads (#2794)
* fix(uploads): add Windows support for safe symlink-protected uploads * fix(uploads): update tests and translate comments; |
||
|
|
4063dd7157
|
feat(debug): print presented file paths with physical resolution (#2825)
Surface artifacts produced via the present_files tool in the CLI debug REPL so headless clients without a frontend (VS Code launch configs, etc.) can locate output files. Each turn prints newly added artifacts plus their resolved host path. Works for any source that goes through present_files — ACP agents, subagents, or sandbox writes. Co-authored-by: Claude Opus 4 <noreply@anthropic.com> |
||
|
|
7a3c58a733
|
Fix duplicate gateway upload filenames (#2789) | ||
|
|
1edc9d9fae
|
chore(deps): bump langchain-core from 1.3.2 to 1.3.3 in /backend (#2807)
Bumps [langchain-core](https://github.com/langchain-ai/langchain) from 1.3.2 to 1.3.3. - [Release notes](https://github.com/langchain-ai/langchain/releases) - [Commits](https://github.com/langchain-ai/langchain/compare/langchain-core==1.3.2...langchain-core==1.3.3) --- updated-dependencies: - dependency-name: langchain-core dependency-version: 1.3.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
7caf03e97c
|
fix(packaging): add postgres extra for store/checkpointer supportFix postgres extra install guidance (#2584)
* Fix postgres extra install guidance * Fix postgres install message lint * Format postgres install messages * Fix postgres install guidance and config docs |
||
|
|
c1b7f1d189
|
feat: static system prompt with DynamicContextMiddleware for prefix-cache optimization (#2801)
* feat(middleware): inject dynamic context via DynamicContextMiddleware
Move memory and current date out of the system prompt and into a
dedicated <system-reminder> HumanMessage injected once per session
(frozen-snapshot pattern) via a new DynamicContextMiddleware.
This keeps the system prompt byte-exact across all users and sessions,
enabling maximum Anthropic/Bedrock prefix-cache reuse.
Key design decisions:
- ID-swap technique: reminder takes the first HumanMessage's ID
(replacing it in-place via add_messages), original content gets a
derived `{id}__user` ID (appended after). Preserves correct ordering.
- hide_from_ui: True on reminder messages so frontend filters them out.
- Midnight crossing: date-update reminder injected before the current
turn's HumanMessage when the conversation spans midnight.
- INFO-level logging for production diagnostics.
Also adds prompt-caching breakpoint budget enforcement tests and
updates ClaudeChatModel docs to reference the new pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(token-usage): log input/output token detail breakdown in middleware
Extend the LLM token usage log line to include input_token_details and
output_token_details (cache_creation, cache_read, reasoning, audio, etc.)
when present. Adds tests covering Anthropic cache detail logging from
both usage_metadata and response_metadata.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: fix nginx
* fix(middleware): always inject date; gate memory on injection_enabled
Date injection is now unconditional — it is part of the static system
prompt replacement and should always be present. Memory injection
remains gated by `memory.injection_enabled` in the app config.
Previously the entire DynamicContextMiddleware was skipped when
injection_enabled was False, which also suppressed the date.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): format files and correct test assertions for token usage middleware
- ruff format dynamic_context_middleware.py and test_claude_provider_prompt_caching.py
- Remove unused pytest import from test_dynamic_context_middleware.py
- Fix two tests that asserted response_metadata fallback logic that
doesn't exist: replace with tests that match actual middleware behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(middleware): address Copilot review comments on DynamicContextMiddleware
- Use additional_kwargs flag for reminder detection instead of content
substring matching, so user messages containing '<system-reminder>'
are not mistakenly treated as injected reminders
- Generate stable UUID when original HumanMessage.id is None to prevent
ambiguous 'None__user' derived IDs and message collisions
- Downgrade per-turn no-op log to DEBUG; keep actual injection events at INFO
- Add two new tests: missing-id UUID fallback and user-text false-positive
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|
|
109490da25
|
chore(deps): bump python-multipart from 0.0.26 to 0.0.27 in /backend (#2799)
Bumps [python-multipart](https://github.com/Kludex/python-multipart) from 0.0.26 to 0.0.27. - [Release notes](https://github.com/Kludex/python-multipart/releases) - [Changelog](https://github.com/Kludex/python-multipart/blob/main/CHANGELOG.md) - [Commits](https://github.com/Kludex/python-multipart/compare/0.0.26...0.0.27) --- updated-dependencies: - dependency-name: python-multipart dependency-version: 0.0.27 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
14c0a32ee6
|
chore(deps): bump mako from 1.3.11 to 1.3.12 in /backend (#2798)
Bumps [mako](https://github.com/sqlalchemy/mako) from 1.3.11 to 1.3.12. - [Release notes](https://github.com/sqlalchemy/mako/releases) - [Changelog](https://github.com/sqlalchemy/mako/blob/main/CHANGES) - [Commits](https://github.com/sqlalchemy/mako/commits) --- updated-dependencies: - dependency-name: mako dependency-version: 1.3.12 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
2b1fcb3e43
|
fix(task): remove max_turns parameter from task tool interface (#2783)
* fix(task): remove max_turns parameter from task tool interface Subagents should always use their configured max_turns value. Exposing this parameter allowed callers to override the admin-configured limit, which is undesirable. The value is now exclusively driven by subagent config (per-agent overrides and global defaults in config.yaml). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
7de9b5828b
|
fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning (#2774)
* fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning
Add deerflow/tools/types.py with:
Runtime = ToolRuntime[dict[str, Any], ThreadState]
Replace every runtime: ToolRuntime[ContextT, ThreadState] and
runtime: ToolRuntime[dict[str, Any], ThreadState] annotation in
sandbox/tools.py, present_file_tool.py, task_tool.py, view_image_tool.py,
and skill_manage_tool.py with the new Runtime alias.
The unbound ContextT TypeVar (default None) caused
PydanticSerializationUnexpectedValue warnings on every tool call because
LangChain's BaseTool._parse_input calls model_dump() on the auto-generated
args_schema while DeerFlow passes a dict as runtime context.
Binding the context to dict[str, Any] aligns Pydantic's serialization
expectations with reality and removes the noise from all run modes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(tools): extend Runtime alias to setup_agent and update_agent tools
Replace bare ToolRuntime annotations in setup_agent_tool.py and
update_agent_tool.py with the shared Runtime alias introduced in the
previous commit, and add both tools to the Pydantic serialization
warning regression test (13 cases total).
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(tools): loosen Pydantic warning filter to avoid version-specific format
Replace the brittle "field_name='context'" substring check with a looser
"context" match so the assertion stays valid if Pydantic changes its
internal warning format across versions.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(tools): simplify warning filter and clean up docstring
Remove the "context" substring condition from the Pydantic warning
filter — asserting that no PydanticSerializationUnexpectedValue fires
at all is both simpler and more comprehensive, since the test payload
contains only the tool's own args plus runtime.
Also update the module docstring to remove the version-specific warning
format example that was inconsistent with the looser filter.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
37db689349
|
fix(events): serialize structured db event content (#2762) | ||
|
|
bd45cb2846
|
fix(sandbox): disable msys path conversion (#2766) | ||
|
|
5fd0e6ac89
|
fix(middleware): sync raw tool call metadata (#2757) | ||
|
|
daa3ffc29b
|
feat(loop-detection): make loop detection configurable with per-tool frequency overrides (#2711)
* Make loop detection configurable Expose LoopDetectionMiddleware thresholds through config.yaml while preserving existing defaults and allowing the middleware to be disabled. Refs bytedance/deer-flow#2517 * feat(loop-detection): add per-tool tool_freq_overrides to Phase 1 Adds ToolFreqOverride model and tool_freq_overrides field to LoopDetectionConfig, wires it through LoopDetectionMiddleware, and documents the option in config.example.yaml. Resolves the gap flagged in the #2586 review: without per-tool overrides, users hit by #2510/#2511 (RNA-seq workflows exceeding the bash hard limit) had no way to raise thresholds for one tool without loosening the global limit for every tool. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * docs(loop-detection): document tool_freq_overrides in LoopDetectionMiddleware docstring Add the missing Args entry for tool_freq_overrides, explaining the (warn, hard_limit) tuple structure and how per-tool thresholds supersede the global tool_freq_warn / tool_freq_hard_limit for named tools. Also run ruff format on the three files flagged by the lint check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(loop-detection): validate LoopDetectionMiddleware __init__ params eagerly Raise clear ValueError at construction time instead of crashing at unpack-time inside _track_and_check when bad values are passed: - tool_freq_overrides: must be 2-tuples of positive ints with hard_limit >= warn - scalar thresholds: warn_threshold, hard_limit, tool_freq_warn, tool_freq_hard_limit must be >= 1 and hard limits must >= their warn pairs - window_size, max_tracked_threads must be >= 1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): isolate credential loader directory-path test from real ~/.claude The test didn't monkeypatch HOME, so on any machine with real Claude Code credentials at ~/.claude/.credentials.json the function fell through to those credentials and the assertion failed. Adding HOME redirect ensures the default credential path doesn't exist during the test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style(test): add blank lines after import pytest in TestInitValidation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(loop-detection): collapse dual validation to LoopDetectionConfig Modifications - LoopDetectionMiddleware.__init__: stripped of all ValueError raises; becomes a plain field-assignment constructor. - LoopDetectionMiddleware.from_config: classmethod that builds the middleware from a Pydantic-validated LoopDetectionConfig and handles the ToolFreqOverride -> tuple[int, int] conversion. - agents/factory.py: SDK construction routed through LoopDetectionMiddleware.from_config(LoopDetectionConfig()) so the defaults path is Pydantic-validated too. - agents/lead_agent/agent.py: uses from_config instead of unpacking config fields by hand. - tests/test_loop_detection_middleware.py: deleted TestInitValidation (16 methods exercising the removed __init__ checks); added TestFromConfig (4 tests: scalar field mapping, override tuple conversion, empty overrides, behavioral smoke test). Result: one validation layer (Pydantic), zero duplication, no __new__ hacks. Both production construction sites flow through LoopDetectionConfig. Test results make test -> 2977 passed, 18 skipped, 0 failed (137s) make format -> All checks passed; 411 files left unchanged * feat(agents): make loop_detection configurable in create_deerflow_agent Adds a `loop_detection: bool | AgentMiddleware = True` field to RuntimeFeatures, mirroring the existing pattern used by `sandbox`, `memory`, and `vision`. SDK users can now disable LoopDetectionMiddleware or replace it with a custom instance built from their own LoopDetectionConfig — e.g. `LoopDetectionMiddleware.from_config(my_cfg)` — instead of being stuck with the hardcoded defaults previously installed by the SDK factory. The lead-agent path (which already reads AppConfig.loop_detection) is unchanged, and the default `True` preserves prior always-on behavior for all existing callers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: knight0940 <631532668@qq.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Amorend <142649913+knight0940@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
cef4224381
|
fix(skills): enforce allowed-tools metadata (#2626)
* fix(skills): parse allowed-tools frontmatter * fix(skills): validate allowed-tools metadata * fix(skills): add shared allowed-tools policy * fix(subagents): enforce skill allowed-tools * fix(agent): enforce skill allowed-tools * refactor(skills): dedupe TypeVar and reuse cached enabled skills - Drop redundant module-level TypeVar in tool_policy; rely on PEP 695 syntax. - Expose get_cached_enabled_skills() and have the lead agent reuse it instead of synchronously rescanning skills on every request. * fix(agent): expose config-scoped skill cache * fix(subagents): pass filtered tools explicitly * fix(skills): clean allowed-tools policy feedback |
||
|
|
2b0e62f679
|
[security] fix(auth): reject cross-site auth POSTs (#2740)
* fix(security): reject cross-site auth posts * fix(auth): align secure cookie proxy scheme handling --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
1336872b15
|
fix(channels): authenticate gateway command requests (#2742) | ||
|
|
4ead2c6b19
|
fix(config): reset config-backed singletons on hot reload (#2588)
* Fix stale config singletons on reload * fix(config): update checkpointer imports after runtime move * Fix config reload singleton mutation on validation failure --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
59c4a3f0a4
|
feat(agent): add custom-agent self-updates with user isolation (#2713)
* feat(agent): add update_agent tool for in-chat custom-agent self-updates (#2616) Custom agents had no built-in way to persist updates to their own SOUL.md / config.yaml from a normal chat — `setup_agent` was only bound during the bootstrap flow, so when the user asked the agent to refine its description or personality, the agent would shell out via bash/write_file and the edits landed in a temporary sandbox/tool workspace instead of `{base_dir}/agents/{agent_name}/`. Changes: - New `update_agent` builtin tool with partial-update semantics (only the fields you pass are written) and atomic temp-file + os.replace writes so a failed update never corrupts existing SOUL.md / config.yaml. - Lead agent now binds `update_agent` in the non-bootstrap path whenever `agent_name` is set in the runtime context. Default agent (no agent_name) and bootstrap flow are unchanged. - New `<self_update>` system-prompt section is injected for custom agents, instructing them to use `update_agent` — and explicitly NOT bash / write_file — to persist self-updates. - Tests: 11 new cases in `tests/test_update_agent_tool.py` covering validation (missing/invalid agent_name, unknown agent, no fields), partial updates (soul-only, description-only, skills=[] vs omitted), no-op detection, atomic-write safety, and AgentConfig round-tripping; plus 2 new cases in `tests/test_lead_agent_prompt.py` covering the self-update prompt section. - Docs: updated backend/CLAUDE.md builtin tools list and tools.mdx (en/zh) with the new tool description. * feat(agent): isolate custom agents per user Store custom agent definitions under the effective user, keep legacy agents readable until migration, and cover API/tool/migration behavior with tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: consistent write/delete targets & add --user-id to migration --------- Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
e8675f266d
|
fix(loop-detection): keep tool-call pairing on warn injection (#2724) (#2725)
* fix(loop-detection): keep tool-call pairing on warn injection (#2724) * make format * fix(loop-detection): avoid IMMessage leak to downstream consumer * fix(channels): filter loop warning text from IM replies |
||
|
|
680187ddc2
|
fix: Supplement list_running in RemoteSandboxBackend (#2716)
* fix: Supplement list_running in RemoteSandboxBackend * fix * except requests.RequestException as exc: * fix |
||
|
|
028493bfd8
|
fix(docker):force ngix to resolve upstream names at request time (#2717)
* fix(docker):force ngix to resolve upstream names at request time * fix(docker): set resolver valid=0s to eliminate DNS cache window for request-time re-resolution Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/07bdb872-022f-4fd2-9fa8-d800a4ce34a7 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * Update DNS resolver valid time and add upstreams * fix the unit test error * Remove upstream server configurations from nginx.conf Removed upstream server configurations for gateway and frontend. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> |
||
|
|
8e48b7e85c
|
fix(channels): preserve clarification conversation history across follow-up turns (#2444)
* fix(channels): preserve clarification conversation history across follow-up turns Pin channel-triggered runs to the root checkpoint namespace and ensure thread_id is always present in configurable run config so follow-up replies resume the same conversation state. Add regression coverage to channel tests: assert checkpoint_ns/thread_id are passed in wait and stream paths add an integration-style clarification flow test that verifies the second user reply continues prior context instead of starting a new session This addresses history loss after ask_clarification interruptions (issue #2425). * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(channels): copy configurable dict before injecting run-scoped fields When configurable was already a plain dict, _resolve_run_params mutated it in place, leaking checkpoint_ns and thread_id back into the shared session config. Always copy via dict() before mutating to prevent cross-user or cross-channel config pollution. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
d02f762ab0
|
feat: refine token usage display modes (#2329)
* feat: refine token usage display modes * docs: clarify token usage accounting semantics * fix: avoid duplicate subtask debug keys * style: format token usage tests * chore: address token attribution review feedback * Update test_token_usage_middleware.py * Update test_token_usage_middleware.py * chore: simplify token attribution fallback * fix token usage metadata follow-up handling --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
82e7936d36
|
fix(docker): set UTF-8 locale to prevent ASCII encoding errors in minimal containers (#2707)
* fix(docker): set UTF-8 locale to prevent ASCII encoding errors in minimal containers * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
f80ac961ec
|
fix(harness): restore legacy skills path fallback (#2694) (#2696)
* fix(harness): restore legacy skills path fallback (#2694) * fix(format): make format * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
44ab21fc44
|
feat(community): add Serper web search provider (#2630)
* feat(community): add Serper web search provider Add a new community search provider backed by the Serper Google Search API (https://serper.dev). Serper returns real-time Google results via a simple JSON API and requires only an API key — no extra Python package. Changes: - backend/packages/harness/deerflow/community/serper/__init__.py - backend/packages/harness/deerflow/community/serper/tools.py Implements web_search_tool using httpx (already a project dependency). API key is read from config.yaml `api_key` field or SERPER_API_KEY env var. Follows the same interface / output shape as the existing ddg_search provider. Exposes max_results parameter (default 5) with config override logic. - backend/tests/test_serper_tools.py Unit tests covering API key resolution, config overrides, HTTP errors, empty results, and parameter passing. - config.example.yaml: add commented-out Serper example alongside other providers - .env.example: add SERPER_API_KEY placeholder Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix the lint error * Fix the lint error --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
e543bbf5d6
|
[security] fix(upload): reject symlinked upload destinations (#2623)
* fix: reject symlinked upload destinations * test: harden upload destination checks * fix: address PR feedback for #2623 * test: cover safe upload re-uploads * fix: preserve upload limit checks after rebase * fix(upload): stream safe HTTP upload writes |
||
|
|
ca3332f8bf
|
fix(gateway): return ISO 8601 timestamps from threads endpoints (#2599)
* fix(gateway): return ISO 8601 timestamps from threads endpoints (#2594) ThreadResponse documents created_at / updated_at as ISO timestamps, matching the LangGraph Platform schema (langgraph_sdk.schema.Thread exposes them as datetime, JSON-encoded as ISO 8601). The gateway threads router was instead emitting str(time.time()) — unix-second floats — breaking frontend new Date() parsing and producing a mixed ISO/unix wire format that also corrupted the search sort order. Centralize timestamp generation in deerflow.utils.time: - now_iso() — datetime.now(UTC).isoformat() - coerce_iso(x) — heals legacy unix-timestamp strings on read so the store converges to ISO without a one-shot migration threads.py: replace 6 time.time() call sites with now_iso(); wrap all read paths and Phase-2 checkpoint metadata with coerce_iso(); _store_upsert opportunistically heals legacy created_at on update; drop unused time import. thread_runs.py: reuse now_iso() instead of a private duplicate _now_iso(), preventing future drift between the two timestamp call sites. Tests: 9 unit tests for the helper; 5 integration tests pinning the ISO contract for create/get/patch/search and the legacy-healing path on the internal store upsert. Full suite: 2144 passed, 15 skipped, 0 failed. Closes #2594 * fix(gateway): coerce checkpoint metadata timestamps to ISO on read After the merge with main, three additional read paths in ``threads.py`` were still emitting raw ``str(metadata.get("created_at", ""))`` — ``get_thread_state``, ``update_thread_state``, and ``get_thread_history``. Same root cause as #2594: when the checkpoint metadata's ``created_at`` is a unix-second float (legacy data, or a checkpoint written by an older Gateway version), ``str(float)`` produces ``"1777252410.411327"`` and the frontend's ``new Date(...)`` returns ``Invalid Date``. The fix on the ``/threads/{id}`` GET path was already in place; these three sibling endpoints needed the same treatment. All four call sites now flow through ``coerce_iso``, so: - legacy float metadata heals to ISO on the way out, - ISO metadata passes through unchanged, - ``datetime`` instances (which the new ``coerce_iso`` branch handles explicitly) emit with the ``T`` separator instead of falling through to the space-separated ``str(datetime)`` form. Coverage added for the two endpoints not already pinned by the merge: - ``test_get_thread_state_returns_iso_for_legacy_checkpoint_metadata`` - ``test_get_thread_history_returns_iso_for_legacy_checkpoint_metadata`` Both pre-seed a checkpoint whose metadata carries the literal float from the issue body and assert the wire format is ISO. |
||
|
|
bb8b234d85
|
chroe(2585): keep polishing the code of codex token usage (#2689) | ||
|
|
17447fccbe
|
fix(runtime): make rollback restore checkpoint supersede newer checkpoints (#2582)
* Restore rollback checkpoints with fresh ids * Tighten rollback checkpoint tests and imports * Update test_run_worker_rollback.py --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
866d1ca409
|
Populate Codex usage metadata for token accounting (#2585) | ||
|
|
8ba01dfd83
|
refactor: thread app_config through lead and subagent task path (#2666)
* refactor: thread app config through lead prompt * fix: honor explicit app config across runtime paths * style: format subagent executor tests * fix: thread resolved app config and guard subagents-only fallback Address two PR review findings: 1. _create_summarization_middleware passed the original (possibly None) app_config into create_chat_model, forcing the model factory back to ambient get_app_config() and risking config drift between the middleware's resolved view and the model's view. Pass the resolved AppConfig instance through end-to-end. 2. get_available_subagent_names accepted Any-typed config and forwarded it to is_host_bash_allowed, which reads ``.sandbox``. A SubagentsAppConfig (also accepted upstream as a sum-type input) has no ``.sandbox`` attribute and would be silently treated as "no sandbox configured", incorrectly disabling the bash subagent. Guard on hasattr and fall back to ambient lookup otherwise. Adds regression tests for both paths. * chore: simplify hasattr guard and tighten regression tests - Collapse if/else into ternary in get_available_subagent_names; hasattr(None, ...) is False so the explicit None check was redundant. - Drop comments that narrate the change rather than explain non-obvious WHY (test names already convey intent). - Replace stringly-typed sentinel "no-arg" in regression test with direct args tuple comparison. --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> |
||
|
|
189b82405c
|
fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination (#2685)
* fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination The agent_sandbox library's shell API defaults no_change_timeout to 120 seconds. When AioSandbox.execute_command() called exec_command() without this parameter, commands producing no output for 120s would return with NO_CHANGE_TIMEOUT status even though the script was still running. Pass no_change_timeout=600 to all exec_command calls (matching the client-level HTTP timeout) so long-running commands are not cut short. Fixes #2668 * test(sandbox): add assertions for no_change_timeout in execute_command and list_dir Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2f37bc72-0826-4443-a6ba-e5b78c22fb5a Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> |
||
|
|
487c1d939f
|
fix(subagents): use model override for tools and middleware (#2641)
* fix(subagents): use model override for tools and middleware * fix(config): resolve effective subagent model * fix(subagents): defer app config loading * fix(subagents): fully defer config.yaml load in executor __init__ The previous attempt only relocated the explicit get_app_config() call, but left resolve_subagent_model_name(...) running eagerly in __init__. That helper has its own internal get_app_config() fallback, which still fired when both app_config and parent_model were None and config.model == "inherit" — exactly the path unit tests hit, breaking 21 tests in CI with FileNotFoundError: config.yaml. Skip the eager resolve in __init__ when it would require loading the config file, and defer to _create_agent (which already has the app_config or get_app_config() fallback). |
||
|
|
c09c334544
|
fix(harness): resolve runtime paths from project root (#2642)
* fix(harness): resolve runtime paths from project root * docs(config): update * fix(config): address runtime path review feedback * test(config): fix skills path e2e root * test(config): cover legacy config fallback when project root lacks config files Verifies that when DEER_FLOW_PROJECT_ROOT is unset and cwd has no config.yaml/extensions_config.json, AppConfig and ExtensionsConfig fall back to the legacy backend/repo-root candidates — the backward-compat path requested in PR #2642 review. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
8939ccaed2
|
fix(uploads): enforce streaming upload limits in gateway (#2589)
* fix: enforce gateway upload limits * fix: acquire sandbox before upload writes * Fix upload limit config wiring * Sanitize upload size error filenames * test: call upload routes unwrapped * fix: guard upload limits endpoint --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
83938cf35a
|
fix(subagents): propagate user context across threaded execution (#2676) | ||
|
|
78633c69ac
|
fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2679)
* fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2677) When creating a custom agent via the web UI, SOUL.md was always written to the global base_dir/SOUL.md instead of agents/<name>/SOUL.md. Root cause: the bootstrap flow sends agent_name via body.context, but two layers were broken: 1. services.py only forwarded body.context keys into config["configurable"]; config["context"] was never populated. 2. worker.py constructed the parent Runtime with a hard-coded {thread_id, run_id} context, ignoring config["context"] entirely. After the langgraph >= 1.1.9 bump (#98a5b34f), ToolRuntime.context no longer falls back to configurable, so setup_agent's runtime.context.get("agent_name") returned None and the tool's silent agent_name=None -> base_dir fallback kicked in, overwriting the global SOUL.md. Fix: - services.py: extract merge_run_context_overrides() and write the whitelisted context keys into both configurable (legacy readers) and context (langgraph 1.1+ ToolRuntime consumers). - worker.py: extract _build_runtime_context() and merge config["context"] into the Runtime's context (without letting callers override thread_id/run_id). The base_dir fallback in setup_agent_tool.py is left in place because the IM /bootstrap channel command depends on it. That code path can be tightened in a follow-up. Adds regression tests covering both helpers. * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
8b61c94e1d
|
fix: keep lead agent graph factory signature compatible (#2678)
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> |
||
|
|
1ad1420e31
|
refactor(skills): Unified skill storage capability (#2613) | ||
|
|
eba3b9e18d
|
fix(config): unify log_level from config.yaml across Gateway and debug entry points (#2601)
Centralize log level parsing in `logging_level_from_config()` and application in `apply_logging_level()` within `deerflow.config.app_config`. - Gateway lifespan applies configured log level on startup - `debug.py` uses shared helpers instead of local duplicates - `apply_logging_level()` targets only `deerflow`/`app` logger hierarchies so third-party library verbosity is not affected; root handler levels are only lowered (never raised) to allow configured loggers through without suppressing third-party output; root logger level is not modified - Config field description updated to clarify scope - Tests save/restore global logging state to avoid test pollution Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
c0da278269
|
fix(memory): replace short-lived asyncio.run() with persistent event loop (#2627)
* fix(memory): replace short-lived asyncio.run() with persistent event loop to prevent zombie httpx connections The memory updater used asyncio.run() inside daemon threads, creating and destroying short-lived event loops on every update. Langchain providers (e.g. langchain-anthropic) cache httpx AsyncClient instances globally via @lru_cache, so SSL connections created on a loop that is subsequently destroyed become zombie connections in the shared pool. When the main agent's lead run later reuses one of these connections, httpx/anyio triggers RuntimeError: Event loop is closed during connection cleanup. Replace the ThreadPoolExecutor + asyncio.run() pattern with a _MemoryLoopRunner that maintains a single persistent event loop in a daemon thread for the process lifetime. Since the loop never closes, connections bound to it never become invalid. The _run_async_update_sync function now submits coroutines to this persistent loop via run_coroutine_threadsafe instead of creating throwaway loops. * update the code to address the review comments * Fix the review comments of 2615 P1 — user_id forwarded through sync path: Added user_id parameter to _prepare_update_prompt, _finalize_update, and _do_update_memory_sync, and forwarded it to get_memory_data(agent_name, user_id=user_id) and save(..., user_id=user_id). The update_memory() entry point now passes user_id through both the executor.submit path and the direct call path. Added TestUserIdForwarding with two regression tests (sync + async) verifying get_memory_data and save receive the correct user_id. P2 — aupdate_memory() delegates to sync: Replaced the model.ainvoke() call with asyncio.to_thread(self._do_update_memory_sync, ...). This eliminates the unsafe async provider client path entirely — all memory updater entry points now use the isolated sync model.invoke() path. Updated the test from asserting ainvoke is awaited to asserting invoke is called and ainvoke is not. Nit — duplicate comment removed: Removed the duplicated # Matches sentences... comment on line 230. * Chore(test): update the code of test_memory_updater --------- Co-authored-by: rayhpeng <rayhpeng@gmail.com> |
||
|
|
7dea1666ce
|
fix: avoid temporary event loops in async subagent execution (#2414)
* fix: avoid temporary event loops in async subagent execution * Rename isolated subagent loop globals * Harden isolated subagent loop shutdown and logging * Sort subagent executor imports * Format subagent executor * Remove isolated loop pool from subagent executor * Format subagent executor cleanup --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
38714b6ceb
|
refactor: thread app_config through middleware factories (#2652)
* refactor: thread app_config through middleware factories Continues the incremental config-refactor sequence (#2611 root, #2612 lead path) one layer deeper into the middleware factories. Two ambient lookups inside _build_runtime_middlewares are eliminated and the LLMErrorHandling band-aid removed: - _build_runtime_middlewares / build_lead_runtime_middlewares / build_subagent_runtime_middlewares now require app_config: AppConfig. - get_guardrails_config() inside the factory is replaced with app_config.guardrails (semantically identical — same default-factory GuardrailsConfig — verified by direct equality check). - LLMErrorHandlingMiddleware.__init__ now requires app_config and reads circuit_breaker fields directly. The class-level circuit_failure_threshold / circuit_recovery_timeout_sec defaults are removed along with the try/except (FileNotFoundError, RuntimeError): pass band-aid — the let-it-crash invariant the rest of the refactor enforces. Caller chain (already-resolved app_config sources): - _build_middlewares in lead_agent/agent.py: reorder so resolved_app_config = app_config or get_app_config() is computed BEFORE build_lead_runtime_middlewares is called, then passed as kwarg. - SubagentExecutor: optional app_config parameter (mirrors the lead-agent pattern); _create_agent does the same `or get_app_config()` fallback at agent-build time, so task_tool callers don't need to plumb app_config through yet (typed-context plumbing for tool runtimes is a separate refactor). Tests: - test_llm_error_handling_middleware: _make_app_config helper using AppConfig(sandbox=SandboxConfig(use="test")) — same minimal-config pattern conftest already uses. Three direct LLMErrorHandlingMiddleware() calls each followed by post-construction circuit_breaker mutation fold cleanly into _build_middleware(circuit_failure_threshold=..., circuit_recovery_timeout_sec=...). Verification: - tests/test_llm_error_handling_middleware.py — 14 passed - tests/test_subagent_executor.py — 28 passed - tests/test_tool_error_handling_middleware.py — 6 passed - tests/test_task_tool_core_logic.py — 18 passed (verifies task_tool unchanged behavior) - Full suite: 2697 passed, 3 skipped. The single intermittent failure in tests/test_client_e2e.py::test_tool_call_produces_events is pre-existing LLM flakiness (the test asserts the model decided to call a tool; reproduces 1/3 on unchanged main as well). * fix: address middleware app config review comments * fix: satisfy app config annotation lint * test: cover explicit app config middleware wiring --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> |
||
|
|
74081a85a6
|
[security] fix(sandbox): bind local Docker ports to loopback (#2633)
* fix(sandbox): bind local Docker ports to loopback * fix(sandbox): preserve IPv6 loopback Docker binds * fix(sandbox): log Docker bind host selection |
||
|
|
08afdcb907
|
feat(channels): add DingTalk channel integration (#2628)
* feat(channels): add DingTalk channel integration Add a new DingTalk messaging channel using the dingtalk-stream SDK with Stream Push (WebSocket), requiring no public IP. Supports both plain sampleMarkdown replies and optional AI Card streaming for a typewriter effect when card_template_id is configured. - Add DingTalkChannel implementation with token management, message routing, allowed_users filtering, and markdown adaptation - Register dingtalk in channel service registry and capability map - Propagate inbound metadata to outbound messages in ChannelManager for DingTalk sender context (sender_staff_id, conversation_type) - Add dingtalk-stream dependency to pyproject.toml - Add configuration examples in config.example.yaml and .env.example - Update all README translations with setup instructions - Add comprehensive test suite (test_dingtalk_channel.py) and metadata propagation test in test_channels.py - Update backend CLAUDE.md to document DingTalk channel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(channels): address PR review feedback for DingTalk integration - Replace runtime mutation of CHANNEL_CAPABILITIES with a `supports_streaming` property on the Channel base class, overridden by DingTalkChannel, FeishuChannel, and WeComChannel - Store stream client reference and attempt graceful disconnect in stop(); guard _on_chatbot_message with _running check to prevent post-stop message processing - Use msg.chat_id as the primary routing key in send/send_file via a shared _resolve_routing helper, with metadata as fallback - Fix process() return type annotation from tuple[str, str] to tuple[int, str] to match AckMessage.STATUS_OK - Protect _incoming_messages with threading.Lock for cross-thread safety between the Stream Push thread and the asyncio loop - Re-add Docker Compose URL guidance removed during DingTalk setup docs addition in README.md - Fix incomplete sentence in README_zh.md (missing verb "启用") Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(docs): restore plain paragraph format for Docker Compose note Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(channels): fix isinstance TypeError and add file size guard in DingTalk channel Use tuple syntax for isinstance() type check to avoid runtime TypeError with PEP 604 union types. Add upload size limit (20MB) before reading files into memory. Narrow exception handlers to specific types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(channels): propagate markdown fallback errors and validate access token response - Re-raise exceptions in _send_markdown_fallback to prevent partial deliveries (files sent without accompanying text) - Validate _get_access_token response: reject non-dict bodies, empty tokens, and coerce invalid expireIn to a safe default - Add tests for both fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(channels): validate upload response and broaden send_file exception handling - Validate _upload_media JSON response: handle JSONDecodeError and non-dict payloads gracefully by returning None - Broaden send_file exception tuple to include TypeError and AttributeError for unexpected JSON shapes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(channels): fix streaming race on channel registration and slim outbound metadata - Register channel in service before calling start() to avoid race where background receiver publishes inbound before registration, causing manager to fall back to static CHANNEL_CAPABILITIES - Strip known-large metadata keys (raw_message, ref_msg) from outbound messages to prevent memory bloat from propagated inbound payloads Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update service.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update CLAUDE.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
0691c4dda3
|
fix(security): allow disabling API docs in production via GATEWAY_ENABLE_DOCS (#2651)
* fix(security): allow disabling API docs in production via GATEWAY_ENABLE_DOCS Expose /docs, /redoc, and /openapi.json only when GATEWAY_ENABLE_DOCS=true (default). Setting GATEWAY_ENABLE_DOCS=false disables all three endpoints, preventing unauthorized API surface discovery in production deployments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(security): add unit tests and docs for GATEWAY_ENABLE_DOCS Add 7 tests covering default behavior, env var parsing (case-insensitive, fail-closed), endpoint visibility, and health endpoint independence. Update CONFIGURATION.md and CLAUDE.md with the new toggle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style(security): apply ruff formatting to gateway app.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
11afd32459 | Fix the log Injection error of skills.py | ||
|
|
64f4dc1639 | fixed the CI build errors | ||
|
|
844ad8e528
|
Merge branch 'main' into release/2.0-rc | ||
|
|
395c14357b
|
chore(adpator):Adapt MindIE engine model and improve testing and fixes (#2523)
* feat(models): 适配 MindIE引擎的模型 * test: add unit tests for MindIEChatModel adapter and fix PR review comments * chore: update uv.lock with pytest-asyncio * build: add pytest-asyncio to test dependencies * fix: address PR review comments (lazy import, cache clients, safe newline escape, strict xml regex) * fix(mindie): preserve string args without JSON quotes in XML tool call serialization * fix(mindie): preserve string args without JSON quotes in XML tool call serialization * test_mindie_provider:format * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(mindie): prevent nested tool_call params from leaking into outer args * fixed by escaping XML entities in _fix_messages and unescaping during parse, with regression tests added. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
e82940c03d
|
refactor: thread release config through lead path (#2612)
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> |
||
|
|
6bd88fe14c
|
fix(sandbox): block host bash traversal escapes (#2560)
* fix(sandbox): block host bash traversal escapes Fixes #2535 * fix(sandbox): harden local bash path guards * fix(sandbox): avoid bash cd argument false positives * Fix the lint error Add function to resolve and validate user data path. * Fix the lint error --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
39c5da94f3
|
fix(sandbox): prevent local custom mount symlink escapes (#2558)
* fix(sandbox): prevent local custom mount symlink escapes Fixes #2506 * fix(sandbox): harden custom mount symlink handling * fix(sandbox): format internal symlink directory listings |
||
|
|
707ed328dd
|
fix(skills): scan skill archives before install (#2561)
* fix(skills): scan skill archives before install Fixes #2536 * fix(skills): scan archive support files before install * style(skills): format archive installer * fix(skills): address archive install review comments |
||
|
|
f7dfb88a30
|
fix(aio-sandbox): redact env values in container logs (#2562)
* fix(aio-sandbox): redact env values in container logs Fixes #2534 * fix(aio-sandbox): address env log review comments |
||
|
|
69649d8aae
|
Fix the issues when reviewing 2566 persistant part (#2604)
* Fix the code review command of journal & event store P0,P1 issues * Fix the code review command of journal & event store P2 issues * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/packages/harness/deerflow/runtime/journal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Refactor logger debug message formatting --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
4e4e4f92a0
|
fix(security): harden auth system and fix run journal logic bug (#2593)
* fix(security): harden auth system and fix run journal logic bug
- Fix inverted condition in RunJournal.on_chat_model_start that prevented
first human message capture (not messages → messages)
- Pre-hash passwords with SHA-256 before bcrypt to avoid silent 72-byte
truncation vulnerability
- Move load_dotenv() from module scope into get_auth_config() to prevent
import-time os.environ mutation breaking test isolation
- Return generic ‘Invalid token’ instead of exposing specific error
variants (expired, malformed, invalid_signature) to clients
- Make @require_auth independently enforce 401 instead of silently
passing through when AuthMiddleware is absent
- Rate-limit /setup-status endpoint with per-IP cooldown to mitigate
initialization-state information leak
- Document in-process rate limiter limitation for multi-worker deployments
* fix(security): return 429+Retry-After on setup-status rate limit, bound cooldown dict
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/070d0be8-99a5-46c8-85bb-6b81b5284021
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
* fix(security): add versioned password hashes with auto-migration on login
The SHA-256 pre-hash change silently broke verification for any existing
bcrypt-only password hashes. Introduce a <N>$ prefix scheme so hashes
are self-describing:
- v2 (current): bcrypt(b64(sha256(password))) with $ prefix
- v1 (legacy): plain bcrypt, prefixed $ or bare (no prefix)
verify_password auto-detects the version and falls back to v1 for older
hashes. LocalAuthProvider.authenticate() now rehashes legacy hashes to v2
on successful login via needs_rehash(), so existing users upgrade
transparently without a dedicated migration step.
* fix(auth): harden verify_password, best-effort rehash, update require_auth docstring, downgrade journal logging
- password.py: wrap bcrypt.checkpw in try/except → return False for malformed/corrupt hashes instead of crashing
- local_provider.py: wrap auto-rehash update_user() in try/except so transient DB errors don't fail valid logins
- authz.py: update require_auth docstring to reflect independent 401 enforcement
- journal.py: downgrade on_chat_model_start from INFO to DEBUG, log only metadata (batch_count, message_counts) instead of full serialized/messages content
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/48c5cf31-a4ab-418a-982a-6343c37bb299
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
* fix(auth): address code review - narrow ValueError catch, add rehash warning log, rename num_batches
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/48c5cf31-a4ab-418a-982a-6343c37bb299
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
|
||
|
|
af8c0cfb78
|
fix(harness): constrain view_image to thread data paths (#2557)
* fix(harness): constrain view_image to thread data paths Fixes #2530 * fix(harness): address view_image review findings * style(harness): format view_image changes * fix(harness): address view_image review comments |
||
|
|
b8bc4826d8
|
refactor: root release config in gateway runtime (#2611)
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> |
||
|
|
ed9ebfac4d | fix: enforce 'request' parameter requirement in require_auth decorator | ||
|
|
da174dfd4d | feat: implement process-local internal authentication for Gateway and enhance CSRF handling | ||
|
|
897dae5475 | fix the lint error of backend | ||
|
|
eba6c0eab2 | fix unit tests of test_upload_files and test_shutdown | ||
|
|
60754f0c50 | fix the unit tests error of agent provider | ||
|
|
ac18b9c424 | Apply the code reviewer suggestion of abstractmethod | ||
|
|
35ef8b7c13 | feat: add default database configuration for AppConfig and update example config | ||
|
|
7bf618de67 |
Refactor DeerFlow to use Gateway's LangGraph-compatible API
- Updated documentation and comments to reflect the transition from LangGraph Server to Gateway. - Changed default URLs in ChannelManager and tests to point to Gateway. - Removed references to LangGraph Server in deployment scripts and configurations. - Updated Nginx configuration to route API traffic to Gateway. - Adjusted frontend configurations to utilize Gateway's API. - Removed LangGraph service from Docker Compose files, consolidating services under Gateway. - Added regression tests to ensure Gateway integration works as expected. Co-authored-by: Copilot <copilot@github.com> |
||
|
|
653b7ae17a | Apply the code reviewer suggestion of abstractmethod | ||
|
|
16aedf459a
|
Potential fix for pull request finding 'Unused import'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> |
||
|
|
c5d57b4533
|
fix: resolve make dev and test-e2e errors (#2570) | ||
|
|
e4ff444a71 | Fixed the warning message of uv | ||
|
|
829e82a9af | fix the lint error in backend | ||
|
|
3b71e2d377 |
feat: add request parameter to generate_suggestions endpoint for enhanced context
Co-authored-by: Copilot <copilot@github.com> |
||
|
|
98a5b34f76 | fix: resolve merge conflict in pnpm-lock.yaml and clean up better-auth dependencies | ||
|
|
db5ad86381 |
feat: enhance chat history loading with new hooks and UI components (#2338)
* Refactor API fetch calls to use a unified fetch function; enhance chat history loading with new hooks and UI components - Replaced `fetchWithAuth` with a generic `fetch` function across various API modules for consistency. - Updated `useThreadStream` and `useThreadHistory` hooks to manage chat history loading, including loading states and pagination. - Introduced `LoadMoreHistoryIndicator` component for better user experience when loading more chat history. - Enhanced message handling in `MessageList` to accommodate new loading states and history management. - Added support for run messages in the thread context, improving the overall message handling logic. - Updated translations for loading indicators in English and Chinese. * Fix test assertions for run ordering in RunManager tests - Updated assertions in `test_list_by_thread` to reflect correct ordering of runs. - Modified `test_list_by_thread_is_stable_when_timestamps_tie` to ensure stable ordering when timestamps are tied. |
||
|
|
2e05f380c4 |
feat(persistence): per-user filesystem isolation, run-scoped APIs, and state/history simplification (#2153)
* feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930) * feat(persistence): add SQLAlchemy 2.0 async ORM scaffold Introduce a unified database configuration (DatabaseConfig) that controls both the LangGraph checkpointer and the DeerFlow application persistence layer from a single `database:` config section. New modules: - deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends - deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton - deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation Gateway integration initializes/tears down the persistence engine in the existing langgraph_runtime() context manager. Legacy checkpointer config is preserved for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add RunEventStore ABC + MemoryRunEventStore Phase 2-A prerequisite for event storage: adds the unified run event stream interface (RunEventStore) with an in-memory implementation, RunEventsConfig, gateway integration, and comprehensive tests (27 cases). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints Phase 2-B: run persistence + event storage + token tracking. - ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow - RunRepository implements RunStore ABC via SQLAlchemy ORM - ThreadMetaRepository with owner access control - DbRunEventStore with trace content truncation and cursor pagination - JsonlRunEventStore with per-run files and seq recovery from disk - RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events, accumulates token usage by caller type, buffers and flushes to store - RunManager now accepts optional RunStore for persistent backing - Worker creates RunJournal, writes human_message, injects callbacks - Gateway deps use factory functions (RunRepository when DB available) - New endpoints: messages, run messages, run events, token-usage - ThreadCreateRequest gains assistant_id field - 92 tests pass (33 new), zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add user feedback + follow-up run association Phase 2-C: feedback and follow-up tracking. - FeedbackRow ORM model (rating +1/-1, optional message_id, comment) - FeedbackRepository with CRUD, list_by_run/thread, aggregate stats - Feedback API endpoints: create, list, stats, delete - follow_up_to_run_id in RunCreateRequest (explicit or auto-detected from latest successful run on the thread) - Worker writes follow_up_to_run_id into human_message event metadata - Gateway deps: feedback_repo factory + getter - 17 new tests (14 FeedbackRepository + 3 follow-up association) - 109 total tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config - config.example.yaml: deprecate standalone checkpointer section, activate unified database:sqlite as default (drives both checkpointer + app data) - New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage including check_access owner logic, list_by_owner pagination - Extended test_run_repository.py (+4 tests) — completion preserves fields, list ordering desc, limit, owner_none returns all - Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false, middleware no ai_message, unknown caller tokens, convenience fields, tool_error, non-summarization custom event - Extended test_run_event_store.py (+7 tests) — DB batch seq continuity, make_run_event_store factory (memory/db/jsonl/fallback/unknown) - Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists, follow-up metadata, summarization in history, full DB-backed lifecycle - Fixed DB integration test to use proper fake objects (not MagicMock) for JSON-serializable metadata - 157 total Phase 2 tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * config: move default sqlite_dir to .deer-flow/data Keep SQLite databases alongside other DeerFlow-managed data (threads, memory) under the .deer-flow/ directory instead of a top-level ./data folder. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now() - Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM models. Add json_serializer=json.dumps(ensure_ascii=False) to all create_async_engine calls so non-ASCII text (Chinese etc.) is stored as-is in both SQLite and Postgres. - Change ORM datetime defaults from datetime.now(UTC) to datetime.now(), remove UTC imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): simplify deps.py with getter factory + inline repos - Replace 6 identical getter functions with _require() factory. - Inline 3 _make_*_repo() factories into langgraph_runtime(), call get_session_factory() once instead of 3 times. - Add thread_meta upsert in start_run (services.py). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add UV_EXTRAS build arg for optional dependencies Support installing optional dependency groups (e.g. postgres) at Docker build time via UV_EXTRAS build arg: UV_EXTRAS=postgres docker compose build Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(journal): fix flush, token tracking, and consolidate tests RunJournal fixes: - _flush_sync: retain events in buffer when no event loop instead of dropping them; worker's finally block flushes via async flush(). - on_llm_end: add tool_calls filter and caller=="lead_agent" guard for ai_message events; mark message IDs for dedup with record_llm_usage. - worker.py: persist completion data (tokens, message count) to RunStore in finally block. Model factory: - Auto-inject stream_usage=True for BaseChatOpenAI subclasses with custom api_base, so usage_metadata is populated in streaming responses. Test consolidation: - Delete test_phase2b_integration.py (redundant with existing tests). - Move DB-backed lifecycle test into test_run_journal.py. - Add tests for stream_usage injection in test_model_factory.py. - Clean up executor/task_tool dead journal references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): widen content type to str|dict in all store backends Allow event content to be a dict (for structured OpenAI-format messages) in addition to plain strings. Dict values are JSON-serialized for the DB backend and deserialized on read; memory and JSONL backends handle dicts natively. Trace truncation now serializes dicts to JSON before measuring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(events): use metadata flag instead of heuristic for dict content detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(converters): add LangChain-to-OpenAI message format converters Pure functions langchain_to_openai_message, langchain_to_openai_completion, langchain_messages_to_openai, and _infer_finish_reason for converting LangChain BaseMessage objects to OpenAI Chat Completions format, used by RunJournal for event storage. 15 unit tests added. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(converters): handle empty list content as null, clean up test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): human_message content uses OpenAI user message format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): ai_message uses OpenAI format, add ai_tool_call message event - ai_message content now uses {"role": "assistant", "content": "..."} format - New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls - ai_tool_call uses langchain_to_openai_message converter for consistent format - Both events include finish_reason in metadata ("stop" or "tool_calls") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add tool_result message event with OpenAI tool message format Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end, then emit a tool_result message event (role=tool, tool_call_id, content) after each successful tool completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): summary content uses OpenAI system message format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format Add on_chat_model_start to capture structured prompt messages as llm_request events. Replace llm_end trace events with llm_response using OpenAI Chat Completions format. Track llm_call_index to pair request/response events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add record_middleware method for middleware trace events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(events): add full run sequence integration test for OpenAI content format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): align message events with checkpoint format and add middleware tag injection - Message events (ai_message, ai_tool_call, tool_result, human_message) now use BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages - on_tool_end extracts tool_call_id/name/status from ToolMessage objects - on_tool_error now emits tool_result message events with error status - record_middleware uses middleware:{tag} event_type and middleware category - Summarization custom events use middleware:summarize category - TitleMiddleware injects middleware:title tag via get_config() inheritance - SummarizationMiddleware model bound with middleware:summarize tag - Worker writes human_message using HumanMessage.model_dump() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): switch search endpoint to threads_meta table and sync title - POST /api/threads/search now queries threads_meta table directly, removing the two-phase Store + Checkpointer scan approach - Add ThreadMetaRepository.search() with metadata/status filters - Add ThreadMetaRepository.update_display_name() for title sync - Worker syncs checkpoint title to threads_meta.display_name on run completion - Map display_name to values.title in search response for API compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): history endpoint reads messages from event store - POST /api/threads/{thread_id}/history now combines two data sources: checkpointer for checkpoint_id, metadata, title, thread_data; event store for messages (complete history, not truncated by summarization) - Strip internal LangGraph metadata keys from response - Remove full channel_values serialization in favor of selective fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate optional-dependencies header in pyproject.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(middleware): pass tagged config to TitleMiddleware ainvoke call Without the config, the middleware:title tag was not injected, causing the LLM response to be recorded as a lead_agent ai_message in run_events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflict in .env.example Keep both DATABASE_URL (from persistence-scaffold) and WECOM credentials (from main) after the merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address review feedback on PR #1851 - Fix naive datetime.now() → datetime.now(UTC) in all ORM models - Fix seq race condition in DbRunEventStore.put() with FOR UPDATE and UNIQUE(thread_id, seq) constraint - Encapsulate _store access in RunManager.update_run_completion() - Deduplicate _store.put() logic in RunManager via _persist_to_store() - Add update_run_completion to RunStore ABC + MemoryRunStore - Wire follow_up_to_run_id through the full create path - Add error recovery to RunJournal._flush_sync() lost-event scenario - Add migration note for search_threads breaking change - Fix test_checkpointer_none_fix mock to set database=None Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update uv.lock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality Bug fixes: - Sanitize log params to prevent log injection (CodeQL) - Reset threads_meta.status to idle/error when run completes - Attach messages only to latest checkpoint in /history response - Write threads_meta on POST /threads so new threads appear in search Lint fixes: - Remove unused imports (journal.py, migrations/env.py, test_converters.py) - Convert lambda to named function (engine.py, Ruff E731) - Remove unused logger definitions in repos (Ruff F841) - Add logging to JSONL decode errors and empty except blocks - Separate assert side-effects in tests (CodeQL) - Remove unused local variables in tests (Ruff F841) - Fix max_trace_content truncation to use byte length, not char length Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: apply ruff format to persistence and runtime files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding 'Statement has no effect' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * refactor(runtime): introduce RunContext to reduce run_agent parameter bloat Extract checkpointer, store, event_store, run_events_config, thread_meta_repo, and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context() in deps.py to build the base context from app.state singletons. start_run() uses dataclasses.replace() to enrich per-run fields before passing ctx to run_agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): move sanitize_log_param to app/gateway/utils.py Extract the log-injection sanitizer from routers/threads.py into a shared utils module and rename to sanitize_log_param (public API). Eliminates the reverse service → router import in services.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use SQL aggregation for feedback stats and thread token usage Replace Python-side counting in FeedbackRepository.aggregate_by_run with a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread abstract method with SQL GROUP BY implementation in RunRepository and Python fallback in MemoryRunStore. Simplify the thread_token_usage endpoint to delegate to the new method, eliminating the limit=10000 truncation risk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: annotate DbRunEventStore.put() as low-frequency path Add docstring clarifying that put() opens a per-call transaction with FOR UPDATE and should only be used for infrequent writes (currently just the initial human_message event). High-throughput callers should use put_batch() instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(threads): fall back to Store search when ThreadMetaRepository is unavailable When database.backend=memory (default) or no SQL session factory is configured, search_threads now queries the LangGraph Store instead of returning 503. Returns empty list if neither Store nor repo is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata Add ThreadMetaStore abstract base class with create/get/search/update/delete interface. ThreadMetaRepository (SQL) now inherits from it. New MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments. deps.py now always provides a non-None thread_meta_repo, eliminating all `if thread_meta_repo is not None` guards in services.py, worker.py, and routers/threads.py. search_threads no longer needs a Store fallback branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(history): read messages from checkpointer instead of RunEventStore The /history endpoint now reads messages directly from the checkpointer's channel_values (the authoritative source) instead of querying RunEventStore.list_messages(). The RunEventStore API is preserved for other consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address new Copilot review comments - feedback.py: validate thread_id/run_id before deleting feedback - jsonl.py: add path traversal protection with ID validation - run_repo.py: parse `before` to datetime for PostgreSQL compat - thread_meta_repo.py: fix pagination when metadata filter is active - database_config.py: use resolve_path for sqlite_dir consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Implement skill self-evolution and skill_manage flow (#1874) * chore: ignore .worktrees directory * Add skill_manage self-evolution flow * Fix CI regressions for skill_manage * Address PR review feedback for skill evolution * fix(skill-evolution): preserve history on delete * fix(skill-evolution): tighten scanner fallbacks * docs: add skill_manage e2e evidence screenshot * fix(skill-manage): avoid blocking fs ops in session runtime --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> * fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir resolve_path() resolves relative to Paths.base_dir (.deer-flow), which double-nested the path to .deer-flow/.deer-flow/data/app.db. Use Path.resolve() (CWD-relative) instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Feature/feishu receive file (#1608) * feat(feishu): add channel file materialization hook for inbound messages - Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op. - Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text. - Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files. - No impact on Slack/Telegram or other channels (they inherit the default no-op). * style(backend): format code with ruff for lint compliance - Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format` - Ensured both files conform to project linting standards - Fixes CI lint check failures caused by code style issues * fix(feishu): handle file write operation asynchronously to prevent blocking * fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code * test(feishu): add tests for receive_file method and placeholder replacement * fix(manager): remove unnecessary type casting for channel retrieval * fix(feishu): update logging messages to reflect resource handling instead of image * fix(feishu): sanitize filename by replacing invalid characters in file uploads * fix(feishu): improve filename sanitization and reorder image key handling in message processing * fix(feishu): add thread lock to prevent filename conflicts during file downloads * fix(test): correct bad merge in test_feishu_parser.py * chore: run ruff and apply formatting cleanup fix(feishu): preserve rich-text attachment order and improve fallback filename handling * fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915) Two production docker-compose.yaml bugs prevent `make up` from working: 1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH environment overrides. Added in |
||
|
|
00a90bbd3d | refactor: Remove init_token handling from admin initialization logic and related tests | ||
|
|
56d5fa3337 |
feat(persistence):Unified persistence layer with event store, feedback, and rebase cleanup (#2134)
* feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930) * feat(persistence): add SQLAlchemy 2.0 async ORM scaffold Introduce a unified database configuration (DatabaseConfig) that controls both the LangGraph checkpointer and the DeerFlow application persistence layer from a single `database:` config section. New modules: - deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends - deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton - deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation Gateway integration initializes/tears down the persistence engine in the existing langgraph_runtime() context manager. Legacy checkpointer config is preserved for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add RunEventStore ABC + MemoryRunEventStore Phase 2-A prerequisite for event storage: adds the unified run event stream interface (RunEventStore) with an in-memory implementation, RunEventsConfig, gateway integration, and comprehensive tests (27 cases). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints Phase 2-B: run persistence + event storage + token tracking. - ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow - RunRepository implements RunStore ABC via SQLAlchemy ORM - ThreadMetaRepository with owner access control - DbRunEventStore with trace content truncation and cursor pagination - JsonlRunEventStore with per-run files and seq recovery from disk - RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events, accumulates token usage by caller type, buffers and flushes to store - RunManager now accepts optional RunStore for persistent backing - Worker creates RunJournal, writes human_message, injects callbacks - Gateway deps use factory functions (RunRepository when DB available) - New endpoints: messages, run messages, run events, token-usage - ThreadCreateRequest gains assistant_id field - 92 tests pass (33 new), zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add user feedback + follow-up run association Phase 2-C: feedback and follow-up tracking. - FeedbackRow ORM model (rating +1/-1, optional message_id, comment) - FeedbackRepository with CRUD, list_by_run/thread, aggregate stats - Feedback API endpoints: create, list, stats, delete - follow_up_to_run_id in RunCreateRequest (explicit or auto-detected from latest successful run on the thread) - Worker writes follow_up_to_run_id into human_message event metadata - Gateway deps: feedback_repo factory + getter - 17 new tests (14 FeedbackRepository + 3 follow-up association) - 109 total tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config - config.example.yaml: deprecate standalone checkpointer section, activate unified database:sqlite as default (drives both checkpointer + app data) - New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage including check_access owner logic, list_by_owner pagination - Extended test_run_repository.py (+4 tests) — completion preserves fields, list ordering desc, limit, owner_none returns all - Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false, middleware no ai_message, unknown caller tokens, convenience fields, tool_error, non-summarization custom event - Extended test_run_event_store.py (+7 tests) — DB batch seq continuity, make_run_event_store factory (memory/db/jsonl/fallback/unknown) - Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists, follow-up metadata, summarization in history, full DB-backed lifecycle - Fixed DB integration test to use proper fake objects (not MagicMock) for JSON-serializable metadata - 157 total Phase 2 tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * config: move default sqlite_dir to .deer-flow/data Keep SQLite databases alongside other DeerFlow-managed data (threads, memory) under the .deer-flow/ directory instead of a top-level ./data folder. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now() - Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM models. Add json_serializer=json.dumps(ensure_ascii=False) to all create_async_engine calls so non-ASCII text (Chinese etc.) is stored as-is in both SQLite and Postgres. - Change ORM datetime defaults from datetime.now(UTC) to datetime.now(), remove UTC imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): simplify deps.py with getter factory + inline repos - Replace 6 identical getter functions with _require() factory. - Inline 3 _make_*_repo() factories into langgraph_runtime(), call get_session_factory() once instead of 3 times. - Add thread_meta upsert in start_run (services.py). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add UV_EXTRAS build arg for optional dependencies Support installing optional dependency groups (e.g. postgres) at Docker build time via UV_EXTRAS build arg: UV_EXTRAS=postgres docker compose build Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(journal): fix flush, token tracking, and consolidate tests RunJournal fixes: - _flush_sync: retain events in buffer when no event loop instead of dropping them; worker's finally block flushes via async flush(). - on_llm_end: add tool_calls filter and caller=="lead_agent" guard for ai_message events; mark message IDs for dedup with record_llm_usage. - worker.py: persist completion data (tokens, message count) to RunStore in finally block. Model factory: - Auto-inject stream_usage=True for BaseChatOpenAI subclasses with custom api_base, so usage_metadata is populated in streaming responses. Test consolidation: - Delete test_phase2b_integration.py (redundant with existing tests). - Move DB-backed lifecycle test into test_run_journal.py. - Add tests for stream_usage injection in test_model_factory.py. - Clean up executor/task_tool dead journal references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): widen content type to str|dict in all store backends Allow event content to be a dict (for structured OpenAI-format messages) in addition to plain strings. Dict values are JSON-serialized for the DB backend and deserialized on read; memory and JSONL backends handle dicts natively. Trace truncation now serializes dicts to JSON before measuring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(events): use metadata flag instead of heuristic for dict content detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(converters): add LangChain-to-OpenAI message format converters Pure functions langchain_to_openai_message, langchain_to_openai_completion, langchain_messages_to_openai, and _infer_finish_reason for converting LangChain BaseMessage objects to OpenAI Chat Completions format, used by RunJournal for event storage. 15 unit tests added. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(converters): handle empty list content as null, clean up test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): human_message content uses OpenAI user message format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): ai_message uses OpenAI format, add ai_tool_call message event - ai_message content now uses {"role": "assistant", "content": "..."} format - New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls - ai_tool_call uses langchain_to_openai_message converter for consistent format - Both events include finish_reason in metadata ("stop" or "tool_calls") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add tool_result message event with OpenAI tool message format Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end, then emit a tool_result message event (role=tool, tool_call_id, content) after each successful tool completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): summary content uses OpenAI system message format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format Add on_chat_model_start to capture structured prompt messages as llm_request events. Replace llm_end trace events with llm_response using OpenAI Chat Completions format. Track llm_call_index to pair request/response events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add record_middleware method for middleware trace events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(events): add full run sequence integration test for OpenAI content format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): align message events with checkpoint format and add middleware tag injection - Message events (ai_message, ai_tool_call, tool_result, human_message) now use BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages - on_tool_end extracts tool_call_id/name/status from ToolMessage objects - on_tool_error now emits tool_result message events with error status - record_middleware uses middleware:{tag} event_type and middleware category - Summarization custom events use middleware:summarize category - TitleMiddleware injects middleware:title tag via get_config() inheritance - SummarizationMiddleware model bound with middleware:summarize tag - Worker writes human_message using HumanMessage.model_dump() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): switch search endpoint to threads_meta table and sync title - POST /api/threads/search now queries threads_meta table directly, removing the two-phase Store + Checkpointer scan approach - Add ThreadMetaRepository.search() with metadata/status filters - Add ThreadMetaRepository.update_display_name() for title sync - Worker syncs checkpoint title to threads_meta.display_name on run completion - Map display_name to values.title in search response for API compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): history endpoint reads messages from event store - POST /api/threads/{thread_id}/history now combines two data sources: checkpointer for checkpoint_id, metadata, title, thread_data; event store for messages (complete history, not truncated by summarization) - Strip internal LangGraph metadata keys from response - Remove full channel_values serialization in favor of selective fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate optional-dependencies header in pyproject.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(middleware): pass tagged config to TitleMiddleware ainvoke call Without the config, the middleware:title tag was not injected, causing the LLM response to be recorded as a lead_agent ai_message in run_events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflict in .env.example Keep both DATABASE_URL (from persistence-scaffold) and WECOM credentials (from main) after the merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address review feedback on PR #1851 - Fix naive datetime.now() → datetime.now(UTC) in all ORM models - Fix seq race condition in DbRunEventStore.put() with FOR UPDATE and UNIQUE(thread_id, seq) constraint - Encapsulate _store access in RunManager.update_run_completion() - Deduplicate _store.put() logic in RunManager via _persist_to_store() - Add update_run_completion to RunStore ABC + MemoryRunStore - Wire follow_up_to_run_id through the full create path - Add error recovery to RunJournal._flush_sync() lost-event scenario - Add migration note for search_threads breaking change - Fix test_checkpointer_none_fix mock to set database=None Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update uv.lock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality Bug fixes: - Sanitize log params to prevent log injection (CodeQL) - Reset threads_meta.status to idle/error when run completes - Attach messages only to latest checkpoint in /history response - Write threads_meta on POST /threads so new threads appear in search Lint fixes: - Remove unused imports (journal.py, migrations/env.py, test_converters.py) - Convert lambda to named function (engine.py, Ruff E731) - Remove unused logger definitions in repos (Ruff F841) - Add logging to JSONL decode errors and empty except blocks - Separate assert side-effects in tests (CodeQL) - Remove unused local variables in tests (Ruff F841) - Fix max_trace_content truncation to use byte length, not char length Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: apply ruff format to persistence and runtime files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding 'Statement has no effect' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * refactor(runtime): introduce RunContext to reduce run_agent parameter bloat Extract checkpointer, store, event_store, run_events_config, thread_meta_repo, and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context() in deps.py to build the base context from app.state singletons. start_run() uses dataclasses.replace() to enrich per-run fields before passing ctx to run_agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): move sanitize_log_param to app/gateway/utils.py Extract the log-injection sanitizer from routers/threads.py into a shared utils module and rename to sanitize_log_param (public API). Eliminates the reverse service → router import in services.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use SQL aggregation for feedback stats and thread token usage Replace Python-side counting in FeedbackRepository.aggregate_by_run with a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread abstract method with SQL GROUP BY implementation in RunRepository and Python fallback in MemoryRunStore. Simplify the thread_token_usage endpoint to delegate to the new method, eliminating the limit=10000 truncation risk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: annotate DbRunEventStore.put() as low-frequency path Add docstring clarifying that put() opens a per-call transaction with FOR UPDATE and should only be used for infrequent writes (currently just the initial human_message event). High-throughput callers should use put_batch() instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(threads): fall back to Store search when ThreadMetaRepository is unavailable When database.backend=memory (default) or no SQL session factory is configured, search_threads now queries the LangGraph Store instead of returning 503. Returns empty list if neither Store nor repo is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata Add ThreadMetaStore abstract base class with create/get/search/update/delete interface. ThreadMetaRepository (SQL) now inherits from it. New MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments. deps.py now always provides a non-None thread_meta_repo, eliminating all `if thread_meta_repo is not None` guards in services.py, worker.py, and routers/threads.py. search_threads no longer needs a Store fallback branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(history): read messages from checkpointer instead of RunEventStore The /history endpoint now reads messages directly from the checkpointer's channel_values (the authoritative source) instead of querying RunEventStore.list_messages(). The RunEventStore API is preserved for other consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address new Copilot review comments - feedback.py: validate thread_id/run_id before deleting feedback - jsonl.py: add path traversal protection with ID validation - run_repo.py: parse `before` to datetime for PostgreSQL compat - thread_meta_repo.py: fix pagination when metadata filter is active - database_config.py: use resolve_path for sqlite_dir consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Implement skill self-evolution and skill_manage flow (#1874) * chore: ignore .worktrees directory * Add skill_manage self-evolution flow * Fix CI regressions for skill_manage * Address PR review feedback for skill evolution * fix(skill-evolution): preserve history on delete * fix(skill-evolution): tighten scanner fallbacks * docs: add skill_manage e2e evidence screenshot * fix(skill-manage): avoid blocking fs ops in session runtime --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> * fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir resolve_path() resolves relative to Paths.base_dir (.deer-flow), which double-nested the path to .deer-flow/.deer-flow/data/app.db. Use Path.resolve() (CWD-relative) instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Feature/feishu receive file (#1608) * feat(feishu): add channel file materialization hook for inbound messages - Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op. - Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text. - Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files. - No impact on Slack/Telegram or other channels (they inherit the default no-op). * style(backend): format code with ruff for lint compliance - Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format` - Ensured both files conform to project linting standards - Fixes CI lint check failures caused by code style issues * fix(feishu): handle file write operation asynchronously to prevent blocking * fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code * test(feishu): add tests for receive_file method and placeholder replacement * fix(manager): remove unnecessary type casting for channel retrieval * fix(feishu): update logging messages to reflect resource handling instead of image * fix(feishu): sanitize filename by replacing invalid characters in file uploads * fix(feishu): improve filename sanitization and reorder image key handling in message processing * fix(feishu): add thread lock to prevent filename conflicts during file downloads * fix(test): correct bad merge in test_feishu_parser.py * chore: run ruff and apply formatting cleanup fix(feishu): preserve rich-text attachment order and improve fallback filename handling * fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915) Two production docker-compose.yaml bugs prevent `make up` from working: 1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH environment overrides. Added in |
||
|
|
7ff9077074 | feat(dependencies): add langchain-ollama and ollama packages with optional dependencies | ||
|
|
848ace98cb |
feat: replace auto-admin creation with secure interactive first-boot setup (#2063)
* feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930) * feat(persistence): add SQLAlchemy 2.0 async ORM scaffold Introduce a unified database configuration (DatabaseConfig) that controls both the LangGraph checkpointer and the DeerFlow application persistence layer from a single `database:` config section. New modules: - deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends - deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton - deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation Gateway integration initializes/tears down the persistence engine in the existing langgraph_runtime() context manager. Legacy checkpointer config is preserved for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add RunEventStore ABC + MemoryRunEventStore Phase 2-A prerequisite for event storage: adds the unified run event stream interface (RunEventStore) with an in-memory implementation, RunEventsConfig, gateway integration, and comprehensive tests (27 cases). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints Phase 2-B: run persistence + event storage + token tracking. - ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow - RunRepository implements RunStore ABC via SQLAlchemy ORM - ThreadMetaRepository with owner access control - DbRunEventStore with trace content truncation and cursor pagination - JsonlRunEventStore with per-run files and seq recovery from disk - RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events, accumulates token usage by caller type, buffers and flushes to store - RunManager now accepts optional RunStore for persistent backing - Worker creates RunJournal, writes human_message, injects callbacks - Gateway deps use factory functions (RunRepository when DB available) - New endpoints: messages, run messages, run events, token-usage - ThreadCreateRequest gains assistant_id field - 92 tests pass (33 new), zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add user feedback + follow-up run association Phase 2-C: feedback and follow-up tracking. - FeedbackRow ORM model (rating +1/-1, optional message_id, comment) - FeedbackRepository with CRUD, list_by_run/thread, aggregate stats - Feedback API endpoints: create, list, stats, delete - follow_up_to_run_id in RunCreateRequest (explicit or auto-detected from latest successful run on the thread) - Worker writes follow_up_to_run_id into human_message event metadata - Gateway deps: feedback_repo factory + getter - 17 new tests (14 FeedbackRepository + 3 follow-up association) - 109 total tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config - config.example.yaml: deprecate standalone checkpointer section, activate unified database:sqlite as default (drives both checkpointer + app data) - New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage including check_access owner logic, list_by_owner pagination - Extended test_run_repository.py (+4 tests) — completion preserves fields, list ordering desc, limit, owner_none returns all - Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false, middleware no ai_message, unknown caller tokens, convenience fields, tool_error, non-summarization custom event - Extended test_run_event_store.py (+7 tests) — DB batch seq continuity, make_run_event_store factory (memory/db/jsonl/fallback/unknown) - Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists, follow-up metadata, summarization in history, full DB-backed lifecycle - Fixed DB integration test to use proper fake objects (not MagicMock) for JSON-serializable metadata - 157 total Phase 2 tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * config: move default sqlite_dir to .deer-flow/data Keep SQLite databases alongside other DeerFlow-managed data (threads, memory) under the .deer-flow/ directory instead of a top-level ./data folder. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now() - Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM models. Add json_serializer=json.dumps(ensure_ascii=False) to all create_async_engine calls so non-ASCII text (Chinese etc.) is stored as-is in both SQLite and Postgres. - Change ORM datetime defaults from datetime.now(UTC) to datetime.now(), remove UTC imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): simplify deps.py with getter factory + inline repos - Replace 6 identical getter functions with _require() factory. - Inline 3 _make_*_repo() factories into langgraph_runtime(), call get_session_factory() once instead of 3 times. - Add thread_meta upsert in start_run (services.py). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add UV_EXTRAS build arg for optional dependencies Support installing optional dependency groups (e.g. postgres) at Docker build time via UV_EXTRAS build arg: UV_EXTRAS=postgres docker compose build Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(journal): fix flush, token tracking, and consolidate tests RunJournal fixes: - _flush_sync: retain events in buffer when no event loop instead of dropping them; worker's finally block flushes via async flush(). - on_llm_end: add tool_calls filter and caller=="lead_agent" guard for ai_message events; mark message IDs for dedup with record_llm_usage. - worker.py: persist completion data (tokens, message count) to RunStore in finally block. Model factory: - Auto-inject stream_usage=True for BaseChatOpenAI subclasses with custom api_base, so usage_metadata is populated in streaming responses. Test consolidation: - Delete test_phase2b_integration.py (redundant with existing tests). - Move DB-backed lifecycle test into test_run_journal.py. - Add tests for stream_usage injection in test_model_factory.py. - Clean up executor/task_tool dead journal references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): widen content type to str|dict in all store backends Allow event content to be a dict (for structured OpenAI-format messages) in addition to plain strings. Dict values are JSON-serialized for the DB backend and deserialized on read; memory and JSONL backends handle dicts natively. Trace truncation now serializes dicts to JSON before measuring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(events): use metadata flag instead of heuristic for dict content detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(converters): add LangChain-to-OpenAI message format converters Pure functions langchain_to_openai_message, langchain_to_openai_completion, langchain_messages_to_openai, and _infer_finish_reason for converting LangChain BaseMessage objects to OpenAI Chat Completions format, used by RunJournal for event storage. 15 unit tests added. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(converters): handle empty list content as null, clean up test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): human_message content uses OpenAI user message format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): ai_message uses OpenAI format, add ai_tool_call message event - ai_message content now uses {"role": "assistant", "content": "..."} format - New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls - ai_tool_call uses langchain_to_openai_message converter for consistent format - Both events include finish_reason in metadata ("stop" or "tool_calls") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add tool_result message event with OpenAI tool message format Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end, then emit a tool_result message event (role=tool, tool_call_id, content) after each successful tool completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): summary content uses OpenAI system message format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format Add on_chat_model_start to capture structured prompt messages as llm_request events. Replace llm_end trace events with llm_response using OpenAI Chat Completions format. Track llm_call_index to pair request/response events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add record_middleware method for middleware trace events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(events): add full run sequence integration test for OpenAI content format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): align message events with checkpoint format and add middleware tag injection - Message events (ai_message, ai_tool_call, tool_result, human_message) now use BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages - on_tool_end extracts tool_call_id/name/status from ToolMessage objects - on_tool_error now emits tool_result message events with error status - record_middleware uses middleware:{tag} event_type and middleware category - Summarization custom events use middleware:summarize category - TitleMiddleware injects middleware:title tag via get_config() inheritance - SummarizationMiddleware model bound with middleware:summarize tag - Worker writes human_message using HumanMessage.model_dump() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): switch search endpoint to threads_meta table and sync title - POST /api/threads/search now queries threads_meta table directly, removing the two-phase Store + Checkpointer scan approach - Add ThreadMetaRepository.search() with metadata/status filters - Add ThreadMetaRepository.update_display_name() for title sync - Worker syncs checkpoint title to threads_meta.display_name on run completion - Map display_name to values.title in search response for API compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): history endpoint reads messages from event store - POST /api/threads/{thread_id}/history now combines two data sources: checkpointer for checkpoint_id, metadata, title, thread_data; event store for messages (complete history, not truncated by summarization) - Strip internal LangGraph metadata keys from response - Remove full channel_values serialization in favor of selective fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate optional-dependencies header in pyproject.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(middleware): pass tagged config to TitleMiddleware ainvoke call Without the config, the middleware:title tag was not injected, causing the LLM response to be recorded as a lead_agent ai_message in run_events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflict in .env.example Keep both DATABASE_URL (from persistence-scaffold) and WECOM credentials (from main) after the merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address review feedback on PR #1851 - Fix naive datetime.now() → datetime.now(UTC) in all ORM models - Fix seq race condition in DbRunEventStore.put() with FOR UPDATE and UNIQUE(thread_id, seq) constraint - Encapsulate _store access in RunManager.update_run_completion() - Deduplicate _store.put() logic in RunManager via _persist_to_store() - Add update_run_completion to RunStore ABC + MemoryRunStore - Wire follow_up_to_run_id through the full create path - Add error recovery to RunJournal._flush_sync() lost-event scenario - Add migration note for search_threads breaking change - Fix test_checkpointer_none_fix mock to set database=None Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update uv.lock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality Bug fixes: - Sanitize log params to prevent log injection (CodeQL) - Reset threads_meta.status to idle/error when run completes - Attach messages only to latest checkpoint in /history response - Write threads_meta on POST /threads so new threads appear in search Lint fixes: - Remove unused imports (journal.py, migrations/env.py, test_converters.py) - Convert lambda to named function (engine.py, Ruff E731) - Remove unused logger definitions in repos (Ruff F841) - Add logging to JSONL decode errors and empty except blocks - Separate assert side-effects in tests (CodeQL) - Remove unused local variables in tests (Ruff F841) - Fix max_trace_content truncation to use byte length, not char length Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: apply ruff format to persistence and runtime files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding 'Statement has no effect' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * refactor(runtime): introduce RunContext to reduce run_agent parameter bloat Extract checkpointer, store, event_store, run_events_config, thread_meta_repo, and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context() in deps.py to build the base context from app.state singletons. start_run() uses dataclasses.replace() to enrich per-run fields before passing ctx to run_agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): move sanitize_log_param to app/gateway/utils.py Extract the log-injection sanitizer from routers/threads.py into a shared utils module and rename to sanitize_log_param (public API). Eliminates the reverse service → router import in services.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use SQL aggregation for feedback stats and thread token usage Replace Python-side counting in FeedbackRepository.aggregate_by_run with a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread abstract method with SQL GROUP BY implementation in RunRepository and Python fallback in MemoryRunStore. Simplify the thread_token_usage endpoint to delegate to the new method, eliminating the limit=10000 truncation risk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: annotate DbRunEventStore.put() as low-frequency path Add docstring clarifying that put() opens a per-call transaction with FOR UPDATE and should only be used for infrequent writes (currently just the initial human_message event). High-throughput callers should use put_batch() instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(threads): fall back to Store search when ThreadMetaRepository is unavailable When database.backend=memory (default) or no SQL session factory is configured, search_threads now queries the LangGraph Store instead of returning 503. Returns empty list if neither Store nor repo is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata Add ThreadMetaStore abstract base class with create/get/search/update/delete interface. ThreadMetaRepository (SQL) now inherits from it. New MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments. deps.py now always provides a non-None thread_meta_repo, eliminating all `if thread_meta_repo is not None` guards in services.py, worker.py, and routers/threads.py. search_threads no longer needs a Store fallback branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(history): read messages from checkpointer instead of RunEventStore The /history endpoint now reads messages directly from the checkpointer's channel_values (the authoritative source) instead of querying RunEventStore.list_messages(). The RunEventStore API is preserved for other consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address new Copilot review comments - feedback.py: validate thread_id/run_id before deleting feedback - jsonl.py: add path traversal protection with ID validation - run_repo.py: parse `before` to datetime for PostgreSQL compat - thread_meta_repo.py: fix pagination when metadata filter is active - database_config.py: use resolve_path for sqlite_dir consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Implement skill self-evolution and skill_manage flow (#1874) * chore: ignore .worktrees directory * Add skill_manage self-evolution flow * Fix CI regressions for skill_manage * Address PR review feedback for skill evolution * fix(skill-evolution): preserve history on delete * fix(skill-evolution): tighten scanner fallbacks * docs: add skill_manage e2e evidence screenshot * fix(skill-manage): avoid blocking fs ops in session runtime --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> * fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir resolve_path() resolves relative to Paths.base_dir (.deer-flow), which double-nested the path to .deer-flow/.deer-flow/data/app.db. Use Path.resolve() (CWD-relative) instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Feature/feishu receive file (#1608) * feat(feishu): add channel file materialization hook for inbound messages - Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op. - Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text. - Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files. - No impact on Slack/Telegram or other channels (they inherit the default no-op). * style(backend): format code with ruff for lint compliance - Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format` - Ensured both files conform to project linting standards - Fixes CI lint check failures caused by code style issues * fix(feishu): handle file write operation asynchronously to prevent blocking * fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code * test(feishu): add tests for receive_file method and placeholder replacement * fix(manager): remove unnecessary type casting for channel retrieval * fix(feishu): update logging messages to reflect resource handling instead of image * fix(feishu): sanitize filename by replacing invalid characters in file uploads * fix(feishu): improve filename sanitization and reorder image key handling in message processing * fix(feishu): add thread lock to prevent filename conflicts during file downloads * fix(test): correct bad merge in test_feishu_parser.py * chore: run ruff and apply formatting cleanup fix(feishu): preserve rich-text attachment order and improve fallback filename handling * fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915) Two production docker-compose.yaml bugs prevent `make up` from working: 1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH environment overrides. Added in |
||
|
|
94eee95fe0 |
feat(auth): release-validation pass for 2.0-rc — 12 blockers + simplify follow-ups (#2008)
* feat(auth): introduce backend auth module Port RFC-001 authentication core from PR #1728: - JWT token handling (create_access_token, decode_token, TokenPayload) - Password hashing (bcrypt) with verify_password - SQLite UserRepository with base interface - Provider Factory pattern (LocalAuthProvider) - CLI reset_admin tool - Auth-specific errors (AuthErrorCode, TokenError, AuthErrorResponse) Deps: - bcrypt>=4.0.0 - pyjwt>=2.9.0 - email-validator>=2.0.0 - backend/uv.toml pins public PyPI index Tests: 12 pure unit tests (test_auth_config.py, test_auth_errors.py). Scope note: authz.py, test_auth.py, and test_auth_type_system.py are deferred to commit 2 because they depend on middleware and deps wiring that is not yet in place. Commit 1 stays "pure new files only" as the spec mandates. * feat(auth): wire auth end-to-end (middleware + frontend replacement) Backend: - Port auth_middleware, csrf_middleware, langgraph_auth, routers/auth - Port authz decorator (owner_filter_key defaults to 'owner_id') - Merge app.py: register AuthMiddleware + CSRFMiddleware + CORS, add _ensure_admin_user lifespan hook, _migrate_orphaned_threads helper, register auth router - Merge deps.py: add get_local_provider, get_current_user_from_request, get_optional_user_from_request; keep get_current_user as thin str|None adapter for feedback router - langgraph.json: add auth path pointing to langgraph_auth.py:auth - Rename metadata['user_id'] -> metadata['owner_id'] in langgraph_auth (both metadata write and LangGraph filter dict) + test fixtures Frontend: - Delete better-auth library and api catch-all route - Remove better-auth npm dependency and env vars (BETTER_AUTH_SECRET, BETTER_AUTH_GITHUB_*) from env.js - Port frontend/src/core/auth/* (AuthProvider, gateway-config, proxy-policy, server-side getServerSideUser, types) - Port frontend/src/core/api/fetcher.ts - Port (auth)/layout, (auth)/login, (auth)/setup pages - Rewrite workspace/layout.tsx as server component that calls getServerSideUser and wraps in AuthProvider - Port workspace/workspace-content.tsx for the client-side sidebar logic Tests: - Port 5 auth test files (test_auth, test_auth_middleware, test_auth_type_system, test_ensure_admin, test_langgraph_auth) - 176 auth tests PASS After this commit: login/logout/registration flow works, but persistence layer does not yet filter by owner_id. Commit 4 closes that gap. * feat(auth): account settings page + i18n - Port account-settings-page.tsx (change password, change email, logout) - Wire into settings-dialog.tsx as new "account" section with UserIcon, rendered first in the section list - Add i18n keys: - en-US/zh-CN: settings.sections.account ("Account" / "账号") - en-US/zh-CN: button.logout ("Log out" / "退出登录") - types.ts: matching type declarations * feat(auth): enforce owner_id across 2.0-rc persistence layer Add request-scoped contextvar-based owner filtering to threads_meta, runs, run_events, and feedback repositories. Router code is unchanged — isolation is enforced at the storage layer so that any caller that forgets to pass owner_id still gets filtered results, and new routes cannot accidentally leak data. Core infrastructure ------------------- - deerflow/runtime/user_context.py (new): - ContextVar[CurrentUser | None] with default None - runtime_checkable CurrentUser Protocol (structural subtype with .id) - set/reset/get/require helpers - AUTO sentinel + resolve_owner_id(value, method_name) for sentinel three-state resolution: AUTO reads contextvar, explicit str overrides, explicit None bypasses the filter (for migration/CLI) Repository changes ------------------ - ThreadMetaRepository: create/get/search/update_*/delete gain owner_id=AUTO kwarg; read paths filter by owner, writes stamp it, mutations check ownership before applying - RunRepository: put/get/list_by_thread/delete gain owner_id=AUTO kwarg - FeedbackRepository: create/get/list_by_run/list_by_thread/delete gain owner_id=AUTO kwarg - DbRunEventStore: list_messages/list_events/list_messages_by_run/ count_messages/delete_by_thread/delete_by_run gain owner_id=AUTO kwarg. Write paths (put/put_batch) read contextvar softly: when a request-scoped user is available, owner_id is stamped; background worker writes without a user context pass None which is valid (orphan row to be bound by migration) Schema ------ - persistence/models/run_event.py: RunEventRow.owner_id = Mapped[ str | None] = mapped_column(String(64), nullable=True, index=True) - No alembic migration needed: 2.0 ships fresh, Base.metadata.create_all picks up the new column automatically Middleware ---------- - auth_middleware.py: after cookie check, call get_optional_user_from_ request to load the real User, stamp it into request.state.user AND the contextvar via set_current_user, reset in a try/finally. Public paths and unauthenticated requests continue without contextvar, and @require_auth handles the strict 401 path Test infrastructure ------------------- - tests/conftest.py: @pytest.fixture(autouse=True) _auto_user_context sets a default SimpleNamespace(id="test-user-autouse") on every test unless marked @pytest.mark.no_auto_user. Keeps existing 20+ persistence tests passing without modification - pyproject.toml [tool.pytest.ini_options]: register no_auto_user marker so pytest does not emit warnings for opt-out tests - tests/test_user_context.py: 6 tests covering three-state semantics, Protocol duck typing, and require/optional APIs - tests/test_thread_meta_repo.py: one test updated to pass owner_id= None explicitly where it was previously relying on the old default Test results ------------ - test_user_context.py: 6 passed - test_auth*.py + test_langgraph_auth.py + test_ensure_admin.py: 127 - test_run_event_store / test_run_repository / test_thread_meta_repo / test_feedback: 92 passed - Full backend suite: 1905 passed, 2 failed (both @requires_llm flaky integration tests unrelated to auth), 1 skipped * feat(auth): extend orphan migration to 2.0-rc persistence tables _ensure_admin_user now runs a three-step pipeline on every boot: Step 1 (fatal): admin user exists / is created / password is reset Step 2 (non-fatal): LangGraph store orphan threads → admin Step 3 (non-fatal): SQL persistence tables → admin - threads_meta - runs - run_events - feedback Each step is idempotent. The fatal/non-fatal split mirrors PR #1728's original philosophy: admin creation failure blocks startup (the system is unusable without an admin), whereas migration failures log a warning and let the service proceed (a partial migration is recoverable; a missing admin is not). Key helpers ----------- - _iter_store_items(store, namespace, *, page_size=500): async generator that cursor-paginates across LangGraph store pages. Fixes PR #1728's hardcoded limit=1000 bug that would silently lose orphans beyond the first page. - _migrate_orphaned_threads(store, admin_user_id): Rewritten to use _iter_store_items. Returns the migrated count so the caller can log it; raises only on unhandled exceptions. - _migrate_orphan_sql_tables(admin_user_id): Imports the 4 ORM models lazily, grabs the shared session factory, runs one UPDATE per table in a single transaction, commits once. No-op when no persistence backend is configured (in-memory dev). Tests: test_ensure_admin.py (8 passed) * test(auth): port AUTH test plan docs + lint/format pass - Port backend/docs/AUTH_TEST_PLAN.md and AUTH_UPGRADE.md from PR #1728 - Rename metadata.user_id → metadata.owner_id in AUTH_TEST_PLAN.md (4 occurrences from the original PR doc) - ruff auto-fix UP037 in sentinel type annotations: drop quotes around "str | None | _AutoSentinel" now that from __future__ import annotations makes them implicit string forms - ruff format: 2 files (app/gateway/app.py, runtime/user_context.py) Note on test coverage additions: - conftest.py autouse fixture was already added in commit 4 (had to be co-located with the repository changes to keep pre-existing persistence tests passing) - cross-user isolation E2E tests (test_owner_isolation.py) deferred — enforcement is already proven by the 98-test repository suite via the autouse fixture + explicit _AUTO sentinel exercises - New test cases (TC-API-17..20, TC-ATK-13, TC-MIG-01..07) listed in AUTH_TEST_PLAN.md are deferred to a follow-up PR — they are manual-QA test cases rather than pytest code, and the spec-level coverage is already met by test_user_context.py + the 98-test repository suite. Final test results: - Auth suite (test_auth*, test_langgraph_auth, test_ensure_admin, test_user_context): 186 passed - Persistence suite (test_run_event_store, test_run_repository, test_thread_meta_repo, test_feedback): 98 passed - Lint: ruff check + ruff format both clean * test(auth): add cross-user isolation test suite 10 tests exercising the storage-layer owner filter by manually switching the user_context contextvar between two users. Verifies the safety invariant: After a repository write with owner_id=A, a subsequent read with owner_id=B must not return the row, and vice versa. Covers all 4 tables that own user-scoped data: TC-API-17 threads_meta — read, search, update, delete cross-user TC-API-18 runs — get, list_by_thread, delete cross-user TC-API-19 run_events — list_messages, list_events, count_messages, delete_by_thread (CRITICAL: raw conversation content leak vector) TC-API-20 feedback — get, list_by_run, delete cross-user Plus two meta-tests verifying the sentinel pattern itself: - AUTO + unset contextvar raises RuntimeError - explicit owner_id=None bypasses the filter (migration escape hatch) Architecture note ----------------- These tests bypass the HTTP layer by design. The full chain (cookie → middleware → contextvar → repository) is covered piecewise: - test_auth_middleware.py: middleware sets contextvar from cookies - test_owner_isolation.py: repositories enforce isolation when contextvar is set to different users Together they prove the end-to-end safety property without the ceremony of spinning up a full TestClient + in-memory DB for every router endpoint. Tests pass: 231 (full auth + persistence + isolation suite) Lint: clean * refactor(auth): migrate user repository to SQLAlchemy ORM Move the users table into the shared persistence engine so auth matches the pattern of threads_meta, runs, run_events, and feedback — one engine, one session factory, one schema init codepath. New files --------- - persistence/user/__init__.py, persistence/user/model.py: UserRow ORM class with partial unique index on (oauth_provider, oauth_id) - Registered in persistence/models/__init__.py so Base.metadata.create_all() picks it up Modified -------- - auth/repositories/sqlite.py: rewritten as async SQLAlchemy, identical constructor pattern to the other four repositories (def __init__(self, session_factory) + self._sf = session_factory) - auth/config.py: drop users_db_path field — storage is configured through config.database like every other table - deps.py/get_local_provider: construct SQLiteUserRepository with the shared session factory, fail fast if engine is not initialised - tests/test_auth.py: rewrite test_sqlite_round_trip_new_fields to use the shared engine (init_engine + close_engine in a tempdir) - tests/test_auth_type_system.py: add per-test autouse fixture that spins up a scratch engine and resets deps._cached_* singletons * refactor(auth): remove SQL orphan migration (unused in supported scenarios) The _migrate_orphan_sql_tables helper existed to bind NULL owner_id rows in threads_meta, runs, run_events, and feedback to the admin on first boot. But in every supported upgrade path, it's a no-op: 1. Fresh install: create_all builds fresh tables, no legacy rows 2. No-auth → with-auth (no existing persistence DB): persistence tables are created fresh by create_all, no legacy rows 3. No-auth → with-auth (has existing persistence DB from #1930): NOT a supported upgrade path — "有 DB 到有 DB" schema evolution is out of scope; users wipe DB or run manual ALTER So the SQL orphan migration never has anything to do in the supported matrix. Delete the function, simplify _ensure_admin_user from a 3-step pipeline to a 2-step one (admin creation + LangGraph store orphan migration only). LangGraph store orphan migration stays: it serves the real "no-auth → with-auth" upgrade path where a user's existing LangGraph thread metadata has no owner_id field and needs to be stamped with the newly-created admin's id. Tests: 284 passed (auth + persistence + isolation) Lint: clean * security(auth): write initial admin password to 0600 file instead of logs CodeQL py/clear-text-logging-sensitive-data flagged 3 call sites that logged the auto-generated admin password to stdout via logger.info(). Production log aggregators (ELK/Splunk/etc) would have captured those cleartext secrets. Replace with a shared helper that writes to .deer-flow/admin_initial_credentials.txt with mode 0600, and log only the path. New file -------- - app/gateway/auth/credential_file.py: write_initial_credentials() helper. Takes email, password, and a "initial"/"reset" label. Creates .deer-flow/ if missing, writes a header comment plus the email+password, chmods 0o600, returns the absolute Path. Modified -------- - app/gateway/app.py: both _ensure_admin_user paths (fresh creation + needs_setup password reset) now write to file and log the path - app/gateway/auth/reset_admin.py: rewritten to use the shared ORM repo (SQLiteUserRepository with session_factory) and the credential_file helper. The previous implementation was broken after the earlier ORM refactor — it still imported _get_users_conn and constructed SQLiteUserRepository() without a session factory. No tests changed — the three password-log sites are all exercised via existing test_ensure_admin.py which checks that startup succeeds, not that a specific string appears in logs. CodeQL alerts 272, 283, 284: all resolved. * security(auth): strict JWT validation in middleware (fix junk cookie bypass) AUTH_TEST_PLAN test 7.5.8 expects junk cookies to be rejected with 401. The previous middleware behaviour was "presence-only": check that some access_token cookie exists, then pass through. In combination with my Task-12 decision to skip @require_auth decorators on routes, this created a gap where a request with any cookie-shaped string (e.g. access_token=not-a-jwt) would bypass authentication on routes that do not touch the repository (/api/models, /api/mcp/config, /api/memory, /api/skills, …). Fix: middleware now calls get_current_user_from_request() strictly and catches the resulting HTTPException to render a 401 with the proper fine-grained error code (token_invalid, token_expired, user_not_found, …). On success it stamps request.state.user and the contextvar so repository-layer owner filters work downstream. The 4 old "_with_cookie_passes" tests in test_auth_middleware.py were written for the presence-only behaviour; they asserted that a junk cookie would make the handler return 200. They are renamed to "_with_junk_cookie_rejected" and their assertions flipped to 401. The negative path (no cookie → 401 not_authenticated) is unchanged. Verified: no cookie → 401 not_authenticated junk cookie → 401 token_invalid (the fixed bug) expired cookie → 401 token_expired Tests: 284 passed (auth + persistence + isolation) Lint: clean * security(auth): wire @require_permission(owner_check=True) on isolation routes Apply the require_permission decorator to all 28 routes that take a {thread_id} path parameter. Combined with the strict middleware (previous commit), this gives the double-layer protection that AUTH_TEST_PLAN test 7.5.9 documents: Layer 1 (AuthMiddleware): cookie + JWT validation, rejects junk cookies and stamps request.state.user Layer 2 (@require_permission with owner_check=True): per-resource ownership verification via ThreadMetaStore.check_access — returns 404 if a different user owns the thread The decorator's owner_check branch is rewritten to use the SQL thread_meta_repo (the 2.0-rc persistence layer) instead of the LangGraph store path that PR #1728 used (_store_get / get_store in routers/threads.py). The inject_record convenience is dropped — no caller in 2.0 needs the LangGraph blob, and the SQL repo has a different shape. Routes decorated (28 total): - threads.py: delete, patch, get, get-state, post-state, post-history - thread_runs.py: post-runs, post-runs-stream, post-runs-wait, list_runs, get_run, cancel_run, join_run, stream_existing_run, list_thread_messages, list_run_messages, list_run_events, thread_token_usage - feedback.py: create, list, stats, delete - uploads.py: upload (added Request param), list, delete - artifacts.py: get_artifact - suggestions.py: generate (renamed body parameter to avoid conflict with FastAPI Request) Test fixes: - test_suggestions_router.py: bypass the decorator via __wrapped__ (the unit tests cover parsing logic, not auth — no point spinning up a thread_meta_repo just to test JSON unwrapping) - test_auth_middleware.py 4 fake-cookie tests: already updated in the previous commit (745bf432) Tests: 293 passed (auth + persistence + isolation + suggestions) Lint: clean * security(auth): defense-in-depth fixes from release validation pass Eight findings caught while running the AUTH_TEST_PLAN end-to-end against the deployed sg_dev stack. Each is a pre-condition for shipping release/2.0-rc that the previous PRs missed. Backend hardening - routers/auth.py: rate limiter X-Real-IP now requires AUTH_TRUSTED_PROXIES whitelist (CIDR/IP allowlist). Without nginx in front, the previous code honored arbitrary X-Real-IP, letting an attacker rotate the header to fully bypass the per-IP login lockout. - routers/auth.py: 36-entry common-password blocklist via Pydantic field_validator on RegisterRequest + ChangePasswordRequest. The shared _validate_strong_password helper keeps the constraint in one place. - routers/threads.py: ThreadCreateRequest + ThreadPatchRequest strip server-reserved metadata keys (owner_id, user_id) via Pydantic field_validator so a forged value can never round-trip back to other clients reading the same thread. The actual ownership invariant stays on the threads_meta row; this closes the metadata-blob echo gap. - authz.py + thread_meta/sql.py: require_permission gains a require_existing flag plumbed through check_access(require_existing=True). Destructive routes (DELETE/PATCH/state-update/runs/feedback) now treat a missing thread_meta row as 404 instead of "untracked legacy thread, allow", closing the cross-user delete-idempotence gap where any user could successfully DELETE another user's deleted thread. - repositories/sqlite.py + base.py: update_user raises UserNotFoundError on a vanished row instead of silently returning the input. Concurrent delete during password reset can no longer look like a successful update. - runtime/user_context.py: resolve_owner_id() coerces User.id (UUID) to str at the contextvar boundary so SQLAlchemy String(64) columns can bind it. The whole 2.0-rc isolation pipeline was previously broken end-to-end (POST /api/threads → 500 "type 'UUID' is not supported"). - persistence/engine.py: SQLAlchemy listener enables PRAGMA journal_mode=WAL, synchronous=NORMAL, foreign_keys=ON on every new SQLite connection. TC-UPG-06 in the test plan expects WAL; previous code shipped with the default 'delete' journal. - auth_middleware.py: stamp request.state.auth = AuthContext(...) so @require_permission's short-circuit fires; previously every isolation request did a duplicate JWT decode + users SELECT. Also unifies the 401 payload through AuthErrorResponse(...).model_dump(). - app.py: _ensure_admin_user restructure removes the noqa F821 scoping bug where 'password' was referenced outside the branch that defined it. New _announce_credentials helper absorbs the duplicate log block in the fresh-admin and reset-admin branches. * fix(frontend+nginx): rollout CSRF on every state-changing client path The frontend was 100% broken in gateway-pro mode for any user trying to open a specific chat thread. Three cumulative bugs each silently masked the next. LangGraph SDK CSRF gap (api-client.ts) - The Client constructor took only apiUrl, no defaultHeaders, no fetch interceptor. The SDK's internal fetch never sent X-CSRF-Token, so every state-changing /api/langgraph-compat/* call (runs/stream, threads/search, threads/{tid}/history, ...) hit CSRFMiddleware and got 403 before reaching the auth check. UI symptom: empty thread page with no error message; the SPA's hooks swallowed the rejection. - Fix: pass an onRequest hook that injects X-CSRF-Token from the csrf_token cookie per request. Reading the cookie per call (not at construction time) handles login / logout / password-change cookie rotation transparently. The SDK's prepareFetchOptions calls onRequest for both regular requests AND streaming/SSE/reconnect, so the same hook covers runs.stream and runs.joinStream. Raw fetch CSRF gap (7 files) - Audit: 11 frontend fetch sites, only 2 included CSRF (login/setup + account-settings change-password). The other 7 routed through raw fetch() with no header — suggestions, memory, agents, mcp, skills, uploads, and the local thread cleanup hook all 403'd silently. - Fix: enhance fetcher.ts:fetchWithAuth to auto-inject X-CSRF-Token on POST/PUT/DELETE/PATCH from a single shared readCsrfCookie() helper. Convert all 7 raw fetch() callers to fetchWithAuth so the contract is centrally enforced. api-client.ts and fetcher.ts share readCsrfCookie + STATE_CHANGING_METHODS to avoid drift. nginx routing + buffering (nginx.local.conf) - The auth feature shipped without updating the nginx config: per-API explicit location blocks but no /api/v1/auth/, /api/feedback, /api/runs. The frontend's client-side fetches to /api/v1/auth/login/local 404'd from the Next.js side because nginx routed /api/* to the frontend. - Fix: add catch-all `location /api/` that proxies to the gateway. nginx longest-prefix matching keeps the explicit blocks (/api/models, /api/threads regex, /api/langgraph/, ...) winning for their paths. - Fix: disable proxy_buffering + proxy_request_buffering for the frontend `location /` block. Without it, nginx tries to spool large Next.js chunks into /var/lib/nginx/proxy (root-owned) and fails with Permission denied → ERR_INCOMPLETE_CHUNKED_ENCODING → ChunkLoadError. * test(auth): release-validation test infra and new coverage Test fixtures and unit tests added during the validation pass. Router test helpers (NEW: tests/_router_auth_helpers.py) - make_authed_test_app(): builds a FastAPI test app with a stub middleware that stamps request.state.user + request.state.auth and a permissive thread_meta_repo mock. TestClient-based router tests (test_artifacts_router, test_threads_router) use it instead of bare FastAPI() so the new @require_permission(owner_check=True) decorators short-circuit cleanly. - call_unwrapped(): walks the __wrapped__ chain to invoke the underlying handler without going through the authz wrappers. Direct-call tests (test_uploads_router) use it. Typed with ParamSpec so the wrapped signature flows through. Backend test additions - test_auth.py: 7 tests for the new _get_client_ip trust model (no proxy / trusted proxy / untrusted peer / XFF rejection / invalid CIDR / no client). 5 tests for the password blocklist (literal, case-insensitive, strong password accepted, change-password binding, short-password length-check still fires before blocklist). test_update_user_raises_when_row_concurrently_deleted: closes a shipped-without-coverage gap on the new UserNotFoundError contract. - test_thread_meta_repo.py: 4 tests for check_access(require_existing=True) — strict missing-row denial, strict owner match, strict owner mismatch, strict null-owner still allowed (shared rows survive the tightening). - test_ensure_admin.py: 3 tests for _migrate_orphaned_threads / _iter_store_items pagination, covering the TC-UPG-02 upgrade story end-to-end via mock store. Closes the gap where the cursor pagination was untested even though the previous PR rewrote it. - test_threads_router.py: 5 tests for _strip_reserved_metadata (owner_id removal, user_id removal, safe-keys passthrough, empty input, both-stripped). - test_auth_type_system.py: replace "password123" fixtures with Tr0ub4dor3a / AnotherStr0ngPwd! so the new password blocklist doesn't reject the test data. * docs(auth): refresh TC-DOCKER-05 + document Docker validation gap - AUTH_TEST_PLAN.md TC-DOCKER-05: the previous expectation ("admin password visible in docker logs") was stale after the simplify pass that moved credentials to a 0600 file. The grep "Password:" check would have silently failed and given a false sense of coverage. New expectation matches the actual file-based path: 0600 file in DEER_FLOW_HOME, log shows the path (not the secret), reverse-grep asserts no leaked password in container logs. - NEW: docs/AUTH_TEST_DOCKER_GAP.md documents the only un-executed block in the test plan (TC-DOCKER-01..06). Reason: sg_dev validation host has no Docker daemon installed. The doc maps each Docker case to an already-validated bare-metal equivalent (TC-1.1, TC-REENT-01, TC-API-02 etc.) so the gap is auditable, and includes pre-flight reproduction steps for whoever has Docker available. --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> |
||
|
|
d8ecaf46c9 |
feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930)
* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold
Introduce a unified database configuration (DatabaseConfig) that
controls both the LangGraph checkpointer and the DeerFlow application
persistence layer from a single `database:` config section.
New modules:
- deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends
- deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton
- deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation
Gateway integration initializes/tears down the persistence engine in
the existing langgraph_runtime() context manager. Legacy checkpointer
config is preserved for backward compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add RunEventStore ABC + MemoryRunEventStore
Phase 2-A prerequisite for event storage: adds the unified run event
stream interface (RunEventStore) with an in-memory implementation,
RunEventsConfig, gateway integration, and comprehensive tests (27 cases).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints
Phase 2-B: run persistence + event storage + token tracking.
- ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow
- RunRepository implements RunStore ABC via SQLAlchemy ORM
- ThreadMetaRepository with owner access control
- DbRunEventStore with trace content truncation and cursor pagination
- JsonlRunEventStore with per-run files and seq recovery from disk
- RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events,
accumulates token usage by caller type, buffers and flushes to store
- RunManager now accepts optional RunStore for persistent backing
- Worker creates RunJournal, writes human_message, injects callbacks
- Gateway deps use factory functions (RunRepository when DB available)
- New endpoints: messages, run messages, run events, token-usage
- ThreadCreateRequest gains assistant_id field
- 92 tests pass (33 new), zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add user feedback + follow-up run association
Phase 2-C: feedback and follow-up tracking.
- FeedbackRow ORM model (rating +1/-1, optional message_id, comment)
- FeedbackRepository with CRUD, list_by_run/thread, aggregate stats
- Feedback API endpoints: create, list, stats, delete
- follow_up_to_run_id in RunCreateRequest (explicit or auto-detected
from latest successful run on the thread)
- Worker writes follow_up_to_run_id into human_message event metadata
- Gateway deps: feedback_repo factory + getter
- 17 new tests (14 FeedbackRepository + 3 follow-up association)
- 109 total tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config
- config.example.yaml: deprecate standalone checkpointer section, activate
unified database:sqlite as default (drives both checkpointer + app data)
- New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage
including check_access owner logic, list_by_owner pagination
- Extended test_run_repository.py (+4 tests) — completion preserves fields,
list ordering desc, limit, owner_none returns all
- Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false,
middleware no ai_message, unknown caller tokens, convenience fields,
tool_error, non-summarization custom event
- Extended test_run_event_store.py (+7 tests) — DB batch seq continuity,
make_run_event_store factory (memory/db/jsonl/fallback/unknown)
- Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists,
follow-up metadata, summarization in history, full DB-backed lifecycle
- Fixed DB integration test to use proper fake objects (not MagicMock)
for JSON-serializable metadata
- 157 total Phase 2 tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* config: move default sqlite_dir to .deer-flow/data
Keep SQLite databases alongside other DeerFlow-managed data
(threads, memory) under the .deer-flow/ directory instead of a
top-level ./data folder.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now()
- Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM
models. Add json_serializer=json.dumps(ensure_ascii=False) to all
create_async_engine calls so non-ASCII text (Chinese etc.) is stored
as-is in both SQLite and Postgres.
- Change ORM datetime defaults from datetime.now(UTC) to datetime.now(),
remove UTC imports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): simplify deps.py with getter factory + inline repos
- Replace 6 identical getter functions with _require() factory.
- Inline 3 _make_*_repo() factories into langgraph_runtime(), call
get_session_factory() once instead of 3 times.
- Add thread_meta upsert in start_run (services.py).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(docker): add UV_EXTRAS build arg for optional dependencies
Support installing optional dependency groups (e.g. postgres) at
Docker build time via UV_EXTRAS build arg:
UV_EXTRAS=postgres docker compose build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(journal): fix flush, token tracking, and consolidate tests
RunJournal fixes:
- _flush_sync: retain events in buffer when no event loop instead of
dropping them; worker's finally block flushes via async flush().
- on_llm_end: add tool_calls filter and caller=="lead_agent" guard for
ai_message events; mark message IDs for dedup with record_llm_usage.
- worker.py: persist completion data (tokens, message count) to RunStore
in finally block.
Model factory:
- Auto-inject stream_usage=True for BaseChatOpenAI subclasses with
custom api_base, so usage_metadata is populated in streaming responses.
Test consolidation:
- Delete test_phase2b_integration.py (redundant with existing tests).
- Move DB-backed lifecycle test into test_run_journal.py.
- Add tests for stream_usage injection in test_model_factory.py.
- Clean up executor/task_tool dead journal references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): widen content type to str|dict in all store backends
Allow event content to be a dict (for structured OpenAI-format messages)
in addition to plain strings. Dict values are JSON-serialized for the DB
backend and deserialized on read; memory and JSONL backends handle dicts
natively. Trace truncation now serializes dicts to JSON before measuring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(events): use metadata flag instead of heuristic for dict content detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(converters): add LangChain-to-OpenAI message format converters
Pure functions langchain_to_openai_message, langchain_to_openai_completion,
langchain_messages_to_openai, and _infer_finish_reason for converting
LangChain BaseMessage objects to OpenAI Chat Completions format, used by
RunJournal for event storage. 15 unit tests added.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(converters): handle empty list content as null, clean up test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): human_message content uses OpenAI user message format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): ai_message uses OpenAI format, add ai_tool_call message event
- ai_message content now uses {"role": "assistant", "content": "..."} format
- New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls
- ai_tool_call uses langchain_to_openai_message converter for consistent format
- Both events include finish_reason in metadata ("stop" or "tool_calls")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add tool_result message event with OpenAI tool message format
Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end,
then emit a tool_result message event (role=tool, tool_call_id, content) after each
successful tool completion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): summary content uses OpenAI system message format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format
Add on_chat_model_start to capture structured prompt messages as llm_request events.
Replace llm_end trace events with llm_response using OpenAI Chat Completions format.
Track llm_call_index to pair request/response events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add record_middleware method for middleware trace events
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(events): add full run sequence integration test for OpenAI content format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): align message events with checkpoint format and add middleware tag injection
- Message events (ai_message, ai_tool_call, tool_result, human_message) now use
BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages
- on_tool_end extracts tool_call_id/name/status from ToolMessage objects
- on_tool_error now emits tool_result message events with error status
- record_middleware uses middleware:{tag} event_type and middleware category
- Summarization custom events use middleware:summarize category
- TitleMiddleware injects middleware:title tag via get_config() inheritance
- SummarizationMiddleware model bound with middleware:summarize tag
- Worker writes human_message using HumanMessage.model_dump()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): switch search endpoint to threads_meta table and sync title
- POST /api/threads/search now queries threads_meta table directly,
removing the two-phase Store + Checkpointer scan approach
- Add ThreadMetaRepository.search() with metadata/status filters
- Add ThreadMetaRepository.update_display_name() for title sync
- Worker syncs checkpoint title to threads_meta.display_name on run completion
- Map display_name to values.title in search response for API compatibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): history endpoint reads messages from event store
- POST /api/threads/{thread_id}/history now combines two data sources:
checkpointer for checkpoint_id, metadata, title, thread_data;
event store for messages (complete history, not truncated by summarization)
- Strip internal LangGraph metadata keys from response
- Remove full channel_values serialization in favor of selective fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove duplicate optional-dependencies header in pyproject.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(middleware): pass tagged config to TitleMiddleware ainvoke call
Without the config, the middleware:title tag was not injected,
causing the LLM response to be recorded as a lead_agent ai_message
in run_events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve merge conflict in .env.example
Keep both DATABASE_URL (from persistence-scaffold) and WECOM
credentials (from main) after the merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address review feedback on PR #1851
- Fix naive datetime.now() → datetime.now(UTC) in all ORM models
- Fix seq race condition in DbRunEventStore.put() with FOR UPDATE
and UNIQUE(thread_id, seq) constraint
- Encapsulate _store access in RunManager.update_run_completion()
- Deduplicate _store.put() logic in RunManager via _persist_to_store()
- Add update_run_completion to RunStore ABC + MemoryRunStore
- Wire follow_up_to_run_id through the full create path
- Add error recovery to RunJournal._flush_sync() lost-event scenario
- Add migration note for search_threads breaking change
- Fix test_checkpointer_none_fix mock to set database=None
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: update uv.lock
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality
Bug fixes:
- Sanitize log params to prevent log injection (CodeQL)
- Reset threads_meta.status to idle/error when run completes
- Attach messages only to latest checkpoint in /history response
- Write threads_meta on POST /threads so new threads appear in search
Lint fixes:
- Remove unused imports (journal.py, migrations/env.py, test_converters.py)
- Convert lambda to named function (engine.py, Ruff E731)
- Remove unused logger definitions in repos (Ruff F841)
- Add logging to JSONL decode errors and empty except blocks
- Separate assert side-effects in tests (CodeQL)
- Remove unused local variables in tests (Ruff F841)
- Fix max_trace_content truncation to use byte length, not char length
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff format to persistence and runtime files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding 'Statement has no effect'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* refactor(runtime): introduce RunContext to reduce run_agent parameter bloat
Extract checkpointer, store, event_store, run_events_config, thread_meta_repo,
and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context()
in deps.py to build the base context from app.state singletons. start_run() uses
dataclasses.replace() to enrich per-run fields before passing ctx to run_agent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): move sanitize_log_param to app/gateway/utils.py
Extract the log-injection sanitizer from routers/threads.py into a shared
utils module and rename to sanitize_log_param (public API). Eliminates the
reverse service → router import in services.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* perf: use SQL aggregation for feedback stats and thread token usage
Replace Python-side counting in FeedbackRepository.aggregate_by_run with
a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread
abstract method with SQL GROUP BY implementation in RunRepository and
Python fallback in MemoryRunStore. Simplify the thread_token_usage
endpoint to delegate to the new method, eliminating the limit=10000
truncation risk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: annotate DbRunEventStore.put() as low-frequency path
Add docstring clarifying that put() opens a per-call transaction with
FOR UPDATE and should only be used for infrequent writes (currently
just the initial human_message event). High-throughput callers should
use put_batch() instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(threads): fall back to Store search when ThreadMetaRepository is unavailable
When database.backend=memory (default) or no SQL session factory is
configured, search_threads now queries the LangGraph Store instead of
returning 503. Returns empty list if neither Store nor repo is available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata
Add ThreadMetaStore abstract base class with create/get/search/update/delete
interface. ThreadMetaRepository (SQL) now inherits from it. New
MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments.
deps.py now always provides a non-None thread_meta_repo, eliminating all
`if thread_meta_repo is not None` guards in services.py, worker.py, and
routers/threads.py. search_threads no longer needs a Store fallback branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(history): read messages from checkpointer instead of RunEventStore
The /history endpoint now reads messages directly from the
checkpointer's channel_values (the authoritative source) instead of
querying RunEventStore.list_messages(). The RunEventStore API is
preserved for other consumers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address new Copilot review comments
- feedback.py: validate thread_id/run_id before deleting feedback
- jsonl.py: add path traversal protection with ID validation
- run_repo.py: parse `before` to datetime for PostgreSQL compat
- thread_meta_repo.py: fix pagination when metadata filter is active
- database_config.py: use resolve_path for sqlite_dir consistency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Implement skill self-evolution and skill_manage flow (#1874)
* chore: ignore .worktrees directory
* Add skill_manage self-evolution flow
* Fix CI regressions for skill_manage
* Address PR review feedback for skill evolution
* fix(skill-evolution): preserve history on delete
* fix(skill-evolution): tighten scanner fallbacks
* docs: add skill_manage e2e evidence screenshot
* fix(skill-manage): avoid blocking fs ops in session runtime
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir
resolve_path() resolves relative to Paths.base_dir (.deer-flow),
which double-nested the path to .deer-flow/.deer-flow/data/app.db.
Use Path.resolve() (CWD-relative) instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Feature/feishu receive file (#1608)
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915)
Two production docker-compose.yaml bugs prevent `make up` from working:
1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
environment overrides. Added in
|
||
|
|
9dc25987e0
|
fix(channles):update the logger for the channel config (#2524)
* fix(channles):update the logger for the channel config * fix(channels): normalize credential values and add tests for disabled-but-configured warning Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/dfc0a566-aa59-49f9-a74d-610292fb0a63 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * fix the backend lint error --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> |
||
|
|
410f0c48b5
|
fix(channels): accept single slack allowed user (#2481)
* fix(channels): accept single slack allowed user * docs: address Slack allowed_users review notes * ci: rerun backend unit tests * docs: clarify Slack allowed_users config --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
1f59e945af
|
fix: cap prompt caching breakpoints at 4 to prevent API 400 errors (#2449)
* fix: cap prompt caching breakpoints at 4 to prevent API 400 errors (fixes #2448) The previous _apply_prompt_caching() attached cache_control to every text block in the system prompt, every content block in the last N messages, and the last tool definition. In multi-turn conversations with structured content blocks this easily exceeded the 4-breakpoint hard limit enforced by both the Anthropic API and AWS Bedrock, producing a 400 Bad Request (or a silent "No generations found in stream" when streaming). Fix: collect all candidate blocks in document order, then apply cache_control only to the last MAX_CACHE_BREAKPOINTS (4) of them. Later breakpoints cover a larger prefix and therefore yield better cache hit rates, making this the optimal placement strategy as well as the safe one. Adds 13 unit tests covering the budget cap, edge cases, and correct last-candidate placement. * docs: clarify _apply_prompt_caching docstring includes tool definitions Per Copilot review: the implementation also caches the last tool definition (see the candidates list at lines 202-205), so the docstring summary should explicitly mention tools alongside system and recent messages. * Fix the lint error * style: fix ruff format check for test_claude_provider_prompt_caching.py Add the missing blank line before the 'Edge cases' section comment so that ruff format --check passes in CI. --------- Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
f394c0d8c8
|
feat(mcp): support custom tool interceptors via extensions_config.json (#2451)
* feat(mcp): support custom tool interceptors via extensions_config.json
Add a generic extension point for registering custom MCP tool
interceptors through `extensions_config.json`. This allows downstream
projects to inject per-request header manipulation, auth context
propagation, or other cross-cutting concerns without modifying
DeerFlow source code.
Interceptors are declared as Python callable paths in a new
`mcpInterceptors` array field and loaded via the existing
`resolve_variable` reflection mechanism:
```json
{
"mcpInterceptors": [
"my_package.mcp.auth:build_auth_interceptor"
]
}
```
Each entry must resolve to a no-arg builder function that returns an
async interceptor compatible with `MultiServerMCPClient`'s
`tool_interceptors` interface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(mcp): add unit tests for custom tool interceptors
Cover all branches of the mcpInterceptors loading logic:
- valid interceptor loaded and appended to tool_interceptors
- multiple interceptors loaded in declaration order
- builder returning None is skipped
- resolve_variable ImportError logged and skipped
- builder raising exception logged and skipped
- absent mcpInterceptors field is safe (no-op)
- custom interceptors coexist with OAuth interceptor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(mcp): validate mcpInterceptors type and fix lint warnings
Address review feedback:
1. Validate mcpInterceptors config value before iterating:
- Accept a single string and normalize to [string]
- Ignore None silently
- Log warning and skip for non-list/non-string types
2. Fix ruff F841 lint errors in tests:
- Rename _make_mock_env to _make_patches, embed mock_client
- Remove unused `as mock_cls` bindings where not needed
- Extract _get_interceptors() helper to reduce repetition
3. Add two new test cases for type validation:
- test_mcp_interceptors_single_string_is_normalized
- test_mcp_interceptors_invalid_type_logs_warning
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): validate interceptor return type and fix import mock path
Address review feedback:
1. Validate builder return type with callable() check:
- callable interceptor → append to tool_interceptors
- None → silently skip (builder opted out)
- non-callable → log warning with type name and skip
2. Fix test mock path: resolve_variable is a top-level import in
tools.py, so mock deerflow.mcp.tools.resolve_variable instead of
deerflow.reflection.resolve_variable to correctly intercept calls.
3. Add test_custom_interceptor_non_callable_return_logs_warning to
cover the new non-callable validation branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(mcp): add mcpInterceptors example and documentation
- Add mcpInterceptors field to extensions_config.example.json
- Add "Custom Tool Interceptors" section to MCP_SERVER.md with
configuration format, example interceptor code, and edge case
behavior notes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: IECspace <IECspace@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
||
|
|
950821cb9b
|
fix: use subprocess instead of os.system in local_backend.py (#2494)
* fix: use subprocess instead of os.system in local_backend.py The sandbox backend and skill evaluation scripts use subprocess * fixing the failing test --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
2bb1a2dfa2
|
feat(models): Provider for MindIE model engine (#2483)
* feat(models): 适配 MindIE引擎的模型 * test: add unit tests for MindIEChatModel adapter and fix PR review comments * chore: update uv.lock with pytest-asyncio * build: add pytest-asyncio to test dependencies * fix: address PR review comments (lazy import, cache clients, safe newline escape, strict xml regex) --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
b970993425
|
fix: read lead agent options from context (#2515)
* fix: read lead agent options from context * fix: validate runtime context config |
||
|
|
ec8a8cae38
|
fix: gate deferred MCP tool execution (#2513)
* fix: gate deferred MCP tool execution * style: format deferred tool middleware * fix: address deferred tool review feedback |
||
|
|
d78ed5c8f2
|
fix: inherit subagent skill allowlists (#2514) | ||
|
|
f9ff3a698d
|
fix(middleware): avoid rescuing non-skill tool outputs during summarization (#2458)
* fix(middelware): narrow skill rescue to skill-related tool outputs * fix(summarization): address skill rescue review feedback * fix: wire summarization skill rescue config * fix: remove dead skill tool helper * fix(lint): fix format --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
3a61126824
|
fix: keep debug.py interactive terminal free from background log noise (#2466)
* fix(debug): keep terminal clean by redirecting all logs to file - Redirect all logs to debug.log file to prevent background task logs from interfering with interactive terminal prompts - Honor AppConfig.log_level setting instead of hard-coding to INFO - Make logging setup idempotent by clearing pre-existing handlers - Defer deerflow imports until after logging is configured to ensure import-time side effects are captured in debug.log - Display active log level in startup banner - Add prompt_toolkit installation tip for enhanced readline support Made-with: Cursor * attaching the file handler before importing/calling get_app_config() Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
11f557a2c6
|
feat(trace):Add run_name to the trace info for system agents. (#2492)
* feat(trace): Add `run_name` to the trace info for suggestions and memory. before(in langsmith): CodexChatModel CodexChatModel lead_agent after: suggest_agent memory_agent lead_agent feat(trace): Add `run_name` to the trace info for suggestions and memory. before(in langsmith): CodexChatModel CodexChatModel lead_agent after: suggest_agent memory_agent lead_agent * feat(trace): Add `run_name` to the trace info for system agents. before(in langsmith): CodexChatModel CodexChatModel CodexChatModel CodexChatModel lead_agent after: suggest_agent title_agent security_agent memory_agent lead_agent * chore(code format):code format --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
e8572b9d0c
|
fix(jina): log transient failures at WARNING without traceback (#2484) (#2485)
The exception handler in JinaClient.crawl used logger.exception, which emits an ERROR-level record with the full httpx/httpcore/anyio traceback for every transient network failure (timeout, connection refused). Other search/crawl providers in the project log the same class of recoverable failures as a single line. One offline/slow-network session could produce dozens of multi-frame ERROR stack traces, drowning out real problems. Switch to logger.warning with a concise message that includes the exception type and its str, matching the style used elsewhere for recoverable transient failures (aio_sandbox, ddg, etc.). The exception type now also surfaces into the returned "Error: ..." string so callers retain diagnostic signal. Adds a regression test that asserts the log record is WARNING, carries no exc_info, and includes the exception class name. Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
80a7446fd6 | fix(backend): fix the unit test error in backend | ||
|
|
cd12821134 | fix(backend): Updated the uv.lock with new added dependency | ||
|
|
30d619de08
|
feat(subagents): support per-subagent skill loading and custom subagent types (#2253)
* feat(subagents): support per-subagent skill loading and custom subagent types (#2230) Add per-subagent skill configuration and custom subagent type registration, aligned with Codex's role-based config layering and per-session skill injection. Backend: - SubagentConfig gains `skills` field (None=all, []=none, list=whitelist) - New CustomSubagentConfig for user-defined subagent types in config.yaml - SubagentsAppConfig gains `custom_agents` section and `get_skills_for()` - Registry resolves custom agents with three-layer config precedence - SubagentExecutor loads skills per-session as conversation items (Codex pattern) - task_tool no longer appends skills to system_prompt - Lead agent system prompt dynamically lists all registered subagent types - setup_agent tool accepts optional skills parameter - Gateway agents API transparently passes skills in CRUD operations Frontend: - Agent/CreateAgentRequest/UpdateAgentRequest types include skills field - Agent card displays skills as badges alongside tool_groups Config: - config.example.yaml documents custom_agents and per-agent skills override Tests: - 40 new tests covering all skill config, custom agents, and registry logic - Existing tests updated for new get_skills_prompt_section signature Closes #2230 * fix: address review feedback on skills PR - Remove stale get_skills_prompt_section monkeypatches from test_task_tool_core_logic.py (task_tool no longer imports this function after skill injection moved to executor) - Add key prefixes (tg:/sk:) to agent-card badges to prevent React key collisions between tool_groups and skills * fix(ci): resolve lint and test failures - Format agent-card.tsx with prettier (lint-frontend) - Remove stale "Skills Appendix" system_prompt assertion — skills are now loaded per-session by SubagentExecutor, not appended to system_prompt * fix(ci): sort imports in test_subagent_skills_config.py (ruff I001) * fix(ci): use nullish coalescing in agent-card badge condition (eslint) * fix: address review feedback on skills PR - Use model_fields_set in AgentUpdateRequest to distinguish "field omitted" from "explicitly set to null" — fixes skills=None ambiguity where None means "inherit all" but was treated as "don't change" - Move lazy import of get_subagent_config outside loop in _build_available_subagents_description to avoid repeated import overhead --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
4e72410154
|
fix(gateway): bound lifespan shutdown hooks to prevent worker hang under uvicorn reload (#2331)
* fix(gateway): bound lifespan shutdown hooks to prevent worker hang Gateway worker can hang indefinitely in `uvicorn --reload` mode with the listening socket still bound — all /api/* requests return 504, and SIGKILL is the only recovery. Root cause (py-spy dump from a reproduction showed 16+ stacked frames of signal_handler -> Event.set -> threading.Lock.__enter__ on the main thread): CPython's `threading.Event` uses `Condition(Lock())` where the inner Lock is non-reentrant. uvicorn's BaseReload signal handler calls `should_exit.set()` directly from signal context; if a second signal (SIGTERM/SIGHUP from the reload supervisor, or watchfiles-triggered reload) arrives while the first handler holds the Lock, the reentrant call deadlocks on itself. The reload supervisor keeps sending those signals only when the worker fails to exit promptly. DeerFlow's lifespan currently awaits `stop_channel_service()` with no timeout; if a channel's `stop()` stalls (e.g. Feishu/Slack WebSocket waiting for an ack), the worker can't exit, the supervisor keeps signaling, and the deadlock becomes reachable. This is a defense-in-depth fix — it does not repair the upstream uvicorn/CPython issue, but it ensures DeerFlow's lifespan exits within a bounded window so the supervisor has no reason to keep firing signals. No behavior change on the happy path. Wraps the shutdown hook in `asyncio.wait_for(timeout=5.0)` and logs a warning on timeout before proceeding to worker exit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update backend/app/gateway/app.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * style: apply make format (ruff) to test assertions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
c42ae3af79
|
feat: add optional prompt-toolkit support to debug.py (#2461)
* feat: add optional prompt-toolkit support to debug.py Use PromptSession.prompt_async() for arrow-key navigation and input history when prompt-toolkit is available, falling back to plain input() with a helpful install tip otherwise. Made-with: Cursor * fix: handle EOFError gracefully in debug.py Catch EOFError alongside KeyboardInterrupt so that Ctrl-D exits cleanly instead of printing a traceback. Made-with: Cursor |
||
|
|
b90f219bd1
|
fix(skills): validate bundled SKILL.md front-matter in CI (fixes #2443) (#2457)
* fix(skills): validate bundled SKILL.md front-matter in CI (fixes #2443) Adds a parametrized backend test that runs `_validate_skill_frontmatter` against every bundled SKILL.md under `skills/public/`, so a broken front-matter fails CI with a per-skill error message instead of surfacing as a runtime gateway-load warning. The new test caught two pre-existing breakages on `main` and fixes them: * `bootstrap/SKILL.md`: the unquoted description had a second `:` mid-line ("Also trigger for updates: ..."), which YAML parses as a nested mapping ("mapping values are not allowed here"). Rewrites the description as a folded scalar (`>-`), which preserves the original wording (including the embedded colon, double quotes, and apostrophes) without further escaping. This complements PR #2436 (single-file colon→hyphen patch) with a more general convention that survives future edits. * `chart-visualization/SKILL.md`: used `dependency:` which is not in `ALLOWED_FRONTMATTER_PROPERTIES`. Renamed to `compatibility:`, the documented field for "Required tools, dependencies" per skill-creator. No code reads `dependency` (verified by grep across backend/). * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Fix the lint error --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
c43c803f66
|
fix: remove mismatched context param in debug.py to suppress Pydantic warning (#2446)
* fix: remove mismatched context param in debug.py to suppress Pydantic warning
The ainvoke call passed context={"thread_id": ...} but the agent graph
has no context_schema (ContextT defaults to None), causing a
PydanticSerializationUnexpectedValue warning on every invocation.
Align with the production run_agent path by injecting context via
Runtime into configurable["__pregel_runtime"] instead.
Closes #2445
Made-with: Cursor
* refactor: derive runtime thread_id from config to avoid duplication
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Made-with: Cursor
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
||
|
|
dbd777fe62
|
chore(deps): bump python-dotenv from 1.2.1 to 1.2.2 in /backend (#2440)
Bumps [python-dotenv](https://github.com/theskumar/python-dotenv) from 1.2.1 to 1.2.2. - [Release notes](https://github.com/theskumar/python-dotenv/releases) - [Changelog](https://github.com/theskumar/python-dotenv/blob/main/CHANGELOG.md) - [Commits](https://github.com/theskumar/python-dotenv/compare/v1.2.1...v1.2.2) --- updated-dependencies: - dependency-name: python-dotenv dependency-version: 1.2.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
1ca2621285
|
chore(deps): bump lxml from 6.0.2 to 6.1.0 in /backend (#2427)
Bumps [lxml](https://github.com/lxml/lxml) from 6.0.2 to 6.1.0. - [Release notes](https://github.com/lxml/lxml/releases) - [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt) - [Commits](https://github.com/lxml/lxml/compare/lxml-6.0.2...lxml-6.1.0) --- updated-dependencies: - dependency-name: lxml dependency-version: 6.1.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
5ba1dacf25
|
fix: rename present_file to present_files in docs and prompts (#2393)
The tool is registered as `present_files` (plural) in present_file_tool.py, but four references in documentation and prompt strings incorrectly used the singular form `present_file`. This could cause confusion and potentially lead to incorrect tool invocations. Changed files: - backend/docs/GUARDRAILS.md - backend/docs/ARCHITECTURE.md - backend/packages/harness/deerflow/agents/lead_agent/prompt.py (2 occurrences) |
||
|
|
6dce26a52e
|
fix: resolve tool duplication and skill parser YAML inconsistencies (#1803) (#2107)
* Refactor tests for SKILL.md parser Updated tests for SKILL.md parser to handle quoted names and descriptions correctly. Added new tests for parsing plain and single-quoted names, and ensured multi-line descriptions are processed properly. * Implement tool name validation and deduplication Add tool name mismatch warning and deduplication logic * Refactor skill file parsing and error handling * Add tests for tool name deduplication Added tests for tool name deduplication in get_available_tools(). Ensured that duplicates are not returned, the first occurrence is kept, and warnings are logged for skipped duplicates. * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Update minimal config to include tools list * Update test for nonexistent skill file Ensure the test for nonexistent files checks for None. * Refactor tool loading and add skill management support Refactor tool loading logic to include skill management tools based on configuration and clean up comments. * Enhance code comments for tool loading logic Added comments to clarify the purpose of various code sections related to tool loading and configuration. * Fix assertion for duplicate tool name warning * Fix indentation issues in tools.py * Fix the lint error of test_tool_deduplication * Fix the lint error of tools.py * Fix the lint error * Fix the lint error * make format --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
fc94e90f6c
|
fix(setup-agent): prevent data loss when setup fails on existing agen… (#2254)
* fix(setup-agent): prevent data loss when setup fails on existing agent directory Record whether the agent directory pre-existed before mkdir, and only run shutil.rmtree cleanup when the directory was newly created during this call. Previously, any failure would delete the entire directory including pre-existing SOUL.md and config.yaml. * fix: address PR review — init variables before try, remove unused result * style: fix ruff I001 import block formatting in test file * style: add missing blank lines between top-level definitions in test file |
||
|
|
c99865f53d
|
fix(token-usage): enable stream usage for openai-compatible models (#2217)
* fix(token-usage): enable stream usage for openai-compatible models * fix(token-usage): narrow stream_usage default to ChatOpenAI |
||
|
|
a62ca5dd47
|
fix: Catch httpx.ReadError in the error handling (#2309)
* fix: Catch httpx.ReadError in the error handling * fix |
||
|
|
f514e35a36
|
fix(backend): make clarification messages idempotent (#2350) (#2351) | ||
|
|
80e210f5bb
|
[security] fix(uploads): require explicit opt-in for host-side document conversion (#2332)
* fix: disable host-side upload conversion by default * fix: address PR review comments on upload conversion gate |
||
|
|
5656f90792
|
chore(deps-dev): bump pytest from 9.0.2 to 9.0.3 in /backend (#2349)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 9.0.2 to 9.0.3. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pytest-dev/pytest/compare/9.0.2...9.0.3) --- updated-dependencies: - dependency-name: pytest dependency-version: 9.0.3 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
55474011c9
|
fix(subagent): inherit parent agent's tool_groups in task_tool (#2305)
* fix(subagent): inherit parent agent's tool_groups in task_tool
When a custom agent defines tool_groups (e.g. [file:read, file:write, bash]),
the restriction is correctly applied to the lead agent. However, when the lead
agent delegates work to a subagent via the task tool, get_available_tools() is
called without the groups parameter, causing the subagent to receive ALL tools
(including web_search, web_fetch, image_search, etc.) regardless of the parent
agent's configuration.
This fix propagates tool_groups through run metadata so that task_tool passes
the same group filter when building the subagent's tool set.
Changes:
- agent.py: include tool_groups in run metadata
- task_tool.py: read tool_groups from metadata and pass to get_available_tools()
* fix: initialize metadata before conditional block and update tests for tool_groups propagation
- Initialize metadata = {} before the 'if runtime is not None' block to
avoid Ruff F821 (possibly-undefined variable) and simplify the
parent_tool_groups expression.
- Update existing test assertion to expect groups=None in
get_available_tools call signature.
- Add 3 new test cases:
- test_task_tool_propagates_tool_groups_to_subagent
- test_task_tool_no_tool_groups_passes_none
- test_task_tool_runtime_none_passes_groups_none
|
||
|
|
24fe5fbd8c
|
fix(mcp): prevent RuntimeError from escaping except block in get_cach… (#2252)
* fix(mcp): prevent RuntimeError from escaping except block in get_cached_mcp_tools When `asyncio.get_event_loop()` raises RuntimeError and the fallback `asyncio.run()` also fails, the exception escapes unhandled because Python does not route exceptions raised inside an `except` block to sibling `except` clauses. Wrap the fallback call in its own try/except so failures are logged and the function returns [] as intended. * fix: use logger.exception to preserve stack traces on MCP init failure |
||
|
|
aa6098e6a4
|
chore(deps): bump langsmith from 0.6.4 to 0.7.31 in /backend (#2291)
Bumps [langsmith](https://github.com/langchain-ai/langsmith-sdk) from 0.6.4 to 0.7.31. - [Release notes](https://github.com/langchain-ai/langsmith-sdk/releases) - [Commits](https://github.com/langchain-ai/langsmith-sdk/compare/v0.6.4...v0.7.31) --- updated-dependencies: - dependency-name: langsmith dependency-version: 0.7.31 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
ca1b7d5f48
|
fix(sandbox): add missing path masking in ls_tool output (#2317)
ls_tool was the only file-system tool that did not call mask_local_paths_in_output() before returning its result, causing host absolute paths (e.g. /Users/.../backend/.deer-flow/knowledge-base/...) to leak to the LLM instead of the expected virtual paths (/mnt/knowledge-base/...). This patch: - Adds the mask_local_paths_in_output() call to ls_tool, consistent with bash_tool, glob_tool and grep_tool. - Initialises thread_data = None before the is_local_sandbox branch (same pattern as glob_tool) so the variable is always in scope. - Adds three new tests covering user-data path masking, skills path masking and the empty-directory edge case. |
||
|
|
898f4e8ac2
|
fix: Memory update system has cache corruption, data loss, and thread-safety bugs (#2251)
* fix(memory): cache corruption, thread-safety, and caller mutation bugs
Bug 1 (updater.py): deep-copy current_memory before passing to
_apply_updates() so a subsequent save() failure cannot leave a
partially-mutated object in the storage cache.
Bug 3 (storage.py): add _cache_lock (threading.Lock) to
FileMemoryStorage and acquire it around every read/write of
_memory_cache, fixing concurrent-access races between the background
timer thread and HTTP reload calls.
Bug 4 (storage.py): replace in-place mutation
memory_data["lastUpdated"] = ...
with a shallow copy
memory_data = {**memory_data, "lastUpdated": ...}
so save() no longer silently modifies the caller's dict.
Regression tests added for all three bugs in test_memory_storage.py
and test_memory_updater.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: format test_memory_updater.py with ruff
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: remove stale bug-number labels from code comments and docstrings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|
|
259a6844bf
|
chore(deps): bump python-multipart from 0.0.22 to 0.0.26 in /backend (#2282)
Bumps [python-multipart](https://github.com/Kludex/python-multipart) from 0.0.22 to 0.0.26. - [Release notes](https://github.com/Kludex/python-multipart/releases) - [Changelog](https://github.com/Kludex/python-multipart/blob/master/CHANGELOG.md) - [Commits](https://github.com/Kludex/python-multipart/compare/0.0.22...0.0.26) --- updated-dependencies: - dependency-name: python-multipart dependency-version: 0.0.26 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
a664d2f5c4
|
fix(checkpointer): create parent directory before opening SQLite in sync provider (#2272)
* fix(checkpointer): create parent directory before opening SQLite in sync provider
The sync checkpointer factory (_sync_checkpointer_cm) opens a SQLite
connection without first ensuring the parent directory exists. The async
provider and both store providers already call ensure_sqlite_parent_dir(),
but this call was missing from the sync path.
When the deer-flow harness package is used from an external virtualenv
(where the .deer-flow directory is not pre-created), the missing parent
directory causes:
sqlite3.OperationalError: unable to open database file
Add the missing ensure_sqlite_parent_dir() call in the sync SQLite
branch, consistent with the async provider, and add a regression test.
Closes #2259
* style: fix ruff format + add call-order assertion for ensure_parent_dir
- Fix formatting in test_checkpointer.py (ruff format)
- Add test_sqlite_ensure_parent_dir_before_connect to verify
ensure_sqlite_parent_dir is called before from_conn_string
(addresses Copilot review suggestion)
---------
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
|
||
|
|
105db00987
|
feat: show token usage per assistant response (#2270)
* feat: show token usage per assistant response * fix: align client models response with token usage * fix: address token usage review feedback * docs: clarify token usage config example --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
2176b2bbfc
|
fix: validate bootstrap agent names before filesystem writes (#2274)
* fix: validate bootstrap agent names before filesystem writes * fix: tighten bootstrap agent-name validation |
||
|
|
8e3591312a
|
test: add unit tests for ViewImageMiddleware (#2256)
* test: add unit tests for ViewImageMiddleware - Add 33 test cases covering all 7 internal methods plus sync/async before_model hooks - Cover normal path, edge cases (missing keys, empty base64, stale ToolMessages before assistant turn), and deduplication logic - Related to Q2 Roadmap #1669 * test: add unit tests for ViewImageMiddleware Add 35 test cases covering all internal methods, before_model hooks, and edge cases (missing attrs, list-content dedup, stale ToolMessages). Related to #1669 |
||
|
|
692f79452d
|
fix(gateway): forward agent_name and is_bootstrap from context to configurable (#2242)
The frontend sends agent_name and is_bootstrap via the context field in run requests, but services.py only forwards a hardcoded whitelist of keys (_CONTEXT_CONFIGURABLE_KEYS) into the agent's configurable dict. Since agent_name was missing, custom agents never received their name — make_lead_agent always fell back to the default lead agent, skipping SOUL.md, per-agent config and skill filtering. Similarly, is_bootstrap was dropped, so the bootstrap creation flow could never activate the setup_agent tool path. Add both keys to the whitelist so they reach make_lead_agent. Fixes #2222 Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com> |
||
|
|
8760937439
|
fix(memory): use asyncio.to_thread for blocking file I/O in aupdate_memory (#2220)
* fix(memory): use asyncio.to_thread for blocking file I/O in aupdate_memory
`_finalize_update` performs synchronous blocking operations (os.mkdir,
file open/write/rename/stat) that were called directly from the async
`aupdate_memory` method, causing `BlockingError` from blockbuster when
running under an ASGI server. Wrap the call with `asyncio.to_thread` to
offload all blocking I/O to a thread pool.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): use unique temp filename to prevent concurrent write collision
`file_path.with_suffix(".tmp")` produces a fixed path — concurrent saves
for the same agent (now possible after wrapping _finalize_update in
asyncio.to_thread) would clobber the same temp file. Use a UUID-suffixed
temp file so each write is isolated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): also offload _prepare_update_prompt to thread pool
FileMemoryStorage.load() inside _prepare_update_prompt performs
synchronous stat() and file read, blocking the event loop just like
_finalize_update did. Wrap _prepare_update_prompt in asyncio.to_thread
for the same reason.
The async path now has no blocking file I/O on the event loop:
to_thread(_prepare_update_prompt) → await model.ainvoke() → to_thread(_finalize_update)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|
|
4ba3167f48
|
feat: flush memory before summarization (#2176)
* feat: flush memory before summarization * fix: keep agent-scoped memory on summarization flush * fix: harden summarization hook plumbing * fix: address summarization review feedback * style: format memory middleware |
||
|
|
e4f896e90d
|
fix(todo-middleware): prevent premature agent exit with incomplete todos (#2135)
* fix(todo-middleware): prevent premature agent exit with incomplete todos When plan mode is active (is_plan_mode=True), the agent occasionally exits the loop and outputs a final response while todo items are still incomplete. This happens because the routing edge only checks for tool_calls, not todo completion state. Fixes #2112 Add an after_model override to TodoMiddleware with @hook_config(can_jump_to=["model"]). When the model produces a response with no tool calls but there are still incomplete todos, the middleware injects a todo_completion_reminder HumanMessage and returns jump_to=model to force another model turn. A cap of 2 reminders prevents infinite loops when the agent cannot make further progress. Also adds _completion_reminder_count() helper and 14 new unit tests covering all edge cases of the new after_model / aafter_model logic. * Remove unnecessary blank line in test file * Fix runtime argument annotation in before_model * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
07fc25d285
|
feat: switch memory updater to async LLM calls (#2138)
* docs: mark memory updater async migration as completed - Update TODO.md to mark the replacement of sync model.invoke() with async model.ainvoke() in title_middleware and memory updater as completed using [x] format Addresses #2131 * feat: switch memory updater to async LLM calls - Add async aupdate_memory() method using await model.ainvoke() - Convert sync update_memory() to use async wrapper - Add _run_async_update_sync() for nested loop context handling - Maintain backward compatibility with existing sync API - Add ThreadPoolExecutor for async execution from sync contexts Addresses #2131 * test: add tests for async memory updater - Add test_async_update_memory_uses_ainvoke() to verify async path - Convert existing tests to use AsyncMock and ainvoke assertions - Add test_sync_update_memory_wrapper_works_in_running_loop() - Update all model mocks to use async await patterns Addresses #2131 * fix: apply ruff formatting to memory updater - Format multi-line expressions to single line - Ensure code style consistency with project standards - Fix lint issues caught by GitHub Actions * test: add comprehensive tests for async memory updater - Add test_async_update_memory_uses_ainvoke() to verify async path - Convert existing tests to use AsyncMock and ainvoke assertions - Add test_sync_update_memory_wrapper_works_in_running_loop() - Update all model mocks to use async await patterns - Ensure backward compatibility with sync API * fix: satisfy ruff formatting in memory updater test --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
55bc09ac33
|
fix(backend): fix uploads for mounted sandbox providers (#2199)
* fix uploads for mounted sandbox providers * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
c43a45ea40
|
chore(deps): bump pillow from 12.1.1 to 12.2.0 in /backend (#2206)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 12.1.1 to 12.2.0. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/12.1.1...12.2.0) --- updated-dependencies: - dependency-name: pillow dependency-version: 12.2.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
9cf7153b1d
|
fix(check): windows pnpm version detection in check script (#2189)
* fix: resolve Windows pnpm detection in check script * style: format check script regression test * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix: resolve corepack fallback on windows --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
c91785dd68
|
fix(title): strip <think> tags from title model responses and assistant context (#1927)
* fix(title): strip <think> tags from title model responses and assistant context Reasoning models (e.g. minimax M2.7, DeepSeek-R1) emit <think>...</think> blocks before their actual output. When such a model is used as the title model (or as the main agent), the raw thinking content leaked into the thread title stored in state, so the chat list showed the internal monologue instead of a meaningful title. Fixes #1884 - Add `_strip_think_tags()` helper using a regex to remove all <think>...</think> blocks - Apply it in `_parse_title()` so the title model response is always clean - Apply it to the assistant message in `_build_title_prompt()` so thinking content from the first AI turn is not fed back to the title model - Add four new unit tests covering: stripping in parse, think-only response, assistant prompt stripping, and end-to-end async flow with think tags * Fix the lint error --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
053e18e1a6
|
fix(skills): avoid blocking custom skill deletion on readonly history writes (#2197) | ||
|
|
a7e7c6d667
|
fix: disable custom-agent management API by default (#2161)
* fix: disable custom-agent management API by default * style: format agents API hardening files * fix: address review feedback for agents API hardening * fix: add missing disabled API coverage |
||
|
|
f4c17c66ce
|
fix(middleware): fix present_files thread id fallback (#2181)
* fix present files thread id fallback * fix: resolve present_files thread id from runtime config |
||
|
|
1df389b9d0
|
fix: wrap blocking readability call with asyncio.to_thread in web_fetch (#2157)
* fix: wrap blocking readability call with asyncio.to_thread in web_fetch The readability extractor internally spawns a Node.js subprocess via readabilipy, which blocks the async event loop and causes a BlockingError when web_fetch is invoked inside LangGraph's async runtime. Wrap the synchronous extract_article call with asyncio.to_thread to offload it to a thread pool, unblocking the event loop. Note: community/infoquest/tools.py has the same latent issue and should be addressed in a follow-up PR. Closes #2152 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: verify web_fetch offloads extraction via asyncio.to_thread Add a regression test that monkeypatches asyncio.to_thread to confirm readability extraction is offloaded to a worker thread, preventing future refactors from reintroducing the blocking call. Addresses Copilot review feedback on #2157. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
5db71cb68c
|
fix(middleware): repair dangling tool-call history after loop interru… (#2035)
* fix(middleware): repair dangling tool-call history after loop interruption (#2029) * docs(backend): fix middleware chain ordering --------- Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> |
||
|
|
4d4ddb3d3f
|
feat(llm): introduce lightweight circuit breaker to prevent rate-limit bans and resource exhaustion (#2095) | ||
|
|
979a461af5
|
docs: move completed async migration to Completed Features (#2146)
- Move time.sleep() -> asyncio.sleep() from Planned to Completed Features - Clean up duplicate entries in TODO.md Ensures completed async optimizations are properly tracked. |
||
|
|
ac04f2704f
|
feat(subagents): allow model override per subagent in config.yaml (#2064)
* feat(subagents): allow model override per subagent in config.yaml Wire the existing SubagentConfig.model field to config.yaml so users can assign different models to different subagent types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(subagents): cover model override in SubagentsAppConfig + registry Addresses review feedback on #2064: - registry.py: update stale inline comment — the block now applies timeout, max_turns AND model overrides, not just timeout. - test_subagent_timeout_config.py: add coverage for model override resolution across SubagentOverrideConfig, SubagentsAppConfig (get_model_for + load), and registry.get_subagent_config: - per-agent model override is applied to registry-returned config - omitted `model` keeps the builtin value - explicit `model: null` in config.yaml is equivalent to omission - model override on one agent does not affect other agents - model override preserves all other fields (name, description, timeout_seconds, max_turns) - model override does not mutate BUILTIN_SUBAGENTS Copilot's suggestion (3) "setting model to 'inherit' forces inheritance" is skipped intentionally: there is no 'inherit' sentinel in the current implementation — model is `str | None`, and None already means "inherit from parent". Adding a sentinel would be a new feature, not test coverage for this PR. Tests run locally: 51 passed (37 existing + 14 new / expanded). * test(subagents): reject empty-string model at config load time Addresses WillemJiang's review comment on #2064 (empty-string edge case): - subagents_config.py: add `min_length=1` to the `model` field on SubagentOverrideConfig. `model: ""` in config.yaml would otherwise bypass the `is not None` check and reach create_chat_model(name="") as a confusing runtime error. This is symmetric with the existing `ge=1` guards on timeout_seconds / max_turns, so the validation style stays consistent across all three override fields. - test_subagent_timeout_config.py: add test_rejects_empty_model mirroring the existing test_rejects_zero / test_rejects_negative cases; update the docstring on test_model_accepts_any_string (now test_model_accepts_any_non_empty_string) to reflect the new guard. Not addressing the first comment (validating `model` against the `models:` section at load time) in this PR. `SubagentsAppConfig` is scoped to the `subagents:` block and cannot see the sibling `models:` section, so proper cross-section validation needs a second pass or a structural change that is out of scope here — and the current behavior is consistent with how timeout_seconds / max_turns work today. Happy to track this as a follow-up issue covering cross-section validation uniformly for all three fields. Tests run locally: 52 passed in this file; 1847 passed, 18 skipped across the full backend suite. Ruff check + format clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
c4d273a68a
|
feat(channels): add Discord channel integration (#1806)
* feat(channels): add Discord channel integration Add a Discord bot channel following the existing Telegram/Slack pattern. The bot listens for messages, creates conversation threads, and relays responses back to Discord with 2000-char message splitting. - DiscordChannel extends Channel base class - Lazy imports discord.py with install hint - Thread-based conversations (each Discord thread maps to a DeerFlow thread) - Allowed guilds filter for access control - File attachment support via discord.File - Registered in service.py and manager.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(channels): address Copilot review suggestions for Discord integration - Disable @everyone/@here mentions via AllowedMentions.none() - Add 10s timeout to client close to prevent shutdown hangs - Log publish_inbound errors via future callback instead of silently dropping - Open file handle on caller thread to avoid cross-thread ownership issues - Notify user in channel when thread creation fails Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(discord): resolve lint errors in Discord channel - Replace asyncio.TimeoutError with builtin TimeoutError (UP041) - Remove extraneous f-string prefix (F541) - Apply ruff format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tests): remove fake langgraph_sdk shim from test_discord_channel The module-level sys.modules.setdefault shim installed a fake langgraph_sdk.errors.ConflictError during pytest collection. Because pytest imports all test modules before running them, test_channels.py then imported the fake ConflictError instead of the real one. In test_handle_feishu_stream_conflict_sends_busy_message, the test constructs ConflictError(message, response=..., body=...). The fake only subclasses Exception (which takes no kwargs), so the construction raised TypeError. The manager's _is_thread_busy_error check then saw a TypeError instead of a ConflictError and fell through to the generic 'An error occurred' message. langgraph_sdk is a real dependency, so the shim is unnecessary. Removing it makes both test files import the same real ConflictError and the full suite pass (1773 passed, 15 skipped). --------- Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
dc50a7fdfb
|
fix(sandbox): resolve paths in read_file/write_file content for LocalSandbox (#1935)
* fix(sandbox): resolve paths in read_file/write_file content for LocalSandbox In LocalSandbox mode, read_file and write_file now transform container paths in file content, matching the path handling behavior of bash tool. - write_file: resolves virtual paths in content to system paths before writing, so scripts with /mnt/user-data paths work when executed - read_file: reverse-resolves system paths back to virtual paths in returned content for consistency This fixes scenarios where agents write Python scripts with virtual paths, then execute them via bash tool expecting the paths to work. Fixes #1778 * fix(sandbox): address Copilot review — dedicated content resolver + forward-slash safety + tests - Extract _resolve_paths_in_content() separate from _resolve_paths_in_command() to decouple file-content path resolution from shell-command parsing - Normalize resolved paths to forward slashes to avoid Windows backslash escape issues in source files (e.g. \U in Python string literals) - Add 4 focused tests: write resolves content, forward-slash guarantee, read reverse-resolves content, and write→read roundtrip * style: fix ruff lint — remove extraneous f-string prefix * fix(sandbox): only reverse-resolve paths in agent-written files read_file previously applied _reverse_resolve_paths_in_output to ALL file content, which could silently rewrite paths in user uploads and external tool output (Willem Jiang review on #1935). Now tracks files written through write_file in _agent_written_paths. Only those files get reverse-resolved on read. Non-agent files are returned as-is. --------- Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com> |
||
|
|
5b633449f8
|
fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware (#1988)
* fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware The existing hash-based loop detection only catches identical tool call sets. When the agent calls the same tool type (e.g. read_file) on many different files, each call produces a unique hash and bypasses detection. This causes the agent to exhaust recursion_limit, consuming 150K-225K tokens per failed run. Add a second detection layer that tracks cumulative call counts per tool type per thread. Warns at 30 calls (configurable) and forces stop at 50. The hard stop message now uses the actual returned message instead of a hardcoded constant, so both hash-based and frequency-based stops produce accurate diagnostics. Also fix _apply() to use the warning message returned by _track_and_check() for hard stops, instead of always using _HARD_STOP_MSG. Closes #1987 * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(lint): remove unused imports and fix line length - Remove unused _TOOL_FREQ_HARD_STOP_MSG and _TOOL_FREQ_WARNING_MSG imports from test file (F401) - Break long _TOOL_FREQ_WARNING_MSG string to fit within 240 char limit (E501) * style: apply ruff format * test: add LRU eviction and per-thread reset coverage for frequency state Address review feedback from @WillemJiang: - Verify _tool_freq and _tool_freq_warned are cleaned on LRU eviction - Add test for reset(thread_id=...) clearing only the target thread's frequency state while leaving others intact * fix(makefile): route Windows shell-script targets through Git Bash (#2060) --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Asish Kumar <87874775+officialasishkumar@users.noreply.github.com> |
||
|
|
02569136df
|
fix(sandbox): improve sandbox security and preserve multimodal content (#2114)
* fix: improve sandbox security and preserve multimodal content * Add unit test modifications for test_injects_uploaded_files_tag_into_list_content * format updated_content * Add regression tests for multimodal upload content and host bash default safety |
||
|
|
19030928e0
|
chore(deps): bump langchain-core from 1.2.17 to 1.2.28 in /backend (#2109)
Bumps [langchain-core](https://github.com/langchain-ai/langchain) from 1.2.17 to 1.2.28. - [Release notes](https://github.com/langchain-ai/langchain/releases) - [Commits](https://github.com/langchain-ai/langchain/compare/langchain-core==1.2.17...langchain-core==1.2.28) --- updated-dependencies: - dependency-name: langchain-core dependency-version: 1.2.28 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
fe2595a05c
|
Update CMD to run uvicorn with --no-sync option (#2100) | ||
|
|
718dddde75
|
fix(sandbox): prevent memory leak in file operation locks using WeakValueDictionary (#2096)
* fix(sandbox): prevent memory leak in file operation locks using WeakValueDictionary * lint: fix lint issue in sandbox tools security |
||
|
|
fa96acdf4b
|
feat: add WeChat channel integration (#1869)
* feat: add WeChat channel integration * fix(backend): recover stale channel threads and align upload artifact handling * refactor(wechat): reduce scope and restore QR bootstrap * fix(backend): sort manager imports for Ruff lint * fix(tests): add missing patch import in test_channels.py * Update backend/app/channels/wechat.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/app/channels/manager.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(wechat): streamline allowed file extensions initialization and clean up test file --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
90299e2710
|
feat(provisioner): add optional PVC support for sandbox volumes (#2020)
* feat(provisioner): add optional PVC support for sandbox volumes (#1978) Add SKILLS_PVC_NAME and USERDATA_PVC_NAME env vars to allow sandbox Pods to use PersistentVolumeClaims instead of hostPath volumes. This prevents data loss in production when pods are rescheduled across nodes. When USERDATA_PVC_NAME is set, a subPath of threads/{thread_id}/user-data is used so a single PVC can serve multiple threads. Falls back to hostPath when the new env vars are not set, preserving backward compatibility. * add unit test for provisioner pvc volumes * refactor: extract shared provisioner_module fixture to conftest.py Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/e7ccf708-c6ba-40e4-844a-b526bdb249dd Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: JeffJiang <for-eleven@hotmail.com> |
||
|
|
b1aabe88b8
|
fix(backend): stream DeerFlowClient AI text as token deltas (#1969) (#1974)
* fix(backend): stream DeerFlowClient AI text as token deltas (#1969) DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values", "custom"] which only delivers full-state snapshots at graph-node boundaries, so AI replies were dumped as a single messages-tuple event per node instead of streaming token-by-token. `client.stream("hello")` looked identical to `client.chat("hello")` — the bug reported in #1969. Subscribe to "messages" mode as well, forward AIMessageChunk deltas as messages-tuple events with delta semantics (consumers accumulate by id), and dedup the values-snapshot path so it does not re-synthesize AI text that was already streamed. Introduce a per-id usage_metadata counter so the final AIMessage in the values snapshot and the final "messages" chunk — which carry the same cumulative usage — are not double-counted. chat() now accumulates per-id deltas and returns the last message's full accumulated text. Non-streaming mock sources (single event per id) are a degenerate case of the same logic, keeping existing callers and tests backward compatible. Verified end-to-end against a real LLM: a 15-number count emits 35 messages-tuple events with BPE subword boundaries clearly visible ("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across the window, end-event usage matches the values-snapshot usage exactly (not doubled). tests/test_client_live.py::TestLiveStreaming passes. New unit tests: - test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3 delta events with correct content/id/usage, values-snapshot does not duplicate, usage counted once. - test_chat_accumulates_streamed_deltas: chat() rebuilds full text from deltas. - test_messages_mode_tool_message: ToolMessage delivered via messages mode is not duplicated by the values-snapshot synthesis path. The stream() docstring now documents why this client does not reuse Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw LangChain objects vs serialized dicts, single caller vs HTTP fan-out). Fixes #1969 * refactor(backend): simplify DeerFlowClient streaming helpers (#1969) Post-review cleanup for the token-level streaming fix. No behavior change for correct inputs; one efficiency regression fixed. Fix: chat() O(n²) accumulator ----------------------------- `chat()` accumulated per-id text via `buffers[id] = buffers.get(id,"") + delta`, which is O(n) per concat → O(n²) total over a streamed response. At ~2 KB cumulative text this becomes user-visible; at 50 KB / 5000 chunks it costs roughly 100-300 ms of pure copying. Switched to `dict[str, list[str]]` + `"".join()` once at return. Cleanup ------- - Extract `_serialize_tool_calls`, `_ai_text_event`, `_ai_tool_calls_event`, and `_tool_message_event` static helpers. The messages-mode and values-mode branches previously repeated four inline dict literals each; they now call the same builders. - `StreamEvent.type` is now typed as `Literal["values", "messages-tuple", "custom", "end"]` via a `StreamEventType` alias. Makes the closed set explicit and catches typos at type-check time. - Direct attribute access on `AIMessage`/`AIMessageChunk`: `.usage_metadata`, `.tool_calls`, `.id` all have default values on the base class, so the `getattr(..., None)` fallbacks were dead code. Removed from the hot path. - `_account_usage` parameter type loosened to `Any` so that LangChain's `UsageMetadata` TypedDict is accepted under strict type checking. - Trimmed narrating comments on `seen_ids` / `streamed_ids` / the values-synthesis skip block; kept the non-obvious ones that document the cross-mode dedup invariant. Net diff: -15 lines. All 132 unit tests + harness boundary test still pass; ruff check and ruff format pass. * docs(backend): add STREAMING.md design note (#1969) Dedicated design document for the token-level streaming architecture, prompted by the bug investigation in #1969. Contents: - Why two parallel streaming paths exist (Gateway HTTP/async vs DeerFlowClient sync/in-process) and why they cannot be merged. - LangGraph's three-layer mode naming (Graph "messages" vs Platform SDK "messages-tuple" vs HTTP SSE) and why a shared string constant would be harmful. - Gateway path: run_agent + StreamBridge + sse_consumer with a sequence diagram. - DeerFlowClient path: sync generator + direct yield, delta semantics, chat() accumulator. - Why the three id sets (seen_ids / streamed_ids / counted_usage_ids) each carry an independent invariant and cannot be collapsed. - End-to-end sequence for a real conversation turn. - Lessons from #1969: why mock-based tests missed the bug, why BPE subword boundaries in live output are the strongest correctness signal, and the regression test that locks it in. - Source code location index. Also: - Link from backend/CLAUDE.md Embedded Client section. - Link from backend/docs/README.md under Feature Documentation. * test(backend): add refactor regression guards for stream() (#1969) Three new tests in TestStream that lock the contract introduced by PR #1974 so any future refactor (sync->async migration, sharing a core with Gateway's run_agent, dedup strategy change) cannot silently change behavior. - test_dedup_requires_messages_before_values_invariant: canary that documents the order-dependence of cross-mode dedup. streamed_ids is populated only by the messages branch, so values-before-messages for the same id produces duplicate AI text events. Real LangGraph never inverts this order, but a refactor that does (or that makes dedup idempotent) must update this test deliberately. - test_messages_mode_golden_event_sequence: locks the *exact* event sequence (4 events: 2 messages-tuple deltas, 1 values snapshot, 1 end) for a canonical streaming turn. List equality gives a clear diff on any drift in order, type, or payload shape. - test_chat_accumulates_in_linear_time: perf canary for the O(n^2) fix in commit 1f11ba10. 10,000 single-char chunks must accumulate in under 1s; the threshold is wide enough to pass on slow CI but tight enough to fail if buffer = buffer + delta is restored. All three tests pass alongside the existing 12 TestStream tests (15/15). ruff check + ruff format clean. * docs(backend): clarify stream() docstring on JSON serialization (#1969) Replace the misleading "raw LangChain objects (AIMessage, usage_metadata as dataclasses), not dicts" claim in the "Why not reuse Gateway's run_agent?" section. The implementation already yields plain Python dicts (StreamEvent.data is dict, and usage_metadata is a TypedDict), so the original wording suggested a richer return type than the API actually delivers. The corrected wording focuses on what is actually true and relevant: this client skips the JSON/SSE serialization layer that Gateway adds for HTTP wire transmission, and yields stream event payloads directly as Python data structures. Addresses Copilot review feedback on PR #1974. * test(backend): document none-id messages dedup limitation (#1969) Add test_none_id_chunks_produce_duplicates_known_limitation to TestStream that explicitly documents and asserts the current behavior when an LLM provider emits AIMessageChunk with id=None (vLLM, certain custom backends). The cross-mode dedup machinery cannot record a None id in streamed_ids (guarded by ``if msg_id:``), so the values snapshot's reassembled AIMessage with a real id falls through and synthesizes a duplicate AI text event. The test asserts len == 2 and locks this as a known limitation rather than silently letting future contributors hit it without context. Why this is documented rather than fixed: * Falling back to ``metadata.get("id")`` does not help — LangGraph's messages-mode metadata never carries the message id. * Synthesizing ``f"_synth_{id(msg_chunk)}"`` only helps if the values snapshot uses the same fallback, which it does not. * A real fix requires provider cooperation (always emit chunk ids) or content-based dedup (false-positive risk), neither of which belongs in this PR. If a real fix lands, replace this test with a positive assertion that dedup works for None-id chunks. Addresses Copilot review feedback on PR #1974 (client.py:515). * fix(frontend): UI polish - fix CSS typo, dark mode border, and hardcoded colors (#1942) - Fix `font-norma` typo to `font-normal` in message-list subtask count - Fix dark mode `--border` using reddish hue (22.216) instead of neutral - Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground` - Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground` - Add missing `font-sans` to welcome description `<pre>` for consistency - Make case-study-section padding responsive (`px-4 md:px-20`) Closes #1940 * docs: clarify deployment sizing guidance (#1963) * fix(frontend): prevent stale 'new' thread ID from triggering 422 history requests (#1960) After history.replaceState updates the URL from /chats/new to /chats/{UUID}, Next.js useParams does not update because replaceState bypasses the router. The useEffect in useThreadChat would then set threadIdFromPath ('new') as the threadId, causing the LangGraph SDK to call POST /threads/new/history which returns HTTP 422 (Invalid thread ID: must be a UUID). This fix adds a guard to skip the threadId update when threadIdFromPath is the literal string 'new', preserving the already-correct UUID that was set when the thread was created. * fix(frontend): avoid using route new as thread id (#1967) Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> * Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965) * Fix event loop conflict in SubagentExecutor.execute() When SubagentExecutor.execute() is called from within an already-running event loop (e.g., when the parent agent uses async/await), calling asyncio.run() creates a new event loop that conflicts with asyncio primitives (like httpx.AsyncClient) that were created in and bound to the parent loop. This fix detects if we're already in a running event loop, and if so, runs the subagent in a separate thread with its own isolated event loop to avoid conflicts. Fixes: sub-task cards not appearing in Ultra mode when using async parent agents Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(subagent): harden isolated event loop execution --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(backend): remove dead getattr in _tool_message_event --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> Co-authored-by: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com> Co-authored-by: 13ernkastel <LennonCMJ@live.com> Co-authored-by: siwuai <458372151@qq.com> Co-authored-by: 肖 <168966994+luoxiao6645@users.noreply.github.com> Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> Co-authored-by: Saber <11769524+hawkli-1994@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
eef0a6e2da
|
feat(dx): Setup Wizard + doctor command — closes #2030 (#2034) | ||
|
|
b107444878
|
docs(api): document recursion_limit for LangGraph API runs (#1929)
The /api/langgraph/* endpoints proxy straight to the LangGraph server, so clients inherit LangGraph's native recursion_limit default of 25 instead of the 100 that build_run_config sets for the Gateway and IM channel paths. 25 is too low for plan-mode or subagent runs and reliably triggers GraphRecursionError on the lead agent's final synthesis step after subagents return. Set recursion_limit: 100 in the Create Run example and the cURL snippet, and add a short note explaining the discrepancy so users following the docs don't hit the 25-step ceiling as a surprise. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
133ffe7174
|
feat(models): add langchain-ollama for native Ollama thinking support (#2062)
Add langchain-ollama as an optional dependency and provide ChatOllama config examples, enabling proper thinking/reasoning content preservation for local Ollama models. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
194bab4691
|
feat(config): add when_thinking_disabled support for model configs (#1970)
* feat(config): add when_thinking_disabled support for model configs Allow users to explicitly configure what parameters are sent to the model when thinking is disabled, via a new `when_thinking_disabled` field in model config. This mirrors the existing `when_thinking_enabled` pattern and takes full precedence over the hardcoded disable behavior when set. Backwards compatible — existing configs work unchanged. Closes #1675 * fix(config): address copilot review — gate when_thinking_disabled independently - Switch truthiness check to `is not None` so empty dict overrides work - Restructure disable path so when_thinking_disabled is gated independently of has_thinking_settings, allowing it to work without when_thinking_enabled - Update test to reflect new behavior |
||
|
|
35f141fc48
|
feat: implement full checkpoint rollback on user cancellation (#1867)
* feat: implement full checkpoint rollback on user cancellation - Capture pre-run checkpoint snapshot including checkpoint state, metadata, and pending_writes - Add _rollback_to_pre_run_checkpoint() function to restore thread state - Implement _call_checkpointer_method() helper to support both async and sync checkpointer methods - Rollback now properly restores checkpoint, metadata, channel_versions, and pending_writes - Remove obsolete TODO comment (Phase 2) as rollback is now complete This resolves the TODO(Phase 2) comment and enables full thread state restoration when a run is cancelled by the user. * fix: address rollback review feedback * fix: strengthen checkpoint rollback validation and error handling - Validate restored_config structure and checkpoint_id before use - Raise RuntimeError on malformed pending_writes instead of silent skip - Normalize None checkpoint_ns to empty string instead of "None" - Move delete_thread to only execute when pre_run_snapshot is None - Add docstring noting non-atomic rollback as known limitation This addresses review feedback on PR #1867 regarding data integrity in the checkpoint rollback implementation. * test: add comprehensive coverage for checkpoint rollback edge cases - test_rollback_restores_snapshot_without_deleting_thread - test_rollback_deletes_thread_when_no_snapshot_exists - test_rollback_raises_when_restore_config_has_no_checkpoint_id - test_rollback_normalizes_none_checkpoint_ns_to_root_namespace - test_rollback_raises_on_malformed_pending_write_not_a_tuple - test_rollback_raises_on_malformed_pending_write_non_string_channel - test_rollback_propagates_aput_writes_failure Covers all scenarios from PR #1867 review feedback. * test: format rollback worker tests |
||
|
|
0b6fa8b9e1
|
fix(sandbox): add startup reconciliation to prevent orphaned container leaks (#1976)
* fix(sandbox): add startup reconciliation to prevent orphaned container leaks Sandbox containers were never cleaned up when the managing process restarted, because all lifecycle tracking lived in in-memory dictionaries. This adds startup reconciliation that enumerates running containers via `docker ps` and either destroys orphans (age > idle_timeout) or adopts them into the warm pool. Closes #1972 * fix(sandbox): address Copilot review — adopt-all strategy, improved error handling - Reconciliation now adopts all containers into warm pool unconditionally, letting the idle checker decide cleanup. Avoids destroying containers that another concurrent process may still be using. - list_running() logs stderr on docker ps failure and catches FileNotFoundError/OSError. - Signal handler test restores SIGTERM/SIGINT in addition to SIGHUP. - E2E test docstring corrected to match actual coverage scope. * fix(sandbox): address maintainer review — batch inspect, lock tightening, import hygiene - _reconcile_orphans(): merge check-and-insert into a single lock acquisition per container to eliminate the TOCTOU window. - list_running(): batch the per-container docker inspect into a single call. Total subprocess calls drop from 2N+1 to 2 (one ps + one batch inspect). Parse port and created_at from the inspect JSON payload. - Extract _parse_docker_timestamp() and _extract_host_port() as module-level pure helpers and test them directly. - Move datetime/json imports to module top level. - _make_provider_for_reconciliation(): document the __new__ bypass and the lockstep coupling to AioSandboxProvider.__init__. - Add assertion that list_running() makes exactly ONE inspect call. |
||
|
|
563383c60f
|
fix(agent): file-io path guidance in agent prompts (#2019)
* fix(prompt): guide workspace-relative file io * Clarify bash agent file IO path guidance |
||
|
|
1b74d84590
|
fix: resolve missing serialized kwargs in PatchedChatDeepSeek (#2025)
* add tests * fix ci * fix ci |
||
|
|
616caa92b1
|
fix(models): resolve duplicate keyword argument error when reasoning_effort appears in both config and kwargs (#2017)
When a model config includes `reasoning_effort` as an extra YAML field
(ModelConfig uses `extra="allow"`), and the thinking-disabled code path
also injects `reasoning_effort="minimal"` into kwargs, the previous
`model_class(**kwargs, **model_settings_from_config)` call raises:
TypeError: got multiple values for keyword argument 'reasoning_effort'
Fix by merging the two dicts before instantiation, giving runtime kwargs
precedence over config values: `{**model_settings_from_config, **kwargs}`.
Fixes #1977
Co-authored-by: octo-patch <octo-patch@github.com>
|
||
|
|
31a3c9a3de
|
feat(client): add thread query methods list_threads and get_thread (#1609)
* feat(client): add thread query methods `list_threads` and `get_thread` Implemented two public API methods in `DeerFlowClient` to query threads using the underlying `checkpointer`. * Update backend/packages/harness/deerflow/client.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/packages/harness/deerflow/client.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/tests/test_client.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update backend/packages/harness/deerflow/client.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(deerflow): Fix possible KeyError issue when sorting threads * fix unit test --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
ad6d934a5f
|
fix(middleware): handle string-serialized options in ClarificationMiddleware (#1997)
* fix(middleware): handle string-serialized options in ClarificationMiddleware (#1995) Some models (e.g. Qwen3-Max) serialize array tool parameters as JSON strings instead of native arrays. Add defensive type checking in _format_clarification_message() to deserialize string options before iteration, preventing per-character rendering. * fix(middleware): normalize options after JSON deserialization Address Copilot review feedback: - Add post-deserialization normalization so options is always a list (handles json.loads returning a scalar string, dict, or None) - Add test for JSON-encoded scalar string ("development") - Fix test_json_string_with_mixed_types to use actual mixed types |
||
|
|
5350b2fb24
|
feat(community): add Exa search as community tool provider (#1357)
* feat(community): add Exa search as community tool provider Add Exa (exa.ai) as a new community search provider alongside Tavily, Firecrawl, InfoQuest, and Jina AI. Exa is an AI-native search engine with neural, keyword, and auto search types. New files: - community/exa/tools.py: web_search_tool and web_fetch_tool - tests/test_exa_tools.py: 10 unit tests with mocked Exa client Changes: - pyproject.toml: add exa-py dependency - config.example.yaml: add commented-out Exa configuration examples Usage: set `use: deerflow.community.exa.tools:web_search_tool` in config.yaml and provide EXA_API_KEY. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(community): address PR review comments for Exa tools - Make _get_exa_client() accept tool_name param so web_fetch reads its own config - Remove __init__.py to match namespace package pattern of other providers - Add duplicate tool name warning in config.example.yaml - Add regression tests for web_fetch config resolution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update revision in uv.lock to 3 --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
29817c3b34
|
fix(backend): use timezone-aware UTC in memory modules (fix pytest DeprecationWarnings) (#1992)
* fix(backend): use timezone-aware UTC in memory modules Replace datetime.utcnow() with datetime.now(timezone.utc) and a shared utc_now_iso_z() helper so persisted ISO timestamps keep the trailing Z suffix without triggering Python 3.12+ deprecation warnings. Made-with: Cursor * refactor(backend): use removesuffix for utc_now_iso_z suffix Makes the +00:00 -> Z transform explicit for the trailing offset only (Copilot review on PR #1992). Made-with: Cursor * style(backend): satisfy ruff UP017 with datetime.UTC in memory queue Made-with: Cursor --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
e5b149068c
|
Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965)
* Fix event loop conflict in SubagentExecutor.execute() When SubagentExecutor.execute() is called from within an already-running event loop (e.g., when the parent agent uses async/await), calling asyncio.run() creates a new event loop that conflicts with asyncio primitives (like httpx.AsyncClient) that were created in and bound to the parent loop. This fix detects if we're already in a running event loop, and if so, runs the subagent in a separate thread with its own isolated event loop to avoid conflicts. Fixes: sub-task cards not appearing in Ultra mode when using async parent agents Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(subagent): harden isolated event loop execution --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
0948c7a4e1
|
fix(provider): preserve streamed Codex output when response.completed.output is empty (#1928)
* fix: preserve streamed Codex output items * fix: prefer completed Codex output over streamed placeholders |
||
|
|
c3170f22da
|
fix(backend): make loop detection hash tool calls by stable keys (#1911)
* fix(backend): make loop detection hash tool calls by stable keys The loop detection middleware previously hashed full tool call arguments, which made repeated calls look different when only non-essential argument details changed. In particular, `read_file` calls with nearby line ranges could bypass repetition detection even when the agent was effectively reading the same file region again and again. - Hash tool calls using stable keys instead of the full raw args payload - Bucket `read_file` line ranges so nearby reads map to the same region key - Prefer stable identifiers such as `path`, `url`, `query`, or `command` before falling back to JSON serialization of args - Keep hashing order-independent so the same tool call set produces the same hash regardless of call order Fixes #1905 * fix(backend): harden loop detection hash normalization - Normalize and parse stringified tool args defensively - Expand stable key derivation to include pattern, glob, and cmd - Normalize reversed read_file ranges before bucketing Fixes #1905 * fix(backend): harden loop detection tool format * exclude write_file and str_replace from the stable-key path — writing different content to the same file shouldn't be flagged. --------- Co-authored-by: JeffJiang <for-eleven@hotmail.com> |
||
|
|
3b3e8e1b0b
|
feat(sandbox): strengthen bash command auditing with compound splitting and expanded patterns (#1881)
* fix(sandbox): strengthen regex coverage in SandboxAuditMiddleware
Expand high-risk patterns from 6 to 13 and medium-risk from 4 to 6,
closing several bypass vectors identified by cross-referencing Claude
Code's BashSecurity validator chain against DeerFlow's threat model.
High-risk additions:
- Generalised pipe-to-sh (replaces narrow curl|sh rule)
- Targeted command substitution ($() / backtick with dangerous executables)
- base64 decode piped to execution
- Overwrite system binaries (/usr/bin/, /bin/, /sbin/)
- Overwrite shell startup files (~/.bashrc, ~/.profile, etc.)
- /proc/*/environ leakage
- LD_PRELOAD / LD_LIBRARY_PATH hijack
- /dev/tcp/ bash built-in networking
Medium-risk additions:
- sudo/su (no-op under Docker root, warn only)
- PATH= modification (long attack chain, warn only)
Design decisions:
- Command substitution uses targeted matching (curl/wget/bash/sh/python/
ruby/perl/base64) rather than blanket block to avoid false positives
on safe usage like $(date) or `whoami`.
- Skipped encoding/obfuscation checks (hex, octal, Unicode homoglyphs)
as ROI is low in Docker sandbox — LLMs don't generate encoded commands
and container isolation bounds the blast radius.
- Merged pip/pip3 into single pip3? pattern.
* feat(sandbox): compound command splitting and fork bomb detection
Split compound bash commands (&&, ||, ;) into sub-commands and classify
each independently — prevents dangerous commands hidden after safe
prefixes (e.g. "cd /workspace && rm -rf /") from bypassing detection.
- Add _split_compound_command() with shlex quote-aware splitting
- Add fork bomb detection patterns (classic and while-loop variants)
- Most severe verdict wins; block short-circuits
- 15 new tests covering compound commands, splitting, and fork bombs
* test(sandbox): add async tests for fork bomb and compound commands
Cover awrap_tool_call path for fork bomb detection (3 variants) and
compound command splitting (block/warn/pass scenarios).
* fix(sandbox): address Copilot review — no-whitespace operators, >>/etc/, whole-command scan
- _split_compound_command: replace shlex-based implementation with a
character-by-character quote/escape-aware scanner. shlex.split only
separates '&&' / '||' / ';' when they are surrounded by whitespace,
so payloads like 'rm -rf /&&echo ok' or 'safe;rm -rf /' bypassed the
previous splitter and therefore the per-sub-command classifier.
- _HIGH_RISK_PATTERNS: change r'>\s*/etc/' to r'>+\s*/etc/' so append
redirection ('>>/etc/hosts') is also blocked.
- _classify_command: run a whole-command high-risk scan *before*
splitting. Structural attacks like 'while true; do bash & done'
span multiple shell statements — splitting on ';' destroys the
pattern context, so the raw command must be scanned first.
- tests: add no-whitespace operator cases to TestSplitCompoundCommand
and test_compound_command_classification to lock in the bypass fix.
|
||
|
|
f0dd8cb0d2
|
fix(subagents): add cooperative cancellation for subagent threads (#1873)
* fix(subagents): add cooperative cancellation for subagent threads
Subagent tasks run inside ThreadPoolExecutor threads with their own
event loop (asyncio.run). When a user clicks stop, RunManager cancels
the parent asyncio.Task, but Future.cancel() cannot terminate a running
thread and asyncio.Event does not propagate across event loops. This
causes subagent threads to keep executing (writing files, calling LLMs)
even after the user explicitly stops the run.
Fix: add a threading.Event (cancel_event) to SubagentResult and check
it cooperatively in _aexecute()'s astream iteration loop. On cancel,
request_cancel_background_task() sets the event, and the thread exits
at the next iteration boundary.
Changes:
- executor.py: Add cancel_event field to SubagentResult, check it in
_aexecute loop, set it on timeout, add request_cancel_background_task
- task_tool.py: Call request_cancel_background_task on CancelledError
* fix(subagents): guard cancel status and add pre-check before astream
- Only overwrite status to FAILED when still RUNNING, preserving
TIMED_OUT set by the scheduler thread.
- Add cancel_event pre-check before entering the astream loop so
cancellation is detected immediately when already signalled.
* fix(subagents): guard status updates with lock to prevent race condition
Wrap the check-and-set on result.status in _aexecute with
_background_tasks_lock so the timeout handler in execute_async
cannot interleave between the read and write.
* fix(subagents): add dedicated CANCELLED status for user cancellation
Introduce SubagentStatus.CANCELLED to distinguish user-initiated
cancellation from actual execution failures. Update _aexecute,
task_tool polling, cleanup terminal-status sets, and test fixtures.
* test(subagents): add cancellation tests and fix timeout regression test
- Add dedicated TestCooperativeCancellation test class with 6 tests:
- Pre-set cancel_event prevents astream from starting
- Mid-stream cancel_event returns CANCELLED immediately
- request_cancel_background_task() sets cancel_event correctly
- request_cancel on nonexistent task is a no-op
- Real execute_async timeout does not overwrite CANCELLED (deterministic
threading.Event sync, no wall-clock sleeps)
- cleanup_background_task removes CANCELLED tasks
- Add task_tool cancellation coverage:
- test_cancellation_calls_request_cancel: assert CancelledError path
calls request_cancel_background_task(task_id)
- test_task_tool_returns_cancelled_message: assert CANCELLED polling
branch emits task_cancelled event and returns expected message
- Fix pre-existing test infrastructure issue: add deerflow.sandbox.security
to _MOCKED_MODULE_NAMES (fixes ModuleNotFoundError for all executor tests)
- Add RUNNING guard to timeout handler in executor.py to prevent
TIMED_OUT from overwriting CANCELLED status
- Add cooperative cancellation granularity comment documenting that
cancellation is only detected at astream iteration boundaries
---------
Co-authored-by: lulusiyuyu <lulusiyuyu@users.noreply.github.com>
|
||
|
|
7643a46fca
|
fix(skill): make skill prompt cache refresh nonblocking (#1924)
* fix: make skill prompt cache refresh nonblocking * fix: harden skills prompt cache refresh * chore: add timeout to skills cache warm-up |
||
|
|
c4da0e8ca9
|
Move async SQLite mkdir off the event loop (#1921)
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com> |
||
|
|
88e535269e
|
Feature/feishu receive file (#1608)
* feat(feishu): add channel file materialization hook for inbound messages - Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op. - Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text. - Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files. - No impact on Slack/Telegram or other channels (they inherit the default no-op). * style(backend): format code with ruff for lint compliance - Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format` - Ensured both files conform to project linting standards - Fixes CI lint check failures caused by code style issues * fix(feishu): handle file write operation asynchronously to prevent blocking * fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code * test(feishu): add tests for receive_file method and placeholder replacement * fix(manager): remove unnecessary type casting for channel retrieval * fix(feishu): update logging messages to reflect resource handling instead of image * fix(feishu): sanitize filename by replacing invalid characters in file uploads * fix(feishu): improve filename sanitization and reorder image key handling in message processing * fix(feishu): add thread lock to prevent filename conflicts during file downloads * fix(test): correct bad merge in test_feishu_parser.py * chore: run ruff and apply formatting cleanup fix(feishu): preserve rich-text attachment order and improve fallback filename handling |
||
|
|
888f7bfb9d
|
Implement skill self-evolution and skill_manage flow (#1874)
* chore: ignore .worktrees directory * Add skill_manage self-evolution flow * Fix CI regressions for skill_manage * Address PR review feedback for skill evolution * fix(skill-evolution): preserve history on delete * fix(skill-evolution): tighten scanner fallbacks * docs: add skill_manage e2e evidence screenshot * fix(skill-manage): avoid blocking fs ops in session runtime --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
055e4df049
|
fix(sandbox): add input sanitisation guard to SandboxAuditMiddleware (#1872)
* fix(sandbox): add L2 input sanitisation to SandboxAuditMiddleware Add _validate_input() to reject malformed bash commands before regex classification: empty commands, oversized commands (>10 000 chars), and null bytes that could cause detection/execution layer inconsistency. * fix(sandbox): address Copilot review — type guard, log truncation, reject reason - Coerce None/non-string command to str before validation - Truncate oversized commands in audit logs to prevent log amplification - Propagate reject_reason through _pre_process() to block message - Remove L2 label from comments and test class names * fix(sandbox): isinstance type guard + async input sanitisation tests Address review comments: - Replace str() coercion with isinstance(raw_command, str) guard so non-string truthy values (0, [], False) fall back to empty string instead of passing validation as "0"/"[]"/"False". - Add TestInputSanitisationBlocksInAwrapToolCall with 4 async tests covering empty, null-byte, oversized, and None command via awrap_tool_call path. |
||
|
|
1ced6e977c
|
fix(backend): preserve viewed image reducer metadata (#1900)
Fix concurrent viewed_images state updates for multi-image input by preserving the reducer metadata in the vision middleware state schema. |
||
|
|
dd30e609f7
|
feat(models): add vLLM provider support (#1860)
support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking. Co-authored-by: NmanQAQ <normangyao@qq.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
5fd2c581f6
|
fix: add output truncation to ls_tool to prevent context window overflow (#1896)
ls_tool was the only sandbox tool without output size limits, allowing multi-MB results from large directories to blow up the model context window. Add head-truncation (configurable via ls_output_max_chars, default 20000) consistent with existing bash and read_file truncation. Closes #1887 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
7c68dd4ad4
|
Fix(#1702): stream resume run (#1858)
* fix: repair stream resume run metadata # Conflicts: # backend/packages/harness/deerflow/runtime/stream_bridge/memory.py # frontend/src/core/threads/hooks.ts * fix(stream): repair resumable replay validation --------- Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
29575c32f9
|
fix: expose custom events from DeerFlowClient.stream() (#1827)
* fix: expose custom client stream events Signed-off-by: suyua9 <1521777066@qq.com> * fix(client): normalize streamed custom mode values * test(client): satisfy backend ruff import ordering --------- Signed-off-by: suyua9 <1521777066@qq.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
ca2fb95ee6
|
feat: unified serve.sh with gateway mode support (#1847) | ||
|
|
117fa9b05d
|
fix(channels): normalize slack allowed user ids (#1802)
* fix(channels): normalize slack allowed user ids * style(channels): apply backend formatter --------- Co-authored-by: haimingZZ <15558128926@qq.com> Co-authored-by: suyua9 <1521777066@qq.com> |
||
|
|
8049785de6
|
fix(memory): case-insensitive fact deduplication and positive reinforcement detection (#1804)
* fix(memory): case-insensitive fact deduplication and positive reinforcement detection Two fixes to the memory system: 1. _fact_content_key() now lowercases content before comparison, preventing semantically duplicate facts like "User prefers Python" and "user prefers python" from being stored separately. 2. Adds detect_reinforcement() to MemoryMiddleware (closes #1719), mirroring detect_correction(). When users signal approval ("yes exactly", "perfect", "完全正确", etc.), the memory updater now receives reinforcement_detected=True and injects a hint prompting the LLM to record confirmed preferences and behaviors with high confidence. Changes across the full signal path: - memory_middleware.py: _REINFORCEMENT_PATTERNS + detect_reinforcement() - queue.py: reinforcement_detected field in ConversationContext and add() - updater.py: reinforcement_detected param in update_memory() and update_memory_from_conversation(); builds reinforcement_hint alongside the existing correction_hint Tests: 11 new tests covering deduplication, hint injection, and signal detection (Chinese + English patterns, window boundary, conflict with correction). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): address Copilot review comments on reinforcement detection - Tighten _REINFORCEMENT_PATTERNS: remove 很好, require punctuation/end-of-string boundaries on remaining patterns, split this-is-good into stricter variants - Suppress reinforcement_detected when correction_detected is true to avoid mixed-signal noise - Use casefold() instead of lower() for Unicode-aware fact deduplication - Add missing test coverage for reinforcement_detected OR merge and forwarding in queue --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
9ca68ffaaa
|
fix: preserve virtual path separator style (#1828)
* fix: preserve virtual path separator style * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
0ffe5a73c1
|
chroe(config):Increase subagent max-turn limits (#1852) | ||
|
|
d3b59a7931
|
docs: fix some broken links (#1864)
* Rename BACKEND_TODO.md to TODO.md in documentation * Update MCP Setup Guide link in CONTRIBUTING.md * Update reference to config.yaml path in documentation * Fix config file path in TITLE_GENERATION_IMPLEMENTATION.md Updated the path to the example config file in the documentation. |
||
|
|
e5416b539a
|
fix(docker): use multi-stage build to remove build-essential from runtime image (#1846)
* fix(docker): use multi-stage build to remove build-essential from runtime image The build-essential toolchain (~200 MB) was only needed for compiling native Python extensions during `uv sync` but remained in the final image, increasing size and attack surface. Split the Dockerfile into a builder stage (with build-essential) and a clean runtime stage that copies only the compiled artifacts, Node.js, Docker CLI, and uv. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(docker): add dev stage and pin docker:cli per review feedback Address Copilot review comments: - Add a `dev` build stage (FROM builder) that retains build-essential so startup-time `uv sync` in dev containers can compile from source - Update docker-compose-dev.yaml to use `target: dev` for gateway and langgraph services - Keep the clean runtime stage (no build-essential) as the default final stage for production builds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
72d4347adb
|
fix(sandbox): guard against None runtime.context in sandbox tool helpers (#1853)
sandbox_from_runtime() and ensure_sandbox_initialized() write sandbox_id into runtime.context after acquiring a sandbox. When lazy_init=True and no context is supplied to the graph run, runtime.context is None (the LangGraph default), causing a TypeError on the assignment. Add `if runtime.context is not None` guards at all three write sites. Reads already had equivalent guards (e.g. `runtime.context.get(...) if runtime.context else None`); this brings writes into line. |
||
|
|
a283d4a02d
|
fix: include soul field in GET /api/agents list response (fixes #1819) (#1863)
Previously, the list endpoint always returned soul=null because
_agent_config_to_response() was called without include_soul=True.
This caused confusion since PUT /api/agents/{name} and GET /api/agents/{name}
both returned the soul content, but the list endpoint silently omitted it.
Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
|
||
|
|
5f8dac66e6
|
chore(deps): update uv.lock (#1848)
Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
2a150f5d4a
|
fix: unblock concurrent threads and workspace hydration (#1839)
* fix: unblock concurrent threads and workspace hydration * fix: restore async title generation * fix: address PR review feedback * style: format lead agent prompt |
||
|
|
163121d327
|
fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline (#1838)
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents Add workflow guidance to the <uploaded_files> context block so the agent knows to use grep and glob (added in #1784) alongside read_file when working with uploaded documents, rather than falling back to web search. This is the final piece of the three-PR PDF agentic search pipeline: - PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings - PR2 (#1738): document outline injected into agent context with line numbers - PR3 (this): agent guided to use outline + grep + read_file workflow * feat(uploads): add file-first priority and fallback guidance to uploaded_files context * fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline - Add _clean_bold_title() to merge adjacent bold spans (** **) produced by pymupdf4llm when bold text crosses span boundaries - Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise **<num>** **<title>** headings common in academic papers; excludes pure-number table headers and rows with more than 4 bold blocks - When outline is empty, read first 5 non-empty lines of the .md as a content preview and surface a grep hint in the agent context - Update _format_file_entry to render the preview + grep hint instead of silently omitting the outline section - Add 3 new extract_outline tests and 2 new middleware tests (65 total) * fix(uploads): address Copilot review comments on extract_outline regex - Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII titles (e.g. **1** **概述**); pure-numeric/punctuation blocks still excluded - Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input - Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex) - Log debug message with exc_info when preview extraction fails |
||
|
|
19809800f1
|
feat: support wecom channel (#1390)
* feat: support wecom channel * fix: sending file to client Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * test: add unit tests for wecom channel Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * docs: add example configs and setup docs Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * revert pypi default index setting Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * revert: keeping codes in harness untouched Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * fix: format issue Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> * fix: resolve Copilot comments Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> --------- Signed-off-by: fengxusong <7008971+fengxsong@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
bbd0866374
|
feat(uploads): guide agent using agentic search for uploaded documents (#1816)
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents Add workflow guidance to the <uploaded_files> context block so the agent knows to use grep and glob (added in #1784) alongside read_file when working with uploaded documents, rather than falling back to web search. This is the final piece of the three-PR PDF agentic search pipeline: - PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings - PR2 (#1738): document outline injected into agent context with line numbers - PR3 (this): agent guided to use outline + grep + read_file workflow * feat(uploads): add file-first priority and fallback guidance to uploaded_files context |
||
|
|
db82b59254
|
fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware (#1823)
* fix: inject longTermBackground into memory prompt
The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.
The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.
Add the missing field handling and a regression test.
* fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware
LangChain AIMessage.content can be str | list. When using providers that
return structured content blocks (e.g. Anthropic thinking mode, certain
OpenAI-compatible gateways), content is a list of dicts like
[{"type": "text", "text": "..."}].
The hard_limit branch in _apply() concatenated content with a string via
(last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}", which raises
TypeError when content is a non-empty list (list + str is invalid).
Add _append_text() static method that:
- Returns the text directly when content is None
- Appends a {"type": "text"} block when content is a list
- Falls back to string concatenation when content is a str
This is consistent with how other modules in the project already handle
list content (client.py._extract_text, memory_middleware, executor.py).
* test(middleware): add unit tests for _append_text and list content hard stop
Add regression tests to verify LoopDetectionMiddleware handles list-type
AIMessage.content correctly during hard stop:
- TestAppendText: unit tests for the new _append_text() static method
covering None, str, list (including empty list) content types
- TestHardStopWithListContent: integration tests verifying hard stop
works correctly with list content (Anthropic thinking mode), None
content, and str content
Requested by reviewer in PR #1823.
* fix(middleware): improve _append_text robustness and test isolation
- Add explicit isinstance(content, str) check with fallback for
unexpected types (coerce to str) to prevent TypeError on edge cases
- Deep-copy list content in _make_state() test helper to prevent
shared mutable references across test iterations
- Add test_unexpected_type_coerced_to_str: verify fallback for
non-str/list/None content types
- Add test_list_content_not_mutated_in_place: verify _append_text
does not modify the original list
* style: fix ruff format whitespace in test file
---------
Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
|
||
|
|
ddfc988bef
|
feat(uploads): add pymupdf4llm PDF converter with auto-fallback and async offload (#1727)
* feat(uploads): add pymupdf4llm PDF converter with auto-fallback and async offload - Introduce pymupdf4llm as an optional PDF converter with better heading detection and table preservation than MarkItDown - Auto mode: prefer pymupdf4llm when installed; fall back to MarkItDown when output is suspiciously sparse (image-based / scanned PDFs) - Sparsity check uses chars-per-page (< 50 chars/page) rather than an absolute threshold, correctly handling both short and long documents - Large files (> 1 MB) are offloaded to asyncio.to_thread() to avoid blocking the event loop (related: #1569) - Add UploadsConfig with pdf_converter field (auto/pymupdf4llm/markitdown) - Add pymupdf4llm as optional dependency: pip install deerflow-harness[pymupdf] - Add 14 unit tests covering sparsity heuristic, routing logic, and async path * fix(uploads): address Copilot review comments on PDF converter - Fix docstring: MIN_CHARS_PYMUPDF -> _MIN_CHARS_PER_PAGE (typo) - Fix file handle leak: wrap pymupdf.open in try/finally to ensure doc.close() - Fix silent fallback gap: _convert_pdf_with_pymupdf4llm now catches all conversion exceptions (not just ImportError), so encrypted/corrupt PDFs fall back to MarkItDown instead of propagating - Tighten type: pdf_converter field changed from str to Literal[auto|pymupdf4llm|markitdown] - Normalize config value: _get_pdf_converter() strips and lowercases the raw config string, warns and falls back to 'auto' on unknown values |
||
|
|
5ff230eafd
|
feat(uploads): inject document outline into agent context for converted files (#1738)
* feat(uploads): inject document outline into agent context for converted files
Extract headings from converted .md files and inject them into the
<uploaded_files> context block so the agent can navigate large documents
by line number before reading.
- Add `extract_outline()` to `file_conversion.py`: recognises standard
Markdown headings (#/##/###) and SEC-style bold structural headings
(**ITEM N. BUSINESS**, **PART II**); caps at 50 entries; excludes
cover-page boilerplate (WASHINGTON DC, CURRENT REPORT, SIGNATURES)
- Add `_extract_outline_for_file()` helper in `uploads_middleware.py`:
looks for a sibling `.md` file produced by the conversion pipeline
- Update `UploadsMiddleware._create_files_message()` to render the outline
under each file entry with `L{line}: {title}` format and a `read_file`
prompt for range-based reading
- Tests: 10 new tests for `extract_outline()`, 4 new tests for outline
injection in `UploadsMiddleware`; existing test updated for new `outline`
field in `uploaded_files` state
Partially addresses #1647 (agent ignores uploaded files).
* fix(uploads): stream outline file reads and strip inline bold from heading titles
- Switch extract_outline() from read_text().splitlines() to open()+line iteration
so large converted documents are not loaded into memory on every agent turn;
exits as soon as MAX_OUTLINE_ENTRIES is reached (Copilot suggestion)
- Strip **...** wrapper from standard Markdown heading titles before appending
to outline so agent context stays clean (e.g. "## **Overview**" → "Overview")
(Copilot suggestion)
- Remove unused pathlib.Path import and fix import sort order in test_file_conversion.py
to satisfy ruff CI lint
* fix(uploads): show truncation hint when outline exceeds MAX_OUTLINE_ENTRIES
When extract_outline() hits the cap it now appends a sentinel entry
{"truncated": True} instead of silently dropping the rest of the headings.
UploadsMiddleware reads the sentinel and renders a hint line:
... (showing first 50 headings; use `read_file` to explore further)
Without this the agent had no way to know the outline was incomplete and
would treat the first 50 headings as the full document structure.
* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id
runtime.context does not always carry thread_id (depends on LangGraph
invocation path). ThreadDataMiddleware already falls back to
get_config().configurable.thread_id — apply the same pattern so
UploadsMiddleware can resolve the uploads directory and attach outlines
in all invocation paths.
* style: apply ruff format
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
|
||
|
|
46d0c329c1
|
fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id (#1814)
* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id runtime.context does not always carry thread_id depending on the LangGraph invocation path. When absent, uploads_dir resolved to None and the entire outline/historical-files attachment was silently skipped. Apply the same fallback pattern already used by ThreadDataMiddleware: try get_config().configurable.thread_id, with a RuntimeError guard for test environments where get_config() is called outside a runnable context. Discovered via live integration testing (curl against local LangGraph). Unit tests inject uploads_dir directly and would not catch this. * style: apply ruff format to uploads_middleware.py |
||
|
|
a2aba23962
|
fix: replace the offline link in the lead_agent prompt (#1800) |