Commit Graph

197 Commits

Author SHA1 Message Date
d 🔹
e8572b9d0c
fix(jina): log transient failures at WARNING without traceback (#2484) (#2485)
The exception handler in JinaClient.crawl used logger.exception, which
emits an ERROR-level record with the full httpx/httpcore/anyio traceback
for every transient network failure (timeout, connection refused). Other
search/crawl providers in the project log the same class of recoverable
failures as a single line. One offline/slow-network session could produce
dozens of multi-frame ERROR stack traces, drowning out real problems.

Switch to logger.warning with a concise message that includes the
exception type and its str, matching the style used elsewhere for
recoverable transient failures (aio_sandbox, ddg, etc.). The exception
type now also surfaces into the returned "Error: ..." string so callers
retain diagnostic signal.

Adds a regression test that asserts the log record is WARNING, carries
no exc_info, and includes the exception class name.

Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-24 16:00:14 +08:00
Xinmin Zeng
30d619de08
feat(subagents): support per-subagent skill loading and custom subagent types (#2253)
* feat(subagents): support per-subagent skill loading and custom subagent types (#2230)

Add per-subagent skill configuration and custom subagent type registration,
aligned with Codex's role-based config layering and per-session skill injection.

Backend:
- SubagentConfig gains `skills` field (None=all, []=none, list=whitelist)
- New CustomSubagentConfig for user-defined subagent types in config.yaml
- SubagentsAppConfig gains `custom_agents` section and `get_skills_for()`
- Registry resolves custom agents with three-layer config precedence
- SubagentExecutor loads skills per-session as conversation items (Codex pattern)
- task_tool no longer appends skills to system_prompt
- Lead agent system prompt dynamically lists all registered subagent types
- setup_agent tool accepts optional skills parameter
- Gateway agents API transparently passes skills in CRUD operations

Frontend:
- Agent/CreateAgentRequest/UpdateAgentRequest types include skills field
- Agent card displays skills as badges alongside tool_groups

Config:
- config.example.yaml documents custom_agents and per-agent skills override

Tests:
- 40 new tests covering all skill config, custom agents, and registry logic
- Existing tests updated for new get_skills_prompt_section signature

Closes #2230

* fix: address review feedback on skills PR

- Remove stale get_skills_prompt_section monkeypatches from test_task_tool_core_logic.py
  (task_tool no longer imports this function after skill injection moved to executor)
- Add key prefixes (tg:/sk:) to agent-card badges to prevent React key collisions
  between tool_groups and skills

* fix(ci): resolve lint and test failures

- Format agent-card.tsx with prettier (lint-frontend)
- Remove stale "Skills Appendix" system_prompt assertion — skills are now
  loaded per-session by SubagentExecutor, not appended to system_prompt

* fix(ci): sort imports in test_subagent_skills_config.py (ruff I001)

* fix(ci): use nullish coalescing in agent-card badge condition (eslint)

* fix: address review feedback on skills PR

- Use model_fields_set in AgentUpdateRequest to distinguish "field omitted"
  from "explicitly set to null" — fixes skills=None ambiguity where None
  means "inherit all" but was treated as "don't change"
- Move lazy import of get_subagent_config outside loop in
  _build_available_subagents_description to avoid repeated import overhead

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-23 23:59:47 +08:00
Shawn Jasper
5ba1dacf25
fix: rename present_file to present_files in docs and prompts (#2393)
The tool is registered as `present_files` (plural) in present_file_tool.py,
but four references in documentation and prompt strings incorrectly used the
singular form `present_file`. This could cause confusion and potentially
lead to incorrect tool invocations.

Changed files:
- backend/docs/GUARDRAILS.md
- backend/docs/ARCHITECTURE.md
- backend/packages/harness/deerflow/agents/lead_agent/prompt.py (2 occurrences)
2026-04-21 16:10:14 +08:00
Ansel
6dce26a52e
fix: resolve tool duplication and skill parser YAML inconsistencies (#1803) (#2107)
* Refactor tests for SKILL.md parser

Updated tests for SKILL.md parser to handle quoted names and descriptions correctly. Added new tests for parsing plain and single-quoted names, and ensured multi-line descriptions are processed properly.

* Implement tool name validation and deduplication

Add tool name mismatch warning and deduplication logic

* Refactor skill file parsing and error handling

* Add tests for tool name deduplication

Added tests for tool name deduplication in get_available_tools(). Ensured that duplicates are not returned, the first occurrence is kept, and warnings are logged for skipped duplicates.

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Update minimal config to include tools list

* Update test for nonexistent skill file

Ensure the test for nonexistent files checks for None.

* Refactor tool loading and add skill management support

Refactor tool loading logic to include skill management tools based on configuration and clean up comments.

* Enhance code comments for tool loading logic

Added comments to clarify the purpose of various code sections related to tool loading and configuration.

* Fix assertion for duplicate tool name warning

* Fix indentation issues in tools.py

* Fix the lint error of test_tool_deduplication

* Fix the lint error of tools.py

* Fix the lint error

* Fix the lint error

* make format

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-20 20:25:03 +08:00
imhaoran
fc94e90f6c
fix(setup-agent): prevent data loss when setup fails on existing agen… (#2254)
* fix(setup-agent): prevent data loss when setup fails on existing agent directory

Record whether the agent directory pre-existed before mkdir, and only
run shutil.rmtree cleanup when the directory was newly created during
this call. Previously, any failure would delete the entire directory
including pre-existing SOUL.md and config.yaml.

* fix: address PR review — init variables before try, remove unused result

* style: fix ruff I001 import block formatting in test file

* style: add missing blank lines between top-level definitions in test file
2026-04-20 20:17:30 +08:00
Admire
c99865f53d
fix(token-usage): enable stream usage for openai-compatible models (#2217)
* fix(token-usage): enable stream usage for openai-compatible models

* fix(token-usage): narrow stream_usage default to ChatOpenAI
2026-04-19 22:42:55 +08:00
Xun
a62ca5dd47
fix: Catch httpx.ReadError in the error handling (#2309)
* fix: Catch httpx.ReadError in the error handling

* fix
2026-04-19 22:30:22 +08:00
Nan Gao
f514e35a36
fix(backend): make clarification messages idempotent (#2350) (#2351) 2026-04-19 22:00:58 +08:00
Hinotobi
80e210f5bb
[security] fix(uploads): require explicit opt-in for host-side document conversion (#2332)
* fix: disable host-side upload conversion by default

* fix: address PR review comments on upload conversion gate
2026-04-18 22:47:42 +08:00
Shawn Jasper
55474011c9
fix(subagent): inherit parent agent's tool_groups in task_tool (#2305)
* fix(subagent): inherit parent agent's tool_groups in task_tool

When a custom agent defines tool_groups (e.g. [file:read, file:write, bash]),
the restriction is correctly applied to the lead agent. However, when the lead
agent delegates work to a subagent via the task tool, get_available_tools() is
called without the groups parameter, causing the subagent to receive ALL tools
(including web_search, web_fetch, image_search, etc.) regardless of the parent
agent's configuration.

This fix propagates tool_groups through run metadata so that task_tool passes
the same group filter when building the subagent's tool set.

Changes:
- agent.py: include tool_groups in run metadata
- task_tool.py: read tool_groups from metadata and pass to get_available_tools()

* fix: initialize metadata before conditional block and update tests for tool_groups propagation

- Initialize metadata = {} before the 'if runtime is not None' block to
  avoid Ruff F821 (possibly-undefined variable) and simplify the
  parent_tool_groups expression.
- Update existing test assertion to expect groups=None in
  get_available_tools call signature.
- Add 3 new test cases:
  - test_task_tool_propagates_tool_groups_to_subagent
  - test_task_tool_no_tool_groups_passes_none
  - test_task_tool_runtime_none_passes_groups_none
2026-04-18 22:17:37 +08:00
imhaoran
24fe5fbd8c
fix(mcp): prevent RuntimeError from escaping except block in get_cach… (#2252)
* fix(mcp): prevent RuntimeError from escaping except block in get_cached_mcp_tools

When `asyncio.get_event_loop()` raises RuntimeError and the fallback
`asyncio.run()` also fails, the exception escapes unhandled because
Python does not route exceptions raised inside an `except` block to
sibling `except` clauses. Wrap the fallback call in its own try/except
so failures are logged and the function returns [] as intended.

* fix: use logger.exception to preserve stack traces on MCP init failure
2026-04-18 21:07:30 +08:00
Shawn Jasper
ca1b7d5f48
fix(sandbox): add missing path masking in ls_tool output (#2317)
ls_tool was the only file-system tool that did not call
mask_local_paths_in_output() before returning its result, causing host
absolute paths (e.g. /Users/.../backend/.deer-flow/knowledge-base/...)
to leak to the LLM instead of the expected virtual paths
(/mnt/knowledge-base/...).

This patch:
- Adds the mask_local_paths_in_output() call to ls_tool, consistent
  with bash_tool, glob_tool and grep_tool.
- Initialises thread_data = None before the is_local_sandbox branch
  (same pattern as glob_tool) so the variable is always in scope.
- Adds three new tests covering user-data path masking, skills path
  masking and the empty-directory edge case.
2026-04-18 08:46:59 +08:00
DanielWalnut
898f4e8ac2
fix: Memory update system has cache corruption, data loss, and thread-safety bugs (#2251)
* fix(memory): cache corruption, thread-safety, and caller mutation bugs

Bug 1 (updater.py): deep-copy current_memory before passing to
_apply_updates() so a subsequent save() failure cannot leave a
partially-mutated object in the storage cache.

Bug 3 (storage.py): add _cache_lock (threading.Lock) to
FileMemoryStorage and acquire it around every read/write of
_memory_cache, fixing concurrent-access races between the background
timer thread and HTTP reload calls.

Bug 4 (storage.py): replace in-place mutation
  memory_data["lastUpdated"] = ...
with a shallow copy
  memory_data = {**memory_data, "lastUpdated": ...}
so save() no longer silently modifies the caller's dict.

Regression tests added for all three bugs in test_memory_storage.py
and test_memory_updater.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: format test_memory_updater.py with ruff

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: remove stale bug-number labels from code comments and docstrings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 12:00:31 +08:00
d 🔹
a664d2f5c4
fix(checkpointer): create parent directory before opening SQLite in sync provider (#2272)
* fix(checkpointer): create parent directory before opening SQLite in sync provider

The sync checkpointer factory (_sync_checkpointer_cm) opens a SQLite
connection without first ensuring the parent directory exists.  The async
provider and both store providers already call ensure_sqlite_parent_dir(),
but this call was missing from the sync path.

When the deer-flow harness package is used from an external virtualenv
(where the .deer-flow directory is not pre-created), the missing parent
directory causes:

    sqlite3.OperationalError: unable to open database file

Add the missing ensure_sqlite_parent_dir() call in the sync SQLite
branch, consistent with the async provider, and add a regression test.

Closes #2259

* style: fix ruff format + add call-order assertion for ensure_parent_dir

- Fix formatting in test_checkpointer.py (ruff format)
- Add test_sqlite_ensure_parent_dir_before_connect to verify
  ensure_sqlite_parent_dir is called before from_conn_string
  (addresses Copilot review suggestion)

---------

Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
2026-04-16 09:06:38 +08:00
YuJitang
105db00987
feat: show token usage per assistant response (#2270)
* feat: show token usage per assistant response

* fix: align client models response with token usage

* fix: address token usage review feedback

* docs: clarify token usage config example

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-16 08:56:49 +08:00
Hinotobi
2176b2bbfc
fix: validate bootstrap agent names before filesystem writes (#2274)
* fix: validate bootstrap agent names before filesystem writes

* fix: tighten bootstrap agent-name validation
2026-04-16 08:36:42 +08:00
DanielWalnut
8760937439
fix(memory): use asyncio.to_thread for blocking file I/O in aupdate_memory (#2220)
* fix(memory): use asyncio.to_thread for blocking file I/O in aupdate_memory

`_finalize_update` performs synchronous blocking operations (os.mkdir,
file open/write/rename/stat) that were called directly from the async
`aupdate_memory` method, causing `BlockingError` from blockbuster when
running under an ASGI server. Wrap the call with `asyncio.to_thread` to
offload all blocking I/O to a thread pool.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(memory): use unique temp filename to prevent concurrent write collision

`file_path.with_suffix(".tmp")` produces a fixed path — concurrent saves
for the same agent (now possible after wrapping _finalize_update in
asyncio.to_thread) would clobber the same temp file. Use a UUID-suffixed
temp file so each write is isolated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(memory): also offload _prepare_update_prompt to thread pool

FileMemoryStorage.load() inside _prepare_update_prompt performs
synchronous stat() and file read, blocking the event loop just like
_finalize_update did. Wrap _prepare_update_prompt in asyncio.to_thread
for the same reason.

The async path now has no blocking file I/O on the event loop:
  to_thread(_prepare_update_prompt) → await model.ainvoke() → to_thread(_finalize_update)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 16:41:54 +08:00
DanielWalnut
4ba3167f48
feat: flush memory before summarization (#2176)
* feat: flush memory before summarization

* fix: keep agent-scoped memory on summarization flush

* fix: harden summarization hook plumbing

* fix: address summarization review feedback

* style: format memory middleware
2026-04-14 15:01:06 +08:00
Octopus
e4f896e90d
fix(todo-middleware): prevent premature agent exit with incomplete todos (#2135)
* fix(todo-middleware): prevent premature agent exit with incomplete todos

When plan mode is active (is_plan_mode=True), the agent occasionally
exits the loop and outputs a final response while todo items are still
incomplete. This happens because the routing edge only checks for
tool_calls, not todo completion state.

Fixes #2112

Add an after_model override to TodoMiddleware with
@hook_config(can_jump_to=["model"]). When the model produces a
response with no tool calls but there are still incomplete todos, the
middleware injects a todo_completion_reminder HumanMessage and returns
jump_to=model to force another model turn. A cap of 2 reminders
prevents infinite loops when the agent cannot make further progress.

Also adds _completion_reminder_count() helper and 14 new unit tests
covering all edge cases of the new after_model / aafter_model logic.

* Remove unnecessary blank line in test file

* Fix runtime argument annotation in before_model

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-14 11:11:26 +08:00
luo jiyin
07fc25d285
feat: switch memory updater to async LLM calls (#2138)
* docs: mark memory updater async migration as completed

- Update TODO.md to mark the replacement of sync model.invoke()
  with async model.ainvoke() in title_middleware and memory updater
  as completed using [x] format

Addresses #2131

* feat: switch memory updater to async LLM calls

- Add async aupdate_memory() method using await model.ainvoke()
- Convert sync update_memory() to use async wrapper
- Add _run_async_update_sync() for nested loop context handling
- Maintain backward compatibility with existing sync API
- Add ThreadPoolExecutor for async execution from sync contexts

Addresses #2131

* test: add tests for async memory updater

- Add test_async_update_memory_uses_ainvoke() to verify async path
- Convert existing tests to use AsyncMock and ainvoke assertions
- Add test_sync_update_memory_wrapper_works_in_running_loop()
- Update all model mocks to use async await patterns

Addresses #2131

* fix: apply ruff formatting to memory updater

- Format multi-line expressions to single line
- Ensure code style consistency with project standards
- Fix lint issues caught by GitHub Actions

* test: add comprehensive tests for async memory updater

- Add test_async_update_memory_uses_ainvoke() to verify async path
- Convert existing tests to use AsyncMock and ainvoke assertions
- Add test_sync_update_memory_wrapper_works_in_running_loop()
- Update all model mocks to use async await patterns
- Ensure backward compatibility with sync API

* fix: satisfy ruff formatting in memory updater test

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-14 11:10:42 +08:00
Nan Gao
55bc09ac33
fix(backend): fix uploads for mounted sandbox providers (#2199)
* fix uploads for mounted sandbox providers

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-14 10:44:31 +08:00
Octopus
c91785dd68
fix(title): strip <think> tags from title model responses and assistant context (#1927)
* fix(title): strip <think> tags from title model responses and assistant context

Reasoning models (e.g. minimax M2.7, DeepSeek-R1) emit <think>...</think>
blocks before their actual output. When such a model is used as the title
model (or as the main agent), the raw thinking content leaked into the thread
title stored in state, so the chat list showed the internal monologue instead
of a meaningful title.

Fixes #1884

- Add `_strip_think_tags()` helper using a regex to remove all <think>...</think> blocks
- Apply it in `_parse_title()` so the title model response is always clean
- Apply it to the assistant message in `_build_title_prompt()` so thinking
  content from the first AI turn is not fed back to the title model
- Add four new unit tests covering: stripping in parse, think-only response,
  assistant prompt stripping, and end-to-end async flow with think tags

* Fix the lint error

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-14 09:51:39 +08:00
Hinotobi
a7e7c6d667
fix: disable custom-agent management API by default (#2161)
* fix: disable custom-agent management API by default

* style: format agents API hardening files

* fix: address review feedback for agents API hardening

* fix: add missing disabled API coverage
2026-04-14 00:03:38 +08:00
Nan Gao
f4c17c66ce
fix(middleware): fix present_files thread id fallback (#2181)
* fix present files thread id fallback

* fix: resolve present_files thread id from runtime config
2026-04-13 22:59:13 +08:00
lesliewangwyc-dev
1df389b9d0
fix: wrap blocking readability call with asyncio.to_thread in web_fetch (#2157)
* fix: wrap blocking readability call with asyncio.to_thread in web_fetch

The readability extractor internally spawns a Node.js subprocess via
readabilipy, which blocks the async event loop and causes a
BlockingError when web_fetch is invoked inside LangGraph's async
runtime.

Wrap the synchronous extract_article call with asyncio.to_thread to
offload it to a thread pool, unblocking the event loop.

Note: community/infoquest/tools.py has the same latent issue and
should be addressed in a follow-up PR.

Closes #2152

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: verify web_fetch offloads extraction via asyncio.to_thread

Add a regression test that monkeypatches asyncio.to_thread to confirm
readability extraction is offloaded to a worker thread, preventing
future refactors from reintroducing the blocking call.

Addresses Copilot review feedback on #2157.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-13 21:15:24 +08:00
5db71cb68c
fix(middleware): repair dangling tool-call history after loop interru… (#2035)
* fix(middleware): repair dangling tool-call history after loop interruption (#2029)

* docs(backend): fix middleware chain ordering

---------

Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
2026-04-12 19:11:22 +08:00
Jin
4d4ddb3d3f
feat(llm): introduce lightweight circuit breaker to prevent rate-limit bans and resource exhaustion (#2095) 2026-04-12 17:48:40 +08:00
Javen Fang
ac04f2704f
feat(subagents): allow model override per subagent in config.yaml (#2064)
* feat(subagents): allow model override per subagent in config.yaml

Wire the existing SubagentConfig.model field to config.yaml so users
can assign different models to different subagent types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(subagents): cover model override in SubagentsAppConfig + registry

Addresses review feedback on #2064:

- registry.py: update stale inline comment — the block now applies
  timeout, max_turns AND model overrides, not just timeout.
- test_subagent_timeout_config.py: add coverage for model override
  resolution across SubagentOverrideConfig, SubagentsAppConfig
  (get_model_for + load), and registry.get_subagent_config:
  - per-agent model override is applied to registry-returned config
  - omitted `model` keeps the builtin value
  - explicit `model: null` in config.yaml is equivalent to omission
  - model override on one agent does not affect other agents
  - model override preserves all other fields (name, description,
    timeout_seconds, max_turns)
  - model override does not mutate BUILTIN_SUBAGENTS

Copilot's suggestion (3) "setting model to 'inherit' forces inheritance"
is skipped intentionally: there is no 'inherit' sentinel in the current
implementation — model is `str | None`, and None already means
"inherit from parent". Adding a sentinel would be a new feature, not
test coverage for this PR.

Tests run locally: 51 passed (37 existing + 14 new / expanded).

* test(subagents): reject empty-string model at config load time

Addresses WillemJiang's review comment on #2064 (empty-string edge case):

- subagents_config.py: add `min_length=1` to the `model` field on
  SubagentOverrideConfig. `model: ""` in config.yaml would otherwise
  bypass the `is not None` check and reach create_chat_model(name="")
  as a confusing runtime error. This is symmetric with the existing
  `ge=1` guards on timeout_seconds / max_turns, so the validation style
  stays consistent across all three override fields.
- test_subagent_timeout_config.py: add test_rejects_empty_model
  mirroring the existing test_rejects_zero / test_rejects_negative
  cases; update the docstring on test_model_accepts_any_string (now
  test_model_accepts_any_non_empty_string) to reflect the new guard.

Not addressing the first comment (validating `model` against the
`models:` section at load time) in this PR. `SubagentsAppConfig` is
scoped to the `subagents:` block and cannot see the sibling `models:`
section, so proper cross-section validation needs a second pass or a
structural change that is out of scope here — and the current behavior
is consistent with how timeout_seconds / max_turns work today. Happy to
track this as a follow-up issue covering cross-section validation
uniformly for all three fields.

Tests run locally: 52 passed in this file; 1847 passed, 18 skipped
across the full backend suite. Ruff check + format clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:40:21 +08:00
Jason
dc50a7fdfb
fix(sandbox): resolve paths in read_file/write_file content for LocalSandbox (#1935)
* fix(sandbox): resolve paths in read_file/write_file content for LocalSandbox

In LocalSandbox mode, read_file and write_file now transform
container paths in file content, matching the path handling
behavior of bash tool.

- write_file: resolves virtual paths in content to system paths
  before writing, so scripts with /mnt/user-data paths work
  when executed
- read_file: reverse-resolves system paths back to virtual
  paths in returned content for consistency

This fixes scenarios where agents write Python scripts with
virtual paths, then execute them via bash tool expecting the
paths to work.

Fixes #1778

* fix(sandbox): address Copilot review — dedicated content resolver + forward-slash safety + tests

- Extract _resolve_paths_in_content() separate from _resolve_paths_in_command()
  to decouple file-content path resolution from shell-command parsing
- Normalize resolved paths to forward slashes to avoid Windows backslash
  escape issues in source files (e.g. \U in Python string literals)
- Add 4 focused tests: write resolves content, forward-slash guarantee,
  read reverse-resolves content, and write→read roundtrip

* style: fix ruff lint — remove extraneous f-string prefix

* fix(sandbox): only reverse-resolve paths in agent-written files

read_file previously applied _reverse_resolve_paths_in_output to ALL
file content, which could silently rewrite paths in user uploads and
external tool output (Willem Jiang review on #1935).

Now tracks files written through write_file in _agent_written_paths.
Only those files get reverse-resolved on read. Non-agent files are
returned as-is.

---------

Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
2026-04-11 17:41:36 +08:00
ZHANG Ning
5b633449f8
fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware (#1988)
* fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware

The existing hash-based loop detection only catches identical tool call
sets. When the agent calls the same tool type (e.g. read_file) on many
different files, each call produces a unique hash and bypasses detection.
This causes the agent to exhaust recursion_limit, consuming 150K-225K
tokens per failed run.

Add a second detection layer that tracks cumulative call counts per tool
type per thread. Warns at 30 calls (configurable) and forces stop at 50.
The hard stop message now uses the actual returned message instead of a
hardcoded constant, so both hash-based and frequency-based stops produce
accurate diagnostics.

Also fix _apply() to use the warning message returned by
_track_and_check() for hard stops, instead of always using _HARD_STOP_MSG.

Closes #1987

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix(lint): remove unused imports and fix line length

- Remove unused _TOOL_FREQ_HARD_STOP_MSG and _TOOL_FREQ_WARNING_MSG
  imports from test file (F401)
- Break long _TOOL_FREQ_WARNING_MSG string to fit within 240 char limit (E501)

* style: apply ruff format

* test: add LRU eviction and per-thread reset coverage for frequency state

Address review feedback from @WillemJiang:
- Verify _tool_freq and _tool_freq_warned are cleaned on LRU eviction
- Add test for reset(thread_id=...) clearing only the target thread's
  frequency state while leaving others intact

* fix(makefile): route Windows shell-script targets through Git Bash (#2060)

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Asish Kumar <87874775+officialasishkumar@users.noreply.github.com>
2026-04-11 17:33:27 +08:00
yorick
02569136df
fix(sandbox): improve sandbox security and preserve multimodal content (#2114)
* fix: improve sandbox security and preserve multimodal content

* Add unit test modifications for test_injects_uploaded_files_tag_into_list_content

* format updated_content

* Add regression tests for multimodal upload content and host bash default safety
2026-04-11 16:52:10 +08:00
Jin
718dddde75
fix(sandbox): prevent memory leak in file operation locks using WeakValueDictionary (#2096)
* fix(sandbox): prevent memory leak in file operation locks using WeakValueDictionary

* lint: fix lint issue in sandbox tools security
2026-04-10 22:55:53 +08:00
greatmengqi
b1aabe88b8
fix(backend): stream DeerFlowClient AI text as token deltas (#1969) (#1974)
* fix(backend): stream DeerFlowClient AI text as token deltas (#1969)

DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values",
"custom"] which only delivers full-state snapshots at graph-node
boundaries, so AI replies were dumped as a single messages-tuple event
per node instead of streaming token-by-token. `client.stream("hello")`
looked identical to `client.chat("hello")` — the bug reported in #1969.

Subscribe to "messages" mode as well, forward AIMessageChunk deltas as
messages-tuple events with delta semantics (consumers accumulate by id),
and dedup the values-snapshot path so it does not re-synthesize AI
text that was already streamed. Introduce a per-id usage_metadata
counter so the final AIMessage in the values snapshot and the final
"messages" chunk — which carry the same cumulative usage — are not
double-counted.

chat() now accumulates per-id deltas and returns the last message's
full accumulated text. Non-streaming mock sources (single event per id)
are a degenerate case of the same logic, keeping existing callers and
tests backward compatible.

Verified end-to-end against a real LLM: a 15-number count emits 35
messages-tuple events with BPE subword boundaries clearly visible
("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across
the window, end-event usage matches the values-snapshot usage exactly
(not doubled). tests/test_client_live.py::TestLiveStreaming passes.

New unit tests:
- test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3
  delta events with correct content/id/usage, values-snapshot does not
  duplicate, usage counted once.
- test_chat_accumulates_streamed_deltas: chat() rebuilds full text
  from deltas.
- test_messages_mode_tool_message: ToolMessage delivered via messages
  mode is not duplicated by the values-snapshot synthesis path.

The stream() docstring now documents why this client does not reuse
Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw
LangChain objects vs serialized dicts, single caller vs HTTP fan-out).

Fixes #1969

* refactor(backend): simplify DeerFlowClient streaming helpers (#1969)

Post-review cleanup for the token-level streaming fix. No behavior
change for correct inputs; one efficiency regression fixed.

Fix: chat() O(n²) accumulator
-----------------------------
`chat()` accumulated per-id text via `buffers[id] = buffers.get(id,"") + delta`,
which is O(n) per concat → O(n²) total over a streamed response. At
~2 KB cumulative text this becomes user-visible; at 50 KB / 5000 chunks
it costs roughly 100-300 ms of pure copying. Switched to
`dict[str, list[str]]` + `"".join()` once at return.

Cleanup
-------
- Extract `_serialize_tool_calls`, `_ai_text_event`, `_ai_tool_calls_event`,
  and `_tool_message_event` static helpers. The messages-mode and
  values-mode branches previously repeated four inline dict literals each;
  they now call the same builders.
- `StreamEvent.type` is now typed as `Literal["values", "messages-tuple",
  "custom", "end"]` via a `StreamEventType` alias. Makes the closed set
  explicit and catches typos at type-check time.
- Direct attribute access on `AIMessage`/`AIMessageChunk`: `.usage_metadata`,
  `.tool_calls`, `.id` all have default values on the base class, so the
  `getattr(..., None)` fallbacks were dead code. Removed from the hot
  path.
- `_account_usage` parameter type loosened to `Any` so that LangChain's
  `UsageMetadata` TypedDict is accepted under strict type checking.
- Trimmed narrating comments on `seen_ids` / `streamed_ids` / the
  values-synthesis skip block; kept the non-obvious ones that document
  the cross-mode dedup invariant.

Net diff: -15 lines. All 132 unit tests + harness boundary test still
pass; ruff check and ruff format pass.

* docs(backend): add STREAMING.md design note (#1969)

Dedicated design document for the token-level streaming architecture,
prompted by the bug investigation in #1969.

Contents:
- Why two parallel streaming paths exist (Gateway HTTP/async vs
  DeerFlowClient sync/in-process) and why they cannot be merged.
- LangGraph's three-layer mode naming (Graph "messages" vs Platform
  SDK "messages-tuple" vs HTTP SSE) and why a shared string constant
  would be harmful.
- Gateway path: run_agent + StreamBridge + sse_consumer with a
  sequence diagram.
- DeerFlowClient path: sync generator + direct yield, delta semantics,
  chat() accumulator.
- Why the three id sets (seen_ids / streamed_ids / counted_usage_ids)
  each carry an independent invariant and cannot be collapsed.
- End-to-end sequence for a real conversation turn.
- Lessons from #1969: why mock-based tests missed the bug, why
  BPE subword boundaries in live output are the strongest
  correctness signal, and the regression test that locks it in.
- Source code location index.

Also:
- Link from backend/CLAUDE.md Embedded Client section.
- Link from backend/docs/README.md under Feature Documentation.

* test(backend): add refactor regression guards for stream() (#1969)

Three new tests in TestStream that lock the contract introduced by
PR #1974 so any future refactor (sync->async migration, sharing a
core with Gateway's run_agent, dedup strategy change) cannot
silently change behavior.

- test_dedup_requires_messages_before_values_invariant: canary that
  documents the order-dependence of cross-mode dedup. streamed_ids
  is populated only by the messages branch, so values-before-messages
  for the same id produces duplicate AI text events. Real LangGraph
  never inverts this order, but a refactor that does (or that makes
  dedup idempotent) must update this test deliberately.

- test_messages_mode_golden_event_sequence: locks the *exact* event
  sequence (4 events: 2 messages-tuple deltas, 1 values snapshot, 1
  end) for a canonical streaming turn. List equality gives a clear
  diff on any drift in order, type, or payload shape.

- test_chat_accumulates_in_linear_time: perf canary for the O(n^2)
  fix in commit 1f11ba10. 10,000 single-char chunks must accumulate
  in under 1s; the threshold is wide enough to pass on slow CI but
  tight enough to fail if buffer = buffer + delta is restored.

All three tests pass alongside the existing 12 TestStream tests
(15/15). ruff check + ruff format clean.

* docs(backend): clarify stream() docstring on JSON serialization (#1969)

Replace the misleading "raw LangChain objects (AIMessage,
usage_metadata as dataclasses), not dicts" claim in the
"Why not reuse Gateway's run_agent?" section. The implementation
already yields plain Python dicts (StreamEvent.data is dict, and
usage_metadata is a TypedDict), so the original wording suggested
a richer return type than the API actually delivers.

The corrected wording focuses on what is actually true and
relevant: this client skips the JSON/SSE serialization layer that
Gateway adds for HTTP wire transmission, and yields stream event
payloads directly as Python data structures.

Addresses Copilot review feedback on PR #1974.

* test(backend): document none-id messages dedup limitation (#1969)

Add test_none_id_chunks_produce_duplicates_known_limitation to
TestStream that explicitly documents and asserts the current
behavior when an LLM provider emits AIMessageChunk with id=None
(vLLM, certain custom backends).

The cross-mode dedup machinery cannot record a None id in
streamed_ids (guarded by ``if msg_id:``), so the values snapshot's
reassembled AIMessage with a real id falls through and synthesizes
a duplicate AI text event. The test asserts len == 2 and locks
this as a known limitation rather than silently letting future
contributors hit it without context.

Why this is documented rather than fixed:
* Falling back to ``metadata.get("id")`` does not help — LangGraph's
  messages-mode metadata never carries the message id.
* Synthesizing ``f"_synth_{id(msg_chunk)}"`` only helps if the
  values snapshot uses the same fallback, which it does not.
* A real fix requires provider cooperation (always emit chunk ids)
  or content-based dedup (false-positive risk), neither of which
  belongs in this PR.

If a real fix lands, replace this test with a positive assertion
that dedup works for None-id chunks.

Addresses Copilot review feedback on PR #1974 (client.py:515).

* fix(frontend): UI polish - fix CSS typo, dark mode border, and hardcoded colors (#1942)

- Fix `font-norma` typo to `font-normal` in message-list subtask count
- Fix dark mode `--border` using reddish hue (22.216) instead of neutral
- Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground`
- Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground`
- Add missing `font-sans` to welcome description `<pre>` for consistency
- Make case-study-section padding responsive (`px-4 md:px-20`)

Closes #1940

* docs: clarify deployment sizing guidance (#1963)

* fix(frontend): prevent stale 'new' thread ID from triggering 422 history requests (#1960)

After history.replaceState updates the URL from /chats/new to
/chats/{UUID}, Next.js useParams does not update because replaceState
bypasses the router. The useEffect in useThreadChat would then set
threadIdFromPath ('new') as the threadId, causing the LangGraph SDK
to call POST /threads/new/history which returns HTTP 422 (Invalid
thread ID: must be a UUID).

This fix adds a guard to skip the threadId update when
threadIdFromPath is the literal string 'new', preserving the
already-correct UUID that was set when the thread was created.

* fix(frontend): avoid using route new as thread id (#1967)

Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>

* Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965)

* Fix event loop conflict in SubagentExecutor.execute()

When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.

This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.

Fixes: sub-task cards not appearing in Ultra mode when using async parent agents

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(subagent): harden isolated event loop execution

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(backend): remove dead getattr in _tool_message_event

---------

Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com>
Co-authored-by: 13ernkastel <LennonCMJ@live.com>
Co-authored-by: siwuai <458372151@qq.com>
Co-authored-by: 肖 <168966994+luoxiao6645@users.noreply.github.com>
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
Co-authored-by: Saber <11769524+hawkli-1994@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-10 18:16:38 +08:00
DanielWalnut
eef0a6e2da
feat(dx): Setup Wizard + doctor command — closes #2030 (#2034) 2026-04-10 17:43:39 +08:00
shivam johri
194bab4691
feat(config): add when_thinking_disabled support for model configs (#1970)
* feat(config): add when_thinking_disabled support for model configs

Allow users to explicitly configure what parameters are sent to the
model when thinking is disabled, via a new `when_thinking_disabled`
field in model config. This mirrors the existing `when_thinking_enabled`
pattern and takes full precedence over the hardcoded disable behavior
when set. Backwards compatible — existing configs work unchanged.

Closes #1675

* fix(config): address copilot review — gate when_thinking_disabled independently

- Switch truthiness check to `is not None` so empty dict overrides work
- Restructure disable path so when_thinking_disabled is gated independently
  of has_thinking_settings, allowing it to work without when_thinking_enabled
- Update test to reflect new behavior
2026-04-09 18:49:00 +08:00
luo jiyin
35f141fc48
feat: implement full checkpoint rollback on user cancellation (#1867)
* feat: implement full checkpoint rollback on user cancellation

- Capture pre-run checkpoint snapshot including checkpoint state, metadata, and pending_writes
- Add _rollback_to_pre_run_checkpoint() function to restore thread state
- Implement _call_checkpointer_method() helper to support both async and sync checkpointer methods
- Rollback now properly restores checkpoint, metadata, channel_versions, and pending_writes
- Remove obsolete TODO comment (Phase 2) as rollback is now complete

This resolves the TODO(Phase 2) comment and enables full thread state
restoration when a run is cancelled by the user.

* fix: address rollback review feedback

* fix: strengthen checkpoint rollback validation and error handling

- Validate restored_config structure and checkpoint_id before use
- Raise RuntimeError on malformed pending_writes instead of silent skip
- Normalize None checkpoint_ns to empty string instead of "None"
- Move delete_thread to only execute when pre_run_snapshot is None
- Add docstring noting non-atomic rollback as known limitation

This addresses review feedback on PR #1867 regarding data integrity
in the checkpoint rollback implementation.

* test: add comprehensive coverage for checkpoint rollback edge cases

- test_rollback_restores_snapshot_without_deleting_thread
- test_rollback_deletes_thread_when_no_snapshot_exists
- test_rollback_raises_when_restore_config_has_no_checkpoint_id
- test_rollback_normalizes_none_checkpoint_ns_to_root_namespace
- test_rollback_raises_on_malformed_pending_write_not_a_tuple
- test_rollback_raises_on_malformed_pending_write_non_string_channel
- test_rollback_propagates_aput_writes_failure

Covers all scenarios from PR #1867 review feedback.

* test: format rollback worker tests
2026-04-09 17:56:36 +08:00
Xinmin Zeng
0b6fa8b9e1
fix(sandbox): add startup reconciliation to prevent orphaned container leaks (#1976)
* fix(sandbox): add startup reconciliation to prevent orphaned container leaks

Sandbox containers were never cleaned up when the managing process restarted,
because all lifecycle tracking lived in in-memory dictionaries. This adds
startup reconciliation that enumerates running containers via `docker ps` and
either destroys orphans (age > idle_timeout) or adopts them into the warm pool.

Closes #1972

* fix(sandbox): address Copilot review — adopt-all strategy, improved error handling

- Reconciliation now adopts all containers into warm pool unconditionally,
  letting the idle checker decide cleanup. Avoids destroying containers
  that another concurrent process may still be using.
- list_running() logs stderr on docker ps failure and catches
  FileNotFoundError/OSError.
- Signal handler test restores SIGTERM/SIGINT in addition to SIGHUP.
- E2E test docstring corrected to match actual coverage scope.

* fix(sandbox): address maintainer review — batch inspect, lock tightening, import hygiene

- _reconcile_orphans(): merge check-and-insert into a single lock acquisition
  per container to eliminate the TOCTOU window.
- list_running(): batch the per-container docker inspect into a single call.
  Total subprocess calls drop from 2N+1 to 2 (one ps + one batch inspect).
  Parse port and created_at from the inspect JSON payload.
- Extract _parse_docker_timestamp() and _extract_host_port() as module-level
  pure helpers and test them directly.
- Move datetime/json imports to module top level.
- _make_provider_for_reconciliation(): document the __new__ bypass and the
  lockstep coupling to AioSandboxProvider.__init__.
- Add assertion that list_running() makes exactly ONE inspect call.
2026-04-09 17:21:23 +08:00
Admire
563383c60f
fix(agent): file-io path guidance in agent prompts (#2019)
* fix(prompt): guide workspace-relative file io

* Clarify bash agent file IO path guidance
2026-04-09 16:12:34 +08:00
Xun
1b74d84590
fix: resolve missing serialized kwargs in PatchedChatDeepSeek (#2025)
* add tests

* fix ci

* fix ci
2026-04-09 16:07:16 +08:00
Octopus
616caa92b1
fix(models): resolve duplicate keyword argument error when reasoning_effort appears in both config and kwargs (#2017)
When a model config includes `reasoning_effort` as an extra YAML field
(ModelConfig uses `extra="allow"`), and the thinking-disabled code path
also injects `reasoning_effort="minimal"` into kwargs, the previous
`model_class(**kwargs, **model_settings_from_config)` call raises:

  TypeError: got multiple values for keyword argument 'reasoning_effort'

Fix by merging the two dicts before instantiation, giving runtime kwargs
precedence over config values: `{**model_settings_from_config, **kwargs}`.

Fixes #1977

Co-authored-by: octo-patch <octo-patch@github.com>
2026-04-09 15:09:39 +08:00
knukn
31a3c9a3de
feat(client): add thread query methods list_threads and get_thread (#1609)
* feat(client): add thread query methods `list_threads` and `get_thread`

Implemented two public API methods in `DeerFlowClient` to query threads using the underlying `checkpointer`.

* Update backend/packages/harness/deerflow/client.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update backend/packages/harness/deerflow/client.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update backend/tests/test_client.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update backend/packages/harness/deerflow/client.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix(deerflow): Fix possible KeyError issue when sorting threads

* fix unit test

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-09 15:00:22 +08:00
Xinmin Zeng
ad6d934a5f
fix(middleware): handle string-serialized options in ClarificationMiddleware (#1997)
* fix(middleware): handle string-serialized options in ClarificationMiddleware (#1995)

Some models (e.g. Qwen3-Max) serialize array tool parameters as JSON
strings instead of native arrays. Add defensive type checking in
_format_clarification_message() to deserialize string options before
iteration, preventing per-character rendering.

* fix(middleware): normalize options after JSON deserialization

Address Copilot review feedback:
- Add post-deserialization normalization so options is always a list
  (handles json.loads returning a scalar string, dict, or None)
- Add test for JSON-encoded scalar string ("development")
- Fix test_json_string_with_mixed_types to use actual mixed types
2026-04-08 21:04:20 +08:00
hung_ng__
5350b2fb24
feat(community): add Exa search as community tool provider (#1357)
* feat(community): add Exa search as community tool provider

Add Exa (exa.ai) as a new community search provider alongside Tavily,
Firecrawl, InfoQuest, and Jina AI. Exa is an AI-native search engine
with neural, keyword, and auto search types.

New files:
- community/exa/tools.py: web_search_tool and web_fetch_tool
- tests/test_exa_tools.py: 10 unit tests with mocked Exa client

Changes:
- pyproject.toml: add exa-py dependency
- config.example.yaml: add commented-out Exa configuration examples

Usage: set `use: deerflow.community.exa.tools:web_search_tool` in
config.yaml and provide EXA_API_KEY.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(community): address PR review comments for Exa tools

- Make _get_exa_client() accept tool_name param so web_fetch reads its own config
- Remove __init__.py to match namespace package pattern of other providers
- Add duplicate tool name warning in config.example.yaml
- Add regression tests for web_fetch config resolution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update revision in uv.lock to 3

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-08 17:13:39 +08:00
Gao Mingfei
29817c3b34
fix(backend): use timezone-aware UTC in memory modules (fix pytest DeprecationWarnings) (#1992)
* fix(backend): use timezone-aware UTC in memory modules

Replace datetime.utcnow() with datetime.now(timezone.utc) and a shared
utc_now_iso_z() helper so persisted ISO timestamps keep the trailing Z
suffix without triggering Python 3.12+ deprecation warnings.

Made-with: Cursor

* refactor(backend): use removesuffix for utc_now_iso_z suffix

Makes the +00:00 -> Z transform explicit for the trailing offset only
(Copilot review on PR #1992).

Made-with: Cursor

* style(backend): satisfy ruff UP017 with datetime.UTC in memory queue

Made-with: Cursor

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-08 16:28:00 +08:00
Saber
e5b149068c
Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965)
* Fix event loop conflict in SubagentExecutor.execute()

When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.

This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.

Fixes: sub-task cards not appearing in Ultra mode when using async parent agents

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(subagent): harden isolated event loop execution

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 11:46:06 +08:00
Async23
0948c7a4e1
fix(provider): preserve streamed Codex output when response.completed.output is empty (#1928)
* fix: preserve streamed Codex output items

* fix: prefer completed Codex output over streamed placeholders
2026-04-07 18:21:22 +08:00
koppx
c3170f22da
fix(backend): make loop detection hash tool calls by stable keys (#1911)
* fix(backend): make loop detection hash tool calls by stable keys

The loop detection middleware previously hashed full tool call arguments,
which made repeated calls look different when only non-essential argument
details changed. In particular, `read_file` calls with nearby line ranges
could bypass repetition detection even when the agent was effectively
reading the same file region again and again.

- Hash tool calls using stable keys instead of the full raw args payload
- Bucket `read_file` line ranges so nearby reads map to the same region key
- Prefer stable identifiers such as `path`, `url`, `query`, or `command`
  before falling back to JSON serialization of args
- Keep hashing order-independent so the same tool call set produces the
  same hash regardless of call order

Fixes #1905

* fix(backend): harden loop detection hash normalization

- Normalize and parse stringified tool args defensively
- Expand stable key derivation to include pattern, glob, and cmd
- Normalize reversed read_file ranges before bucketing

Fixes #1905

* fix(backend): harden loop detection tool format

* exclude write_file and str_replace from the stable-key path — writing different content to the same file shouldn't be flagged.

---------

Co-authored-by: JeffJiang <for-eleven@hotmail.com>
2026-04-07 17:46:33 +08:00
KKK
3b3e8e1b0b
feat(sandbox): strengthen bash command auditing with compound splitting and expanded patterns (#1881)
* fix(sandbox): strengthen regex coverage in SandboxAuditMiddleware

Expand high-risk patterns from 6 to 13 and medium-risk from 4 to 6,
closing several bypass vectors identified by cross-referencing Claude
Code's BashSecurity validator chain against DeerFlow's threat model.

High-risk additions:
- Generalised pipe-to-sh (replaces narrow curl|sh rule)
- Targeted command substitution ($() / backtick with dangerous executables)
- base64 decode piped to execution
- Overwrite system binaries (/usr/bin/, /bin/, /sbin/)
- Overwrite shell startup files (~/.bashrc, ~/.profile, etc.)
- /proc/*/environ leakage
- LD_PRELOAD / LD_LIBRARY_PATH hijack
- /dev/tcp/ bash built-in networking

Medium-risk additions:
- sudo/su (no-op under Docker root, warn only)
- PATH= modification (long attack chain, warn only)

Design decisions:
- Command substitution uses targeted matching (curl/wget/bash/sh/python/
  ruby/perl/base64) rather than blanket block to avoid false positives
  on safe usage like $(date) or `whoami`.
- Skipped encoding/obfuscation checks (hex, octal, Unicode homoglyphs)
  as ROI is low in Docker sandbox — LLMs don't generate encoded commands
  and container isolation bounds the blast radius.
- Merged pip/pip3 into single pip3? pattern.

* feat(sandbox): compound command splitting and fork bomb detection

Split compound bash commands (&&, ||, ;) into sub-commands and classify
each independently — prevents dangerous commands hidden after safe
prefixes (e.g. "cd /workspace && rm -rf /") from bypassing detection.

- Add _split_compound_command() with shlex quote-aware splitting
- Add fork bomb detection patterns (classic and while-loop variants)
- Most severe verdict wins; block short-circuits
- 15 new tests covering compound commands, splitting, and fork bombs

* test(sandbox): add async tests for fork bomb and compound commands

Cover awrap_tool_call path for fork bomb detection (3 variants) and
compound command splitting (block/warn/pass scenarios).

* fix(sandbox): address Copilot review — no-whitespace operators, >>/etc/, whole-command scan

- _split_compound_command: replace shlex-based implementation with a
  character-by-character quote/escape-aware scanner. shlex.split only
  separates '&&' / '||' / ';' when they are surrounded by whitespace,
  so payloads like 'rm -rf /&&echo ok' or 'safe;rm -rf /' bypassed the
  previous splitter and therefore the per-sub-command classifier.
- _HIGH_RISK_PATTERNS: change r'>\s*/etc/' to r'>+\s*/etc/' so append
  redirection ('>>/etc/hosts') is also blocked.
- _classify_command: run a whole-command high-risk scan *before*
  splitting. Structural attacks like 'while true; do bash & done'
  span multiple shell statements — splitting on ';' destroys the
  pattern context, so the raw command must be scanned first.
- tests: add no-whitespace operator cases to TestSplitCompoundCommand
  and test_compound_command_classification to lock in the bypass fix.
2026-04-07 17:15:24 +08:00
lulusiyuyu
f0dd8cb0d2
fix(subagents): add cooperative cancellation for subagent threads (#1873)
* fix(subagents): add cooperative cancellation for subagent threads

Subagent tasks run inside ThreadPoolExecutor threads with their own
event loop (asyncio.run). When a user clicks stop, RunManager cancels
the parent asyncio.Task, but Future.cancel() cannot terminate a running
thread and asyncio.Event does not propagate across event loops. This
causes subagent threads to keep executing (writing files, calling LLMs)
even after the user explicitly stops the run.

Fix: add a threading.Event (cancel_event) to SubagentResult and check
it cooperatively in _aexecute()'s astream iteration loop. On cancel,
request_cancel_background_task() sets the event, and the thread exits
at the next iteration boundary.

Changes:
- executor.py: Add cancel_event field to SubagentResult, check it in
  _aexecute loop, set it on timeout, add request_cancel_background_task
- task_tool.py: Call request_cancel_background_task on CancelledError

* fix(subagents): guard cancel status and add pre-check before astream

- Only overwrite status to FAILED when still RUNNING, preserving
  TIMED_OUT set by the scheduler thread.
- Add cancel_event pre-check before entering the astream loop so
  cancellation is detected immediately when already signalled.

* fix(subagents): guard status updates with lock to prevent race condition

Wrap the check-and-set on result.status in _aexecute with
_background_tasks_lock so the timeout handler in execute_async
cannot interleave between the read and write.

* fix(subagents): add dedicated CANCELLED status for user cancellation

Introduce SubagentStatus.CANCELLED to distinguish user-initiated
cancellation from actual execution failures.  Update _aexecute,
task_tool polling, cleanup terminal-status sets, and test fixtures.

* test(subagents): add cancellation tests and fix timeout regression test

- Add dedicated TestCooperativeCancellation test class with 6 tests:
  - Pre-set cancel_event prevents astream from starting
  - Mid-stream cancel_event returns CANCELLED immediately
  - request_cancel_background_task() sets cancel_event correctly
  - request_cancel on nonexistent task is a no-op
  - Real execute_async timeout does not overwrite CANCELLED (deterministic
    threading.Event sync, no wall-clock sleeps)
  - cleanup_background_task removes CANCELLED tasks

- Add task_tool cancellation coverage:
  - test_cancellation_calls_request_cancel: assert CancelledError path
    calls request_cancel_background_task(task_id)
  - test_task_tool_returns_cancelled_message: assert CANCELLED polling
    branch emits task_cancelled event and returns expected message

- Fix pre-existing test infrastructure issue: add deerflow.sandbox.security
  to _MOCKED_MODULE_NAMES (fixes ModuleNotFoundError for all executor tests)

- Add RUNNING guard to timeout handler in executor.py to prevent
  TIMED_OUT from overwriting CANCELLED status

- Add cooperative cancellation granularity comment documenting that
  cancellation is only detected at astream iteration boundaries

---------

Co-authored-by: lulusiyuyu <lulusiyuyu@users.noreply.github.com>
2026-04-07 11:12:25 +08:00
DanielWalnut
7643a46fca
fix(skill): make skill prompt cache refresh nonblocking (#1924)
* fix: make skill prompt cache refresh nonblocking

* fix: harden skills prompt cache refresh

* chore: add timeout to skills cache warm-up
2026-04-07 10:50:34 +08:00
Markus Corazzione
c4da0e8ca9
Move async SQLite mkdir off the event loop (#1921)
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
2026-04-07 10:47:20 +08:00
DanielWalnut
888f7bfb9d
Implement skill self-evolution and skill_manage flow (#1874)
* chore: ignore .worktrees directory

* Add skill_manage self-evolution flow

* Fix CI regressions for skill_manage

* Address PR review feedback for skill evolution

* fix(skill-evolution): preserve history on delete

* fix(skill-evolution): tighten scanner fallbacks

* docs: add skill_manage e2e evidence screenshot

* fix(skill-manage): avoid blocking fs ops in session runtime

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-06 22:07:11 +08:00
KKK
055e4df049
fix(sandbox): add input sanitisation guard to SandboxAuditMiddleware (#1872)
* fix(sandbox): add L2 input sanitisation to SandboxAuditMiddleware

Add _validate_input() to reject malformed bash commands before regex
classification: empty commands, oversized commands (>10 000 chars), and
null bytes that could cause detection/execution layer inconsistency.

* fix(sandbox): address Copilot review — type guard, log truncation, reject reason

- Coerce None/non-string command to str before validation
- Truncate oversized commands in audit logs to prevent log amplification
- Propagate reject_reason through _pre_process() to block message
- Remove L2 label from comments and test class names

* fix(sandbox): isinstance type guard + async input sanitisation tests

Address review comments:
- Replace str() coercion with isinstance(raw_command, str) guard so
  non-string truthy values (0, [], False) fall back to empty string
  instead of passing validation as "0"/"[]"/"False".
- Add TestInputSanitisationBlocksInAwrapToolCall with 4 async tests
  covering empty, null-byte, oversized, and None command via
  awrap_tool_call path.
2026-04-06 17:21:58 +08:00
Zhou
1ced6e977c
fix(backend): preserve viewed image reducer metadata (#1900)
Fix concurrent viewed_images state updates for multi-image input by preserving the reducer metadata in the vision middleware state schema.
2026-04-06 16:47:19 +08:00
NmanQAQ
dd30e609f7
feat(models): add vLLM provider support (#1860)
support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking.

Co-authored-by: NmanQAQ <normangyao@qq.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-06 15:18:34 +08:00
yangzheli
5fd2c581f6
fix: add output truncation to ls_tool to prevent context window overflow (#1896)
ls_tool was the only sandbox tool without output size limits, allowing
multi-MB results from large directories to blow up the model context
window. Add head-truncation (configurable via ls_output_max_chars,
default 20000) consistent with existing bash and read_file truncation.

Closes #1887

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 15:09:57 +08:00
7c68dd4ad4
Fix(#1702): stream resume run (#1858)
* fix: repair stream resume run metadata

# Conflicts:
#	backend/packages/harness/deerflow/runtime/stream_bridge/memory.py
#	frontend/src/core/threads/hooks.ts

* fix(stream): repair resumable replay validation

---------

Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-06 14:51:10 +08:00
suyua9
29575c32f9
fix: expose custom events from DeerFlowClient.stream() (#1827)
* fix: expose custom client stream events

Signed-off-by: suyua9 <1521777066@qq.com>

* fix(client): normalize streamed custom mode values

* test(client): satisfy backend ruff import ordering

---------

Signed-off-by: suyua9 <1521777066@qq.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-06 10:09:39 +08:00
thefoolgy
8049785de6
fix(memory): case-insensitive fact deduplication and positive reinforcement detection (#1804)
* fix(memory): case-insensitive fact deduplication and positive reinforcement detection

Two fixes to the memory system:

1. _fact_content_key() now lowercases content before comparison, preventing
   semantically duplicate facts like "User prefers Python" and "user prefers
   python" from being stored separately.

2. Adds detect_reinforcement() to MemoryMiddleware (closes #1719), mirroring
   detect_correction(). When users signal approval ("yes exactly", "perfect",
   "完全正确", etc.), the memory updater now receives reinforcement_detected=True
   and injects a hint prompting the LLM to record confirmed preferences and
   behaviors with high confidence.

   Changes across the full signal path:
   - memory_middleware.py: _REINFORCEMENT_PATTERNS + detect_reinforcement()
   - queue.py: reinforcement_detected field in ConversationContext and add()
   - updater.py: reinforcement_detected param in update_memory() and
     update_memory_from_conversation(); builds reinforcement_hint alongside
     the existing correction_hint

Tests: 11 new tests covering deduplication, hint injection, and signal
detection (Chinese + English patterns, window boundary, conflict with correction).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(memory): address Copilot review comments on reinforcement detection

- Tighten _REINFORCEMENT_PATTERNS: remove 很好, require punctuation/end-of-string boundaries on remaining patterns, split this-is-good into stricter variants
- Suppress reinforcement_detected when correction_detected is true to avoid mixed-signal noise
- Use casefold() instead of lower() for Unicode-aware fact deduplication
- Add missing test coverage for reinforcement_detected OR merge and forwarding in queue

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 16:23:00 +08:00
Evan Wu
9ca68ffaaa
fix: preserve virtual path separator style (#1828)
* fix: preserve virtual path separator style

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-05 15:52:22 +08:00
Markus Corazzione
0ffe5a73c1
chroe(config):Increase subagent max-turn limits (#1852) 2026-04-05 15:41:00 +08:00
SHIYAO ZHANG
72d4347adb
fix(sandbox): guard against None runtime.context in sandbox tool helpers (#1853)
sandbox_from_runtime() and ensure_sandbox_initialized() write
sandbox_id into runtime.context after acquiring a sandbox. When
lazy_init=True and no context is supplied to the graph run,
runtime.context is None (the LangGraph default), causing a TypeError
on the assignment.

Add `if runtime.context is not None` guards at all three write sites.
Reads already had equivalent guards (e.g. `runtime.context.get(...) if
runtime.context else None`); this brings writes into line.
2026-04-05 10:58:38 +08:00
DanielWalnut
2a150f5d4a
fix: unblock concurrent threads and workspace hydration (#1839)
* fix: unblock concurrent threads and workspace hydration

* fix: restore async title generation

* fix: address PR review feedback

* style: format lead agent prompt
2026-04-04 21:19:35 +08:00
SHIYAO ZHANG
163121d327
fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline (#1838)
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents

Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.

This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this):  agent guided to use outline + grep + read_file workflow

* feat(uploads): add file-first priority and fallback guidance to uploaded_files context

* fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline

- Add _clean_bold_title() to merge adjacent bold spans (** **) produced
  by pymupdf4llm when bold text crosses span boundaries
- Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise **<num>** **<title>**
  headings common in academic papers; excludes pure-number table headers
  and rows with more than 4 bold blocks
- When outline is empty, read first 5 non-empty lines of the .md as a
  content preview and surface a grep hint in the agent context
- Update _format_file_entry to render the preview + grep hint instead of
  silently omitting the outline section
- Add 3 new extract_outline tests and 2 new middleware tests (65 total)

* fix(uploads): address Copilot review comments on extract_outline regex

- Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII
  titles (e.g. **1** **概述**); pure-numeric/punctuation blocks still excluded
- Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to
  keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input
- Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex)
- Log debug message with exc_info when preview extraction fails
2026-04-04 14:25:08 +08:00
SHIYAO ZHANG
bbd0866374
feat(uploads): guide agent using agentic search for uploaded documents (#1816)
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents

Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.

This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this):  agent guided to use outline + grep + read_file workflow

* feat(uploads): add file-first priority and fallback guidance to uploaded_files context
2026-04-04 11:08:31 +08:00
ppyt
db82b59254
fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware (#1823)
* fix: inject longTermBackground into memory prompt

The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.

The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.

Add the missing field handling and a regression test.

* fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware

LangChain AIMessage.content can be str | list. When using providers that
return structured content blocks (e.g. Anthropic thinking mode, certain
OpenAI-compatible gateways), content is a list of dicts like
[{"type": "text", "text": "..."}].

The hard_limit branch in _apply() concatenated content with a string via
(last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}", which raises
TypeError when content is a non-empty list (list + str is invalid).

Add _append_text() static method that:
- Returns the text directly when content is None
- Appends a {"type": "text"} block when content is a list
- Falls back to string concatenation when content is a str

This is consistent with how other modules in the project already handle
list content (client.py._extract_text, memory_middleware, executor.py).

* test(middleware): add unit tests for _append_text and list content hard stop

Add regression tests to verify LoopDetectionMiddleware handles list-type
AIMessage.content correctly during hard stop:

- TestAppendText: unit tests for the new _append_text() static method
  covering None, str, list (including empty list) content types
- TestHardStopWithListContent: integration tests verifying hard stop
  works correctly with list content (Anthropic thinking mode), None
  content, and str content

Requested by reviewer in PR #1823.

* fix(middleware): improve _append_text robustness and test isolation

- Add explicit isinstance(content, str) check with fallback for
  unexpected types (coerce to str) to prevent TypeError on edge cases
- Deep-copy list content in _make_state() test helper to prevent
  shared mutable references across test iterations
- Add test_unexpected_type_coerced_to_str: verify fallback for
  non-str/list/None content types
- Add test_list_content_not_mutated_in_place: verify _append_text
  does not modify the original list

* style: fix ruff format whitespace in test file

---------

Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
2026-04-04 10:38:22 +08:00
SHIYAO ZHANG
ddfc988bef
feat(uploads): add pymupdf4llm PDF converter with auto-fallback and async offload (#1727)
* feat(uploads): add pymupdf4llm PDF converter with auto-fallback and async offload

- Introduce pymupdf4llm as an optional PDF converter with better heading
  detection and table preservation than MarkItDown
- Auto mode: prefer pymupdf4llm when installed; fall back to MarkItDown
  when output is suspiciously sparse (image-based / scanned PDFs)
- Sparsity check uses chars-per-page (< 50 chars/page) rather than an
  absolute threshold, correctly handling both short and long documents
- Large files (> 1 MB) are offloaded to asyncio.to_thread() to avoid
  blocking the event loop (related: #1569)
- Add UploadsConfig with pdf_converter field (auto/pymupdf4llm/markitdown)
- Add pymupdf4llm as optional dependency: pip install deerflow-harness[pymupdf]
- Add 14 unit tests covering sparsity heuristic, routing logic, and async path

* fix(uploads): address Copilot review comments on PDF converter

- Fix docstring: MIN_CHARS_PYMUPDF -> _MIN_CHARS_PER_PAGE (typo)
- Fix file handle leak: wrap pymupdf.open in try/finally to ensure doc.close()
- Fix silent fallback gap: _convert_pdf_with_pymupdf4llm now catches all
  conversion exceptions (not just ImportError), so encrypted/corrupt PDFs
  fall back to MarkItDown instead of propagating
- Tighten type: pdf_converter field changed from str to Literal[auto|pymupdf4llm|markitdown]
- Normalize config value: _get_pdf_converter() strips and lowercases the raw
  config string, warns and falls back to 'auto' on unknown values
2026-04-03 21:59:45 +08:00
SHIYAO ZHANG
5ff230eafd
feat(uploads): inject document outline into agent context for converted files (#1738)
* feat(uploads): inject document outline into agent context for converted files

Extract headings from converted .md files and inject them into the
<uploaded_files> context block so the agent can navigate large documents
by line number before reading.

- Add `extract_outline()` to `file_conversion.py`: recognises standard
  Markdown headings (#/##/###) and SEC-style bold structural headings
  (**ITEM N. BUSINESS**, **PART II**); caps at 50 entries; excludes
  cover-page boilerplate (WASHINGTON DC, CURRENT REPORT, SIGNATURES)
- Add `_extract_outline_for_file()` helper in `uploads_middleware.py`:
  looks for a sibling `.md` file produced by the conversion pipeline
- Update `UploadsMiddleware._create_files_message()` to render the outline
  under each file entry with `L{line}: {title}` format and a `read_file`
  prompt for range-based reading
- Tests: 10 new tests for `extract_outline()`, 4 new tests for outline
  injection in `UploadsMiddleware`; existing test updated for new `outline`
  field in `uploaded_files` state

Partially addresses #1647 (agent ignores uploaded files).

* fix(uploads): stream outline file reads and strip inline bold from heading titles

- Switch extract_outline() from read_text().splitlines() to open()+line iteration
  so large converted documents are not loaded into memory on every agent turn;
  exits as soon as MAX_OUTLINE_ENTRIES is reached (Copilot suggestion)
- Strip **...** wrapper from standard Markdown heading titles before appending
  to outline so agent context stays clean (e.g. "## **Overview**" → "Overview")
  (Copilot suggestion)
- Remove unused pathlib.Path import and fix import sort order in test_file_conversion.py
  to satisfy ruff CI lint

* fix(uploads): show truncation hint when outline exceeds MAX_OUTLINE_ENTRIES

When extract_outline() hits the cap it now appends a sentinel entry
{"truncated": True} instead of silently dropping the rest of the headings.
UploadsMiddleware reads the sentinel and renders a hint line:

  ... (showing first 50 headings; use `read_file` to explore further)

Without this the agent had no way to know the outline was incomplete and
would treat the first 50 headings as the full document structure.

* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id

runtime.context does not always carry thread_id (depends on LangGraph
invocation path). ThreadDataMiddleware already falls back to
get_config().configurable.thread_id — apply the same pattern so
UploadsMiddleware can resolve the uploads directory and attach outlines
in all invocation paths.

* style: apply ruff format

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-03 20:52:47 +08:00
SHIYAO ZHANG
46d0c329c1
fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id (#1814)
* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id

runtime.context does not always carry thread_id depending on the
LangGraph invocation path. When absent, uploads_dir resolved to None
and the entire outline/historical-files attachment was silently skipped.

Apply the same fallback pattern already used by ThreadDataMiddleware:
try get_config().configurable.thread_id, with a RuntimeError guard for
test environments where get_config() is called outside a runnable context.

Discovered via live integration testing (curl against local LangGraph).
Unit tests inject uploads_dir directly and would not catch this.

* style: apply ruff format to uploads_middleware.py
2026-04-03 20:26:21 +08:00
Rain120
a2aba23962
fix: replace the offline link in the lead_agent prompt (#1800) 2026-04-03 20:19:23 +08:00
d 🔹
6dbdd4674f
fix: guarantee END sentinel delivery when stream bridge queue is full (#1695)
When MemoryStreamBridge queue reaches capacity, publish_end() previously
used the same 30s timeout + drop strategy as regular events. If the END
sentinel was dropped, subscribe() would loop forever waiting for it,
causing the SSE connection to hang indefinitely and leaking _queues and
_counters resources for that run_id.

Changes:
- publish_end() now evicts oldest regular events when queue is full to
  guarantee END sentinel delivery — the sentinel is the only signal that
  allows subscribers to terminate
- Added per-run drop counters (_dropped_counts) with dropped_count() and
  dropped_total properties for observability
- cleanup() and close() now clear drop counters
- publish() logs total dropped count per run for easier debugging

Tests:
- test_end_sentinel_delivered_when_queue_full: verifies END arrives even
  with a completely full queue
- test_end_sentinel_evicts_oldest_events: verifies eviction behavior
- test_end_sentinel_no_eviction_when_space_available: no side effects
  when queue has room
- test_concurrent_tasks_end_sentinel: 4 concurrent producer/consumer
  pairs all terminate properly
- test_dropped_count_tracking, test_dropped_total,
  test_cleanup_clears_dropped_counts, test_close_clears_dropped_counts:
  drop counter coverage

Closes #1689

Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
2026-04-03 20:12:30 +08:00
finallylly
1694c616ef
feat(sandbox): add read-only support for local sandbox path mappings (#1808) 2026-04-03 19:46:22 +08:00
DanielWalnut
c6cdf200ce
feat(sandbox): add built-in grep and glob tools (#1784)
* feat(sandbox): add grep and glob tools

* refactor(aio-sandbox): use native file search APIs

* fix(sandbox): address review issues in grep/glob tools

- aio_sandbox: use should_ignore_path() instead of should_ignore_name()
  for include_dirs=True branch to filter nested ignored paths correctly
- aio_sandbox: add early exit when max_results reached in glob loop
- aio_sandbox: guard entry.path.startswith(path) before stripping prefix
- aio_sandbox: validate regex locally before sending to remote API
- search: skip lines exceeding max_line_chars to prevent ReDoS
- search: remove resolve() syscall in os.walk loop
- tools: avoid double get_thread_data() call in glob_tool/grep_tool
- tests: add 6 new cases covering the above code paths
- tests: patch get_app_config in truncation test to isolate config

* Fix sandbox grep/glob review feedback

* Remove unrelated Langfuse RFC from PR
2026-04-03 16:03:06 +08:00
Admire
48565664e0
fix ACP mcpServers payload (#1735)
* fix ACP mcpServers payload

* Handle invalid ACP MCP config
2026-04-03 15:28:56 +08:00
knukn
76fad8b08d
feat(client): add available_skills parameter to DeerFlowClient (#1779)
* feat(client): add `available_skills` parameter to DeerFlowClient for dynamic runtime skill filtering

* Update backend/packages/harness/deerflow/client.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix(client): include `agent_name` and `available_skills` in agent config cache key

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-03 11:22:58 +08:00
ppyt
5664b9d413
fix: inject longTermBackground into memory prompt (#1734)
The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.

The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.

Add the missing field handling and a regression test.

Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
2026-04-03 11:21:58 +08:00
greatmengqi
8128a3bc57
fix: enable DanglingToolCallMiddleware for subagents (#1766) 2026-04-02 18:56:18 +08:00
moose-lab
f56d0b4869
fix(sandbox): exclude URL paths from absolute path validation (#1385) (#1419)
* fix(sandbox): URL路径被误判为不安全绝对路径 (#1385)

在本地沙箱模式下,bash工具对命令做绝对路径安全校验时,会把curl命令中的
HTTPS URL(如 https://example.com/api/v1/check)误识别为本地绝对路径并拦截。

根因:_ABSOLUTE_PATH_PATTERN 正则的负向后行断言 (?<![:\w]) 只排除了冒号和
单词字符,但 :// 中第二个斜杠前面是第一个斜杠(/),不在排除列表中,导致
//example.com/api/... 被匹配为绝对路径 /example.com/api/...。

修复:在负向后行断言中增加斜杠字符,改为 (?<![:\w/]),使得 :// 中的连续
斜杠不会触发绝对路径匹配。同时补充了URL相关的单元测试用例。

Signed-off-by: moose-lab <moose-lab@users.noreply.github.com>

* fix(sandbox): refine absolute path regex to preserve file:// defense-in-depth

Change lookbehind from (?<![:\w/]) to (?<![:\w])(?<!:/) so only the
second slash in :// sequences is excluded. This keeps URL paths from
false-positiving while still letting the regex detect /etc/passwd in
file:///etc/passwd. Also add explicit file:// URL blocking and tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: moose-lab <moose-lab@users.noreply.github.com>
Co-authored-by: moose-lab <moose-lab@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-02 16:09:14 +08:00
Varian_米泽
a2cb38f62b
fix: prevent concurrent subagent file write conflicts in sandbox tools (#1714)
* fix: prevent concurrent subagent file write conflicts

Serialize same-path str_replace operations in sandbox tools

Guard AioSandbox write_file/update_file with the existing sandbox lock

Add regression tests for concurrent str_replace and append races

Verify with backend full tests and ruff lint checks

* fix(sandbox): Fix the concurrency issue of file operations on the same path in isolated sandboxes.

Ensure that different sandbox instances use independent locks for file operations on the same virtual path to avoid concurrency conflicts. Change the lock key from a single path to a composite key of (sandbox.id, path), and add tests to verify the concurrent safety of isolated sandboxes.

* feat(sandbox): Extract file operation lock logic to standalone module and fix concurrency issues

Extract file operation lock related logic from tools.py into a separate file_operation_lock.py module.
Fix data race issues during concurrent str_replace and write_file operations.
2026-04-02 15:39:41 +08:00
knukn
f8fb8d6fb1
feat/per agent skill filter (#1650)
* feat(agent): 为AgentConfig添加skills字段并更新lead_agent系统提示

在AgentConfig中添加skills字段以支持配置agent可用技能
更新lead_agent的系统提示模板以包含可用技能信息

* fix: resolve agent skill configuration edge cases and add tests

* Update backend/packages/harness/deerflow/agents/lead_agent/prompt.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor(agent): address PR review comments for skills configuration

- Add detailed docstring to `skills` field in `AgentConfig` to clarify the semantics of `None` vs `[]`.
- Add unit tests in `test_custom_agent.py` to verify `load_agent_config()` correctly parses omitted skills and explicit empty lists.
- Fix `test_make_lead_agent_empty_skills_passed_correctly` to include `agent_name` in the runtime config, ensuring it exercises the real code path.

* docs: 添加关于按代理过滤技能的配置说明

在配置示例文件和文档中添加说明,解释如何通过代理的config.yaml文件限制加载的技能

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-02 15:02:09 +08:00
totoyang
2d1f90d5dc
feat(tracing): add optional Langfuse support (#1717)
* feat(tracing): add optional Langfuse support

* Fix tracing fail-fast behavior for explicitly enabled providers

* fix(lint)
2026-04-02 13:06:10 +08:00
3a672b39c7
Fix/1681 llm call retry handling (#1683)
* fix(runtime): handle llm call errors gracefully

* fix(runtime): preserve graph control flow in llm retry middleware

---------

Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
2026-04-02 10:12:17 +08:00
SHIYAO ZHANG
df5339b5d0
feat(sandbox): truncate oversized bash and read_file tool outputs (#1677)
* feat(sandbox): truncate oversized bash and read_file tool outputs

Long tool outputs (large directory listings, multi-MB source files) can
overflow the model's context window. Two new configurable limits:

- bash_output_max_chars (default 20000): middle-truncates bash output,
  preserving both head and tail so stderr at the end is not lost
- read_file_output_max_chars (default 50000): head-truncates file output
  with a hint to use start_line/end_line for targeted reads

Both limits are enforced at the tool layer (sandbox/tools.py) rather
than middleware, so truncation is guaranteed regardless of call path.
Setting either limit to 0 disables truncation entirely.

Measured: read_file on a 250KB source file drops from 63,698 tokens to
19,927 tokens (69% reduction) with the default limit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): remove unused pytest import and fix import sort order

* style: apply ruff format to sandbox/tools.py

* refactor(sandbox): address Copilot review feedback on truncation feature

- strict hard cap: while-loop ensures result (including marker) ≤ max_chars
- max_chars=0 now returns "" instead of original output
- get_app_config() wrapped in try/except with fallback to defaults
- sandbox_config.py: add ge=0 validation on truncation limit fields
- config.example.yaml: bump config_version 4→5
- tests: add len(result) <= max_chars assertions, edge-case (max=0, small
  max, various sizes) tests; fix skipped-count test for strict hard cap

* refactor(sandbox): replace while-loop truncation with fixed marker budget

Use a pre-allocated constant (_MARKER_MAX_LEN) instead of a convergence
loop to ensure result <= max_chars. Simpler, safer, and skipped-char
count in the marker is now an exact predictable value.

* refactor(sandbox): compute marker budget dynamically instead of hardcoding

* fix(sandbox): make max_chars=0 disable truncation instead of returning empty string

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
2026-04-02 09:22:41 +08:00
Jason
1fb5acee39
fix(gateway): prevent 400 error when client sends context with configurable (#1660)
* fix(gateway): prevent 400 error when client sends context with configurable

Fixes #1290

LangGraph >= 0.6.0 rejects requests that include both 'configurable' and
'context' in the run config. If the client (e.g. useStream hook) sends
a 'context' key, we now honour it and skip creating our own
'configurable' dict to avoid the conflict.

When no 'context' is provided, we fall back to the existing
'configurable' behaviour with thread_id.

* fix(gateway): address review feedback — warn on dual keys, fix runtime injection, add tests

- Log a warning when client sends both 'context' and 'configurable' so
  it's no longer silently dropped (reviewer feedback)
- Ensure thread_id is available in config['context'] when present so
  middlewares can find it there too
- Add test coverage for the context path, the both-keys-present case,
  passthrough of other keys, and the no-config fallback

* style: ruff format services.py

---------

Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-01 23:21:32 +08:00
Alian
e97c8c9943
fix(skills): support parsing multiline YAML strings in SKILL.md frontmatter (#1703)
* fix(skills): support parsing multiline YAML strings in SKILL.md frontmatter

* test(skills): add tests for multiline YAML descriptions
2026-04-01 23:08:30 +08:00
Shengyuan Wang
2f3744f807
refactor: replace sync requests with async httpx in Jina AI client (#1603)
* refactor: replace sync requests with async httpx in Jina AI client

Replace synchronous `requests.post()` with `httpx.AsyncClient` in
JinaClient.crawl() and make web_fetch_tool async. This is part of the
planned async concurrency optimization for the agent hot path
(see docs/TODO.md).

* fix: address Copilot review feedback on async Jina client

- Short-circuit error strings in web_fetch_tool before passing to
  ReadabilityExtractor, preventing misleading extraction results
- Log missing JINA_API_KEY warning only once per process to reduce
  noise under concurrent async fetching
- Use logger.exception instead of logger.error in crawl exception
  handler to preserve stack traces for debugging
- Add async web_fetch_tool tests and warn-once coverage

* fix: mock get_app_config in web_fetch_tool tests for CI

The web_fetch_tool tests failed in CI because get_app_config requires
a config.yaml file that isn't present in the test environment. Mock
the config loader to remove the filesystem dependency.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-01 17:02:39 +08:00
AochenShen99
0cdecf7b30
feat(memory): structured reflection + correction detection in MemoryMiddleware (#1620) (#1668)
* feat(memory): add structured reflection and correction detection

* fix(memory): align sourceError schema and prompt guidance

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-01 16:45:29 +08:00
LYU Yichen
3e461d9d08
fix: use safe docker bind mount syntax for sandbox mounts (#1655)
Docker's -v host:container syntax is ambiguous for Windows drive-letter
paths (e.g. D:/...) because ':' is both the drive separator and the
volume separator, causing mount failures on Windows hosts.

Introduce _format_container_mount() which uses '--mount type=bind,...'
for Docker (unambiguous on all platforms) and keeps '-v' for Apple
Container runtime which does not support the --mount flag yet.

Adds unit tests covering Windows paths, read-only mounts, and Apple
Container pass-through.

Made-with: Cursor
2026-04-01 11:42:12 +08:00
Matt Van Horn
a3bfea631c
fix(sandbox): serialize concurrent exec_command calls in AioSandbox (#1435)
* fix(sandbox): serialize concurrent exec_command calls in AioSandbox

The AIO sandbox container maintains a single persistent shell session
that corrupts when multiple exec_command requests arrive concurrently
(e.g. when ToolNode issues parallel tool_calls). The corrupted session
returns 'ErrorObservation' strings as output, cascading into subsequent
commands.

Add a threading.Lock to AioSandbox to serialize shell commands. As a
secondary defense, detect ErrorObservation in output and retry with a
fresh session ID.

Fixes #1433

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(sandbox): address Copilot review findings

- Fix shell injection in list_dir: use shlex.quote(path) to escape
  user-provided paths in the find command
- Narrow ErrorObservation retry condition from broad substring match
  to the specific corruption signature to prevent false retries
- Improve test_lock_prevents_concurrent_execution: use threading.Barrier
  to ensure all workers contend for the lock simultaneously
- Improve test_list_dir_uses_lock: assert lock.locked() is True during
  exec_command to verify lock acquisition

* style: auto-format with ruff

---------

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 22:33:35 +08:00
Admire
aae59a8ba8
fix: surface configured sandbox mounts to agents (#1638)
* fix: surface configured sandbox mounts to agents

* fix: address PR review feedback

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-31 22:22:30 +08:00
Admire
3ff15423d6
fix Windows Docker sandbox path mounting (#1634)
* fix windows docker sandbox paths

* fix windows sandbox mount validation

* fix backend checks for windows sandbox path PR
2026-03-31 22:19:27 +08:00
SHIYAO ZHANG
c2f7be37b3
fix(tools): move sandbox.tools import in view_image_tool to break circular import (#1674)
view_image_tool.py had a top-level import of deerflow.sandbox.tools, which
created a circular dependency chain:

  sandbox.tools
    -> deerflow.agents.thread_state (triggers agents/__init__.py)
      -> agents/factory.py
        -> tools/builtins/__init__.py
          -> view_image_tool.py
            -> deerflow.sandbox.tools  <-- circular!

This caused ImportError when any test directly imported sandbox.tools,
making test_sandbox_tools_security.py fail to collect since #1522.

Fix: move the sandbox.tools import inside the view_image_tool function body.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 22:05:23 +08:00
Admire
9a557751d6
feat: support memory import and export (#1521)
* feat: support memory import and export

* fix(memory): address review feedback

* style: format memory settings page

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-30 17:25:47 +08:00
rayhpeng
34e835bc33
feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli (#1403)
* feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli

Implement all core LangGraph Platform API endpoints in the Gateway,
allowing it to fully replace the langgraph-cli dev server for local
development. This eliminates a heavyweight dependency and simplifies
the development stack.

Changes:
- Add runs lifecycle endpoints (create, stream, wait, cancel, join)
- Add threads CRUD and search endpoints
- Add assistants compatibility endpoints (search, get, graph, schemas)
- Add StreamBridge (in-memory pub/sub for SSE) and async provider
- Add RunManager with atomic create_or_reject (eliminates TOCTOU race)
- Add worker with interrupt/rollback cancel actions and runtime context injection
- Route /api/langgraph/* to Gateway in nginx config
- Skip langgraph-cli startup by default (SKIP_LANGGRAPH_SERVER=0 to restore)
- Add unit tests for RunManager, SSE format, and StreamBridge

* fix: drain bridge queue on client disconnect to prevent backpressure

When on_disconnect=continue, keep consuming events from the bridge
without yielding, so the worker is not blocked by a full queue.
Only on_disconnect=cancel breaks out immediately.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: remove pytest import

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: Fix default stream_mode to ["values", "messages-tuple"]

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: Remove unused if_exists field from ThreadCreateRequest

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: address review comments on gateway LangGraph API

- Mount runs.py router in app.py (missing include_router)
- Normalize interrupt_before/after "*" to node list before run_agent()
- Use entry.id for SSE event ID instead of counter
- Drain bridge queue on disconnect when on_disconnect=continue
- Reuse serialization helper in wait_run() for consistent wire format
- Reject unsupported multitask_strategy with 400
- Remove SKIP_LANGGRAPH_SERVER fallback, always use Gateway

* feat: extract app.state access into deps.py

Encapsulate read/write operations for singleton objects (RunManager,
StreamBridge, checkpointer) held in app.state into a shared utility,
reducing repeated access patterns across router modules.

* feat: extract deerflow.runtime.serialization module with tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace duplicated serialization with deerflow.runtime.serialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: extract app/gateway/services.py with run lifecycle logic

Create a service layer that centralizes SSE formatting, input/config
normalization, and run lifecycle management. Router modules will delegate
to these functions instead of using private cross-imported helpers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: wire routers to use services layer, remove cross-module private imports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: apply ruff formatting to refactored files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(runtime): support LangGraph dev server and add compat route

- Enable official LangGraph dev server for local development workflow
- Decouple runtime components from agents package for better separation
- Provide gateway-backed fallback route when dev server is skipped
- Simplify lifecycle management using context manager in gateway

* feat(runtime): add Store providers with auto-backend selection

- Add async_provider.py and provider.py under deerflow/runtime/store/
- Support memory, sqlite, postgres backends matching checkpointer config
- Integrate into FastAPI lifespan via AsyncExitStack in deps.py
- Replace hardcoded InMemoryStore with config-driven factory

* refactor(gateway): migrate thread management from checkpointer to Store and resolve multiple endpoint failures

- Add Store-backed CRUD helpers (_store_get, _store_put, _store_upsert)
- Replace checkpoint-scanning search with two-phase strategy:
  phase 1 reads Store (O(threads)), phase 2 backfills from checkpointer
  for legacy/LangGraph Server threads with lazy migration
- Extend Store record schema with values field for title persistence
- Sync thread title from checkpoint to Store after run completion
- Fix /threads/{id}/runs/{run_id}/stream 405 by accepting both
  GET and POST methods; POST handles interrupt/rollback actions
- Fix /threads/{id}/state 500 by separating read_config and
  write_config, adding checkpoint_ns to configurable, and
  shallow-copying checkpoint/metadata before mutation
- Sync title to Store on state update for immediate search reflection
- Move _upsert_thread_in_store into services.py, remove duplicate logic
- Add _sync_thread_title_after_run: await run task, read final
  checkpoint title, write back to Store record
- Spawn title sync as background task from start_run when Store exists

* refactor(runtime): deduplicate store and checkpointer provider logic

Extract _ensure_sqlite_parent_dir() helper into checkpointer/provider.py
and use it in all three places that previously inlined the same mkdir logic.
Consolidate duplicate error constants in store/async_provider.py by importing
from store/provider.py instead of redefining them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(runtime): move SQLite helpers to runtime/store, checkpointer imports from store

_resolve_sqlite_conn_str and _ensure_sqlite_parent_dir now live in
runtime/store/provider.py. agents/checkpointer/provider and
agents/checkpointer/async_provider import from there, reversing the
previous dependency direction (store → checkpointer becomes
checkpointer → store).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(runtime): extract SQLite helpers into runtime/store/_sqlite_utils.py

Move resolve_sqlite_conn_str and ensure_sqlite_parent_dir out of
checkpointer/provider.py into a dedicated _sqlite_utils module.
Functions are now public (no underscore prefix), making cross-module
imports semantically correct. All four provider files import from
the single shared location.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(gateway): use adelete_thread to fully remove thread checkpoints on delete

AsyncSqliteSaver has no adelete method — the previous hasattr check
always evaluated to False, silently leaving all checkpoint rows in the
database. Switch to adelete_thread(thread_id) which deletes every
checkpoint and pending-write row for the thread across all namespaces
(including sub-graph checkpoints).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(gateway): remove dead bridge_cm/ckpt_cm code and fix StrEnum lint

app.py had unreachable code after the async-with lifespan refactor:
bridge_cm and ckpt_cm were referenced but never defined (F821), and
the channel service startup/shutdown was outside the langgraph_runtime
block so it never ran. Move channel service lifecycle inside the
async-with block where it belongs.

Replace str+Enum inheritance in RunStatus and DisconnectMode with
StrEnum as suggested by UP042.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: format with ruff

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-30 16:02:23 +08:00
d 🔹
9bcdba6038
fix: promote deferred tools after tool_search returns schema (#1570)
* fix: promote matched tools from deferred registry after tool_search returns schema

After tool_search returns a tool's full schema, the tool is promoted
(removed from the deferred registry) so DeferredToolFilterMiddleware
stops filtering it from bind_tools on subsequent LLM calls.

Without this, deferred tools are permanently filtered — the LLM gets
the schema from tool_search but can never invoke the tool because
the middleware keeps stripping it.

Fixes #1554

* test: add promote() and tool_search promotion tests

Tests cover:
- promote removes tools from registry
- promote nonexistent/empty is no-op
- search returns nothing after promote
- middleware passes promoted tools through
- tool_search auto-promotes matched tools (select + keyword)

* fix: address review — lint blank line + empty registry guard

- Add missing blank line between FakeRequest methods (E301)
- Use 'if not registry' to handle empty registries consistently

---------

Co-authored-by: d 🔹 <258577966+voidborne-d@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-30 11:23:15 +08:00
SHIYAO ZHANG
9aa3ff7c48
feat(sandbox): add SandboxAuditMiddleware for bash command security auditing (#1532)
* feat(sandbox): add SandboxAuditMiddleware for bash command security auditing

Addresses the LocalSandbox escape vector reported in #1224 where bash tool
calls can execute destructive commands against the host filesystem.

- Add SandboxAuditMiddleware with three-tier command classification:
  - High-risk (block): rm -rf /, curl|bash, dd if=, mkfs, /etc/shadow access
  - Medium-risk (warn): pip install, apt install, chmod 777
  - Safe (pass): normal workspace operations
- Register middleware after GuardrailMiddleware in _build_runtime_middlewares,
  applied to both lead agent and subagents
- Structured audit log via standard logger (visible in langgraph.log)
- Medium-risk commands execute but append a warning to the tool result,
  allowing the LLM to self-correct without blocking legitimate workflows
- High-risk commands return an error ToolMessage without calling the handler,
  so the agent loop continues gracefully

* fix(lint): sort imports in test_sandbox_audit_middleware

* refactor(sandbox-audit): address Copilot review feedback (3/5/6)

- Fix class docstring to match implementation: medium-risk commands are
  executed with a warning appended (not rejected), and cwd anchoring note
  removed (handled in a separate PR)
- Remove capsys.disabled() from benchmark test to avoid CI log noise;
  keep assertions for recall/precision targets
- Remove misleading 'cwd fix' from test module docstring

* test(sandbox-audit): add async tests for awrap_tool_call

* fix(sandbox-audit): address Copilot review feedback (1/2)

- Narrow rm high-risk regex to only block truly destructive targets
  (/, /*, ~, ~/*, /home, /root); legitimate workspace paths like
  /mnt/user-data/ are no longer false-positived
- Handle list-typed ToolMessage content in _append_warn_to_result;
  append a text block instead of str()-ing the list to avoid breaking
  structured content normalization

* style: apply ruff format to sandbox_audit_middleware files

* fix(sandbox-audit): update benchmark comment to match assert-based implementation

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-30 07:48:31 +08:00
Markus Corazzione
5ceb19f6f6
fix(oauth): Harden Claude OAuth cache-control handling (#1583) 2026-03-30 07:41:18 +08:00
Admire
fc7de7fffe
feat: support manual add and edit for memory facts (#1538)
* feat: support manual add and edit for memory facts

* fix: restore memory updater save helper

* fix: address memory fact review feedback

* fix: remove duplicate memory fact edit action

* docs: simplify memory fact review setup

* docs: relax memory review startup instructions

* fix: clear rebase marker in memory settings page

* fix: address memory fact review and format issues

* fix: address memory fact review feedback

* refactor: make memory fact updates explicit patch semantics

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-29 23:53:23 +08:00
SHIYAO ZHANG
cdb2a3a017
fix(sandbox): anchor relative paths to thread workspace in local mode (#1522)
* fix(task_tool): fallback to configurable thread_id when context is missing

task_tool only read thread_id from runtime.context, but when invoked
via LangGraph Server, thread_id lives in config.configurable instead.
Add the same fallback that ThreadDataMiddleware uses (PR #1237).

Fixes subagent execution failure: 'Thread ID is required in runtime
context or config.configurable'

* remove debug logging from task_tool

* fix(sandbox): anchor relative paths to thread workspace in local mode

In local sandbox mode, bash commands using relative paths were resolved
against the langgraph server process cwd (backend/) instead of the
per-thread workspace directory. This allowed relative-path writes to
escape the thread isolation boundary.

Root cause: validate_local_bash_command_paths and
replace_virtual_paths_in_command only process absolute paths (scanning
for '/' prefix). Relative paths pass through untouched and inherit the
process cwd at subprocess.run time.

Fix: after virtual path translation, prepend `cd {workspace} &&` to
anchor the shell's cwd to the thread-isolated workspace directory before
execution. shlex.quote() ensures paths with spaces or special characters
are handled safely.

This mirrors the approach used by OpenHands (fixed cwd at execution
layer) and is the correct fix for local mode where each subprocess.run
is an independent process with no persistent shell session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(sandbox): extract _apply_cwd_prefix and add unit tests

Extract the workspace cd-prefix logic from bash_tool into a dedicated
_apply_cwd_prefix() helper so it can be unit-tested in isolation.
Add four tests covering: normal prefix, no thread_data, missing
workspace_path, and paths with spaces (shlex.quote).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* revert: remove unrelated configurable thread_id fallback from sandbox/tools.py

This change belongs in a separate PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: remove trailing whitespace in test_sandbox_tools_security

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-29 23:21:06 +08:00
Admire
68c9e09a7a
fix: add Windows shell fallback for local sandbox (#1505)
* fix: add Windows shell fallback for local sandbox

* fix: handle PowerShell execution on Windows

* fix: handle Windows local shell execution

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-29 21:31:29 +08:00