* fix(oauth): inject billing header for non-Haiku model access
The Anthropic Messages API requires a billing identification block
in the system prompt when using Claude Code OAuth tokens (sk-ant-oat*)
to access non-Haiku models (Opus, Sonnet). Without it, the API returns
a generic 400 "Error" with no actionable detail.
This was discovered by intercepting Claude Code CLI requests — the CLI
injects an `x-anthropic-billing-header` text block as the first system
prompt entry on every request. Third-party consumers of the same OAuth
tokens must do the same.
Changes:
- Add `_apply_oauth_billing()` to `ClaudeChatModel` that prepends the
billing header block to the system prompt when `_is_oauth` is True
- Add `metadata.user_id` with device/session identifiers (required by
the API alongside the billing header)
- Called from `_get_request_payload()` before prompt caching runs
Verified with Claude Max OAuth tokens against all three model tiers:
- claude-opus-4-6: 200 OK
- claude-sonnet-4-6: 200 OK
- claude-haiku-4-5-20251001: 200 OK (was already working)
Closes#1245
* fix(oauth): address review feedback on billing header injection
- Make OAUTH_BILLING_HEADER configurable via ANTHROPIC_BILLING_HEADER env var
- Normalize billing block to always be first in system list (strip + reinsert)
- Guard metadata with isinstance check for non-dict values
- Replace os.uname() with socket.gethostname() for Windows compat
- Fix docstrings to say "all OAuth requests" instead of "non-Haiku"
- Move inline imports to module level (fixes ruff I001)
- Add 9 unit tests for _apply_oauth_billing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* test: add unit tests for skill frontmatter validation
Cover _validate_skill_frontmatter logic:
- Valid minimal and full-field skills
- Missing SKILL.md, missing frontmatter, invalid YAML
- Required field validation (name, description)
- Unexpected key rejection
- Name format: hyphen-case, no leading/trailing/consecutive hyphens
- Name and description length limits
- Angle bracket rejection in description
* test: fix unused variables flagged by ruff F841
Replace unused tuple elements with _ and add assertions on
msg/name return values in success-path tests.
* test: address review feedback on unused variables
* test: consolidate validation tests into single module
Move the UTF-8/windows-locale test from test_skills_router.py into
test_skills_validation.py and remove test_skills_router.py to eliminate
duplicated assertions and future maintenance drift.
* fix: match assertion strings to actual validation messages
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Allow per-agent environment variables to be declared in config.yaml under
acp_agents.<name>.env. Values prefixed with $ are resolved from the host
environment at invocation time, consistent with other config fields.
Passes None to spawn_agent_process when env is empty so the subprocess
inherits the parent environment unchanged.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix Windows backend test compatibility
* Preserve ACP path style on Windows
* Fix installer import ordering
* Address review comments for Windows fixes
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(LLM): fixing Gemini thinking + tool calls via OpenAI gateway (#1180)
When using Gemini with thinking enabled through an OpenAI-compatible gateway,
the API requires that fields on thinking content blocks are
preserved and echoed back verbatim in subsequent requests. Standard
silently drops these signatures when serializing
messages, causing HTTP 400 errors:
Changes:
- Add PatchedChatOpenAI adapter that re-injects signed thinking blocks into
request payloads, preserving the signature chain across multi-turn
conversations with tool calls.
- Support two LangChain storage patterns: additional_kwargs.thinking_blocks
and content list.
- Add 11 unit tests covering signed/unsigned blocks, storage patterns, edge
cases, and precedence rules.
- Update config.example.yaml with Gemini + thinking gateway example.
- Update CONFIGURATION.md with detailed guidance and error explanation.
Fixes: #1180
* Updated the patched_openai.py with thought_signature of function call
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* docs: fix inaccurate thought_signature description in CONFIGURATION.md (#1220)
* Initial plan
* docs: fix CONFIGURATION.md wording for thought_signature - tool-call objects, not thinking blocks
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/360f5226-4631-48a7-a050-189094af8ffe
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* refactor: extract shared utils to break harness→app cross-layer imports
Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: split backend/src into harness (deerflow.*) and app (app.*)
Physically split the monolithic backend/src/ package into two layers:
- **Harness** (`packages/harness/deerflow/`): publishable agent framework
package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
models, MCP, skills, config, and all core infrastructure.
- **App** (`app/`): unpublished application code with import prefix `app.*`.
Contains gateway (FastAPI REST API) and channels (IM integrations).
Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure
Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add harness→app boundary check test and update docs
Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.
Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add config versioning with auto-upgrade on startup
When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.
- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix comments
* fix: update src.* import in test_sandbox_tools_security to deerflow.*
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle empty config and search parent dirs for config.example.yaml
Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
looking next to config.yaml, fixing detection in common setups
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: correct skills root path depth and config_version type coercion
- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
after harness split, file lives at packages/harness/deerflow/skills/
so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
_check_config_version() to prevent TypeError when YAML stores value
as string (e.g. config_version: "1")
- tests: add regression tests for both fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test imports from src.* to deerflow.*/app.* after harness refactor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(harness): add tool-first ACP agent invocation (#37)
* feat(harness): add tool-first ACP agent invocation
* build(harness): make ACP dependency required
* fix(harness): address ACP review feedback
* feat(harness): decouple ACP agent workspace from thread data
ACP agents (codex, claude-code) previously used per-thread workspace
directories, causing path resolution complexity and coupling task
execution to DeerFlow's internal thread data layout. This change:
- Replace _resolve_cwd() with a fixed _get_work_dir() that always uses
{base_dir}/acp-workspace/, eliminating virtual path translation and
thread_id lookups
- Introduce /mnt/acp-workspace virtual path for lead agent read-only
access to ACP agent output files (same pattern as /mnt/skills)
- Add security guards: read-only validation, path traversal prevention,
command path allowlisting, and output masking for acp-workspace
- Update system prompt and tool description to guide LLM: send
self-contained tasks to ACP agents, copy results via /mnt/acp-workspace
- Add 11 new security tests for ACP workspace path handling
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(prompt): inject ACP section only when ACP agents are configured
The ACP agent guidance in the system prompt is now conditionally built
by _build_acp_section(), which checks get_acp_agents() and returns an
empty string when no ACP agents are configured. This avoids polluting
the prompt with irrelevant instructions for users who don't use ACP.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix lint
* fix(harness): address Copilot review comments on sandbox path handling and ACP tool
- local_sandbox: fix path-segment boundary bug in _resolve_path (== or startswith +"/")
and add lookahead in _resolve_paths_in_command regex to prevent /mnt/skills matching
inside /mnt/skills-extra
- local_sandbox_provider: replace print() with logger.warning(..., exc_info=True)
- invoke_acp_agent_tool: guard getattr(option, "optionId") with None default + continue;
move full prompt from INFO to DEBUG level (truncated to 200 chars)
- sandbox/tools: fix _get_acp_workspace_host_path docstring to match implementation;
remove misleading "read-only" language from validate_local_bash_command_paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(acp): thread-isolated workspaces, permission guardrail, and ContextVar registry
P1.1 – ACP workspace thread isolation
- Add `Paths.acp_workspace_dir(thread_id)` for per-thread paths
- `_get_work_dir(thread_id)` in invoke_acp_agent_tool now uses
`{base_dir}/threads/{thread_id}/acp-workspace/`; falls back to
global workspace when thread_id is absent or invalid
- `_invoke` extracts thread_id from `RunnableConfig` via
`Annotated[RunnableConfig, InjectedToolArg]`
- `sandbox/tools.py`: `_get_acp_workspace_host_path(thread_id)`,
`_resolve_acp_workspace_path(path, thread_id)`, and all callers
(`replace_virtual_paths_in_command`, `mask_local_paths_in_output`,
`ls_tool`, `read_file_tool`) now resolve ACP paths per-thread
P1.2 – ACP permission guardrail
- New `auto_approve_permissions: bool = False` field in `ACPAgentConfig`
- `_build_permission_response(options, *, auto_approve: bool)` now
defaults to deny; only approves when `auto_approve=True`
- Document field in `config.example.yaml`
P2 – Deferred tool registry race condition
- Replace module-level `_registry` global with `contextvars.ContextVar`
- Each asyncio request context gets its own registry; worker threads
inherit the context automatically via `loop.run_in_executor`
- Expose `get_deferred_registry` / `set_deferred_registry` /
`reset_deferred_registry` helpers
Tests: 831 pass (57 for affected modules, 3 new tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(sandbox): mount /mnt/acp-workspace in docker sandbox container
The AioSandboxProvider was not mounting the ACP workspace into the
sandbox container, so /mnt/acp-workspace was inaccessible when the lead
agent tried to read ACP results in docker mode.
Changes:
- `ensure_thread_dirs`: also create `acp-workspace/` (chmod 0o777) so
the directory exists before the sandbox container starts — required
for Docker volume mounts
- `_get_thread_mounts`: add read-only `/mnt/acp-workspace` mount using
the per-thread host path (`host_paths.acp_workspace_dir(thread_id)`)
- Update stale CLAUDE.md description (was "fixed global workspace")
Tests: `test_aio_sandbox_provider.py` (4 new tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): remove unused imports in test_aio_sandbox_provider
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix config
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* test: add unit tests for TodoMiddleware
Cover context-loss detection logic:
- _todos_in_messages and _reminder_in_messages helpers
- _format_todos formatting
- Reminder injection when write_todos truncated
- No-op when todos visible or reminder already present
- abefore_model async delegation
* test: fix event loop error in todo middleware async test
Use asyncio.run() instead of get_event_loop().run_until_complete()
to avoid RuntimeError on Python 3.12 where no default event loop
exists in the main thread.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* test: add unit tests for DanglingToolCallMiddleware
Cover message patching logic for dangling tool calls:
- No-op when all tool calls have responses
- Synthetic ToolMessage insertion at correct positions
- Mixed responded/dangling scenarios
- wrap_model_call and awrap_model_call integration
* test: fix async tests and strengthen override assertions
- Use @pytest.mark.anyio + async def instead of deprecated
asyncio.get_event_loop().run_until_complete() (fixes Py3.12 CI failure)
- Assert that override() receives the correct patched messages kwarg
in both wrap_model_call and awrap_model_call tests
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* refactor: extract shared skill installer and upload manager to harness
Move duplicated business logic from Gateway routers and Client into
shared harness modules, eliminating code duplication.
New shared modules:
- deerflow.skills.installer: 6 functions (zip security, extraction, install)
- deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate,
list, delete, get_uploads_dir, ensure_uploads_dir)
Key improvements:
- SkillAlreadyExistsError replaces stringly-typed 409 status routing
- normalize_filename rejects backslash-containing filenames
- Read paths (list/delete) no longer mkdir via get_uploads_dir
- Write paths use ensure_uploads_dir for explicit directory creation
- list_files_in_dir does stat inside scandir context (no re-stat)
- install_skill_from_archive uses single is_file() check (one syscall)
- Fix agent config key not reset on update_mcp_config/update_skill
Tests: 42 new (22 installer + 20 upload manager) + client hardening
* refactor: centralize upload URL construction and clean up installer
- Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing()
into shared manager.py, eliminating 6 duplicated URL constructions across
Gateway router and Client
- Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of
hardcoded "mnt/user-data/uploads" strings
- Eliminate TOCTOU pre-checks and double file read in installer — single
ZipFile() open with exception handling replaces is_file() + is_zipfile()
+ ZipFile() sequence
- Add missing re-exports: ensure_uploads_dir in uploads/__init__.py,
SkillAlreadyExistsError in skills/__init__.py
- Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS
- Hoist sandbox_uploads_dir(thread_id) before loop in uploads router
* fix: add input validation for thread_id and filename length
- Reject thread_id containing unsafe filesystem characters (only allow
alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs
like <script> or shell metacharacters
- Reject filenames longer than 255 bytes (OS limit) in normalize_filename
- Gateway upload router maps ValueError to 400 for invalid thread_id
* fix: address PR review — symlink safety, input validation coverage, error ordering
- list_files_in_dir: use follow_symlinks=False to prevent symlink metadata
leakage; check is_dir() instead of exists() for non-directory paths
- install_skill_from_archive: restore is_file() pre-check before extension
validation so error messages match the documented exception contract
- validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so
all entry points (upload/list/delete) are protected
- delete_uploaded_file: catch ValueError from thread_id validation (was 500)
- requires_llm marker: also skip when OPENAI_API_KEY is unset
- e2e fixture: update TitleMiddleware exclusion comment (kept filtering —
middleware triggers extra LLM calls that add non-determinism to tests)
* chore: revert uv.lock to main — no dependency changes in this PR
* fix: use monkeypatch for global config in e2e fixture to prevent test pollution
The e2e_env fixture was calling set_title_config() and
set_summarization_config() directly, which mutated global singletons
without automatic cleanup. When pytest ran test_client_e2e.py before
test_title_middleware_core_logic.py, the leaked enabled=False caused
5 title tests to fail in CI.
Switched to monkeypatch.setattr on the module-level private variables
so pytest restores the originals after each test.
* fix: address code review — URL encoding, API consistency, test isolation
- upload_artifact_url: percent-encode filename to handle spaces/#/?
- deduplicate_filename: mutate seen set in place (caller no longer
needs manual .add() — less error-prone API)
- list_files_in_dir: document that size is int, enrich stringifies
- e2e fixture: monkeypatch _app_config instead of set_app_config()
to prevent global singleton pollution (same pattern as title/summarization fix)
- _make_e2e_config: read LLM connection details from env vars so
external contributors can override defaults
- Update tests to match new deduplicate_filename contract
* docs: rewrite RFC in English and add alternatives/breaking changes sections
* fix: address code review feedback on PR #1202
- Rename deduplicate_filename to claim_unique_filename to make
the in-place set mutation explicit in the function name
- Replace PermissionError with PathTraversalError(ValueError) for
path traversal detection — malformed input is 400, not 403
* fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
LoopDetectionMiddleware injected SystemMessage mid-conversation to warn
about repetitive tool calls. This crashes Anthropic models because
langchain_anthropic's _format_messages() requires system messages to
appear only at the start of the conversation — interleaved system
messages raise 'Received multiple non-consecutive system messages'.
Switch the warning injection from SystemMessage to HumanMessage, which
works with all providers (Anthropic, OpenAI, Google, etc.).
Fixes#1299
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
* fix(mcp): implement sync invocation wrapper for async MCP tools
Since DeerFlowClient streams synchronously, invoking async-only MCP tools
(loaded via langchain-mcp-adapters) resulted in a NotImplementedError.
This commit bridges the sync/async gap by dynamically injecting a `func`
wrapper into `StructuredTool` instances that only have a `coroutine`.
Key changes:
- Added `sync_wrapper` in `get_mcp_tools` to execute async tool calls.
- Handled nested event loops by delegating to a global `ThreadPoolExecutor`
when an event loop is already running, avoiding `RuntimeError`.
- Added detailed error logging within the wrapper for better transparency.
- Added comprehensive test coverage in `test_mcp_sync_wrapper.py` verifying
tool patching, event loop behavior, and exception propagation.
* refactor(mcp): extract sync wrapper to module level and fix test mocks
Addressed PR review comments:
- Extracted _make_sync_tool_wrapper to module level to avoid nested func definitions.
- Refactored tests to use the actual production helper instead of duplicating logic.
- Fixed AsyncMock patching for awaited dependencies in tests.
- Added atexit hook for graceful thread pool shutdown.
- Fixed PEP8 blank line formatting in tests.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(threads): clean up local thread data after thread deletion
Delete DeerFlow-managed thread directories after the web UI removes a LangGraph thread.
This keeps local thread data in sync with conversation deletion and adds regression coverage for the cleanup flow.
* fix(threads): address thread cleanup review feedback
Encode thread cleanup URLs in the web client, keep cache updates explicit when no thread search data is cached, and return a generic 500 response from the cleanup endpoint while documenting the sanitized error behavior.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Add GuardrailMiddleware that evaluates every tool call before execution.
Three provider options: built-in AllowlistProvider (zero deps), OAP passport
providers (open standard), or custom providers loaded by class path.
- GuardrailProvider protocol with GuardrailRequest/Decision dataclasses
- GuardrailMiddleware (AgentMiddleware, position 5 in chain)
- AllowlistProvider for simple deny/allow by tool name
- GuardrailsConfig (Pydantic singleton, loaded from config.yaml)
- 25 tests covering allow/deny, fail-closed/open, async, GraphBubbleUp
- Comprehensive docs at backend/docs/GUARDRAILS.md
Closes#1213
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add Claude Code OAuth and Codex CLI providers
Port of bytedance/deer-flow#1136 from @solanian's feat/cli-oauth-providers branch.\n\nCarries the feature forward on top of current main without the original CLA-blocked commit metadata, while preserving attribution in the commit message for review.
* fix: harden CLI credential loading
Align Codex auth loading with the current ~/.codex/auth.json shape, make Docker credential mounts directory-based to avoid broken file binds on hosts without exported credential files, and add focused loader tests.
* refactor: tighten codex auth typing
Replace the temporary Any return type in CodexChatModel._load_codex_auth with the concrete CodexCliCredential type after the credential loader was stabilized.
* fix: load Claude Code OAuth from Keychain
Match Claude Code's macOS storage strategy more closely by checking the Keychain-backed credentials store before falling back to ~/.claude/.credentials.json. Keep explicit file overrides and add focused tests for the Keychain path.
* fix: require explicit Claude OAuth handoff
* style: format thread hooks reasoning request
* docs: document CLI-backed auth providers
* fix: address provider review feedback
* fix: harden provider edge cases
* Fix deferred tools, Codex message normalization, and local sandbox paths
* chore: narrow PR scope to OAuth providers
* chore: remove unrelated frontend changes
* chore: reapply OAuth branch frontend scope cleanup
* fix: preserve upload guards with reasoning effort wiring
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: normalize ToolMessage structured content in serialization
When models return ToolMessage content as a list of content blocks
(e.g. [{"type": "text", "text": "..."}]), the UI previously displayed
the raw Python repr string instead of the extracted text.
Replace str(msg.content) with the existing _extract_text() helper in
both _serialize_message() and stream() to properly normalize
list-of-blocks content to plain text.
Fixes#1149
Also fixes the same root cause as #1188 (characters displayed one per
line when tool response content is returned as structured blocks).
Added 11 regression tests covering string, list-of-blocks, mixed,
empty, and fallback content types.
* fix(memory): extract text from structured LLM responses in memory updater
When LLMs return response content as list of content blocks
(e.g. [{"type": "text", "text": "..."}]) instead of plain strings,
str() produces Python repr which breaks JSON parsing in the memory
updater. This caused memory updates to silently fail.
Changes:
- Add _extract_text() helper in updater.py for safe content normalization
- Use _extract_text() instead of str(response.content) in update_memory()
- Fix format_conversation_for_update() to handle plain strings in list content
- Fix subagent executor fallback path to extract text from list content
- Replace print() with structured logging (logger.info/warning/error)
- Add 13 regression tests covering _extract_text, format_conversation,
and update_memory with structured LLM responses
* fix: address Copilot review - defensive text extraction + logger.exception
- client.py _extract_text: use block.get('text') + isinstance check (prevent KeyError/TypeError)
- prompt.py format_conversation_for_update: same defensive check for dict text blocks
- executor.py: type-safe text extraction in both code paths, fallback to placeholder instead of str(raw_content)
- updater.py: use logger.exception() instead of logger.error() for traceback preservation
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: preserve chunked structured content without spurious newlines
* fix: restore backend unit test compatibility
---------
Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: track token usage per conversation turn
Add token usage tracking to the streaming API so consumers can monitor
cost per turn without additional API calls.
Changes:
1. _serialize_message now includes usage_metadata for AI messages in
values events, exposing input_tokens/output_tokens/total_tokens
from LangChain's native metadata.
2. stream() accumulates token usage across all AI messages in a turn
and emits the cumulative totals in the end event:
{usage: {input_tokens: N, output_tokens: N, total_tokens: N}}
3. Each messages-tuple AI event with text content now includes a
per-message usage_metadata field for granular tracking.
This enables the frontend to display token consumption per turn,
support cost-aware UX, and let users monitor API spending.
10 tests added covering serialization passthrough and cumulative
aggregation logic.
Co-Authored-By: OpenClaw <noreply@openclaw.ai>
* fix: address Copilot review - use Mapping access for usage_metadata
- Replace getattr(usage, 'input_tokens', 0) with usage.get('input_tokens', 0)
since LangChain usage_metadata is a dict, not an object
- Remove unused 'import pytest' (fixes Ruff F401)
- Add proper stream() integration tests for cumulative usage in end event
and per-message usage_metadata in messages-tuple events
---------
Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com>
Co-authored-by: OpenClaw <noreply@openclaw.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
This PR improves MiniMax Code Plan integration in DeerFlow by fixing three issues in the current flow: stream errors were not clearly surfaced in the UI, the frontend could not display the actual provider model ID, and MiniMax reasoning output could leak into final assistant content as inline <think>...</think>. The change adds a MiniMax-specific adapter, exposes real model IDs end-to-end, and adds a frontend fallback for historical messages.
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(feishu): support @bot message in topic groups
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(feishu): preserve rich-text formatting and add parser unit tests
* chore(test): remove unused import to fix ruff lint error
* style: auto-format imports to satisfy ruff
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(manager): add bootstrap command to initialize soul.md in correct place
* feat(channels): add /bootstrap command to IM channels
Add a `/bootstrap` command that routes to the chat handler with
`is_bootstrap: True` in the run context, allowing the agent to invoke
its setup/initialization flow (e.g. `setup_agent`).
- The text after `/bootstrap` is forwarded as the chat message; when
omitted a default "Initialize workspace" message is used.
- Feishu channels use the streaming path as with normal chat.
- No changes to ChannelStore — bootstrap is stateless and triggered
purely by the command.
- Update /help output to include /bootstrap.
- Add 5 tests covering: text/no-text variants, Feishu streaming path,
thread creation, and help text.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix: accept copilot suggestion
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(gateway): remove generated markdown on upload delete
Keep thread upload storage consistent by deleting the generated markdown companion when the original convertible upload is removed.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(harness): allow agent read access to /mnt/skills in local sandbox
Skill files under /mnt/skills/ were blocked by the path validator,
preventing agents from reading skill definitions. This change:
- Refactors `resolve_local_tool_path` into `validate_local_tool_path`,
a pure security gate that no longer resolves paths (left to the sandbox)
- Permits read-only access to the skills container path (/mnt/skills by
default, configurable via config.skills.container_path)
- Blocks write access to skills paths (PermissionError)
- Allows /mnt/skills in bash command path validation
- Adds `LocalSandbox.update_path_mappings` and injects per-thread
user-data mappings into the sandbox so all virtual-path resolution
is handled uniformly by the sandbox layer
- Covers all new behaviour with tests
Fixes#1177
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(sandbox): unify all virtual path resolution in tools.py
Move skills path resolution from LocalSandbox into tools.py so that all
virtual-to-host path translation (user-data and skills) lives in one
layer. LocalSandbox becomes a pure execution layer that receives only
real host paths — no more path_mappings, _resolve_path, or reverse
resolve logic.
This addresses architecture feedback that path resolution was split
across two layers (tools.py for user-data, LocalSandbox for skills),
making the flow hard to follow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(sandbox): address Copilot review — cache-on-success and error path masking
- Replace @lru_cache with manual cache-on-success for _get_skills_container_path
and _get_skills_host_path so transient failures at startup don't permanently
disable skills access.
- Add _sanitize_error() helper that masks host filesystem paths in error
messages via mask_local_paths_in_output before returning them to the agent.
- Apply _sanitize_error() to all catch-all (Exception/OSError) handlers in
sandbox tool functions to prevent host path leakage in error output.
- Remove unused lru_cache import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(tools): add tool_search for deferred MCP tool loading
When multiple MCP servers are enabled, total tool count can exceed 30-50,
causing context bloat and degraded tool selection accuracy. This adds a
deferred tool loading mechanism controlled by `tool_search.enabled` config.
- Add ToolSearchConfig with single `enabled` field
- Add DeferredToolRegistry with regex search (select:, +keyword, keyword)
- Add tool_search tool returning OpenAI-compatible function JSON
- Add DeferredToolFilterMiddleware to hide deferred schemas from bind_tools
- Add <available-deferred-tools> section to system prompt
- Enable MCP tool_name_prefix to prevent cross-server name collisions
- Add 34 unit tests covering registry, tool, prompt, and middleware
* fix: reset stale deferred registry and bump config_version
- Reset deferred registry upfront in get_available_tools() to prevent
stale tool entries when MCP servers are disabled between calls
- Bump config_version to 2 for new tool_search config field
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tests): mock get_app_config in prompt section tests for CI
CI has no config.yaml, causing TestDeferredToolsPromptSection to fail
with FileNotFoundError. Add autouse fixture to mock get_app_config.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(harness): normalize structured content for titles
Flatten structured LangChain message content before prompting the title model so list/block payloads don't leak Python reprs into generated thread titles.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* refactor: extract shared utils to break harness→app cross-layer imports
Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: split backend/src into harness (deerflow.*) and app (app.*)
Physically split the monolithic backend/src/ package into two layers:
- **Harness** (`packages/harness/deerflow/`): publishable agent framework
package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
models, MCP, skills, config, and all core infrastructure.
- **App** (`app/`): unpublished application code with import prefix `app.*`.
Contains gateway (FastAPI REST API) and channels (IM integrations).
Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure
Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add harness→app boundary check test and update docs
Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.
Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add config versioning with auto-upgrade on startup
When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.
- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix comments
* fix: update src.* import in test_sandbox_tools_security to deerflow.*
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle empty config and search parent dirs for config.example.yaml
Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
looking next to config.yaml, fixing detection in common setups
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: correct skills root path depth and config_version type coercion
- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
after harness split, file lives at packages/harness/deerflow/skills/
so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
_check_config_version() to prevent TypeError when YAML stores value
as string (e.g. config_version: "1")
- tests: add regression tests for both fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test imports from src.* to deerflow.*/app.* after harness refactor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat(feishu): stream updates on a single card
* fix(feishu): ensure final message on stream error and warn on missing card ID
- Wrap streaming loop in try/except/finally so a is_final=True outbound
message is always published, even when the LangGraph stream breaks
mid-way. This prevents _running_card_ids memory leaks and ensures the
Feishu card shows a DONE reaction instead of hanging on "Working on it".
- Log a warning when _ensure_running_card gets no message_id back from
the Feishu reply API, making silent fallback to new-card behavior
visible in logs.
- Add test_handle_feishu_stream_error_still_sends_final to cover the
error path.
- Reformat service.py dict comprehension (ruff format, no logic change).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Avoid blocking inbound on Feishu card creation
---------
Co-authored-by: songyaolun <songyaolun@bytedance.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add LoopDetectionMiddleware to break repetitive tool call loops
Adds a new AgentMiddleware that detects when the agent is stuck calling
the same tools with the same arguments repeatedly, which currently runs
until the recursion limit kills the run.
Detection: per-thread sliding window of tool call hashes (name + args).
- Warn threshold (default 3): injects a "wrap up" system message
- Hard limit (default 5): strips tool_calls, forcing final text output
Includes 13 unit tests covering hashing, thresholds, window sliding,
reset, and edge cases.
Closes#1055
* fix: address PR #1056 review feedback for LoopDetectionMiddleware
- Remove unused imports (Awaitable, Callable, ModelCallResult,
ModelRequest, ModelResponse, AIMessage) from loop_detection_middleware
- Remove unused pytest import from test file
- Fix _hash_tool_calls sort key: sort by (name, serialized args) for
deterministic hashing when multiple calls share the same tool name
- Revert subagent_enabled default to False in agent.py to match
DeerFlowClient and channel defaults
- Remove unrelated SearxNG tools and Next.js rewrite changes from PR
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address 2nd round review feedback on PR #1056
- Inject loop warning only once per thread (prevents context bloat)
- Add threading.Lock for thread-safe history mutations
- Use runtime.context thread_id instead of workspace_path
- Add LRU eviction for per-thread history (max 100 threads)
- Add 5 new tests covering warn-once, LRU eviction, thread isolation,
fallback thread_id, and lock presence
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve lint errors in loop detection middleware tests
Sort imports (I001) and remove unused _WARNING_MSG import (F401)
to fix ruff lint failures in CI.
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Add MiniMax as an OpenAI-compatible model provider
MiniMax offers high-performance LLMs (M2.5, M2.5-highspeed) with
204K context windows. This commit adds MiniMax as a selectable
provider in the configuration system.
Changes:
- Add MiniMax to SUPPORTED_MODELS with model definitions
- Add MiniMax provider configuration in conf/config.yaml
- Update documentation with MiniMax setup instructions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update README to remove MiniMax API details
Removed mention of MiniMax API usage and configuration examples.
---------
Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: preserve conversation context in Telegram private chats
In private (1-on-1) chats, set topic_id=None so all messages map to a
single DeerFlow thread per chat instead of creating a new thread for
every message. Also fix _cmd_generic to use topic_id=None in private
chats so /new correctly targets the default thread.
Group chat behavior is unchanged (reply_to or msg_id as topic_id).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: preserve conversation context in Telegram private chats
Fixes#1101
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: mirror _on_text reply logic in _cmd_generic for group chats
_cmd_generic now prefers reply_to_message.message_id over msg_id in
group/supergroup chats, consistent with _on_text. This ensures commands
like /new and /status target the correct conversation thread when sent
as a reply in group chats.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(sandbox): harden local file access and mask host paths
- enforce local sandbox file tools to only accept /mnt/user-data paths
- add path traversal checks against thread workspace/uploads/outputs roots
- preserve requested virtual paths in tool error messages (no host path leaks)
- mask local absolute paths in bash output back to virtual sandbox paths
- update bash tool guidance to prefer thread-local venv + python -m pip
- add regression tests for path mapping, masking, and access restrictions
Fixes#968
* feat(sandbox): restrict risky absolute paths in local bash commands
- validate absolute path usage in local-mode bash commands
- allow only /mnt/user-data virtual paths for user data access
- keep a small allowlist for system executable/device paths
- return clear permission errors for unsafe command paths
- add regression tests for bash path validation rules
* test(sandbox): add success path test for resolve_local_tool_path (#992)
* Initial plan
* test(sandbox): add success path test for resolve_local_tool_path
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
* fix(sandbox): reject bare virtual root early with clear error in resolve_local_tool_path (#991)
* Initial plan
* fix(sandbox): reject bare virtual root early with clear error in resolve_local_tool_path
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* fix(gateway): ignore archive metadata wrappers
Treat top-level __MACOSX and dotfile entries as packaging metadata so valid .skill archives still resolve to their real skill directory.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(gateway): allow standard skill frontmatter metadata
Accept standard optional frontmatter fields during .skill installs so external skills with version, author, or compatibility metadata do not fail validation.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* docs: sync skill installer metadata behavior
Document the skill install allowlist so user-facing and backend contributor docs match the gateway validation contract.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(gateway): normalize suggestion response content
Handle list-style model content before JSON parsing so provider wrappers do not silently drop follow-up suggestions.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* docs: sync suggestions endpoint behavior
Document the rich-content normalization path so the README and backend gateway notes stay aligned with the current router contract.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(tracing): support LANGCHAIN_* env fallback for LangSmith config
- add backward-compatible env parsing in tracing_config.py
- support fallback keys:
LANGCHAIN_TRACING_V2 / LANGCHAIN_TRACING
LANGCHAIN_API_KEY
LANGCHAIN_PROJECT
LANGCHAIN_ENDPOINT
- keep LANGSMITH_* as preferred source when both are present
- add regression tests in test_tracing_config.py
* fix(tracing): correct LANGSMITH_* precedence over LANGCHAIN_* for enabled flag (#1067)
* Initial plan
* fix(tracing): use first-present-wins logic for enabled flag, add precedence docs and test
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* fix(subagents): cleanup background tasks after completion to prevent memory leak
Added cleanup_background_task() function to remove completed subagent results
from the global _background_tasks dict. Found a small issue: completed tasks
were never removed, causing memory to grow indefinitely with each subagent
execution.
Alternative approaches considered:
- Future + SubagentHandle pattern: Not chosen due to requiring refactoring
Chose the simple cleanup approach for minimal code changes while effectively
resolving the memory leak.
Changes:
- Add cleanup_background_task() in executor.py
- Call cleanup in all task_tool return paths (completed, failed, timed out)
* fix(subagents): prevent race condition in background task cleanup
Address Copilot review feedback on memory leak fix:
- Add terminal state check in cleanup_background_task() to only remove
tasks that are COMPLETED/FAILED/TIMED_OUT or have completed_at set
- Remove cleanup call from polling safety-timeout branch in task_tool
since the task may still be running
- Add comprehensive tests for cleanup behavior including:
- Verification that cleanup is called on terminal states
- Verification that cleanup is NOT called on polling timeout
- Tests for terminal state check logic in executor
This prevents KeyError when the background executor tries to update
a task that was prematurely removed from _background_tasks.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(checkpointer): return InMemorySaver instead of None when not configured (#1016)
* fix(checkpointer): also fix get_checkpointer() to return InMemorySaver
Make all three checkpointer functions consistent:
- make_checkpointer() (async) → InMemorySaver
- checkpointer_context() (sync) → InMemorySaver
- get_checkpointer() (sync singleton) → InMemorySaver
This ensures DeerFlowClient always has a valid checkpointer.
* fix: address CI failure and Copilot review feedback
- Fix import order in test_checkpointer_none_fix.py (I001 ruff error)
- Fix type annotation: _checkpointer should be Checkpointer | None
- Update docstring: change "None if not configured" to "InMemorySaver if not configured"
- Ensure app config is loaded before checking checkpointer config to prevent incorrect InMemorySaver fallback
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add IM channels system for Feishu, Slack, and Telegram integration
Bridge external messaging platforms to DeerFlow via LangGraph Server with
async message bus, thread management, and per-channel configuration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address review comments on IM channels system
Fix topic_id handling in store remove/list_entries and manager commands,
correct Telegram reply threading, remove unused imports/variables, update
docstrings and docs to match implementation, and prevent config mutation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* update skill creator
* fix im reply text
* fix comments
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add checkpointer configuration to config.example.yaml
- Introduced a new section for checkpointer configuration to enable state persistence for the embedded DeerFlowClient.
- Documented supported types: memory, sqlite, and postgres, along with examples for each.
- Clarified that the LangGraph Server manages its own state persistence separately.
* refactor(checkpointer): streamline checkpointer initialization and logging
* fix(uv.lock): update revision and add new wheel URLs for brotlicffi package
* feat: add langchain-anthropic dependency and update related configurations
* Fix checkpointer lifecycle, docstring, and path resolution bugs from PR #1005 review (#4)
* Initial plan
* Address all review suggestions from PR #1005
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* Fix resolve_path to always return real Path; move SQLite special-string handling to callers
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com>
* feat: u may ask
* chore: adjust code according to CR
* chore: adjust code according to CR
* ut: test for suggestions.py
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(subagent): support async MCP tools in subagent executor
SubagentExecutor.execute() was synchronous and could not handle async-only tools like MCP tools. This caused failures when trying to use MCP tools within subagents.
Changes:
- Add _aexecute() async method using agent.astream() for async execution
- Refactor execute() to use asyncio.run() wrapping _aexecute()
- This allows subagents to use async tools (MCP) within ThreadPoolExecutor
* test(subagent): add unit tests for executor async/sync paths
Add comprehensive tests covering:
- Async _aexecute() with success/error cases
- Sync execute() wrapper using asyncio.run()
- Async tool (MCP) support verification
- Thread pool execution safety
* fix(subagent): subagent-test-circular-depend
- Use session-scoped fixture with delayed import to handle circular dependencies
without affecting other test modules
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
- replace with explicit runtime deps:
- regenerate after dependency changes
- make deterministic by patching
to avoid leaked global affecting expected paths
* feat(upload): implement optimistic UI for file uploads and enhance message handling
* feat(middleware): enhance file handling by collecting historical uploads from directory
* feat(thread-title): update page title handling for new threads and improve loading state
* feat(uploads-middleware): enhance file extraction by verifying file existence in uploads directory
* feat(thread-stream): update file path reference to use virtual_path for uploads
* feat(tests): add core behaviour tests for UploadsMiddleware
* feat(tests): remove unused pytest import from test_uploads_middleware_core_logic.py
* feat: enhance file upload handling and localization support
- Update UploadsMiddleware to validate filenames more robustly.
- Modify MessageListItem to parse uploaded files from raw content for backward compatibility.
- Add localization for uploading messages in English and Chinese.
- Introduce parseUploadedFiles utility to extract uploaded files from message content.
* fix(memory): prevent file upload events from persisting in long-term memory
Uploaded files are session-scoped and unavailable in future sessions.
Previously, upload interactions were recorded in memory, causing the
agent to search for non-existent files in subsequent conversations.
Changes:
- memory_middleware: skip human messages containing <uploaded_files>
and their paired AI responses from the memory queue
- updater: post-process generated memory to strip upload mentions
before saving to file
- prompt: instruct the memory LLM to ignore file upload events
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): address Copilot review feedback on upload filtering
- memory_middleware: strip <uploaded_files> block from human messages
instead of dropping the entire turn; only skip the turn (and paired
AI response) when nothing remains after stripping
- updater: narrow the upload-scrubbing regex to explicit upload events
(avoids false-positive removal of "User works with CSV files" etc.);
also filter upload-event facts from the facts array
- prompt: move `import re` to module scope; skip upload-only human
messages (empty after stripping) rather than appending "User: "
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): allow optional words between 'upload' and 'file' in scrub regex
The previous pattern required 'uploading file' with no intervening words,
so 'uploading a test file' was not matched and leaked into long-term memory.
Allow up to 3 modifier words between the verb and noun (e.g. 'uploading a
test file', 'uploaded the attachment').
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(memory): add unit tests for upload filtering in memory pipeline
Covers _filter_messages_for_memory and _strip_upload_mentions_from_memory
per Copilot review suggestion. 15 test cases verify:
- Upload-only turns (and paired AI responses) are excluded from memory queue
- User's real question is preserved when combined with an upload block
- Upload file paths are never present in filtered message content
- Intermediate tool messages are always excluded
- Multi-turn conversations: only the upload turn is dropped
- Multimodal (list-content) human messages are handled
- Upload-event sentences are removed from summaries and facts
- Legitimate file-related facts (CSV preferences, PDF exports) are preserved
- "uploading a test file" (words between verb and noun) is caught by regex
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add agent management functionality with creation, editing, and deletion
* feat: enhance agent creation and chat experience
- Added AgentWelcome component to display agent description on new thread creation.
- Improved agent name validation with availability check during agent creation.
- Updated NewAgentPage to handle agent creation flow more effectively, including enhanced error handling and user feedback.
- Refactored chat components to streamline message handling and improve user experience.
- Introduced new bootstrap skill for personalized onboarding conversations, including detailed conversation phases and a structured SOUL.md template.
- Updated localization files to reflect new features and error messages.
- General code cleanup and optimizations across various components and hooks.
* Refactor workspace layout and agent management components
- Updated WorkspaceLayout to use useLayoutEffect for sidebar state initialization.
- Removed unused AgentFormDialog and related edit functionality from AgentCard.
- Introduced ArtifactTrigger component to manage artifact visibility.
- Enhanced ChatBox to handle artifact selection and display.
- Improved message list rendering logic to avoid loading states.
- Updated localization files to remove deprecated keys and add new translations.
- Refined hooks for local settings and thread management to improve performance and clarity.
- Added temporal awareness guidelines to deep research skill documentation.
* feat: refactor chat components and introduce thread management hooks
* feat: improve artifact file detail preview logic and clean up console logs
* feat: refactor lead agent creation logic and improve logging details
* feat: validate agent name format and enhance error handling in agent setup
* feat: simplify thread search query by removing unnecessary metadata
* feat: update query key in useDeleteThread and useRenameThread for consistency
* feat: add isMock parameter to thread and artifact handling for improved testing
* fix: reorder import of setup_agent for consistency in builtins module
* feat: append mock parameter to thread links in CaseStudySection for testing purposes
* fix: update load_agent_soul calls to use cfg.name for improved clarity
* fix: update date format in apply_prompt_template for consistency
* feat: integrate isMock parameter into artifact content loading for enhanced testing
* docs: add license section to SKILL.md for clarity and attribution
* feat(agent): enhance model resolution and agent configuration handling
* chore: remove unused import of _resolve_model_name from agents
* feat(agent): remove unused field
* fix(agent): set default value for requested_model_name in _resolve_model_name function
* feat(agent): update get_available_tools call to handle optional agent_config and improve middleware function signature
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: Add reasoning effort configuration support
* Add `reasoning_effort` parameter to model config and agent initialization
* Support reasoning effort levels (minimal/low/medium/high) for Doubao/GPT-5 models
* Add UI controls in input box for reasoning effort selection
* Update doubao-seed-1.8 example config with reasoning effort support
Fixes & Cleanup:
* Ensure UTF-8 encoding for file operations
* Remove unused imports
* fix: set reasoning_effort to None for unsupported models
* fix: unit test error
* Update frontend/src/components/workspace/input-box.tsx
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
add oauth schema to MCP server config (extensions_config.json)
support client_credentials and refresh_token grants
implement token manager with caching and pre-expiry refresh
inject OAuth Authorization header for MCP tool discovery and tool calls
extend MCP gateway config models to read/write OAuth settings
update docs and examples for OAuth configuration
add unit tests for token fetch/cache and header injection
Validate that all dict-returning client methods conform to Gateway
Pydantic response models (ModelsListResponse, ModelResponse,
SkillsListResponse, SkillResponse, SkillInstallResponse,
McpConfigResponse, UploadResponse, MemoryConfigResponse,
MemoryStatusResponse). Pydantic ValidationError in CI catches
schema drift between client and Gateway with zero production coupling.
Also includes prior review fixes: enhanced client methods, expanded
unit tests (67→77), live integration test improvements, and updated
documentation.
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Add `DeerFlowClient` class that provides direct in-process access to
DeerFlow's agent and Gateway capabilities without requiring LangGraph
Server or Gateway API processes. This enables users to import and use
DeerFlow as a Python library.
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* fix: recover from stale model context after config model changes
* fix: fail fast on missing model config and expand model resolution tests
* fix: remove duplicate get_app_config imports
* fix: align model resolution tests with runtime imports
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: remove duplicate model resolution test case
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat(subagents): make subagent timeout configurable via config.yaml
- Add SubagentsAppConfig supporting global and per-agent timeout_seconds
- Load subagents config section in AppConfig.from_file()
- Registry now applies config.yaml overrides without mutating builtin defaults
- Polling safety-net in task_tool is now dynamic (execution timeout + 60s buffer)
- Document subagents section in config.example.yaml
- Add make test command and enforce TDD policy in CLAUDE.md
- Add 38 unit tests covering config validation, timeout resolution, registry
override behavior, and polling timeout formula
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(subagents): add logging for subagent timeout config and execution
- Log loaded timeout config (global default + per-agent overrides) on startup
- Log debug message in registry when config.yaml overrides a builtin timeout
- Include timeout in executor's async execution start log
- Log effective timeout and polling limit when a task is dispatched
- Fix UnboundLocalError: move max_poll_count assignment before logger.info
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* ci(backend): add lint step and run all unit tests via Makefile
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix lint
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add AioSandboxProvider for Docker-based sandbox execution with
configurable container lifecycle, volume mounts, and port management
- Add TitleMiddleware to auto-generate thread titles after first
user-assistant exchange using LLM
- Add Claude Code documentation (CLAUDE.md, AGENTS.md)
- Extend SandboxConfig with Docker-specific options (image, port, mounts)
- Fix hardcoded mount path to use expanduser
- Add agent-sandbox and dotenv dependencies
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>