8955b3222a
11 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
aa015462a7
|
feat(im): Add user-owned IM channel connections (#3487)
* Add user-owned IM channel connections
* Fix dev startup and channel connect popup
* Use async channel connect flow
* Harden dev service daemon startup
* Support local IM channel connections
* Align IM connections with local channels
* Fix safe user id digest algorithm
* Address Copilot IM channel feedback
* Address IM channel review comments
* Support all integrated IM channel connections
* Format additional channel connection tests
* Keep unavailable channel connect buttons clickable
* Fix IM channel provider icons
* Add runtime setup for enabled IM channels
* Guard global shortcut key handling
* Keep configured IM channels editable
* Avoid password autofill for channel secrets
* Make channel threads visible to connection owners
* Persist IM runtime config locally
* Allow disconnecting runtime IM channels
* Route no-auth channel sessions to local user
* Use default user for auth-disabled local mode
* Show IM channel source on threads
* Prefill IM channel runtime config
* Reflect IM channel runtime health
* Ignore Feishu message read events
* Ignore Feishu non-content message events
* Let setup wizard enable IM channels
* Fix frontend formatting after merge
* Stabilize backend tests without local config
* Isolate channel runtime config tests
* Address channel connection review comments
* Use sha256 user buckets with legacy migration
* Ensure runtime IM channels are ready after restart
* Persist disconnected IM channel state
* Address channel connection review comments
* Address channel connection review findings
Frontend connect flow:
- Open the runtime-config dialog only when a provider still needs
credentials; configured providers go straight to the connect flow, so
the binding-code/deep-link path is reachable from the UI again.
- After saving credentials, continue into the connect flow when a user
binding is still required (multi-user mode) instead of stopping at a
"Connected" toast.
- Extract shared provider-state helpers to core/channels/provider-state
and add unit + e2e coverage for the direct-connect and
configure-then-connect paths.
Provider status semantics:
- Report connection_status from the user's newest connection row;
with no binding it is not_connected, except in auth-disabled local
mode where a configured running channel is effectively connected.
Concurrency and event-loop correctness:
- Offload ChannelRuntimeConfigStore construction and writes, channel
service construction, and Slack connection replies to threads; add a
tests/blocking_io/ anchor for the runtime-config handlers.
- Consume binding codes with a conditional UPDATE so a code can only be
used once under concurrent workers; retry upsert_connection as an
update when a concurrent insert wins the unique constraint.
- Serialize ensure_channel_ready per channel so concurrent provider
polls cannot double-start a channel worker.
Config and migration hardening:
- Stop mutating the get_app_config()-cached Telegram provider config;
the runtime store now owns the UI-entered bot username.
- Register channel_connections in STARTUP_ONLY_FIELDS with the
standardized startup-only Field description.
- Match the legacy unsafe-id bucket by recomputing its exact SHA-1 name
so another user's same-prefix bucket can never be migrated.
- Remove the unused Telegram process_webhook_update path and document
src/core/channels in the frontend docs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Address PR review comments on authz scoping and channel runtime
Security (review feedback from ShenAC-SAC):
- Scope internal-token callers to the connection owner carried in
X-DeerFlow-Owner-User-Id instead of bypassing owner checks outright,
in both require_permission(owner_check=True) and the stateless run
endpoints. Internal callers keep access to their own and
shared/legacy threads, and may claim a default-owned channel thread
for its real owner, but a leaked internal token no longer grants
cross-user thread access.
- Require admin privileges for POST/DELETE /api/channels/{provider}/
runtime-config: runtime credentials and channel workers are
instance-wide shared state (same model as the MCP config API).
Read-only provider listing stays available to all users.
Performance (review feedback from willem-bd):
- Skip the redundant thread channel-metadata PATCH after the first
successful backfill per thread.
- Reuse the per-connection Slack WebClient until its token changes
instead of constructing one per outbound message.
- Reconcile channel readiness for all providers concurrently in
GET /api/channels/providers.
Also resolve the code-quality unused-import flag in the blocking-io
anchor by pre-importing the channel service via importlib.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Fix prettier formatting in provider-state test
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Reconcile UI runtime channel config with config reload on restart
Main now reloads a channel's config.yaml entry on restart_channel()
(#3514, issue #3497). Adapt the user-owned connection flow to coexist:
- configure_channel() restarts with reload_config=False — the caller
just supplied the authoritative config (browser-entered credentials
that are never written to config.yaml), so a file reload must not
clobber it with the stale on-disk entry.
- _load_channel_config() re-applies the UI runtime-store overlay used
at startup, so an operator-triggered restart keeps browser-entered
credentials for channels without a config.yaml entry and does not
resurrect a channel disconnected from the UI.
- Offload the reload's disk IO (config.yaml + runtime store) with
asyncio.to_thread, matching the blocking-IO policy on this branch.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
|
||
|
|
3ae82dc663
|
fix(mcp): add auth interceptor with channel user_id and keep header propagation to mcp tools (#3294)
* 修复channel中的user_id传递到interceptor中的bug, mcp可通过header传递user_id到mcp工具 Co-authored-by: Cursor <cursoragent@cursor.com> * fix(channel,mcp,gateway): normalize channel user_id and add regression tests Normalize external channel user ids into filesystem-safe runtime context while preserving raw channel_user_id, and document gateway user_id propagation semantics. Add regression coverage for channel user_id context mapping, gateway user_id precedence/internal-role behavior, and MCP interceptor header forwarding via meta.headers. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(auth,mcp): harden user id normalization and header handling Increase sanitized user-id digest suffix to 16 hex chars, replace internal system role magic string with a shared constant, and harden MCP header forwarding with Mapping type checks. Add regression tests for empty channel user_id handling, unsupported header types, and updated digest length behavior. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: zhongli <335302680@qq.com> Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
9c03a71a07
|
fix(gateway): preserve message additional_kwargs in normalize_input (#3132) (#3136)
* fix(gateway): preserve message additional_kwargs in normalize_input (#3132) The gateway's hand-rolled dict→message coercion only forwarded `content` and collapsed every role to `HumanMessage`, silently dropping the frontend's `additional_kwargs.files` payload (along with `id`, `name`, and ai/system/tool roles). Effect on issue #3132: - `UploadsMiddleware` saw no `files` on the last human message, so the just-uploaded file got bucketed under "previous messages" while the current turn was reported as `(empty)`. - The persisted human message had no `files`, so the attachment chip on the message disappeared the moment the optimistic UI cleared. Delegate the conversion to `langchain_core.messages.utils.convert_to_messages` so `additional_kwargs`, `id`, `name`, and non-human roles round-trip unchanged. * fix(gateway): convert malformed-message ValueError into HTTP 400 normalize_input now sits at the request boundary, so a malformed input.messages[N] dict (missing role/type/content, unsupported role, etc.) should surface as 400 with the offending index — not bubble out of FastAPI as 500. Per Copilot review on #3136. |
||
|
|
923f516deb
|
feat(trace):LangGraph -> lead_agent and set custom agent_name to run_name (#3101)
* feat(trace):LangGraph -> lead_agent and set user custom agent name to run_name * feat(trace):follow github copilot suggest * feat(trace):Refactor run_name resolution and improve test coverage |
||
|
|
1c96a6afc8
|
fix: keep new agent bootstrap in user scope (#2784) | ||
|
|
78633c69ac
|
fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2679)
* fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2677) When creating a custom agent via the web UI, SOUL.md was always written to the global base_dir/SOUL.md instead of agents/<name>/SOUL.md. Root cause: the bootstrap flow sends agent_name via body.context, but two layers were broken: 1. services.py only forwarded body.context keys into config["configurable"]; config["context"] was never populated. 2. worker.py constructed the parent Runtime with a hard-coded {thread_id, run_id} context, ignoring config["context"] entirely. After the langgraph >= 1.1.9 bump (#98a5b34f), ToolRuntime.context no longer falls back to configurable, so setup_agent's runtime.context.get("agent_name") returned None and the tool's silent agent_name=None -> base_dir fallback kicked in, overwriting the global SOUL.md. Fix: - services.py: extract merge_run_context_overrides() and write the whitelisted context keys into both configurable (legacy readers) and context (langgraph 1.1+ ToolRuntime consumers). - worker.py: extract _build_runtime_context() and merge config["context"] into the Runtime's context (without letting callers override thread_id/run_id). The base_dir fallback in setup_agent_tool.py is left in place because the IM /bootstrap channel command depends on it. That code path can be tightened in a follow-up. Adds regression tests covering both helpers. * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
b970993425
|
fix: read lead agent options from context (#2515)
* fix: read lead agent options from context * fix: validate runtime context config |
||
|
|
1fb5acee39
|
fix(gateway): prevent 400 error when client sends context with configurable (#1660)
* fix(gateway): prevent 400 error when client sends context with configurable Fixes #1290 LangGraph >= 0.6.0 rejects requests that include both 'configurable' and 'context' in the run config. If the client (e.g. useStream hook) sends a 'context' key, we now honour it and skip creating our own 'configurable' dict to avoid the conflict. When no 'context' is provided, we fall back to the existing 'configurable' behaviour with thread_id. * fix(gateway): address review feedback — warn on dual keys, fix runtime injection, add tests - Log a warning when client sends both 'context' and 'configurable' so it's no longer silently dropped (reviewer feedback) - Ensure thread_id is available in config['context'] when present so middlewares can find it there too - Add test coverage for the context path, the both-keys-present case, passthrough of other keys, and the no-config fallback * style: ruff format services.py --------- Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
c2ff59a5b1
|
fix(gateway): merge context field into configurable for langgraph-compat runs (#1699) (#1707)
The langgraph-compat layer dropped the DeerFlow-specific `context` field from run requests, causing agent config (subagent_enabled, is_plan_mode, thinking_enabled, etc.) to fall back to defaults. Add `context` to RunCreateRequest and merge allowlisted keys into config.configurable in start_run, with existing configurable values taking precedence. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
6ff60f2af1
|
fix(gateway): forward assistant_id as agent_name in build_run_config (#1667)
* fix(gateway): forward assistant_id as agent_name in build_run_config Fixes #1644 When the LangGraph Platform-compatible /runs endpoint receives a custom assistant_id (e.g. 'finalis'), the Gateway's build_run_config() silently ignored it — configurable['agent_name'] was never set, so make_lead_agent fell through to the default lead agent and SOUL.md was never loaded. Root cause (introduced in #1403): resolve_agent_factory() correctly falls back to make_lead_agent for all assistant_id values, but build_run_config() had no assistant_id parameter and never injected configurable['agent_name']. The full call chain: POST /runs (assistant_id='finalis') → resolve_agent_factory('finalis') # returns make_lead_agent ✓ → build_run_config(thread_id, ...) # no agent_name injected ✗ → make_lead_agent(config) → cfg.get('agent_name') → None → load_agent_soul(None) → base SOUL.md (doesn't exist) → None Fix: - Add keyword-only parameter to build_run_config(). - When assistant_id is set and differs from 'lead_agent', inject it as configurable['agent_name'] (matching the channel manager's existing _resolve_run_params() logic for IM channels). - Honour an explicit configurable['agent_name'] in the request body; assistant_id mapping only fills the gap when it is absent. - Remove stale log-only branch from resolve_agent_factory(); update docstring to explain the factory/configurable split. Tests added (test_gateway_services.py): - Custom assistant_id injects configurable['agent_name'] - 'lead_agent' assistant_id does NOT inject agent_name - None assistant_id does NOT inject agent_name - Explicit configurable['agent_name'] in request is not overwritten - resolve_agent_factory returns make_lead_agent for all inputs * style: format with ruff * fix: validate and normalize assistant_id to prevent path traversal Addresses Copilot review: strip/lowercase/replace underscores and reject names that don't match [a-z0-9-]+, consistent with ChannelManager._normalize_custom_agent_name(). --------- Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com> |
||
|
|
34e835bc33
|
feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli (#1403)
* feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli
Implement all core LangGraph Platform API endpoints in the Gateway,
allowing it to fully replace the langgraph-cli dev server for local
development. This eliminates a heavyweight dependency and simplifies
the development stack.
Changes:
- Add runs lifecycle endpoints (create, stream, wait, cancel, join)
- Add threads CRUD and search endpoints
- Add assistants compatibility endpoints (search, get, graph, schemas)
- Add StreamBridge (in-memory pub/sub for SSE) and async provider
- Add RunManager with atomic create_or_reject (eliminates TOCTOU race)
- Add worker with interrupt/rollback cancel actions and runtime context injection
- Route /api/langgraph/* to Gateway in nginx config
- Skip langgraph-cli startup by default (SKIP_LANGGRAPH_SERVER=0 to restore)
- Add unit tests for RunManager, SSE format, and StreamBridge
* fix: drain bridge queue on client disconnect to prevent backpressure
When on_disconnect=continue, keep consuming events from the bridge
without yielding, so the worker is not blocked by a full queue.
Only on_disconnect=cancel breaks out immediately.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: remove pytest import
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: Fix default stream_mode to ["values", "messages-tuple"]
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: Remove unused if_exists field from ThreadCreateRequest
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: address review comments on gateway LangGraph API
- Mount runs.py router in app.py (missing include_router)
- Normalize interrupt_before/after "*" to node list before run_agent()
- Use entry.id for SSE event ID instead of counter
- Drain bridge queue on disconnect when on_disconnect=continue
- Reuse serialization helper in wait_run() for consistent wire format
- Reject unsupported multitask_strategy with 400
- Remove SKIP_LANGGRAPH_SERVER fallback, always use Gateway
* feat: extract app.state access into deps.py
Encapsulate read/write operations for singleton objects (RunManager,
StreamBridge, checkpointer) held in app.state into a shared utility,
reducing repeated access patterns across router modules.
* feat: extract deerflow.runtime.serialization module with tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: replace duplicated serialization with deerflow.runtime.serialization
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: extract app/gateway/services.py with run lifecycle logic
Create a service layer that centralizes SSE formatting, input/config
normalization, and run lifecycle management. Router modules will delegate
to these functions instead of using private cross-imported helpers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: wire routers to use services layer, remove cross-module private imports
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff formatting to refactored files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(runtime): support LangGraph dev server and add compat route
- Enable official LangGraph dev server for local development workflow
- Decouple runtime components from agents package for better separation
- Provide gateway-backed fallback route when dev server is skipped
- Simplify lifecycle management using context manager in gateway
* feat(runtime): add Store providers with auto-backend selection
- Add async_provider.py and provider.py under deerflow/runtime/store/
- Support memory, sqlite, postgres backends matching checkpointer config
- Integrate into FastAPI lifespan via AsyncExitStack in deps.py
- Replace hardcoded InMemoryStore with config-driven factory
* refactor(gateway): migrate thread management from checkpointer to Store and resolve multiple endpoint failures
- Add Store-backed CRUD helpers (_store_get, _store_put, _store_upsert)
- Replace checkpoint-scanning search with two-phase strategy:
phase 1 reads Store (O(threads)), phase 2 backfills from checkpointer
for legacy/LangGraph Server threads with lazy migration
- Extend Store record schema with values field for title persistence
- Sync thread title from checkpoint to Store after run completion
- Fix /threads/{id}/runs/{run_id}/stream 405 by accepting both
GET and POST methods; POST handles interrupt/rollback actions
- Fix /threads/{id}/state 500 by separating read_config and
write_config, adding checkpoint_ns to configurable, and
shallow-copying checkpoint/metadata before mutation
- Sync title to Store on state update for immediate search reflection
- Move _upsert_thread_in_store into services.py, remove duplicate logic
- Add _sync_thread_title_after_run: await run task, read final
checkpoint title, write back to Store record
- Spawn title sync as background task from start_run when Store exists
* refactor(runtime): deduplicate store and checkpointer provider logic
Extract _ensure_sqlite_parent_dir() helper into checkpointer/provider.py
and use it in all three places that previously inlined the same mkdir logic.
Consolidate duplicate error constants in store/async_provider.py by importing
from store/provider.py instead of redefining them.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(runtime): move SQLite helpers to runtime/store, checkpointer imports from store
_resolve_sqlite_conn_str and _ensure_sqlite_parent_dir now live in
runtime/store/provider.py. agents/checkpointer/provider and
agents/checkpointer/async_provider import from there, reversing the
previous dependency direction (store → checkpointer becomes
checkpointer → store).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(runtime): extract SQLite helpers into runtime/store/_sqlite_utils.py
Move resolve_sqlite_conn_str and ensure_sqlite_parent_dir out of
checkpointer/provider.py into a dedicated _sqlite_utils module.
Functions are now public (no underscore prefix), making cross-module
imports semantically correct. All four provider files import from
the single shared location.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gateway): use adelete_thread to fully remove thread checkpoints on delete
AsyncSqliteSaver has no adelete method — the previous hasattr check
always evaluated to False, silently leaving all checkpoint rows in the
database. Switch to adelete_thread(thread_id) which deletes every
checkpoint and pending-write row for the thread across all namespaces
(including sub-graph checkpoints).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gateway): remove dead bridge_cm/ckpt_cm code and fix StrEnum lint
app.py had unreachable code after the async-with lifespan refactor:
bridge_cm and ckpt_cm were referenced but never defined (F821), and
the channel service startup/shutdown was outside the langgraph_runtime
block so it never ran. Move channel service lifecycle inside the
async-with block where it belongs.
Replace str+Enum inheritance in RunStatus and DisconnectMode with
StrEnum as suggested by UP042.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: format with ruff
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
|