deerflow2

History

greatmengqi b1aabe88b8 fix(backend): stream DeerFlowClient AI text as token deltas (#1969 ) (#1974 ) * fix(backend): stream DeerFlowClient AI text as token deltas (#1969) DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values", "custom"] which only delivers full-state snapshots at graph-node boundaries, so AI replies were dumped as a single messages-tuple event per node instead of streaming token-by-token. `client.stream("hello")` looked identical to `client.chat("hello")` — the bug reported in #1969. Subscribe to "messages" mode as well, forward AIMessageChunk deltas as messages-tuple events with delta semantics (consumers accumulate by id), and dedup the values-snapshot path so it does not re-synthesize AI text that was already streamed. Introduce a per-id usage_metadata counter so the final AIMessage in the values snapshot and the final "messages" chunk — which carry the same cumulative usage — are not double-counted. chat() now accumulates per-id deltas and returns the last message's full accumulated text. Non-streaming mock sources (single event per id) are a degenerate case of the same logic, keeping existing callers and tests backward compatible. Verified end-to-end against a real LLM: a 15-number count emits 35 messages-tuple events with BPE subword boundaries clearly visible ("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across the window, end-event usage matches the values-snapshot usage exactly (not doubled). tests/test_client_live.py::TestLiveStreaming passes. New unit tests: - test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3 delta events with correct content/id/usage, values-snapshot does not duplicate, usage counted once. - test_chat_accumulates_streamed_deltas: chat() rebuilds full text from deltas. - test_messages_mode_tool_message: ToolMessage delivered via messages mode is not duplicated by the values-snapshot synthesis path. The stream() docstring now documents why this client does not reuse Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw LangChain objects vs serialized dicts, single caller vs HTTP fan-out). Fixes #1969 * refactor(backend): simplify DeerFlowClient streaming helpers (#1969) Post-review cleanup for the token-level streaming fix. No behavior change for correct inputs; one efficiency regression fixed. Fix: chat() O(n²) accumulator ----------------------------- `chat()` accumulated per-id text via `buffers[id] = buffers.get(id,"") + delta`, which is O(n) per concat → O(n²) total over a streamed response. At ~2 KB cumulative text this becomes user-visible; at 50 KB / 5000 chunks it costs roughly 100-300 ms of pure copying. Switched to `dict[str, list[str]]` + `"".join()` once at return. Cleanup ------- - Extract `_serialize_tool_calls`, `_ai_text_event`, `_ai_tool_calls_event`, and `_tool_message_event` static helpers. The messages-mode and values-mode branches previously repeated four inline dict literals each; they now call the same builders. - `StreamEvent.type` is now typed as `Literal["values", "messages-tuple", "custom", "end"]` via a `StreamEventType` alias. Makes the closed set explicit and catches typos at type-check time. - Direct attribute access on `AIMessage`/`AIMessageChunk`: `.usage_metadata`, `.tool_calls`, `.id` all have default values on the base class, so the `getattr(..., None)` fallbacks were dead code. Removed from the hot path. - `_account_usage` parameter type loosened to `Any` so that LangChain's `UsageMetadata` TypedDict is accepted under strict type checking. - Trimmed narrating comments on `seen_ids` / `streamed_ids` / the values-synthesis skip block; kept the non-obvious ones that document the cross-mode dedup invariant. Net diff: -15 lines. All 132 unit tests + harness boundary test still pass; ruff check and ruff format pass. * docs(backend): add STREAMING.md design note (#1969) Dedicated design document for the token-level streaming architecture, prompted by the bug investigation in #1969. Contents: - Why two parallel streaming paths exist (Gateway HTTP/async vs DeerFlowClient sync/in-process) and why they cannot be merged. - LangGraph's three-layer mode naming (Graph "messages" vs Platform SDK "messages-tuple" vs HTTP SSE) and why a shared string constant would be harmful. - Gateway path: run_agent + StreamBridge + sse_consumer with a sequence diagram. - DeerFlowClient path: sync generator + direct yield, delta semantics, chat() accumulator. - Why the three id sets (seen_ids / streamed_ids / counted_usage_ids) each carry an independent invariant and cannot be collapsed. - End-to-end sequence for a real conversation turn. - Lessons from #1969: why mock-based tests missed the bug, why BPE subword boundaries in live output are the strongest correctness signal, and the regression test that locks it in. - Source code location index. Also: - Link from backend/CLAUDE.md Embedded Client section. - Link from backend/docs/README.md under Feature Documentation. * test(backend): add refactor regression guards for stream() (#1969) Three new tests in TestStream that lock the contract introduced by PR #1974 so any future refactor (sync->async migration, sharing a core with Gateway's run_agent, dedup strategy change) cannot silently change behavior. - test_dedup_requires_messages_before_values_invariant: canary that documents the order-dependence of cross-mode dedup. streamed_ids is populated only by the messages branch, so values-before-messages for the same id produces duplicate AI text events. Real LangGraph never inverts this order, but a refactor that does (or that makes dedup idempotent) must update this test deliberately. - test_messages_mode_golden_event_sequence: locks the exact event sequence (4 events: 2 messages-tuple deltas, 1 values snapshot, 1 end) for a canonical streaming turn. List equality gives a clear diff on any drift in order, type, or payload shape. - test_chat_accumulates_in_linear_time: perf canary for the O(n^2) fix in commit 1f11ba10. 10,000 single-char chunks must accumulate in under 1s; the threshold is wide enough to pass on slow CI but tight enough to fail if buffer = buffer + delta is restored. All three tests pass alongside the existing 12 TestStream tests (15/15). ruff check + ruff format clean. * docs(backend): clarify stream() docstring on JSON serialization (#1969) Replace the misleading "raw LangChain objects (AIMessage, usage_metadata as dataclasses), not dicts" claim in the "Why not reuse Gateway's run_agent?" section. The implementation already yields plain Python dicts (StreamEvent.data is dict, and usage_metadata is a TypedDict), so the original wording suggested a richer return type than the API actually delivers. The corrected wording focuses on what is actually true and relevant: this client skips the JSON/SSE serialization layer that Gateway adds for HTTP wire transmission, and yields stream event payloads directly as Python data structures. Addresses Copilot review feedback on PR #1974. * test(backend): document none-id messages dedup limitation (#1969) Add test_none_id_chunks_produce_duplicates_known_limitation to TestStream that explicitly documents and asserts the current behavior when an LLM provider emits AIMessageChunk with id=None (vLLM, certain custom backends). The cross-mode dedup machinery cannot record a None id in streamed_ids (guarded by ``if msg_id:``), so the values snapshot's reassembled AIMessage with a real id falls through and synthesizes a duplicate AI text event. The test asserts len == 2 and locks this as a known limitation rather than silently letting future contributors hit it without context. Why this is documented rather than fixed: * Falling back to ``metadata.get("id")`` does not help — LangGraph's messages-mode metadata never carries the message id. * Synthesizing ``f"_synth_{id(msg_chunk)}"`` only helps if the values snapshot uses the same fallback, which it does not. * A real fix requires provider cooperation (always emit chunk ids) or content-based dedup (false-positive risk), neither of which belongs in this PR. If a real fix lands, replace this test with a positive assertion that dedup works for None-id chunks. Addresses Copilot review feedback on PR #1974 (client.py:515). * fix(frontend): UI polish - fix CSS typo, dark mode border, and hardcoded colors (#1942) - Fix `font-norma` typo to `font-normal` in message-list subtask count - Fix dark mode `--border` using reddish hue (22.216) instead of neutral - Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground` - Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground` - Add missing `font-sans` to welcome description `<pre>` for consistency - Make case-study-section padding responsive (`px-4 md:px-20`) Closes #1940 * docs: clarify deployment sizing guidance (#1963) * fix(frontend): prevent stale 'new' thread ID from triggering 422 history requests (#1960) After history.replaceState updates the URL from /chats/new to /chats/{UUID}, Next.js useParams does not update because replaceState bypasses the router. The useEffect in useThreadChat would then set threadIdFromPath ('new') as the threadId, causing the LangGraph SDK to call POST /threads/new/history which returns HTTP 422 (Invalid thread ID: must be a UUID). This fix adds a guard to skip the threadId update when threadIdFromPath is the literal string 'new', preserving the already-correct UUID that was set when the thread was created. * fix(frontend): avoid using route new as thread id (#1967) Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> * Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965) * Fix event loop conflict in SubagentExecutor.execute() When SubagentExecutor.execute() is called from within an already-running event loop (e.g., when the parent agent uses async/await), calling asyncio.run() creates a new event loop that conflicts with asyncio primitives (like httpx.AsyncClient) that were created in and bound to the parent loop. This fix detects if we're already in a running event loop, and if so, runs the subagent in a separate thread with its own isolated event loop to avoid conflicts. Fixes: sub-task cards not appearing in Ultra mode when using async parent agents Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(subagent): harden isolated event loop execution --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(backend): remove dead getattr in _tool_message_event --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> Co-authored-by: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com> Co-authored-by: 13ernkastel <LennonCMJ@live.com> Co-authored-by: siwuai <458372151@qq.com> Co-authored-by: 肖 <168966994+luoxiao6645@users.noreply.github.com> Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> Co-authored-by: Saber <11769524+hawkli-1994@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>		2026-04-10 18:16:38 +08:00
..
conftest.py	feat(dx): Setup Wizard + doctor command — closes #2030 (#2034 )	2026-04-10 17:43:39 +08:00
test_acp_config.py	feat(acp): add env field to ACPAgentConfig for subprocess env injection (#1447 )	2026-03-27 20:03:30 +08:00
test_aio_sandbox.py	fix: prevent concurrent subagent file write conflicts in sandbox tools (#1714 )	2026-04-02 15:39:41 +08:00
test_aio_sandbox_local_backend.py	fix: use safe docker bind mount syntax for sandbox mounts (#1655 )	2026-04-01 11:42:12 +08:00
test_aio_sandbox_provider.py	fix Windows Docker sandbox path mounting (#1634 )	2026-03-31 22:19:27 +08:00
test_app_config_reload.py	fix(config): reload AppConfig when config path or mtime changes (#1239 )	2026-03-22 20:34:01 +08:00
test_artifacts_router.py	fix(gateway): enforce safe download for active artifact MIME types to mitigate stored XSS (#1389 )	2026-03-26 17:44:25 +08:00
test_channel_file_attachments.py	Feature/feishu receive file (#1608 )	2026-04-06 22:14:12 +08:00
test_channels.py	Feature/feishu receive file (#1608 )	2026-04-06 22:14:12 +08:00
test_checkpointer.py	Move async SQLite mkdir off the event loop (#1921 )	2026-04-07 10:47:20 +08:00
test_checkpointer_none_fix.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_clarification_middleware.py	fix(middleware): handle string-serialized options in ClarificationMiddleware (#1997 )	2026-04-08 21:04:20 +08:00
test_claude_provider_oauth_billing.py	fix(oauth): Harden Claude OAuth cache-control handling (#1583 )	2026-03-30 07:41:18 +08:00
test_cli_auth_providers.py	fix(provider): preserve streamed Codex output when response.completed.output is empty (#1928 )	2026-04-07 18:21:22 +08:00
test_client.py	fix(backend): stream DeerFlowClient AI text as token deltas (#1969 ) (#1974 )	2026-04-10 18:16:38 +08:00
test_client_e2e.py	[Security] Address critical host-shell escape in LocalSandboxProvider (#1547 )	2026-03-29 21:03:58 +08:00
test_client_live.py	[Security] Address critical host-shell escape in LocalSandboxProvider (#1547 )	2026-03-29 21:03:58 +08:00
test_codex_provider.py	fix: resolve missing serialized kwargs in PatchedChatDeepSeek (#2025 )	2026-04-09 16:07:16 +08:00
test_config_version.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_create_deerflow_agent.py	fix(backend): preserve viewed image reducer metadata (#1900 )	2026-04-06 16:47:19 +08:00
test_create_deerflow_agent_live.py	feat: add create_deerflow_agent SDK entry point (Phase 1) (#1203 )	2026-03-29 15:31:18 +08:00
test_credential_loader.py	feat: add Claude Code OAuth and Codex CLI as LLM providers (#1166 )	2026-03-22 22:39:50 +08:00
test_custom_agent.py	fix: include soul field in GET /api/agents list response (fixes #1819 ) (#1863 )	2026-04-05 10:49:58 +08:00
test_dangling_tool_call_middleware.py	test: add unit tests for DanglingToolCallMiddleware (#1305 )	2026-03-26 00:20:08 +08:00
test_docker_sandbox_mode_detection.py	fix Windows Docker sandbox path mounting (#1634 )	2026-03-31 22:19:27 +08:00
test_doctor.py	feat(dx): Setup Wizard + doctor command — closes #2030 (#2034 )	2026-04-10 17:43:39 +08:00
test_exa_tools.py	feat(community): add Exa search as community tool provider (#1357 )	2026-04-08 17:13:39 +08:00
test_feishu_parser.py	Feature/feishu receive file (#1608 )	2026-04-06 22:14:12 +08:00
test_file_conversion.py	fix(uploads): handle split-bold headings and artefacts in extract_outline (#1838 )	2026-04-04 14:25:08 +08:00
test_firecrawl_tools.py	feat(dx): Setup Wizard + doctor command — closes #2030 (#2034 )	2026-04-10 17:43:39 +08:00
test_gateway_services.py	fix(gateway): prevent 400 error when client sends context with configurable (#1660 )	2026-04-01 23:21:32 +08:00
test_guardrail_middleware.py	feat(guardrails): add pre-tool-call authorization middleware with pluggable providers (#1240 )	2026-03-23 18:07:33 +08:00
test_harness_boundary.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_infoquest_client.py	feat(harness): integration ACP agent tool (#1344 )	2026-03-26 14:20:18 +08:00
test_invoke_acp_agent_tool.py	fix ACP mcpServers payload (#1735 )	2026-04-03 15:28:56 +08:00
test_jina_client.py	refactor: replace sync requests with async httpx in Jina AI client (#1603 )	2026-04-01 17:02:39 +08:00
test_lead_agent_model_resolution.py	ci: enforce code formatting checks for backend and frontend (#1536 )	2026-03-29 15:34:38 +08:00
test_lead_agent_prompt.py	fix(agent): file-io path guidance in agent prompts (#2019 )	2026-04-09 16:12:34 +08:00
test_lead_agent_skills.py	fix(skill): make skill prompt cache refresh nonblocking (#1924 )	2026-04-07 10:50:34 +08:00
test_llm_error_handling_middleware.py	Fix/1681 llm call retry handling (#1683 )	2026-04-02 10:12:17 +08:00
test_local_bash_tool_loading.py	[Security] Address critical host-shell escape in LocalSandboxProvider (#1547 )	2026-03-29 21:03:58 +08:00
test_local_sandbox_encoding.py	fix: add Windows shell fallback for local sandbox (#1505 )	2026-03-29 21:31:29 +08:00
test_local_sandbox_provider_mounts.py	feat(sandbox): add read-only support for local sandbox path mappings (#1808 )	2026-04-03 19:46:22 +08:00
test_loop_detection_middleware.py	fix(backend): make loop detection hash tool calls by stable keys (#1911 )	2026-04-07 17:46:33 +08:00
test_mcp_client_config.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_mcp_oauth.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_mcp_sync_wrapper.py	feat(harness): integration ACP agent tool (#1344 )	2026-03-26 14:20:18 +08:00
test_memory_prompt_injection.py	fix: inject longTermBackground into memory prompt (#1734 )	2026-04-03 11:21:58 +08:00
test_memory_queue.py	fix(memory): case-insensitive fact deduplication and positive reinforcement detection (#1804 )	2026-04-05 16:23:00 +08:00
test_memory_router.py	feat(memory): structured reflection + correction detection in MemoryMiddleware (#1620 ) (#1668 )	2026-04-01 16:45:29 +08:00
test_memory_storage.py	ci: enforce code formatting checks for backend and frontend (#1536 )	2026-03-29 15:34:38 +08:00
test_memory_updater.py	fix(memory): case-insensitive fact deduplication and positive reinforcement detection (#1804 )	2026-04-05 16:23:00 +08:00
test_memory_upload_filtering.py	fix(memory): case-insensitive fact deduplication and positive reinforcement detection (#1804 )	2026-04-05 16:23:00 +08:00
test_model_config.py	feat(codex): support explicit OpenAI Responses API config (#1235 )	2026-03-22 20:39:26 +08:00
test_model_factory.py	feat(config): add when_thinking_disabled support for model configs (#1970 )	2026-04-09 18:49:00 +08:00
test_patched_deepseek.py	fix: resolve missing serialized kwargs in PatchedChatDeepSeek (#2025 )	2026-04-09 16:07:16 +08:00
test_patched_minimax.py	fix: improve MiniMax code plan integration (#1169 )	2026-03-20 17:18:59 +08:00
test_patched_openai.py	fix(LLM): fixing Gemini thinking + tool calls via OpenAI gateway (#1180 ) (#1205 )	2026-03-26 15:07:05 +08:00
test_present_file_tool_core_logic.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_provisioner_kubeconfig.py	feat(subagents): make subagent timeout configurable via config.yaml (#897 )	2026-02-25 08:39:29 +08:00
test_readability.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_reflection_resolvers.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_run_manager.py	fix: surface configured sandbox mounts to agents (#1638 )	2026-03-31 22:22:30 +08:00
test_run_worker_rollback.py	feat: implement full checkpoint rollback on user cancellation (#1867 )	2026-04-09 17:56:36 +08:00
test_sandbox_audit_middleware.py	feat(sandbox): strengthen bash command auditing with compound splitting and expanded patterns (#1881 )	2026-04-07 17:15:24 +08:00
test_sandbox_orphan_reconciliation.py	fix(sandbox): add startup reconciliation to prevent orphaned container leaks (#1976 )	2026-04-09 17:21:23 +08:00
test_sandbox_orphan_reconciliation_e2e.py	fix(sandbox): add startup reconciliation to prevent orphaned container leaks (#1976 )	2026-04-09 17:21:23 +08:00
test_sandbox_search_tools.py	feat(sandbox): add built-in grep and glob tools (#1784 )	2026-04-03 16:03:06 +08:00
test_sandbox_tools_security.py	fix: preserve virtual path separator style (#1828 )	2026-04-05 15:52:22 +08:00
test_security_scanner.py	Implement skill self-evolution and skill_manage flow (#1874 )	2026-04-06 22:07:11 +08:00
test_serialization.py	feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli (#1403 )	2026-03-30 16:02:23 +08:00
test_serialize_message_content.py	feat(harness): integration ACP agent tool (#1344 )	2026-03-26 14:20:18 +08:00
test_setup_wizard.py	feat(dx): Setup Wizard + doctor command — closes #2030 (#2034 )	2026-04-10 17:43:39 +08:00
test_skill_manage_tool.py	fix(skill): make skill prompt cache refresh nonblocking (#1924 )	2026-04-07 10:50:34 +08:00
test_skills_archive_root.py	refactor: extract shared skill installer and upload manager to harness (#1202 )	2026-03-25 16:28:33 +08:00
test_skills_custom_router.py	fix(skill): make skill prompt cache refresh nonblocking (#1924 )	2026-04-07 10:50:34 +08:00
test_skills_installer.py	Fix Windows backend test compatibility (#1384 )	2026-03-26 17:39:16 +08:00
test_skills_loader.py	Implement skill self-evolution and skill_manage flow (#1874 )	2026-04-06 22:07:11 +08:00
test_skills_parser.py	fix(skills): support parsing multiline YAML strings in SKILL.md frontmatter (#1703 )	2026-04-01 23:08:30 +08:00
test_skills_validation.py	test: add unit tests for skill frontmatter validation (#1309 )	2026-03-27 20:20:31 +08:00
test_sse_format.py	feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli (#1403 )	2026-03-30 16:02:23 +08:00
test_stream_bridge.py	Fix(#1702 ): stream resume run (#1858 )	2026-04-06 14:51:10 +08:00
test_subagent_executor.py	Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965 )	2026-04-08 11:46:06 +08:00
test_subagent_limit_middleware.py	test: add unit tests for SubagentLimitMiddleware (#1306 )	2026-03-25 10:20:16 +08:00
test_subagent_prompt_security.py	fix(agent): file-io path guidance in agent prompts (#2019 )	2026-04-09 16:12:34 +08:00
test_subagent_timeout_config.py	chroe(config):Increase subagent max-turn limits (#1852 )	2026-04-05 15:41:00 +08:00
test_suggestions_router.py	fix: unblock concurrent threads and workspace hydration (#1839 )	2026-04-04 21:19:35 +08:00
test_task_tool_core_logic.py	fix(subagents): add cooperative cancellation for subagent threads (#1873 )	2026-04-07 11:12:25 +08:00
test_thread_data_middleware.py	Fix Windows backend test compatibility (#1384 )	2026-03-26 17:39:16 +08:00
test_threads_router.py	fix(threads): clean up local thread data after thread deletion (#1262 )	2026-03-24 00:36:08 +08:00
test_title_generation.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_title_middleware_core_logic.py	fix: unblock concurrent threads and workspace hydration (#1839 )	2026-04-04 21:19:35 +08:00
test_todo_middleware.py	test: add unit tests for TodoMiddleware (#1307 )	2026-03-26 00:20:50 +08:00
test_token_usage.py	feat(harness): integration ACP agent tool (#1344 )	2026-03-26 14:20:18 +08:00
test_tool_error_handling_middleware.py	refactor: split backend into harness (deerflow.) and app (app.) (#1131 )	2026-03-14 22:55:52 +08:00
test_tool_output_truncation.py	fix: add output truncation to ls_tool to prevent context window overflow (#1896 )	2026-04-06 15:09:57 +08:00
test_tool_search.py	fix: promote deferred tools after tool_search returns schema (#1570 )	2026-03-30 11:23:15 +08:00
test_tracing_config.py	feat(tracing): add optional Langfuse support (#1717 )	2026-04-02 13:06:10 +08:00
test_tracing_factory.py	feat(tracing): add optional Langfuse support (#1717 )	2026-04-02 13:06:10 +08:00
test_uploads_manager.py	Fix Windows backend test compatibility (#1384 )	2026-03-26 17:39:16 +08:00
test_uploads_middleware_core_logic.py	fix(uploads): handle split-bold headings and artefacts in extract_outline (#1838 )	2026-04-04 14:25:08 +08:00
test_uploads_router.py	fix(sandbox): Relax upload permissions for aio sandbox sync (#1409 )	2026-03-27 17:37:44 +08:00
test_vllm_provider.py	feat(models): add vLLM provider support (#1860 )	2026-04-06 15:18:34 +08:00