deerflow2/backend/docs
Ryker_Feng 0bbbbc06f4
feat(community): add Serper Google Images provider for image_search (#3575)
* feat(community): add Serper Google Images provider for image_search

Add a Serper-backed `image_search` tool alongside the existing Serper
`web_search` provider, so users with a SERPER_API_KEY can pull Google
Images results as reference images for downstream image generation.

- Share request/response handling between web_search and image_search
  via `_serper_post` / `_response_items`, with bounded `max_results`
  (capped at 10) and query normalization.
- Add a best-effort SSRF guard (`_safe_public_url`) that rejects
  non-http(s), localhost and private/non-global IP image URLs; filtered
  entries are dropped and never consume the result limit.
- doctor: flag literal `api_key` values in config as a warning and steer
  users toward `.env` + `$SERPER_API_KEY`.
- Docs/config: document the Serper image_search provider and SERPER_API_KEY,
  and discourage committing literal keys to config.yaml.
- Tests: cover the provider end-to-end (100% line coverage on tools.py)
  and the doctor literal-key warning path.

* fix(community): block obfuscated IPv4 literals in Serper image SSRF guard

The image_search SSRF guard only rejected dotted-decimal IP literals; encoded
forms such as decimal (http://2130706433/), hex (0x7f000001) and octal
(0177.0.0.1) raised ValueError in ip_address() and were allowed through, even
though many HTTP clients resolve them to private addresses like 127.0.0.1.

Add _decode_ipv4() to permissively decode these inet_aton-style encodings and
apply the same is_global check; hostnames that do not decode to an IP (e.g.
cafe.com) are still treated as hosts and left to fetch-time re-validation.

Addresses PR review feedback. Tests cover decimal/hex/octal loopback and
private encodings plus non-IP edge cases; tools.py stays at 100% line coverage.

* test(community): cover IPv4-mapped IPv6 URL filtering

* fix(community): address Serper image search review feedback

- Block trailing-dot hostname SSRF bypass (localhost./127.0.0.1.) in
  _safe_public_url by stripping the FQDN root label before checks.
- Keep a filtered image/thumbnail URL empty instead of collapsing onto
  its counterpart, preserving the high-res/preview contract.
- Evaluate the SSRF guard once per field rather than twice.
- Treat a null-typed organic/images field as "no results" rather than a
  malformed payload.
- doctor.py: when a config $VAR is unset, fall through to the default env
  var before reporting it as not set.
2026-06-18 07:36:35 +08:00
..
API.md fix: add MCP tools cache reset endpoint (#3602) 2026-06-16 23:20:20 +08:00
APPLE_CONTAINER.md Fix command syntax for container image pull (#1349) 2026-03-26 00:14:08 +08:00
ARCHITECTURE.md fix: add MCP tools cache reset endpoint (#3602) 2026-06-16 23:20:20 +08:00
AUTH_DESIGN.md docs: document auth design and user isolation (#2913) 2026-05-12 23:07:11 +08:00
AUTH_TEST_DOCKER_GAP.md docs: clean gateway runtime transition remnants (#3334) 2026-06-02 10:03:28 +08:00
AUTH_TEST_PLAN.md docs: clean standalone LangGraph server remnants (#3301) 2026-05-29 11:36:45 +08:00
AUTH_UPGRADE.md docs: clean gateway runtime transition remnants (#3334) 2026-06-02 10:03:28 +08:00
AUTO_TITLE_GENERATION.md docs: fix some broken links (#1864) 2026-04-05 15:35:42 +08:00
BLOCKING_IO_DETECTION.md feat(skill): add blocking-io-guard — SOP skill for blocking-IO triage and runtime anchors (#3503) 2026-06-12 10:20:38 +08:00
CONFIGURATION.md feat(community): add Serper Google Images provider for image_search (#3575) 2026-06-18 07:36:35 +08:00
FILE_UPLOAD.md fix(uploads): enforce streaming upload limits in gateway (#2589) 2026-05-01 20:19:30 +08:00
GUARDRAILS.md fix: rename present_file to present_files in docs and prompts (#2393) 2026-04-21 16:10:14 +08:00
IM_CHANNEL_CONNECTIONS.md fix(channels): require bound identity for user-owned IM messages (#3578) 2026-06-16 23:04:39 +08:00
MCP_SERVER.md docs: discourage MCP filesystem workspace config (#3141) 2026-05-22 09:19:23 +08:00
MEMORY_IMPROVEMENTS_SUMMARY.md refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
MEMORY_IMPROVEMENTS.md feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429) (#3465) 2026-06-10 23:26:15 +08:00
MEMORY_SETTINGS_REVIEW.md feat: support manual add and edit for memory facts (#1538) 2026-03-29 23:53:23 +08:00
memory-settings-sample.json feat: support manual add and edit for memory facts (#1538) 2026-03-29 23:53:23 +08:00
middleware-execution-flow.md feat(loop-detection): defer warning injection (#2752) 2026-05-21 14:36:07 +08:00
PATH_EXAMPLES.md refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
plan_mode_usage.md refactor(lead-agent): make build_middlewares public to drop the last cross-module private import (#3458) 2026-06-09 11:56:28 +08:00
README.md chore: add sandbox memory profiling tools (#3249) 2026-06-03 22:02:27 +08:00
REPLAY_E2E.md fix(replay-e2e): key fixtures by caller and conversation (#3453) 2026-06-09 21:58:31 +08:00
rfc-create-deerflow-agent.md feat: add create_deerflow_agent SDK entry point (Phase 1) (#1203) 2026-03-29 15:31:18 +08:00
rfc-extract-shared-modules.md refactor: extract shared skill installer and upload manager to harness (#1202) 2026-03-25 16:28:33 +08:00
rfc-grep-glob-tools.md feat(sandbox): add built-in grep and glob tools (#1784) 2026-04-03 16:03:06 +08:00
SANDBOX_MEMORY_PROFILING.md chore: add sandbox memory profiling tools (#3249) 2026-06-03 22:02:27 +08:00
SETUP.md fix(harness): resolve runtime paths from project root (#2642) 2026-05-01 22:19:50 +08:00
STREAMING.md fix(backend): stream DeerFlowClient AI text as token deltas (#1969) (#1974) 2026-04-10 18:16:38 +08:00
summarization.md fix(middleware): avoid rescuing non-skill tool outputs during summarization (#2458) 2026-04-24 21:19:46 +08:00
task_tool_improvements.md refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
TITLE_GENERATION_IMPLEMENTATION.md feat(persistence):Unified persistence layer with event store, feedback, and rebase cleanup (#2134) 2026-04-26 11:09:55 +08:00
TODO.md docs: clean standalone LangGraph server remnants (#3301) 2026-05-29 11:36:45 +08:00

Documentation

This directory contains detailed documentation for the DeerFlow backend.

Document Description
ARCHITECTURE.md System architecture overview
API.md Complete API reference
AUTH_DESIGN.md User authentication, CSRF, and per-user isolation design
CONFIGURATION.md Configuration options
SETUP.md Quick setup guide

Feature Documentation

Document Description
STREAMING.md Token-level streaming design: Gateway vs DeerFlowClient paths, stream_mode semantics, per-id dedup
FILE_UPLOAD.md File upload functionality
PATH_EXAMPLES.md Path types and usage examples
SANDBOX_MEMORY_PROFILING.md Sandbox memory baseline and runtime comparison guide
summarization.md Context summarization feature
plan_mode_usage.md Plan mode with TodoList
AUTO_TITLE_GENERATION.md Automatic title generation

Development

Document Description
TODO.md Planned features and known issues

Getting Started

  1. New to DeerFlow? Start with SETUP.md for quick installation
  2. Configuring the system? See CONFIGURATION.md
  3. Understanding the architecture? Read ARCHITECTURE.md
  4. Building integrations? Check API.md for API reference

Document Organization

docs/
├── README.md                  # This file
├── ARCHITECTURE.md            # System architecture
├── API.md                     # API reference
├── AUTH_DESIGN.md             # User authentication and isolation design
├── CONFIGURATION.md           # Configuration guide
├── SETUP.md                   # Setup instructions
├── FILE_UPLOAD.md             # File upload feature
├── PATH_EXAMPLES.md           # Path usage examples
├── summarization.md           # Summarization feature
├── plan_mode_usage.md         # Plan mode feature
├── STREAMING.md               # Token-level streaming design
├── AUTO_TITLE_GENERATION.md   # Title generation
├── TITLE_GENERATION_IMPLEMENTATION.md  # Title implementation details
└── TODO.md                    # Roadmap and issues