deerflow2/backend/packages/harness/deerflow
zengxi 330a2ff8c5
feat(community): add SearXNG and Browserless web search/fetch tools (#3451)
* feat(community): add SearXNG and Browserless web search/fetch tools

- SearXNG web_search: privacy-focused meta search engine integration
  with configurable base_url via config.yaml tool settings
- Browserless web_fetch: headless browser page fetching with
  readability article extraction
- Both tools are fully configurable through tool config section
- No external API keys required for basic operation

* fix: address PR review feedback and add unit tests

- Guard config.model_extra against None values (review #1, #2)
- Coerce max_results to int when reading from config (review #2)
- Fix web_fetch_tool to use direct HTTP fetch instead of reusing
  the web_search client config (review #3)
- Fix misleading docstring for SearxngClient.fetch (review #4)
- Remove unused target_url variable to pass Ruff lint (review #5)
- Normalize bool config values with _normalize_bool helper to
  handle env-resolved string values correctly (review #6)
- Add unit tests for both SearXNG and Browserless client classes
  and their tool functions with mocked httpx (review #7, #8)

* fix: convert to async httpx to avoid blocking I/O on event loop

- Replace httpx.Client with httpx.AsyncClient in both client classes
- Convert tool functions to async def
- Wrap readability_extractor calls in asyncio.to_thread()
- Update all tests to use pytest.mark.asyncio and async mocks
- Fix import sorting to pass Ruff lint

* fix(browserless): replace deprecated waitUntil with waitForEvent

The Browserless API has deprecated the waitUntil parameter.
Replace with waitForEvent which accepts values like 'networkidle'.
Default is empty (no wait), configurable via config.yaml.

* fix(browserless): remove deprecated gotoTimeout and bestAttempt params

The Browserless /content API does not accept gotoTimeout or bestAttempt
as top-level payload keys. These were being sent unconditionally,
causing 400 Bad Request errors on current Browserless versions.

Changes:
- Remove goto_timeout_ms parameter and 'gotoTimeout' from payload
- Remove best_attempt parameter and 'bestAttempt' from payload
- Remove _normalize_bool helper (no longer needed)
- Remove goto_timeout_ms and best_attempt config reading in tools.py
- Add tests for waitForSelector and reject params
- Verify no deprecated params are sent in test_fetch_html_success

* refactor(searxng): remove web_fetch_tool, decouple from web_search config

SearXNG is a search engine — it should only provide web_search_tool.
The web_fetch responsibility belongs to Browserless (headless Chrome)
or Jina AI, not SearXNG.

Changes:
- Remove web_fetch_tool from SearXNG tools.py and __init__.py
- Remove SearxngClient.fetch() method (no longer needed)
- Remove unused asyncio/readability imports from SearXNG tools.py
- Add test for max_results string-to-int coercion from config
- Add test for search with categories parameter
- Add test for httpx.RequestError handling
- Apply ruff format fixes to browserless_client.py and test files
2026-06-12 09:45:26 +08:00
..
agents feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429) (#3465) 2026-06-10 23:26:15 +08:00
community feat(community): add SearXNG and Browserless web search/fetch tools (#3451) 2026-06-12 09:45:26 +08:00
config fix(agents): require config.yaml in resolve_agent_dir to skip memory-only directories (#3390) (#3481) 2026-06-10 23:57:17 +08:00
guardrails feat(guardrails): add pre-tool-call authorization middleware with pluggable providers (#1240) 2026-03-23 18:07:33 +08:00
mcp fix(mcp): close stdio sessions on their owning loop to avoid cross-task cancel-scope error (#3379) (#3392) 2026-06-07 21:37:30 +08:00
models feat(models): add StepFun reasoning model adapter (#3461) 2026-06-09 18:01:43 +08:00
persistence fix: harden run finalization persistence (#3155) 2026-05-23 00:09:06 +08:00
reflection refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
runtime fix runtime journal run lifecycle events (#3470) 2026-06-10 08:33:29 +08:00
sandbox fix(sandbox): persist lazily-acquired sandbox state via Command (#3464) 2026-06-11 17:50:36 +08:00
skills fix(skills): harden slash skill activation across chat channels (#3466) 2026-06-09 23:07:17 +08:00
subagents feat(subagents): extend deferred MCP tool loading to subagents (#3432) 2026-06-08 23:17:22 +08:00
tools feat(subagents): extend deferred MCP tool loading to subagents (#3432) 2026-06-08 23:17:22 +08:00
tracing fix(tracing): propagate session_id and user_id into Langfuse traces (#2944) 2026-05-21 16:49:31 +08:00
uploads fix upload file size contract (#3408) 2026-06-06 15:12:17 +08:00
utils fix(skills): harden slash skill activation across chat channels (#3466) 2026-06-09 23:07:17 +08:00
__init__.py refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
client.py feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429) (#3465) 2026-06-10 23:26:15 +08:00