deerflow2/backend/packages/harness/deerflow/agents/middlewares
SHIYAO ZHANG 2e635634e2 fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline (#1838)
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents

Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.

This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this):  agent guided to use outline + grep + read_file workflow

* feat(uploads): add file-first priority and fallback guidance to uploaded_files context

* fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline

- Add _clean_bold_title() to merge adjacent bold spans (** **) produced
  by pymupdf4llm when bold text crosses span boundaries
- Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise **<num>** **<title>**
  headings common in academic papers; excludes pure-number table headers
  and rows with more than 4 bold blocks
- When outline is empty, read first 5 non-empty lines of the .md as a
  content preview and surface a grep hint in the agent context
- Update _format_file_entry to render the preview + grep hint instead of
  silently omitting the outline section
- Add 3 new extract_outline tests and 2 new middleware tests (65 total)

* fix(uploads): address Copilot review comments on extract_outline regex

- Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII
  titles (e.g. **1** **概述**); pure-numeric/punctuation blocks still excluded
- Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to
  keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input
- Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex)
- Log debug message with exc_info when preview extraction fails
2026-04-04 14:25:08 +08:00
..
__init__.py feat: add create_deerflow_agent SDK entry point (Phase 1) (#1203) 2026-03-29 15:31:18 +08:00
clarification_middleware.py fix: replace print() with logging across harness package (#1282) 2026-03-27 23:15:35 +08:00
dangling_tool_call_middleware.py refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
deferred_tool_filter_middleware.py feat(tools): add tool_search for deferred MCP tool loading (#1176) 2026-03-17 20:43:55 +08:00
llm_error_handling_middleware.py Fix/1681 llm call retry handling (#1683) 2026-04-02 10:12:17 +08:00
loop_detection_middleware.py fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware (#1823) 2026-04-04 10:38:22 +08:00
memory_middleware.py feat(memory): structured reflection + correction detection in MemoryMiddleware (#1620) (#1668) 2026-04-01 16:45:29 +08:00
sandbox_audit_middleware.py feat(sandbox): add SandboxAuditMiddleware for bash command security auditing (#1532) 2026-03-30 07:48:31 +08:00
subagent_limit_middleware.py refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
thread_data_middleware.py fix: replace print() with logging across harness package (#1282) 2026-03-27 23:15:35 +08:00
title_middleware.py fix: add sync after_model to TitleMiddleware (#1190) 2026-03-19 15:46:31 +08:00
todo_middleware.py refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
token_usage_middleware.py feat: add configurable log level and token usage tracking (#1301) 2026-03-25 08:13:26 +08:00
tool_error_handling_middleware.py fix: enable DanglingToolCallMiddleware for subagents (#1766) 2026-04-02 18:56:18 +08:00
uploads_middleware.py fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline (#1838) 2026-04-04 14:25:08 +08:00
view_image_middleware.py fix: replace print() with logging across harness package (#1282) 2026-03-27 23:15:35 +08:00