deerflow2/backend/packages/harness/deerflow/agents/middlewares
SHIYAO ZHANG 87f41d3ae8 feat(uploads): inject document outline into agent context for converted files (#1738)
* feat(uploads): inject document outline into agent context for converted files

Extract headings from converted .md files and inject them into the
<uploaded_files> context block so the agent can navigate large documents
by line number before reading.

- Add `extract_outline()` to `file_conversion.py`: recognises standard
  Markdown headings (#/##/###) and SEC-style bold structural headings
  (**ITEM N. BUSINESS**, **PART II**); caps at 50 entries; excludes
  cover-page boilerplate (WASHINGTON DC, CURRENT REPORT, SIGNATURES)
- Add `_extract_outline_for_file()` helper in `uploads_middleware.py`:
  looks for a sibling `.md` file produced by the conversion pipeline
- Update `UploadsMiddleware._create_files_message()` to render the outline
  under each file entry with `L{line}: {title}` format and a `read_file`
  prompt for range-based reading
- Tests: 10 new tests for `extract_outline()`, 4 new tests for outline
  injection in `UploadsMiddleware`; existing test updated for new `outline`
  field in `uploaded_files` state

Partially addresses #1647 (agent ignores uploaded files).

* fix(uploads): stream outline file reads and strip inline bold from heading titles

- Switch extract_outline() from read_text().splitlines() to open()+line iteration
  so large converted documents are not loaded into memory on every agent turn;
  exits as soon as MAX_OUTLINE_ENTRIES is reached (Copilot suggestion)
- Strip **...** wrapper from standard Markdown heading titles before appending
  to outline so agent context stays clean (e.g. "## **Overview**" → "Overview")
  (Copilot suggestion)
- Remove unused pathlib.Path import and fix import sort order in test_file_conversion.py
  to satisfy ruff CI lint

* fix(uploads): show truncation hint when outline exceeds MAX_OUTLINE_ENTRIES

When extract_outline() hits the cap it now appends a sentinel entry
{"truncated": True} instead of silently dropping the rest of the headings.
UploadsMiddleware reads the sentinel and renders a hint line:

  ... (showing first 50 headings; use `read_file` to explore further)

Without this the agent had no way to know the outline was incomplete and
would treat the first 50 headings as the full document structure.

* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id

runtime.context does not always carry thread_id (depends on LangGraph
invocation path). ThreadDataMiddleware already falls back to
get_config().configurable.thread_id — apply the same pattern so
UploadsMiddleware can resolve the uploads directory and attach outlines
in all invocation paths.

* style: apply ruff format

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-03 20:52:47 +08:00
..
__init__.py feat: add create_deerflow_agent SDK entry point (Phase 1) (#1203) 2026-03-29 15:31:18 +08:00
clarification_middleware.py fix: replace print() with logging across harness package (#1282) 2026-03-27 23:15:35 +08:00
dangling_tool_call_middleware.py refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
deferred_tool_filter_middleware.py feat(tools): add tool_search for deferred MCP tool loading (#1176) 2026-03-17 20:43:55 +08:00
llm_error_handling_middleware.py Fix/1681 llm call retry handling (#1683) 2026-04-02 10:12:17 +08:00
loop_detection_middleware.py feat(harness): integration ACP agent tool (#1344) 2026-03-26 14:20:18 +08:00
memory_middleware.py feat(memory): structured reflection + correction detection in MemoryMiddleware (#1620) (#1668) 2026-04-01 16:45:29 +08:00
sandbox_audit_middleware.py feat(sandbox): add SandboxAuditMiddleware for bash command security auditing (#1532) 2026-03-30 07:48:31 +08:00
subagent_limit_middleware.py refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
thread_data_middleware.py fix: replace print() with logging across harness package (#1282) 2026-03-27 23:15:35 +08:00
title_middleware.py fix: add sync after_model to TitleMiddleware (#1190) 2026-03-19 15:46:31 +08:00
todo_middleware.py refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
token_usage_middleware.py feat: add configurable log level and token usage tracking (#1301) 2026-03-25 08:13:26 +08:00
tool_error_handling_middleware.py fix: enable DanglingToolCallMiddleware for subagents (#1766) 2026-04-02 18:56:18 +08:00
uploads_middleware.py feat(uploads): inject document outline into agent context for converted files (#1738) 2026-04-03 20:52:47 +08:00
view_image_middleware.py fix: replace print() with logging across harness package (#1282) 2026-03-27 23:15:35 +08:00