deerflow2/.planning/phases/07-phase-06-mention-upload/07-RESEARCH.md

288 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 07: Phase 06 验收后补丁归档mention/upload语义与附件预览复用- Research
**Researched:** 2026-04-15
**Domain:** 前后端 mention/upload 语义收敛、附件预览组件复用、memory 清理与验证归档
**Confidence:** HIGH
## User Constraints (from CONTEXT.md)
`07-phase-06-mention-upload` 目录下不存在 `*-CONTEXT.md`,因此无可逐字拷贝的 Locked Decisions/Discretion/Deferred。 [VERIFIED: codebase grep `.planning/phases/07-phase-06-mention-upload/*-CONTEXT.md`]
基于本次 objective 的硬约束如下:将 Phase 06 已验收绕行改动正式纳入 Phase 07范围必须覆盖 mention/upload 语义统一、附件预览复用、memory 清理、可验证提交路径。 [VERIFIED: user objective]
## Summary
Phase 06 的代码层关键补丁已经在仓库内落地:前端通过 `additional_kwargs.files` 单一 envelope 发送 uploads + mentions后端 `UploadsMiddleware` 已区分 `ref_kind=mention` 并单独注入 `<mentioned_files>`,且 `new_files` 不再错误吸收 mention。 [VERIFIED: codebase grep `frontend/src/core/threads/hooks.ts`, `frontend/src/core/threads/submit-files.ts`, `backend/.../uploads_middleware.py`]
memory 侧也已有清理链路:`MemoryMiddleware` 在入队前剥离 `<uploaded_files>/<mentioned_files>``MemoryUpdater` 在落盘前清除上传事件句子与 facts对应回归测试存在且本地通过。 [VERIFIED: codebase grep `backend/.../memory_middleware.py`, `backend/.../memory/updater.py`, `backend/tests/test_memory_upload_filtering.py`; VERIFIED: test run `uv run pytest -q tests/test_memory_upload_filtering.py`]
Phase 07 的核心不是“再造新功能”,而是“归档与验证闭环”:统一术语契约、固定附件预览复用边界、补齐 E2E 选择器漂移、同步 UAT/Validation/Requirements 文档状态,形成可审计提交路径。 [VERIFIED: codebase grep `.planning/phases/06-/06-VERIFICATION.md`, `.planning/phases/06-/06-UAT.md`, `.planning/REQUIREMENTS.md`; VERIFIED: test run `pnpm -s test:e2e --grep "DF-INPUT-007|DF-INPUT-008|DF-INPUT-009"`]
**Primary recommendation:** Phase 07 按 `docs/contract-fix -> test-fix -> re-verify -> archive` 四段执行,禁止再扩展功能面。 [VERIFIED: repo state + phase goal]
## Project Constraints (from CLAUDE.md)
项目根目录不存在 `CLAUDE.md`,无额外项目级强制约束。 [VERIFIED: filesystem check `test -f CLAUDE.md`]
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| `@radix-ui/react-dropdown-menu` | repo: `^2.1.16`; npm latest: `2.1.16` (2025-08-13) | mention 候选面板(键盘/焦点/定位) | 已在输入框实现且与现有 shadcn 体系一致,避免自定义浮层分叉。 [VERIFIED: codebase grep `frontend/src/components/workspace/input-box.tsx`; VERIFIED: npm registry `npm view @radix-ui/react-dropdown-menu version time`] |
| `sonner` | repo: `^2.0.7`; npm latest: `2.0.7` (2025-08-02) | stale/上限提示 | 现有错误提示已基于 toast 语义,便于保持软失败行为一致。 [VERIFIED: codebase grep `toast.error` in `hooks.ts`/`input-box.tsx`; VERIFIED: npm registry `npm view sonner version time`] |
| `PromptInputAttachment`(内部组件) | repo internal | 输入区附件/引用缩略预览 | 当前 reference 预览已复用该组件,是 Phase 07 应固化的复用基线。 [VERIFIED: codebase grep `frontend/src/components/workspace/input-box.tsx`, `frontend/src/components/ai-elements/prompt-input.tsx`] |
| `UploadsMiddleware` + `MemoryMiddleware`(内部中间件) | repo internal | upload/mention 注入与 memory 入队清理 | 语义分层已形成:`uploaded_files` 与 `mentioned_files` 分离memory 过滤双重防线。 [VERIFIED: codebase grep `backend/.../uploads_middleware.py`, `backend/.../memory_middleware.py`, `backend/.../memory/updater.py`] |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| `@playwright/test` | repo: `^1.48.0`; CLI: `1.48.0` | 前端 @引用 回归 | 验证 DF-INPUT-007/008/009 与 testid 合同一致性。 [VERIFIED: `frontend/package.json`; VERIFIED: command `pnpm exec playwright --version`] |
| `pytest` via `uv run` | backend dev: `pytest>=8.0.0` | 后端 middleware/memory 回归 | 本机无全局 `pytest` 时使用 `uv run pytest`。 [VERIFIED: `backend/pyproject.toml`; VERIFIED: env check `command -v pytest`; VERIFIED: test run] |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| `DropdownMenu` | 自定义绝对定位浮层 | 自定义层更易与焦点管理/E2E 选择器漂移。 [VERIFIED: historical phase docs + current selector mismatch] |
| `PromptInputAttachment` 复用 | 新建 mention-only 预览组件 | 会重复实现删除/图片缩略行为,增加 UI 行为分叉。 [VERIFIED: code comparison in `input-box.tsx` + `prompt-input.tsx`] |
**Installation:**
```bash
cd frontend && pnpm install
cd backend && uv sync
```
## Architecture Patterns
### Recommended Project Structure
```text
frontend/src/components/workspace/input-box.tsx # mention candidate + 引用预览
frontend/src/core/threads/submit-files.ts # files envelope 归一化
frontend/src/core/threads/hooks.ts # 发送链路 + stale 软失败
backend/packages/harness/.../uploads_middleware.py # uploaded/mentioned 语义拆分
backend/packages/harness/.../memory_middleware.py # 入队前剥离标签
backend/packages/harness/.../memory/updater.py # 落盘前清理上传事件
backend/tests/test_uploads_middleware_core_logic.py # mention/upload 后端回归
backend/tests/test_memory_upload_filtering.py # memory 清理回归
frontend/tests/e2e/input-and-compose.spec.ts # DF-INPUT-007/008/009
```
[VERIFIED: codebase grep]
### Pattern 1: 单一提交 Envelope + 语义位区分
**What:** 统一走 `additional_kwargs.files`,通过 `ref_kind/ref_source` 区分 mention 与 upload。 [VERIFIED: `submit-files.ts`, `hooks.ts`, `uploads_middleware.py`]
**When to use:** 所有消息级文件上下文(上传/引用)都应遵循。 [VERIFIED: current implementation]
**Example:**
```typescript
// Source: frontend/src/core/threads/submit-files.ts
referenceFiles.push({
filename: reference.filename,
size: reference.size ?? 0,
path: reference.path,
status: "uploaded",
ref_kind: "mention",
ref_source: reference.ref_source,
});
```
### Pattern 2: 输入区预览复用 `PromptInputAttachment`
**What:** 引用预览与上传附件预览统一使用同一渲染组件。 [VERIFIED: `input-box.tsx` + `prompt-input.tsx`]
**When to use:** 输入区顶部预览条(包含图片缩略图和删除动作)。 [VERIFIED: current UI structure]
**Example:**
```tsx
// Source: frontend/src/components/workspace/input-box.tsx
<PromptInputAttachment
data={{ type: "file", id: `reference:${reference.ref_source}:${reference.path ?? reference.filename}`, filename, mediaType, url }}
onRemove={() => onRemoveReference(reference)}
/>
```
### Pattern 3: 双层 memory 清理
**What:** 入队前去标签 + 落盘前清句子/事实。 [VERIFIED: `memory_middleware.py`, `updater.py`]
**When to use:** 任何会把会话瞬时文件路径写入上下文的中间件链路。 [VERIFIED: existing middleware design]
**Example:**
```python
# Source: backend/packages/harness/deerflow/agents/middlewares/memory_middleware.py
stripped = _UPLOAD_BLOCK_RE.sub("", content_str).strip()
```
### Anti-Patterns to Avoid
- **再开并行字段(如 `mentions`:** 会破坏既有 `additional_kwargs.files` 消费链。 [VERIFIED: `hooks.ts`, `message-list-item.tsx`]
- **mention 进入 `new_files`:** 会把引用误判为本次上传,污染 `<uploaded_files>`。 [VERIFIED: `uploads_middleware.py` tests]
- **E2E 依赖不存在 testid:** `reference-chip-remove` 当前无实现,导致回归假红。 [VERIFIED: grep `reference-chip-remove` only in test files]
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| mention 候选浮层 | 自定义定位/焦点层 | `DropdownMenu*` 组件族 | 避免键盘焦点与收起时机出现分叉。 [VERIFIED: `input-box.tsx`] |
| 引用缩略预览 | 新写一套 chip/thumbnail | `PromptInputAttachment` | 已含图片/文件两类渲染与 remove 交互。 [VERIFIED: `prompt-input.tsx`] |
| memory 上传清理 | 单点字符串替换 | `memory_middleware` + `updater` 双层过滤 | 一层漏掉仍可在另一层兜底。 [VERIFIED: code + `test_memory_upload_filtering.py`] |
**Key insight:** Phase 07 的价值在“收口”,不是“扩面”。任何新造轮子都会重新引入 Phase 06 已解决的不一致。 [VERIFIED: phase artifacts + current code]
## Common Pitfalls
### Pitfall 1: 测试选择器漂移导致误判回归
**What goes wrong:** E2E 断言 `reference-chip-remove` 失败,但功能未必失效。 [VERIFIED: test run output]
**Why it happens:** 预览组件复用后删除按钮 testid 未对齐旧用例。 [VERIFIED: grep results]
**How to avoid:** 在复用组件上补稳定选择器,或更新用例改查 aria-label。 [ASSUMED]
**Warning signs:** `DF-INPUT-007` 单点失败且 `reference-chip` 仍可见。 [VERIFIED: test run output]
### Pitfall 2: mention/upload 语义回退
**What goes wrong:** mention 被算成 `uploaded_files`。 [VERIFIED: historical issue + tests]
**Why it happens:** `_files_from_kwargs` 未过滤 `ref_kind=mention`。 [VERIFIED: `uploads_middleware.py`]
**How to avoid:** 保持过滤并用 mixed-list 测试守护。 [VERIFIED: `test_uploads_middleware_core_logic.py`]
**Warning signs:** `<uploaded_files>` 出现 source=mention 的条目。 [VERIFIED: middleware behavior]
### Pitfall 3: 会话瞬时文件路径被写入长期 memory
**What goes wrong:** 后续会话反复检索不存在的旧路径。 [VERIFIED: `updater.py` docstring/comments]
**Why it happens:** 上传标签/句子未在 memory pipeline 剥离。 [VERIFIED: `memory_middleware.py`, `updater.py`]
**How to avoid:** 保留双层清理并跑 `test_memory_upload_filtering.py`。 [VERIFIED: test pass]
**Warning signs:** memory facts 出现 `/mnt/user-data/uploads/`。 [VERIFIED: regex intent]
## Code Examples
### mention 与 upload 分流(后端)
```python
# Source: backend/packages/harness/deerflow/agents/middlewares/uploads_middleware.py
if f.get("ref_kind") == "mention":
continue
```
### 构建单一 files envelope前端
```typescript
// Source: frontend/src/core/threads/hooks.ts
const { files: filesForSubmit, staleCount } = buildFilesForSubmit(
uploadedFileInfo,
normalizedReferences,
);
```
### memory 标签剥离(中间件)
```python
# Source: backend/packages/harness/deerflow/agents/middlewares/memory_middleware.py
_UPLOAD_BLOCK_RE = re.compile(
r"<(?:uploaded_files|mentioned_files)>[\\s\\S]*?</(?:uploaded_files|mentioned_files)>\\n*",
re.IGNORECASE,
)
```
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| mention 与 upload 同池处理 | `ref_kind/ref_source` 明确区分并分块注入 | Phase 06 后段2026-04-15 | 消除“引用被当上传”副作用。 [VERIFIED: git log + middleware code] |
| memory 仅靠提示词约束不记上传 | middleware + updater 双层代码过滤 | 已在当前工作树 | 减少长期 memory 污染。 [VERIFIED: `memory_middleware.py`, `updater.py`, tests] |
**Deprecated/outdated:**
- 仅依赖文档状态判断 Phase 06 完成度(未同步会误判)。 [VERIFIED: `06-VERIFICATION.md` vs `06-UAT.md`/`REQUIREMENTS.md` 状态差异]
## Assumptions Log
| # | Claim | Section | Risk if Wrong |
|---|-------|---------|---------------|
| A1 | 通过补 `data-testid` 或改为 aria 断言即可稳定 DF-INPUT-007 | Common Pitfalls | 可能需要更深层 UI 结构调整。 |
## Open Questions
1. **Phase 07 是否要“改代码”还是“仅归档文档+测试修正”?**
- What we know: 语义与 memory 主链路代码已到位。 [VERIFIED: code + tests]
- What's unclear: 你是否接受只修测试契约与文档闭环,不再动功能实现。
- Recommendation: 先锁定“最小变更原则”,避免 Phase 07 再引入行为漂移。 [ASSUMED]
2. **E2E 断言口径是否改为可访问性语义?**
- What we know: `reference-chip-remove` testid 当前缺失。 [VERIFIED: grep + test output]
- What's unclear: 团队更偏好稳定 testid 还是 aria 文案断言。
- Recommendation: 若追求跨重构稳定,优先 aria若追求低改动补 testid。 [ASSUMED]
## Environment Availability
| Dependency | Required By | Available | Version | Fallback |
|------------|------------|-----------|---------|----------|
| Node.js | frontend tests/tooling | ✓ | v24.14.0 | — |
| pnpm | frontend scripts | ✓ | 10.32.1 | `npm`不推荐lockfile 不一致) |
| Playwright CLI | DF-INPUT E2E | ✓ | 1.48.0 | — |
| Python | backend tests | ✓ | 3.12.3 | — |
| uv | backend test runner | ✓ | 0.10.10 | — |
| pytest (global) | backend tests | ✗ | — | `uv run pytest` |
[VERIFIED: local command checks]
**Missing dependencies with no fallback:**
- None. [VERIFIED: local checks]
**Missing dependencies with fallback:**
- 全局 `pytest` 缺失;使用 `uv run pytest`。 [VERIFIED: local checks + successful runs]
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | Node test runner + Playwright + pytest (via uv) |
| Config file | `frontend/playwright.config.ts`, `backend/pyproject.toml` |
| Quick run command | `cd frontend && node --test src/core/threads/hooks.test.ts` |
| Full suite command | `cd backend && uv run pytest -q tests/test_uploads_middleware_core_logic.py tests/test_memory_upload_filtering.py && cd ../frontend && pnpm -s test:e2e --grep "DF-INPUT-007|DF-INPUT-008|DF-INPUT-009"` |
[VERIFIED: codebase files + executed commands]
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| P7-SEM-01 | mention 不计入 new upload | unit | `cd backend && uv run pytest -q tests/test_uploads_middleware_core_logic.py -k "mention or files_from_kwargs"` | ✅ |
| P7-MEM-01 | memory 不保留上传事件 | unit | `cd backend && uv run pytest -q tests/test_memory_upload_filtering.py` | ✅ |
| P7-UI-01 | @候选/引用 chip 交互稳定 | e2e | `cd frontend && pnpm -s test:e2e --grep "DF-INPUT-007|DF-INPUT-008|DF-INPUT-009"` | ✅(当前有失败) |
| P7-DOC-01 | 验收状态文档闭环 | docs check | `rg -n "ATREF-01|ATREF-02|ATREF-03|ATREF-04|status:" .planning/REQUIREMENTS.md .planning/phases/06-/06-UAT.md .planning/phases/06-/06-VALIDATION.md` | ✅ |
### Sampling Rate
- **Per task commit:** 对应最小命令(前端 unit 或后端 targeted pytest。 [VERIFIED: commit guide + current tests]
- **Per wave merge:** 跑后端双测 + 前端三条 E2E。 [VERIFIED: current phase scope]
- **Phase gate:** 三类测试全绿且文档状态同步后再进入 verify-work。 [VERIFIED: verification gaps]
### Wave 0 Gaps
- [ ] `frontend/tests/e2e/input-and-compose.spec.ts` 与组件选择器合同未对齐(`reference-chip-remove`)。 [VERIFIED: test failure + grep]
- [ ] `.planning/phases/06-/06-UAT.md` 状态未回写到最新结果。 [VERIFIED: file content]
- [ ] `.planning/REQUIREMENTS.md``ATREF-01..04` 仍 Pending。 [VERIFIED: file content]
## Security Domain
### Applicable ASVS Categories
| ASVS Category | Applies | Standard Control |
|---------------|---------|-----------------|
| V2 Authentication | no | 本 phase 不新增 auth 面。 [VERIFIED: scope] |
| V3 Session Management | no | 不改会话机制。 [VERIFIED: scope] |
| V4 Access Control | yes | mention 候选限定当前 thread 数据源。 [VERIFIED: `input-box.tsx` + phase docs] |
| V5 Input Validation | yes | 后端 `_files_from_kwargs` 校验 filename/path。 [VERIFIED: `uploads_middleware.py`] |
| V6 Cryptography | no | 无加密实现变更。 [VERIFIED: scope] |
### Known Threat Patterns for this phase stack
| Pattern | STRIDE | Standard Mitigation |
|---------|--------|---------------------|
| 跨线程文件引用泄露 | Information Disclosure | 候选仅取当前 thread artifacts/uploads。 [VERIFIED: `input-box.tsx`] |
| 伪造 `additional_kwargs.files` 注入 | Tampering | 后端校验 basename 与 `/mnt/user-data/` 前缀。 [VERIFIED: `uploads_middleware.py`] |
| memory 泄露临时路径 | Information Disclosure | middleware + updater 双层过滤上传标签与句子。 [VERIFIED: memory code + tests] |
## Sources
### Primary (HIGH confidence)
- 本仓库代码:`frontend/src/components/workspace/input-box.tsx`、`frontend/src/components/ai-elements/prompt-input.tsx`、`frontend/src/core/threads/hooks.ts`、`frontend/src/core/threads/submit-files.ts`。 [VERIFIED: codebase grep]
- 本仓库代码:`backend/packages/harness/deerflow/agents/middlewares/uploads_middleware.py`、`memory_middleware.py`、`memory/updater.py`。 [VERIFIED: codebase grep]
- 本地执行结果:`node --test`, `uv run pytest`, `pnpm test:e2e --grep ...`。 [VERIFIED: command output]
- npm registry`@radix-ui/react-dropdown-menu`、`sonner` 版本与发布时间。 [VERIFIED: npm view]
### Secondary (MEDIUM confidence)
- `.planning/phases/06-/06-VERIFICATION.md`、`06-UAT.md`、`06-VALIDATION.md`、`.planning/REQUIREMENTS.md` 的状态交叉对比。 [VERIFIED: local docs]
### Tertiary (LOW confidence)
- None.
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH - 基于当前仓库依赖与 npm registry 实查。
- Architecture: HIGH - 关键链路均有代码与测试证据。
- Pitfalls: MEDIUM - 一部分为当前失败现象,一部分为经验性防回退建议。
**Research date:** 2026-04-15
**Valid until:** 2026-05-1530 天)