Commit Graph

9 Commits

Author SHA1 Message Date
Willem Jiang 99a44f1350 feat(eval): add report quality evaluation module and UI integration (#776)
* feat(eval): add report quality evaluation module

Addresses issue #773 - How to evaluate generated report quality objectively.

This module provides two evaluation approaches:
1. Automated metrics (no LLM required):
   - Citation count and source diversity
   - Word count compliance per report style
   - Section structure validation
   - Image inclusion tracking

2. LLM-as-Judge evaluation:
   - Factual accuracy scoring
   - Completeness assessment
   - Coherence evaluation
   - Relevance and citation quality checks

The combined evaluator provides a final score (1-10) and letter grade (A+ to F).

Files added:
- src/eval/__init__.py
- src/eval/metrics.py
- src/eval/llm_judge.py
- src/eval/evaluator.py
- tests/unit/eval/test_metrics.py
- tests/unit/eval/test_evaluator.py

* feat(eval): integrate report evaluation with web UI

This commit adds the web UI integration for the evaluation module:

Backend:
- Add EvaluateReportRequest/Response models in src/server/eval_request.py
- Add /api/report/evaluate endpoint to src/server/app.py

Frontend:
- Add evaluateReport API function in web/src/core/api/evaluate.ts
- Create EvaluationDialog component with grade badge, metrics display,
  and optional LLM deep evaluation
- Add evaluation button (graduation cap icon) to research-block.tsx toolbar
- Add i18n translations for English and Chinese

The evaluation UI allows users to:
1. View quick metrics-only evaluation (instant)
2. Optionally run deep LLM-based evaluation for detailed analysis
3. See grade (A+ to F), score (1-10), and metric breakdown

* feat(eval): improve evaluation reliability and add LLM judge tests

- Extract MAX_REPORT_LENGTH constant in llm_judge.py for maintainability
- Add comprehensive unit tests for LLMJudge class (parse_response,
  calculate_weighted_score, evaluate with mocked LLM)
- Pass reportStyle prop to EvaluationDialog for accurate evaluation criteria
- Add researchQueries store map to reliably associate queries with research
- Add getResearchQuery helper to retrieve query by researchId
- Remove unused imports in test_metrics.py

* fix(eval): use resolveServiceURL for evaluate API endpoint

The evaluateReport function was using a relative URL '/api/report/evaluate'
which sent requests to the Next.js server instead of the FastAPI backend.
Changed to use resolveServiceURL() consistent with other API functions.

* fix: improve type accuracy and React hooks in evaluation components

- Fix get_word_count_target return type from Optional[Dict] to Dict since it always returns a value via default fallback
- Fix useEffect dependency issue in EvaluationDialog using useRef to prevent unwanted re-evaluations
- Add aria-label to GradeBadge for screen reader accessibility
2025-12-25 21:55:48 +08:00
Jiahe Wu d1ce339090 feat(web): add multi-format report export (Markdown, HTML, PDF, Word,… (#756)
* feat(web): add multi-format report export (Markdown, HTML, PDF, Word, Image)

* fix: correct import order for docx (lint error)

* fix(web): address Copilot review comments for multi-format export

- Add i18n support for dropdown menu items (en/zh)

- Add DOMPurify for HTML sanitization (XSS protection)

- Fix async handling for canvas.toBlob with Promise wrapper

- Add toast notifications for export errors

- Fix Tooltip + DropdownMenuTrigger nesting (accessibility)

- Ensure container cleanup in finally block

* fix(web): enhance markdown parsing for PDF and Word export

- Add list support (bullet and numbered) for PDF export
- Add parseInlineMarkdown helper for Word export to handle bold, italic, code, links
- Add list support for Word export (bullet and numbered)
- Address Copilot review comments from PR #756

* fix(web): address PR review feedback for multi-format export

- Extract PDF formatting magic numbers into PDF_CONSTANTS

- Add Tooltip wrapper for download dropdown button

- Reduce triggerDownload cleanup timeout from 1000ms to 100ms

- Use marked.Lexer.lexInline for robust markdown parsing

- Add console.warn for image export cleanup errors

- Add numbering config for Word document ordered lists

- Fix CSS class typo: px-5pb-20 -> px-5 pb-20

- Remove unreachable dead code in parseInlineMarkdown

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2025-12-16 09:06:24 +08:00
Willem Jiang 452ecd77b9 fix: prevent DOM error when removing temporary download link (#675) (#676)
Add defensive checks before removeChild to prevent 'Failed to execute removeChild' error when the element has already been removed from DOM. Wrap URL.revokeObjectURL in finally block to ensure proper resource cleanup.
2025-10-31 22:30:34 +08:00
johnny0120 5e1b2dfb22 feat: add i18n support and add Chinese (#372)
* feat: add i18n support and add Chinese

* fix: resolve conflicts

* Update en.json with cancle settings

* Update zh.json with settngs cancle

---------

Co-authored-by: johnny0120 <15564476+johnny0120@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
2025-07-12 15:18:28 +08:00
Muharrem Okutan 446901ec0b feat: added report download button (#78) 2025-06-11 09:50:48 +08:00
JeffJiang c99513d604 fix: report editor styles (#163)
* fix: report editor styles
2025-05-15 15:18:01 +08:00
JeffJiang 0f9ac7cd23 Check the output links are hallucinations from AI (#139)
* feat: check output links if a hallucination from AI
2025-05-15 10:39:53 +08:00
Henry Li 8109de3b73 fix: allow the first activity to be reporting (#8) 2025-05-09 10:32:49 +08:00
Li Xin fdfc607747 refactor: extract `components` folder 2025-05-02 10:43:14 +08:00