deerflow2/skills/public
KKK 654354c624
test(skills): add evaluation + trigger analysis for systematic-literature-review (#2061)
* test(skills): add trigger eval set for systematic-literature-review skill

20 eval queries (10 should-trigger, 10 should-not-trigger) for use with
skill-creator's run_eval.py. Includes real-world SLR queries contributed
by @VANDRANKI (issue #1862 author) and edge cases for routing
disambiguation with academic-paper-review.

* test(skills): add grader expectations for SLR skill evaluation

5 eval cases with 39 expectations covering:
- Standard SLR flow (APA/BibTeX/IEEE format selection)
- Keyword extraction and search behavior
- Subagent dispatch for metadata extraction
- Report structure (themes, convergences, gaps, per-paper annotations)
- Negative case: single-paper routing to academic-paper-review
- Edge case: implicit SLR without explicit keywords

* refactor(skills): shorten SLR description for better trigger rate

Reduce description from 833 to 344 chars. Key changes:
- Lead with "systematic literature review" as primary trigger phrase
- Strengthen single-paper exclusion: "Not for single-paper tasks"
- Remove verbose example patterns that didn't improve routing

Tested with run_eval.py (10 runs/query):
- False positive "best paper on RL": 67% → 20% (improved)
- True positive explicit SLR query: ~30% (unchanged)

Low recall is a routing-layer limitation, not a description issue —
see PR description for full analysis.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-10 18:02:45 +08:00
..
academic-paper-review feat(skills): add academic-paper-review, code-documentation, and newsletter-generation skills (#1861) 2026-04-05 10:19:35 +08:00
bootstrap feat(agent):Supports custom agent and chat experience with refactoring (#957) 2026-03-03 21:32:01 +08:00
chart-visualization feat(agent):Supports custom agent and chat experience with refactoring (#957) 2026-03-03 21:32:01 +08:00
claude-to-deerflow feat: add claude-to-deerflow skill for DeerFlow API integration (#1024) 2026-03-08 22:06:24 +08:00
code-documentation feat(skills): add academic-paper-review, code-documentation, and newsletter-generation skills (#1861) 2026-04-05 10:19:35 +08:00
consulting-analysis fix(skill): enhance data authenticity protocols and clarify reporting guidelines (#905) 2026-02-25 22:25:23 +08:00
data-analysis fix: use subprocess instead of os.system in analyze.py (#1289) 2026-03-24 20:42:03 +08:00
deep-research feat(agent):Supports custom agent and chat experience with refactoring (#957) 2026-03-03 21:32:01 +08:00
find-skills feat: add find-skills skill for discovering agent skills 2026-02-01 23:54:08 +08:00
frontend-design refactor: refine skills 2026-01-21 21:22:56 +08:00
github-deep-research feat: Support gitHub PAT configuration for higher github API accessing rate. (#1374) 2026-03-27 09:54:14 +08:00
image-generation fix: issue 1138 windows encoding (#1139) 2026-03-16 16:53:12 +08:00
newsletter-generation feat(skills): add academic-paper-review, code-documentation, and newsletter-generation skills (#1861) 2026-04-05 10:19:35 +08:00
podcast-generation fix: add error handling for podcast generation failures (#1257) 2026-03-24 00:20:12 +08:00
ppt-generation fix: issue 1138 windows encoding (#1139) 2026-03-16 16:53:12 +08:00
skill-creator fix: issue 1138 windows encoding (#1139) 2026-03-16 16:53:12 +08:00
surprise-me docs: update description for surprise-me skill to enhance clarity 2026-02-07 10:51:43 +08:00
systematic-literature-review test(skills): add evaluation + trigger analysis for systematic-literature-review (#2061) 2026-04-10 18:02:45 +08:00
vercel-deploy-claimable feat: use list of links 2026-02-02 13:25:21 +08:00
video-generation fix: issue 1138 windows encoding (#1139) 2026-03-16 16:53:12 +08:00
web-design-guidelines fix: fix skill md path 2026-01-20 21:10:05 +08:00