How Many Papers Should I Screen?
Data-driven estimates of screening workload and how AI tools can reduce it significantly.
The Short Answer
In a traditional systematic review, you screen 100% of your search results. This typically means screening between 1,000 and 10,000+ papers at the title/abstract stage. With AI tools, you may only need to screen 20–50% to find 95%+ of relevant studies.
Typical search results per review
median across domains
Average inclusion rate
at title/abstract stage
Final studies included
median for published reviews
Expected Screening Volume by Field
| Research Area | Typical Search Results | Inclusion Rate | With AI (est.) |
|---|---|---|---|
| Clinical Medicine | 2,000–8,000 | 2–5% | Screen 30–40% |
| Psychology | 1,500–5,000 | 3–8% | Screen 25–35% |
| Education | 1,000–4,000 | 5–10% | Screen 25–40% |
| Environmental Science | 2,000–10,000 | 1–3% | Screen 15–30% |
| Computer Science | 1,000–3,000 | 5–15% | Screen 30–50% |
Estimates based on meta-research and the SYNERGY benchmarking dataset. Actual numbers vary widely.
How Long Will Screening Take?
The average screening time is 30 seconds to 2 minutes per paper. Here's what that means for different dataset sizes:
| Papers | Manual (100%) | With AI (~30%) | Time Saved |
|---|---|---|---|
| 1,000 | ~17 hours | ~5 hours | 12 hours |
| 3,000 | ~50 hours | ~15 hours | 35 hours |
| 5,000 | ~83 hours | ~25 hours | 58 hours |
| 10,000 | ~167 hours | ~50 hours | 117 hours |
Based on 1 minute average per paper. AI percentage assumes 95%+ recall using active learning with stopping strategies.
Factors That Affect Screening Volume
Search Sensitivity vs. Specificity
Broader searches find more relevant papers but also more noise. A sensitive search for a Cochrane review might return 10,000+ results vs. 2,000 for a focused search.
Number of Databases
Searching 5 databases vs. 2 will roughly double your results (with overlap/duplicates accounting for ~20–40%).
Topic Popularity
Hot topics (COVID-19, AI, mental health) generate far more results than niche topics.
Inclusion Criteria Specificity
Very specific eligibility criteria lead to lower inclusion rates, meaning more papers to screen per included study.
How to Reduce Screening Workload
Active learning prioritizes relevant papers — reducing workload by 50–80%
Clear criteria speed up decisions from 2 minutes to 30 seconds per paper
Evidence-based stopping rules tell you when it's safe to stop
Better search terms and filters reduce noise without losing relevant papers
Continue Learning
Screen 70% Fewer Papers with AI
Upload your dataset and let Lumina's active learning find relevant papers first. With stopping strategies, screen only what you need.