Automated Academic PDF Organization & Search β Powered by AI
Published in SoftwareX (Elsevier) Β· Science Citation Index Expanded (SCI-E)
Overview β’ Screenshots β’ Features β’ Pipeline β’ Quick Start β’ Docs β’ Citation
LitOrganizer is a free, open-source tool that automatically organizes academic PDF collections. It extracts metadata via DOI lookup, queries multiple academic APIs, and leverages Google Gemini AI as an intelligent fallback β then renames files using citation standards, categorizes them, and provides full-text search through a modern web interface.
|
The Problem:
Researchers accumulate hundreds of PDFs with cryptic filenames like |
The Solution:
LitOrganizer automatically renames them to |
|
Automatically detects DOIs from PDF text and queries 7+ academic APIs simultaneously for accurate metadata:
|
When DOI extraction fails, Gemini AI reads the PDF content and extracts title, authors, and year β then validates via Crossref. Real-time AI status panel shows extraction progress. |
|
Files are renamed using APA 7th edition format: Automatic folder categorization: journal Β· author Β· year Β· subject |
Search across your entire PDF collection with:
|
|
|
LitOrganizer uses a multi-stage pipeline to extract metadata and name your PDF files:
flowchart LR
A["π PDF File"] --> B{"DOI Found?"}
B -- Yes --> C["π Query Academic APIs"]
C --> D["β
Named Article/"]
B -- No --> E{"Gemini AI\nEnabled?"}
E -- Yes --> F["π€ AI Extraction\n(Title, Authors, Year)"]
F --> G{"Validated via\nCrossref?"}
G -- Yes --> D
G -- No --> H["π AI Named Content/\n(if separate folder)"]
E -- No --> I["β Unnamed Article/"]
G -- Fail --> I
Output directory structure:
your_pdf_folder/
βββ Named Article/ β DOI + API verified or Gemini AI validated
βββ AI Named Content/ β Gemini AI named (optional separate folder)
βββ Unnamed Article/ β No metadata found
βββ backups/ β Original file backups (if enabled)
The launcher scripts handle everything automatically β Python check, virtual environment, dependencies, and server startup.
πͺ Windows
- Download or clone the repository
- Double-click
start_litorganizer.bat - Browser opens automatically at
http://localhost:5000
π macOS
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh "Start LitOrganizer.command"Option A: Double-click Start LitOrganizer.command in Finder
Option B: Run ./start_litorganizer.sh in Terminal
Note: If downloaded as ZIP, remove quarantine first:
xattr -cr .
π§ Linux
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh
./start_litorganizer.shπ Manual Installation
# Clone & setup
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
# Create & activate virtual environment
python3 -m venv .venv
source .venv/bin/activate # macOS / Linux
# .venv\Scripts\activate # Windows
# Install & run
pip install -r requirements.txt
python litorganizer.pyβ¨οΈ Command Line Mode
python litorganizer.py -d /path/to/pdfs --create-referencesRun python litorganizer.py --help for all available options.
API settings can be managed on the Settings page or by editing config/api_keys.json.
| API | Status | Requires |
|---|---|---|
| Crossref | β Enabled | β |
| OpenAlex | β Enabled | |
| DataCite | β Enabled | β |
| Europe PMC | β Enabled | β |
| Semantic Scholar | β Enabled | β |
| Scopus | β¬ Optional | API Key |
| Unpaywall | β¬ Optional | |
| Google Gemini AI | β¬ Optional | API Key |
π€ Enable Gemini AI
- Open the Settings page in LitOrganizer
- Toggle Google Gemini Flash on
- Enter your free API key from Google AI Studio
- Save β Gemini AI will be used as fallback when DOI extraction fails
For detailed usage instructions, see the User Guide which covers:
| Topic | Description |
|---|---|
| π Naming Pipeline | How metadata is extracted and files are renamed |
| π€ Gemini AI Setup | Configuration and usage of the AI fallback |
| π Keyword Search | Regex examples and export options |
| π Output Structure | How files are organized into folders |
| βοΈ API Reference | Available APIs and configuration |
π‘ In-App Guide: After launching, click Guide in the navigation menu for interactive documentation.
| Layer | Technologies |
|---|---|
| Backend | Python Β· Flask Β· Flask-SocketIO Β· PyMuPDF Β· pdfplumber |
| AI | Google Gemini Flash 2.0 API |
| Frontend | Tailwind CSS Β· Socket.IO Client Β· SVG Progress Rings Β· Native OS Dialog |
| Data Export | pandas Β· openpyxl Β· python-docx |
- Modern web interface with real-time updates
- DOI fallback with Crossref title search
- Google Gemini AI integration
- Native OS folder picker
- Built-in usage guide
- Full-text search with Word/Excel export
- Batch export in BibTeX / RIS format
- Docker support
- Dark mode
If you use LitOrganizer in your research, please cite:
Εahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews. SoftwareX, 30, 102198. https://doi.org/10.1016/j.softx.2025.102198
BibTeX
@article{sahin2025litorganizer,
title = {LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews},
author = {Εahin, Alperen and Kara, Burak Can and Dirsehan, TaΕkΔ±n},
journal = {SoftwareX},
volume = {30},
pages = {102198},
year = {2025},
publisher = {Elsevier},
doi = {10.1016/j.softx.2025.102198}
}APA 7th Edition
Εahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data
extraction and organization for scientific literature reviews. SoftwareX, 30, 102198.
https://doi.org/10.1016/j.softx.2025.102198
RIS
TY - JOUR
TI - LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews
AU - Εahin, Alperen
AU - Kara, Burak Can
AU - Dirsehan, TaΕkΔ±n
JO - SoftwareX
VL - 30
SP - 102198
PY - 2025
SN - 2352-7110
DO - 10.1016/j.softx.2025.102198
UR - https://www.sciencedirect.com/science/article/pii/S2352711025001657
ER -
v2.0.0 β AI-Powered Web Application (Latest)
Major Release: Complete redesign from PyQt5 desktop app to Flask + Socket.IO web application with Google Gemini AI integration.
- Google Gemini AI integration with real-time status panel
- Modern web interface with Tailwind CSS
- WebSocket-powered live progress tracking with circular progress rings
- Native OS folder picker with quick access shortcuts
- Multi-stage DOI fallback pipeline
- Global activity panel & completion modal
- Comprehensive usage guide page
- Search export to Word/Excel with highlights
- Backup system file copy scope issue
- Cross-platform path separator in "Open Folder"
- Statistics persistence across page navigation
- Progress ring synchronization
- Architecture: PyQt5 β Flask + Socket.IO
- Default AI-named files go to
Named Article/(configurable) - Native OS dialog replaces drag-and-drop zone
- Python requirement broadened to 3.10+
- PyQt5 desktop GUI &
modules/gui/directory --guiCLI argument- Drag & drop directory selection
- Heuristic regex-based content extraction
v1.x β Desktop Application (Legacy)
- PyQt5-based desktop GUI with tabbed interface
- Basic progress bar
- Local-only operation
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch β git checkout -b feature/AmazingFeature
3. Commit your changes β git commit -m 'Add AmazingFeature'
4. Push to the branch β git push origin feature/AmazingFeature
5. Open a Pull Request



