GitHub - bcankara/LitOrganizer: LitOrganizer is a powerful tool designed for researchers, academics, and students to organize their PDF literature collections automatically. It extracts metadata from academic papers, renames files according to citation standards, categorizes them into a logical directory structure, and provides powerful search capabilities.

Automated Academic PDF Organization & Search — Powered by AI

_{Published in SoftwareX (Elsevier) · Science Citation Index Expanded (SCI-E)}

Overview • Screenshots • Features • Pipeline • Quick Start • Docs • Citation

📌 What is LitOrganizer?

LitOrganizer is a free, open-source tool that automatically organizes academic PDF collections. It extracts metadata via DOI lookup, queries multiple academic APIs, and leverages Google Gemini AI as an intelligent fallback — then renames files using citation standards, categorizes them, and provides full-text search through a modern web interface.

The Problem: Researchers accumulate hundreds of PDFs with cryptic filenames like 1234567.pdf, paper_final_v3.pdf, or download(2).pdf. Finding the right paper becomes a nightmare.

The Solution: LitOrganizer automatically renames them to (Smith, 2024) - Machine Learning in Healthcare.pdf and organizes them into folders by journal, author, or year.

📸 Screenshots

_{PDF Processing — Real-time progress with Gemini AI panel}	_{Statistics Dashboard — Performance & accuracy analytics}
_{Processing Complete — Summary with success rate}	_{Full-Text Search — Search across all PDFs with export}

✨ Key Features

🔍 Smart Metadata Extraction

Automatically detects DOIs from PDF text and queries 7+ academic APIs simultaneously for accurate metadata:

Crossref · OpenAlex · DataCite · Europe PMC · Semantic Scholar · Scopus · Unpaywall

🤖 Google Gemini AI Fallback

When DOI extraction fails, Gemini AI reads the PDF content and extracts title, authors, and year — then validates via Crossref.

Real-time AI status panel shows extraction progress.

📝 Citation-Based Renaming

Files are renamed using APA 7th edition format:

(Author, Year) - Title.pdf

Automatic folder categorization: journal · author · year · subject

🔎 Full-Text Search

Search across your entire PDF collection with:

Exact match & regex support
Sentence-level context highlighting
Export results to Word or Excel

📊 Real-Time Web Interface

WebSocket-powered live progress with animated rings
Native OS folder picker dialog
Statistics dashboard with performance metrics

📋 Reference Generation

Auto-generated bibliography of all processed papers
Publication analytics by author, journal & year
Detailed error diagnostics for problematic files

🔬 How It Works

LitOrganizer uses a multi-stage pipeline to extract metadata and name your PDF files:

flowchart LR
    A["📄 PDF File"] --> B{"DOI Found?"}
    B -- Yes --> C["🔗 Query Academic APIs"]
    C --> D["✅ Named Article/"]
    B -- No --> E{"Gemini AI\nEnabled?"}
    E -- Yes --> F["🤖 AI Extraction\n(Title, Authors, Year)"]
    F --> G{"Validated via\nCrossref?"}
    G -- Yes --> D
    G -- No --> H["📁 AI Named Content/\n(if separate folder)"]
    E -- No --> I["❓ Unnamed Article/"]
    G -- Fail --> I

Output directory structure:

your_pdf_folder/
├── Named Article/          ← DOI + API verified or Gemini AI validated
├── AI Named Content/       ← Gemini AI named (optional separate folder)
├── Unnamed Article/        ← No metadata found
└── backups/                ← Original file backups (if enabled)

🚀 Quick Start

The launcher scripts handle everything automatically — Python check, virtual environment, dependencies, and server startup.

🪟 Windows

Download or clone the repository
Double-click start_litorganizer.bat
Browser opens automatically at http://localhost:5000

🍎 macOS

git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh "Start LitOrganizer.command"

Option A: Double-click Start LitOrganizer.command in Finder Option B: Run ./start_litorganizer.sh in Terminal

Note: If downloaded as ZIP, remove quarantine first: xattr -cr .

🐧 Linux

git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh
./start_litorganizer.sh

🛠 Manual Installation

# Clone & setup
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer

# Create & activate virtual environment
python3 -m venv .venv
source .venv/bin/activate        # macOS / Linux
# .venv\Scripts\activate         # Windows

# Install & run
pip install -r requirements.txt
python litorganizer.py

⌨️ Command Line Mode

python litorganizer.py -d /path/to/pdfs --create-references

Run python litorganizer.py --help for all available options.

⚙️ Configuration

API settings can be managed on the Settings page or by editing config/api_keys.json.

API	Status	Requires
Crossref	✅ Enabled	—
OpenAlex	✅ Enabled	Email
DataCite	✅ Enabled	—
Europe PMC	✅ Enabled	—
Semantic Scholar	✅ Enabled	—
Scopus	⬚ Optional	API Key
Unpaywall	⬚ Optional	Email
Google Gemini AI	⬚ Optional	API Key

🤖 Enable Gemini AI

Open the Settings page in LitOrganizer
Toggle Google Gemini Flash on
Enter your free API key from Google AI Studio
Save — Gemini AI will be used as fallback when DOI extraction fails

📖 Documentation

For detailed usage instructions, see the User Guide which covers:

Topic	Description
🔄 Naming Pipeline	How metadata is extracted and files are renamed
🤖 Gemini AI Setup	Configuration and usage of the AI fallback
🔎 Keyword Search	Regex examples and export options
📁 Output Structure	How files are organized into folders
⚙️ API Reference	Available APIs and configuration

💡 In-App Guide: After launching, click Guide in the navigation menu for interactive documentation.

🛠️ Tech Stack

Layer	Technologies
Backend	Python · Flask · Flask-SocketIO · PyMuPDF · pdfplumber
AI	Google Gemini Flash 2.0 API
Frontend	Tailwind CSS · Socket.IO Client · SVG Progress Rings · Native OS Dialog
Data Export	pandas · openpyxl · python-docx

🗺️ Roadmap

📄 Citation

If you use LitOrganizer in your research, please cite:

Şahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews. SoftwareX, 30, 102198. https://doi.org/10.1016/j.softx.2025.102198

BibTeX

@article{sahin2025litorganizer,
  title     = {LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews},
  author    = {Şahin, Alperen and Kara, Burak Can and Dirsehan, Taşkın},
  journal   = {SoftwareX},
  volume    = {30},
  pages     = {102198},
  year      = {2025},
  publisher = {Elsevier},
  doi       = {10.1016/j.softx.2025.102198}
}

APA 7th Edition

Şahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data
extraction and organization for scientific literature reviews. SoftwareX, 30, 102198.
https://doi.org/10.1016/j.softx.2025.102198

RIS

TY  - JOUR
TI  - LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews
AU  - Şahin, Alperen
AU  - Kara, Burak Can
AU  - Dirsehan, Taşkın
JO  - SoftwareX
VL  - 30
SP  - 102198
PY  - 2025
SN  - 2352-7110
DO  - 10.1016/j.softx.2025.102198
UR  - https://www.sciencedirect.com/science/article/pii/S2352711025001657
ER  -

📋 Changelog

v2.0.0 — AI-Powered Web Application (Latest)

Major Release: Complete redesign from PyQt5 desktop app to Flask + Socket.IO web application with Google Gemini AI integration.

✅ Added

Google Gemini AI integration with real-time status panel
Modern web interface with Tailwind CSS
WebSocket-powered live progress tracking with circular progress rings
Native OS folder picker with quick access shortcuts
Multi-stage DOI fallback pipeline
Global activity panel & completion modal
Comprehensive usage guide page
Search export to Word/Excel with highlights

🔧 Fixed

Backup system file copy scope issue
Cross-platform path separator in "Open Folder"
Statistics persistence across page navigation
Progress ring synchronization

🔄 Changed

Architecture: PyQt5 → Flask + Socket.IO
Default AI-named files go to Named Article/ (configurable)
Native OS dialog replaces drag-and-drop zone
Python requirement broadened to 3.10+

🗑️ Removed

PyQt5 desktop GUI & modules/gui/ directory
--gui CLI argument
Drag & drop directory selection
Heuristic regex-based content extraction

v1.x — Desktop Application (Legacy)

PyQt5-based desktop GUI with tabbed interface
Basic progress bar
Local-only operation

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch    →  git checkout -b feature/AmazingFeature
3. Commit your changes           →  git commit -m 'Add AmazingFeature'
4. Push to the branch            →  git push origin feature/AmazingFeature
5. Open a Pull Request

📬 Contact & Support

_{Made with ❤️ for the academic community}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📌 What is LitOrganizer?

📸 Screenshots

✨ Key Features

🔍 Smart Metadata Extraction

🤖 Google Gemini AI Fallback

📝 Citation-Based Renaming

🔎 Full-Text Search

📊 Real-Time Web Interface

📋 Reference Generation

🔬 How It Works

🚀 Quick Start

⚙️ Configuration

📖 Documentation

🛠️ Tech Stack

🗺️ Roadmap

📄 Citation

📋 Changelog

✅ Added

🔧 Fixed

🔄 Changed

🗑️ Removed

🤝 Contributing

📬 Contact & Support

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
config		config
documents		documents
modules		modules
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Start LitOrganizer.command		Start LitOrganizer.command
litorganizer.py		litorganizer.py
requirements.txt		requirements.txt
start_litorganizer.bat		start_litorganizer.bat
start_litorganizer.sh		start_litorganizer.sh

Folders and files

Latest commit

History

Repository files navigation

📌 What is LitOrganizer?

📸 Screenshots

✨ Key Features

🔍 Smart Metadata Extraction

🤖 Google Gemini AI Fallback

📝 Citation-Based Renaming

🔎 Full-Text Search

📊 Real-Time Web Interface

📋 Reference Generation

🔬 How It Works

🚀 Quick Start

⚙️ Configuration

📖 Documentation

🛠️ Tech Stack

🗺️ Roadmap

📄 Citation

📋 Changelog

✅ Added

🔧 Fixed

🔄 Changed

🗑️ Removed

🤝 Contributing

📬 Contact & Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages