Skip to content

bcankara/LitOrganizer

Repository files navigation

LitOrganizer Logo



Automated Academic PDF Organization & Search β€” Powered by AI


Version Python License: MIT Platform


DOI SoftwareX SCI-E GitHub stars GitHub issues


Published in SoftwareX (Elsevier) Β· Science Citation Index Expanded (SCI-E)


Overview β€’ Screenshots β€’ Features β€’ Pipeline β€’ Quick Start β€’ Docs β€’ Citation


πŸ“Œ What is LitOrganizer?

LitOrganizer is a free, open-source tool that automatically organizes academic PDF collections. It extracts metadata via DOI lookup, queries multiple academic APIs, and leverages Google Gemini AI as an intelligent fallback β€” then renames files using citation standards, categorizes them, and provides full-text search through a modern web interface.

The Problem: Researchers accumulate hundreds of PDFs with cryptic filenames like 1234567.pdf, paper_final_v3.pdf, or download(2).pdf. Finding the right paper becomes a nightmare.

The Solution: LitOrganizer automatically renames them to (Smith, 2024) - Machine Learning in Healthcare.pdf and organizes them into folders by journal, author, or year.


πŸ“Έ Screenshots

Processing Page
PDF Processing β€” Real-time progress with Gemini AI panel
Statistics Dashboard
Statistics Dashboard β€” Performance & accuracy analytics
Completion Modal
Processing Complete β€” Summary with success rate
Keyword Search
Full-Text Search β€” Search across all PDFs with export

✨ Key Features

πŸ” Smart Metadata Extraction

Automatically detects DOIs from PDF text and queries 7+ academic APIs simultaneously for accurate metadata:

Crossref Β· OpenAlex Β· DataCite Β· Europe PMC Β· Semantic Scholar Β· Scopus Β· Unpaywall

πŸ€– Google Gemini AI Fallback

When DOI extraction fails, Gemini AI reads the PDF content and extracts title, authors, and year β€” then validates via Crossref.

Real-time AI status panel shows extraction progress.

πŸ“ Citation-Based Renaming

Files are renamed using APA 7th edition format:

(Author, Year) - Title.pdf

Automatic folder categorization: journal Β· author Β· year Β· subject

πŸ”Ž Full-Text Search

Search across your entire PDF collection with:

  • Exact match & regex support
  • Sentence-level context highlighting
  • Export results to Word or Excel

πŸ“Š Real-Time Web Interface

  • WebSocket-powered live progress with animated rings
  • Native OS folder picker dialog
  • Statistics dashboard with performance metrics

πŸ“‹ Reference Generation

  • Auto-generated bibliography of all processed papers
  • Publication analytics by author, journal & year
  • Detailed error diagnostics for problematic files

πŸ”¬ How It Works

LitOrganizer uses a multi-stage pipeline to extract metadata and name your PDF files:

flowchart LR
    A["πŸ“„ PDF File"] --> B{"DOI Found?"}
    B -- Yes --> C["πŸ”— Query Academic APIs"]
    C --> D["βœ… Named Article/"]
    B -- No --> E{"Gemini AI\nEnabled?"}
    E -- Yes --> F["πŸ€– AI Extraction\n(Title, Authors, Year)"]
    F --> G{"Validated via\nCrossref?"}
    G -- Yes --> D
    G -- No --> H["πŸ“ AI Named Content/\n(if separate folder)"]
    E -- No --> I["❓ Unnamed Article/"]
    G -- Fail --> I
Loading

Output directory structure:

your_pdf_folder/
β”œβ”€β”€ Named Article/          ← DOI + API verified or Gemini AI validated
β”œβ”€β”€ AI Named Content/       ← Gemini AI named (optional separate folder)
β”œβ”€β”€ Unnamed Article/        ← No metadata found
└── backups/                ← Original file backups (if enabled)

πŸš€ Quick Start

The launcher scripts handle everything automatically β€” Python check, virtual environment, dependencies, and server startup.

πŸͺŸ Windows
  1. Download or clone the repository
  2. Double-click start_litorganizer.bat
  3. Browser opens automatically at http://localhost:5000
🍎 macOS
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh "Start LitOrganizer.command"

Option A: Double-click Start LitOrganizer.command in Finder Option B: Run ./start_litorganizer.sh in Terminal

Note: If downloaded as ZIP, remove quarantine first: xattr -cr .

🐧 Linux
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer
chmod +x start_litorganizer.sh
./start_litorganizer.sh
πŸ›  Manual Installation
# Clone & setup
git clone https://github.com/bcankara/LitOrganizer.git
cd LitOrganizer

# Create & activate virtual environment
python3 -m venv .venv
source .venv/bin/activate        # macOS / Linux
# .venv\Scripts\activate         # Windows

# Install & run
pip install -r requirements.txt
python litorganizer.py
⌨️ Command Line Mode
python litorganizer.py -d /path/to/pdfs --create-references

Run python litorganizer.py --help for all available options.


βš™οΈ Configuration

API settings can be managed on the Settings page or by editing config/api_keys.json.

API Status Requires
Crossref βœ… Enabled β€”
OpenAlex βœ… Enabled Email
DataCite βœ… Enabled β€”
Europe PMC βœ… Enabled β€”
Semantic Scholar βœ… Enabled β€”
Scopus ⬚ Optional API Key
Unpaywall ⬚ Optional Email
Google Gemini AI ⬚ Optional API Key
πŸ€– Enable Gemini AI
  1. Open the Settings page in LitOrganizer
  2. Toggle Google Gemini Flash on
  3. Enter your free API key from Google AI Studio
  4. Save β€” Gemini AI will be used as fallback when DOI extraction fails

πŸ“– Documentation

For detailed usage instructions, see the User Guide which covers:

Topic Description
πŸ”„ Naming Pipeline How metadata is extracted and files are renamed
πŸ€– Gemini AI Setup Configuration and usage of the AI fallback
πŸ”Ž Keyword Search Regex examples and export options
πŸ“ Output Structure How files are organized into folders
βš™οΈ API Reference Available APIs and configuration

πŸ’‘ In-App Guide: After launching, click Guide in the navigation menu for interactive documentation.


πŸ› οΈ Tech Stack

Layer Technologies
Backend Python Β· Flask Β· Flask-SocketIO Β· PyMuPDF Β· pdfplumber
AI Google Gemini Flash 2.0 API
Frontend Tailwind CSS Β· Socket.IO Client Β· SVG Progress Rings Β· Native OS Dialog
Data Export pandas Β· openpyxl Β· python-docx

πŸ—ΊοΈ Roadmap

  • Modern web interface with real-time updates
  • DOI fallback with Crossref title search
  • Google Gemini AI integration
  • Native OS folder picker
  • Built-in usage guide
  • Full-text search with Word/Excel export
  • Batch export in BibTeX / RIS format
  • Docker support
  • Dark mode

πŸ“„ Citation

If you use LitOrganizer in your research, please cite:

Şahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews. SoftwareX, 30, 102198. https://doi.org/10.1016/j.softx.2025.102198

BibTeX
@article{sahin2025litorganizer,
  title     = {LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews},
  author    = {Şahin, Alperen and Kara, Burak Can and Dirsehan, Taşkın},
  journal   = {SoftwareX},
  volume    = {30},
  pages     = {102198},
  year      = {2025},
  publisher = {Elsevier},
  doi       = {10.1016/j.softx.2025.102198}
}
APA 7th Edition
Şahin, A., Kara, B. C., & Dirsehan, T. (2025). LitOrganizer: Automating the process of data
extraction and organization for scientific literature reviews. SoftwareX, 30, 102198.
https://doi.org/10.1016/j.softx.2025.102198
RIS
TY  - JOUR
TI  - LitOrganizer: Automating the process of data extraction and organization for scientific literature reviews
AU  - Şahin, Alperen
AU  - Kara, Burak Can
AU  - Dirsehan, Taşkın
JO  - SoftwareX
VL  - 30
SP  - 102198
PY  - 2025
SN  - 2352-7110
DO  - 10.1016/j.softx.2025.102198
UR  - https://www.sciencedirect.com/science/article/pii/S2352711025001657
ER  -

πŸ“‹ Changelog

v2.0.0 β€” AI-Powered Web Application (Latest)

Major Release: Complete redesign from PyQt5 desktop app to Flask + Socket.IO web application with Google Gemini AI integration.

βœ… Added

  • Google Gemini AI integration with real-time status panel
  • Modern web interface with Tailwind CSS
  • WebSocket-powered live progress tracking with circular progress rings
  • Native OS folder picker with quick access shortcuts
  • Multi-stage DOI fallback pipeline
  • Global activity panel & completion modal
  • Comprehensive usage guide page
  • Search export to Word/Excel with highlights

πŸ”§ Fixed

  • Backup system file copy scope issue
  • Cross-platform path separator in "Open Folder"
  • Statistics persistence across page navigation
  • Progress ring synchronization

πŸ”„ Changed

  • Architecture: PyQt5 β†’ Flask + Socket.IO
  • Default AI-named files go to Named Article/ (configurable)
  • Native OS dialog replaces drag-and-drop zone
  • Python requirement broadened to 3.10+

πŸ—‘οΈ Removed

  • PyQt5 desktop GUI & modules/gui/ directory
  • --gui CLI argument
  • Drag & drop directory selection
  • Heuristic regex-based content extraction
v1.x β€” Desktop Application (Legacy)
  • PyQt5-based desktop GUI with tabbed interface
  • Basic progress bar
  • Local-only operation

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch    β†’  git checkout -b feature/AmazingFeature
3. Commit your changes           β†’  git commit -m 'Add AmazingFeature'
4. Push to the branch            β†’  git push origin feature/AmazingFeature
5. Open a Pull Request

πŸ“¬ Contact & Support

Issues Discussions


Stars Β  Forks

Made with ❀️ for the academic community

About

LitOrganizer is a powerful tool designed for researchers, academics, and students to organize their PDF literature collections automatically. It extracts metadata from academic papers, renames files according to citation standards, categorizes them into a logical directory structure, and provides powerful search capabilities.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors