Skip to content

barakharyati/GithubForensicTest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GithubForensicTest

A forensic security tool to scan GitHub Pull Request diffs for suspicious patterns. Download and analyze PR diffs at scale to detect potential security issues, secrets, and malicious code injections.

⚑ Built with fast vibe coding for vulnerability research πŸ”¬

Features

  • πŸ” Regex Pattern Search - Search PR diffs using powerful regex patterns
  • ⚑ Parallel Downloads - Download multiple PRs simultaneously (10x faster)
  • πŸš€ Ripgrep Integration - Uses ripgrep for 10-100x faster searching (with Python fallback)
  • πŸ”‘ Multi-Token Support - Rotate through multiple GitHub tokens to handle rate limits
  • πŸ“Š Live Progress - Real-time progress bar with ETA
  • πŸ“ Organized Output - Each scan creates a timestamped folder with all results
  • πŸ“ Detailed Logging - Full logs for debugging and audit trails
  • πŸ’Ύ Multiple Export Formats - Results in JSON and CSV

Setup

1. Clone the repository

git clone <repo-url>
cd GithubForensicTest

2. Create virtual environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Configure GitHub tokens

Create a .env file with your GitHub Personal Access Tokens:

cp example.env .env

Edit .env and add your tokens (comma-separated for multiple tokens):

GITHUB_TOKENS=ghp_token1,ghp_token2,ghp_token3

Generate tokens at: https://github.com/settings/tokens

Required scopes:

  • public_repo - for public repositories
  • repo - for private repositories

5. Configure target repository and patterns

Edit config.json:

{
    "repo": "owner/repo-name",
    "search_patterns": [
        "password",
        "api[_-]?key",
        "secret"
    ],
    "pr_state": "all",
    "per_page": 100
}
Field Description
repo GitHub repository in owner/repo format
search_patterns Array of regex patterns to search for
pr_state "all", "open", or "closed"
per_page PRs per API request (max 100)
max_workers Parallel download threads (default: 10)

Optional: Install ripgrep for faster searching

# macOS
brew install ripgrep

# Ubuntu/Debian
apt install ripgrep

# Windows
choco install ripgrep

If ripgrep is not installed, the tool falls back to Python regex (slower but works).

Usage

python github_forensic_test.py

Output Structure

Each scan creates a timestamped folder:

scans/
└── PR/
    └── owner-repo_20251228_143052/
        β”œβ”€β”€ diffs/              # Downloaded PR diff files
        β”‚   β”œβ”€β”€ PR_1234.diff
        β”‚   β”œβ”€β”€ PR_5678.diff
        β”‚   └── ...
        β”œβ”€β”€ scan.log            # Detailed log file
        β”œβ”€β”€ results.json        # Full results with metadata
        β”œβ”€β”€ results.csv         # CSV for spreadsheet viewing
        └── config_used.json    # Config snapshot for this run

Example Patterns

Security scanning

{
    "search_patterns": [
        "password\\s*=",
        "api[_-]?key\\s*=",
        "secret\\s*=",
        "token\\s*=",
        "credential",
        "BEGIN (RSA|DSA|EC|OPENSSH) PRIVATE KEY"
    ]
}

Finding specific code patterns

{
    "search_patterns": [
        "eval\\(",
        "exec\\(",
        "subprocess\\.call",
        "os\\.system"
    ]
}

Rate Limits

GitHub API rate limits:

  • Authenticated: 5,000 requests/hour per token
  • Unauthenticated: 60 requests/hour

The tool automatically rotates through multiple tokens when rate limits are reached. For large repositories, use multiple tokens.

License

MIT

About

πŸ” Forensic tool to scan GitHub PR diffs for secrets, vulnerabilities, and malicious code patterns

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages