A forensic security tool to scan GitHub Pull Request diffs for suspicious patterns. Download and analyze PR diffs at scale to detect potential security issues, secrets, and malicious code injections.
β‘ Built with fast vibe coding for vulnerability research π¬
- π Regex Pattern Search - Search PR diffs using powerful regex patterns
- β‘ Parallel Downloads - Download multiple PRs simultaneously (10x faster)
- π Ripgrep Integration - Uses ripgrep for 10-100x faster searching (with Python fallback)
- π Multi-Token Support - Rotate through multiple GitHub tokens to handle rate limits
- π Live Progress - Real-time progress bar with ETA
- π Organized Output - Each scan creates a timestamped folder with all results
- π Detailed Logging - Full logs for debugging and audit trails
- πΎ Multiple Export Formats - Results in JSON and CSV
git clone <repo-url>
cd GithubForensicTestpython3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtCreate a .env file with your GitHub Personal Access Tokens:
cp example.env .envEdit .env and add your tokens (comma-separated for multiple tokens):
GITHUB_TOKENS=ghp_token1,ghp_token2,ghp_token3
Generate tokens at: https://github.com/settings/tokens
Required scopes:
public_repo- for public repositoriesrepo- for private repositories
Edit config.json:
{
"repo": "owner/repo-name",
"search_patterns": [
"password",
"api[_-]?key",
"secret"
],
"pr_state": "all",
"per_page": 100
}| Field | Description |
|---|---|
repo |
GitHub repository in owner/repo format |
search_patterns |
Array of regex patterns to search for |
pr_state |
"all", "open", or "closed" |
per_page |
PRs per API request (max 100) |
max_workers |
Parallel download threads (default: 10) |
# macOS
brew install ripgrep
# Ubuntu/Debian
apt install ripgrep
# Windows
choco install ripgrepIf ripgrep is not installed, the tool falls back to Python regex (slower but works).
python github_forensic_test.pyEach scan creates a timestamped folder:
scans/
βββ PR/
βββ owner-repo_20251228_143052/
βββ diffs/ # Downloaded PR diff files
β βββ PR_1234.diff
β βββ PR_5678.diff
β βββ ...
βββ scan.log # Detailed log file
βββ results.json # Full results with metadata
βββ results.csv # CSV for spreadsheet viewing
βββ config_used.json # Config snapshot for this run
{
"search_patterns": [
"password\\s*=",
"api[_-]?key\\s*=",
"secret\\s*=",
"token\\s*=",
"credential",
"BEGIN (RSA|DSA|EC|OPENSSH) PRIVATE KEY"
]
}{
"search_patterns": [
"eval\\(",
"exec\\(",
"subprocess\\.call",
"os\\.system"
]
}GitHub API rate limits:
- Authenticated: 5,000 requests/hour per token
- Unauthenticated: 60 requests/hour
The tool automatically rotates through multiple tokens when rate limits are reached. For large repositories, use multiple tokens.
MIT