AI-Powered Web Scraping and Database Population Platform
SyncEngine is an intelligent platform that scrapes websites and populates user databases. It analyzes database structures, maps website patterns, and uses AI to suggest field mappings - with support for both manual (JSON staging) and automatic sync modes.
- LLM Integration: OpenAI/Claude powered intelligent field mapping suggestions
- Structure Analysis: Automatic detection of website data patterns
- Pagination Detection: Smart detection of query params, path-based, and button pagination
- Manual Mode: Data staged as JSON for review before committing to database
- Auto Mode: Direct insertion into user's database with scheduled jobs
- Hybrid Scraper: Puppeteer (browser) + Cheerio (HTTP) for maximum compatibility
- Rate Limiting: Configurable delays and concurrency controls
- Authentication: Support for cookies, headers, and basic auth
- Grid View: Card-based overview of all assignments with status tracking
- Extraction Rules: Visual editor with CSS selector support
- Real-time Progress: Track pages processed, rows extracted
- Process Logs: Detailed logging for debugging
- JSON Preview: Review staged data before committing
- SQLite: File-based databases with easy setup (recommended for development)
- PostgreSQL, MySQL, SQL Server: Full enterprise database connectors
- Schema analysis and table discovery
- Secure credential encryption (AES-256-GCM)
- Frontend: Next.js 16 (App Router), React 19, TypeScript
- UI Components: Shadcn UI, Tailwind CSS
- Database: SQLite with Prisma ORM (config), SQLite/PostgreSQL/MySQL/SQL Server (user databases)
- SQLite Driver: better-sqlite3 for high-performance SQLite connections
- Web Scraping: Puppeteer, Cheerio
- AI: OpenAI API
- Scheduling: node-cron
- State Management: Zustand
- Node.js 18+
- npm or yarn
- OpenAI API key (for AI features)
- Clone the repository:
git clone <repository-url>
cd syncengine- Install dependencies:
npm install- Set up environment variables:
cp .env.example .envEdit .env with your configuration:
DATABASE_URL="file:./dev.db"
NEXT_PUBLIC_APP_NAME="SyncEngine"
NEXT_PUBLIC_APP_URL="http://localhost:3000"
JWT_SECRET="your-super-secret-jwt-key"
ENCRYPTION_KEY="your-32-character-encryption-key"
# AI Service (OpenAI)
OPENAI_API_KEY="sk-..."
OPENAI_MODEL="gpt-4o"- Set up the database:
npx prisma generate
npx prisma db push
npm run db:seed- Create sample SQLite databases (optional but recommended):
npx tsx scripts/create-sample-dbs.tsThis creates two sample databases in ./data/:
- products.db: E-commerce catalog with categories, products, and reviews
- customers.db: Customer data with orders and order items
- Start the development server:
npm run dev-
Open http://localhost:3000 in your browser.
-
Login with OTP authentication:
- Enter email:
admin@syncengine.local - Check console for OTP code (or configure SMTP in Settings)
- Enter the 6-digit OTP code
- Enter email:
Tip: After logging in, start by adding a Data Source (SQLite recommended) and a Web Source, then create an Assignment to link them together.
syncengine/
├── data/ # Sample SQLite databases
│ ├── products.db # E-commerce product catalog
│ └── customers.db # Customer and orders data
├── prisma/
│ └── schema.prisma # Database schema
├── scripts/
│ └── create-sample-dbs.ts # Script to create sample databases
├── src/
│ ├── app/
│ │ ├── api/ # API routes
│ │ │ ├── data-sources/ # Database source endpoints
│ │ │ ├── web-sources/ # Web source endpoints
│ │ │ ├── assignments/ # Assignment endpoints
│ │ │ ├── extraction-jobs/# Job endpoints
│ │ │ └── logs/ # Process log endpoints
│ │ └── (dashboard)/ # Protected pages
│ │ ├── data-sources/ # Database source management
│ │ ├── web-sources/ # Web source management
│ │ ├── assignments/ # Assignment management
│ │ ├── extraction-jobs/# Job monitoring
│ │ └── logs/ # Process logs
│ ├── components/
│ │ ├── assignments/ # Assignment components
│ │ │ └── assignment-card.tsx
│ │ └── ui/ # Shadcn UI components
│ ├── lib/
│ │ ├── services/
│ │ │ ├── database-connector.ts # Multi-DB connector (SQLite, PostgreSQL, MySQL, MSSQL)
│ │ │ ├── web-scraper.ts # Hybrid scraper
│ │ │ ├── ai-mapper.ts # AI mapping service
│ │ │ ├── extraction-executor.ts # Extraction engine
│ │ │ └── scheduler.ts # Cron scheduling
│ │ ├── api.ts # API client
│ │ └── db.ts # Prisma client
│ └── types/
│ └── index.ts # TypeScript definitions
└── output/ # Staged JSON files (gitignored)
-
Add a Data Source (
/data-sources/new)- Select SQLite and choose one of the sample databases (products.db or customers.db)
- Test the connection to verify it works
-
Add a Web Source (
/web-sources/new)- Enter a website URL to scrape
- Configure scraper type (HTTP for static sites, Browser for JS-heavy sites)
- Analyze the website structure
-
Create an Assignment (
/assignments/new)- Link the web source to a data source table
- Choose sync mode: Manual (review first) or Auto (direct insert)
- Configure extraction schedule
-
Configure Mappings (
/assignments/:id)- Use AI suggestions to map website fields to database columns
- Run sample tests to verify extraction
- Trigger full extraction when ready
-
Monitor Jobs (
/extraction-jobs)- Track extraction progress in real-time
- Review staged data (Manual mode)
- Commit to database or cancel
The project includes two pre-built SQLite databases for testing:
| Table | Description | Rows |
|---|---|---|
categories |
Product categories | 5 |
products |
Product catalog with prices, SKUs | 12 |
product_reviews |
Customer reviews with ratings | 6 |
| Table | Description | Rows |
|---|---|---|
customers |
Customer contact information | 8 |
orders |
Order records with statuses | 8 |
order_items |
Individual order line items | 12 |
To regenerate the sample databases:
npx tsx scripts/create-sample-dbs.tsConfigure database connections:
- SQLite: Select a
.dbfile (easiest for development) - PostgreSQL/MySQL/MSSQL: Standard connection settings with SSL support
- Test connections and discover table schemas
Configure websites to scrape data from:
- Base URL and authentication
- Scraper type (HTTP, Browser, or Hybrid)
- Rate limiting settings
- Pagination pattern detection
Link a web source to a database table:
- Map extracted fields to database columns
- Configure sync mode (Manual or Auto)
- Set extraction schedules
- Visual mapping editor with AI suggestions
Track extraction executions:
- View real-time progress
- Monitor rows extracted/inserted
- Review staged JSON data (Manual mode)
- Commit to database with one click
Comprehensive logging of all activities:
- Debug, Info, Warning, Error levels
- URL and row context
- Filterable log viewer
GET /api/web-sources- List all web sourcesPOST /api/web-sources- Create a new web sourceGET /api/web-sources/:id- Get web source detailsPUT /api/web-sources/:id- Update a web sourceDELETE /api/web-sources/:id- Delete a web sourcePOST /api/web-sources/:id/analyze- AI analyze website structurePOST /api/web-sources/:id/test-scrape- Test scrape sample page
GET /api/assignments- List all assignmentsPOST /api/assignments- Create a new assignmentGET /api/assignments/:id- Get assignment detailsPUT /api/assignments/:id- Update an assignmentDELETE /api/assignments/:id- Delete an assignmentPOST /api/assignments/:id/suggest-mapping- AI suggest mappingsPOST /api/assignments/:id/sample-test- Run sample extractionPUT /api/assignments/:id/mappings- Update extraction rulesPOST /api/assignments/:id/run- Trigger extraction
GET /api/extraction-jobs- List all jobsGET /api/extraction-jobs/:id- Get job detailsPOST /api/extraction-jobs/:id/commit- Commit staged dataPOST /api/extraction-jobs/:id/cancel- Cancel running jobGET /api/extraction-jobs/:id/staged-data- Get staged JSONGET /api/extraction-jobs/:id/logs- Get job logs
GET /api/logs- List logs (filterable)
| Role | Access |
|---|---|
| Admin | Full access: All features, user management, settings |
| Supervisor | Limited access: Web sources, assignments, jobs, logs |
- User enters email on login page
- System sends 6-digit OTP (valid for 10 minutes)
- User enters OTP to authenticate
- Session lasts 120 minutes
npm run devnpm run buildnpm run db:generate # Generate Prisma client
npm run db:push # Push schema to database
npm run db:seed # Seed initial data
npm run db:studio # Open Prisma Studio
npm run db:reset # Reset and seed database- SQLite database connector with file selection
- Sample databases for testing
- Cloud storage integration (Azure Blob, S3, GCS)
- Oracle database connector
- Incremental extraction with change detection
- API key authentication
- Rate limiting
- Multi-tenant support
- Webhook notifications
- Export/import assignment configurations
- Browser extension for selector picking
MIT
Contributions are welcome! Please read our contributing guidelines before submitting a pull request.