Skip to content

NanmiCoder/PicTacticAgent

Repository files navigation

PicTacticAgent

English | 中文

AI-powered cover image generation tool that solves the randomness problem of text-to-image models through a LangGraph StateGraph workflow.

Login Page Workspace
Login — Email / GitHub / Google Workspace — History + Config Panel
Generation Result Template Library
Generation Result — AI Scoring + Ranking Template Library — Filter + One-click Generate

One-click Install for Coding Agents

If you're using Claude Code or similar Coding Agents, let the AI read the installation guide and it will handle the setup automatically. Once installed, it becomes an AI image generation Skill — any Coding Agent can generate and edit images directly via CLI:

# Let AI read the install doc and complete all configuration
@docs/skill-installation.md Follow the guide to install PicTacticAgent Skill

After installation, simply tell the AI "generate a cyberpunk-style cover image" and you're good to go.

Overview

PicTacticAgent uses a LangGraph StateGraph workflow architecture to implement a closed-loop "Generate → Evaluate → Iterate" pipeline that automatically selects the best images, dramatically improving cover image creation efficiency.

Key Features

  • StateGraph Workflow — Deterministic LangGraph StateGraph workflow with modular node-based design
  • Iterative Generation — Configurable 1-10 rounds, 1-10 images per round, auto-stop when quality threshold is met
  • Auto Evaluation — GPT-4o Vision-based 6-dimension scoring (prompt match, visual appeal, layout, detail quality, technical completeness, portrait consistency)
  • Image Editing — AI-powered editing of existing images with text instructions
  • Template Style Analysis — Upload reference images to auto-extract style features and enhance prompts
  • Prompt Template Library — Save, manage, and reuse prompt templates with category filtering
  • Real-time Progress — WebSocket push for generation progress, cancel anytime
  • User Authentication — Email registration/login, JWT access & refresh tokens
  • OAuth Login — GitHub / Google third-party login support
  • Email Verification & Password Reset — Full verification and recovery flow
  • i18n Support — Chinese/English bilingual interface
  • Dark Theme — Default dark UI with theme toggle
  • CLI Tool — Generate/edit images from the command line, no Web server needed

Tech Stack

Layer Technology
Agent Framework LangChain + LangGraph (StateGraph)
Backend FastAPI + Uvicorn
Database SQLite + SQLAlchemy (async) + aiosqlite
Frontend Vite 7 + React 19 + TailwindCSS 4
Image Generation Gemini API / Jiekouai / Antigravity (OpenAI-compatible)
Image Evaluation GPT-4o Vision
CLI Typer + Rich
Package Management uv (Python) + npm (Frontend)

Requirements

  • Python 3.11+
  • uv (recommended) or pip
  • Node.js 18+

Quick Start

1. Clone the Repository

git clone https://github.com/NanmiCoder/PicTacticAgent.git
cd PicTacticAgent

2. Configure Environment Variables

cp .env.example .env

Edit the .env file and configure API keys for your chosen provider:

We recommend Jiekouai, a proxy service where one API key covers both image generation and LLM evaluation. Sign up and link your GitHub account for $3 free trial credits.

# Option 1: Jiekouai (Recommended)
DEFAULT_PROVIDER=jiekouai
JIEKOUAI_API_KEY=your-jiekouai-api-key
JIEKOUAI_API_BASE_URL=https://api.jiekou.ai/v3
JIEKOUAI_LLM_BASE_URL=https://api.jiekou.ai/openai
JIEKOUAI_LLM_MODEL=gpt-5-mini

# Option 2: Official Gemini
DEFAULT_PROVIDER=gemini
GEMINI_API_KEY=your-google-gemini-api-key
GEMINI_MODEL=gemini-3-pro-image-preview
GEMINI_LLM_MODEL=gemini-3-flash-preview

# Option 3: Antigravity (Text-to-image only)
DEFAULT_PROVIDER=antigravity
ANTIGRAVITY_API_KEY=your-antigravity-api-key
ANTIGRAVITY_API_BASE_URL=http://127.0.0.1:8045/v1

3. Start the Services

Option A: One-click Start (Recommended)

./start.sh

Automatically installs dependencies and starts both frontend and backend.

Option B: Manual Start

  1. Install backend dependencies and start:
uv sync
uv run uvicorn backend.src.pictactic.api.app:app --reload --port 8019
  1. Install frontend dependencies and start (new terminal):
cd frontend
npm install
npm run dev

Option C: Docker Deployment

cd docker
docker-compose up -d

Access URLs:

Usage

Web Interface

  1. Register an account or log in with OAuth
  2. Describe the cover image you want in the input box
  3. (Optional) Upload 1-5 reference images as style templates
  4. (Optional) Adjust aspect ratio, image size, rounds, images per round, quality threshold, etc.
  5. Click "Generate" to start, watch progress in real-time
  6. Select the best images from results, download or save as template

CLI

Generate images directly from the terminal without starting the Web server:

# Generate images (prompt must come AFTER all options)
uv run pictactic generate "a tech-inspired cover image"

# Generate with parameters
uv run pictactic generate \
  --rounds 3 --images 5 --top-k 3 --threshold 0.7 \
  --aspect-ratio 16:9 --size 2K --provider gemini \
  "a tech-inspired cover image"

# Edit an existing image
uv run pictactic edit --source ./image.png "change the background to dark blue"

# JSON output (for scripting)
uv run pictactic generate --format json "your prompt"

# List available providers
uv run pictactic providers

# Check task status (requires backend)
uv run pictactic status <task_id>

Configuration

Parameter Description Default Range
Max Rounds Maximum generation iterations 3 1-10
Images per Round Candidate images per round 5 1-10
Quality Threshold Score threshold to stop iterating 0.7 0.5-1.0
Top-K Number of final output images 3 1-10
Aspect Ratio Image aspect ratio 16:9 1:1, 16:9, 9:16, 4:3, 3:4
Image Size Output image size 2K 1K, 2K, 4K

Project Structure

PicTacticAgent/
├── backend/src/pictactic/     # Python backend
│   ├── agents/                # LangGraph StateGraph workflow
│   │   ├── workflow.py        # Workflow definition & orchestration
│   │   ├── conditions.py      # Conditional edge functions
│   │   ├── state.py           # GenerationState shared state
│   │   ├── prompts.py         # LLM prompt templates
│   │   ├── nodes/             # Workflow nodes
│   │   │   ├── analyze_node.py    # Template analysis node
│   │   │   ├── enhance_node.py    # Prompt enhancement node
│   │   │   ├── generate_node.py   # Image generation node
│   │   │   ├── evaluate_node.py   # Image evaluation node
│   │   │   ├── prepare_next_node.py # Next round preparation
│   │   │   └── finalize_node.py   # Final output node (Top-K)
│   │   └── tools/             # Node utility functions
│   ├── api/                   # FastAPI application
│   │   ├── app.py             # Main app (CORS, middleware)
│   │   ├── routes/            # API routes
│   │   │   ├── auth.py        # Authentication routes
│   │   │   ├── generation.py  # Generation routes
│   │   │   ├── templates.py   # Template management routes
│   │   │   └── health.py      # Health check
│   │   ├── dependencies.py    # Auth dependency injection
│   │   └── websocket.py       # WebSocket real-time progress
│   ├── cli/                   # CLI tool
│   │   ├── main.py            # Typer entry point
│   │   ├── generate.py        # generate command
│   │   ├── edit.py            # edit command
│   │   ├── providers.py       # providers command
│   │   ├── status.py          # status command
│   │   └── output.py          # Output formatting (text/json/quiet)
│   ├── providers/             # Image generation providers
│   │   ├── base.py            # ImageProvider ABC + data models
│   │   ├── gemini.py          # Gemini official SDK
│   │   ├── jiekouai.py        # Jiekouai reverse proxy
│   │   └── antigravity.py     # Antigravity (OpenAI-compatible)
│   ├── core/                  # Core configuration
│   │   └── config.py          # pydantic-settings config
│   ├── db/                    # Database layer
│   │   ├── engine.py          # SQLAlchemy async engine
│   │   ├── models.py          # Data models
│   │   ├── repository.py      # Task repository
│   │   └── template_repository.py # Template repository
│   ├── services/              # Business logic services
│   │   ├── auth_service.py    # Auth logic (JWT + bcrypt)
│   │   ├── generation_service.py  # Generation task management
│   │   ├── template_service.py    # Template management
│   │   ├── email_service.py   # Email sending
│   │   └── oauth_client.py    # OAuth client
│   ├── i18n/                  # Internationalization
│   └── models/                # Pydantic data models
│
├── frontend/src/              # React 19 SPA
│   ├── components/            # React components
│   │   ├── auth/              # Auth components (AuthShell, OAuthButtons)
│   │   ├── gallery/           # Image gallery (ImageCard, ImageModal, MasonryGrid, ImageEditDialog)
│   │   ├── generation/        # Generation panel (PromptInput, ConfigPanel, FloatingControlPanel, ProgressDisplay)
│   │   ├── templates/         # Template components (TemplateCard, TemplateList, TemplateDialog)
│   │   ├── layout/            # Layout (Header, TaskSidebar, TaskHeader)
│   │   ├── dialogs/           # Dialogs (SettingsDialog)
│   │   └── ui/                # Shared UI (Select, Toaster)
│   ├── hooks/                 # Custom hooks
│   │   ├── useGeneration.js   # Generation task state
│   │   ├── useAuth.jsx        # Auth state
│   │   ├── useLocale.jsx      # i18n
│   │   ├── useTheme.js        # Theme toggle
│   │   ├── useTemplates.js    # Template management
│   │   ├── useTaskHistory.js  # Task history
│   │   └── useImageEdit.js    # Image editing
│   ├── lib/                   # Utilities & API client
│   ├── pages/                 # Pages
│   │   ├── LandingPage.jsx    # Landing page
│   │   ├── LoginPage.jsx      # Login
│   │   ├── RegisterPage.jsx   # Register
│   │   ├── OAuthCallback.jsx  # OAuth callback
│   │   ├── VerifyEmailPage.jsx # Email verification
│   │   ├── ForgotPasswordPage.jsx # Forgot password
│   │   ├── ResetPasswordPage.jsx  # Reset password
│   │   └── TemplatesPage.jsx  # Template library page
│   └── locales/               # i18n language packs (zh/en)
│
├── docker/                    # Docker configuration
│   ├── docker-compose.yml
│   ├── Dockerfile
│   └── Dockerfile.frontend
│
├── docs/                      # Documentation
│   ├── PRD.md                 # Product requirements
│   ├── TECHNICAL_DESIGN.md    # Technical design
│   └── screenshots/           # Screenshots
│
├── tests/                     # Tests
│   ├── unit/                  # Unit tests
│   ├── integration/           # Integration tests
│   └── e2e/                   # End-to-end tests
│
├── .env.example               # Environment variable template
├── start.sh                   # One-click start script
├── pyproject.toml             # Python project config
└── README.md                  # This file

Workflow

User Input
    │
    ▼
[check_template] ─── Has reference ──→ [analyze_node]
    │                                        │
    │ No reference                           │
    ▼                                        ▼
[enhance_node] ◄────────────────────────────┘
    │
    ▼
[generate_node] ←──────────────────────┐
    │                                    │
    ▼                                    │
[evaluate_node]                          │
    │                                    │
    ▼                                    │
[should_continue]                        │
    │         │                          │
    │ Continue│ Done                     │
    ▼         ▼                          │
[prepare_next] ──────────────────────────┘
    │
    ▼
[finalize_node]
    │
    ▼
Output Top-K Best Images

Node Description

Node Function
check_template Check for reference images, decide if analysis is needed
analyze_node Analyze reference image style, layout, color features
enhance_node Enhance prompt based on analysis and evaluation feedback
generate_node Concurrent image generation API calls with streaming progress
evaluate_node 6-dimension image quality evaluation and ranking
should_continue Check if quality threshold or round limit is reached
prepare_next Prepare next iteration (extract feedback, increment round)
finalize_node Output Top-K final results

API Endpoints

Generation

Method Endpoint Description
POST /api/v1/generation/ Create generation task
GET /api/v1/generation/{task_id} Get task status and progress
GET /api/v1/generation/{task_id}/result Get full generation result
POST /api/v1/generation/{task_id}/images/{image_id}/edit Edit a generated image
POST /api/v1/generation/{task_id}/cancel Cancel task
DELETE /api/v1/generation/{task_id} Delete task
GET /api/v1/generation/ List tasks (paginated)
GET /api/v1/generation/history/list Task history (paginated)
GET /api/v1/generation/provider/capabilities Get provider capabilities

Authentication

Method Endpoint Description
POST /api/v1/auth/register Email registration
POST /api/v1/auth/login Email login
POST /api/v1/auth/logout Logout
POST /api/v1/auth/refresh Refresh token
GET /api/v1/auth/me Get current user
PUT /api/v1/auth/profile Update user profile
POST /api/v1/auth/verify-email Verify email
POST /api/v1/auth/forgot-password Request password reset
POST /api/v1/auth/reset-password Reset password
GET /api/v1/auth/oauth/{provider}/authorize OAuth authorization URL
POST /api/v1/auth/oauth/{provider}/callback OAuth callback

Template Management

Method Endpoint Description
POST /api/v1/templates/ Create template
GET /api/v1/templates/ List templates (search, category, paginated)
GET /api/v1/templates/{template_id} Get template
PUT /api/v1/templates/{template_id} Update template
DELETE /api/v1/templates/{template_id} Delete template
POST /api/v1/templates/{template_id}/generate Generate from template

WebSocket

Endpoint Description
ws://localhost:8019/ws/progress/{task_id} Real-time generation progress

Full API documentation: http://localhost:8019/docs

Image Generation Providers

Provider Modes Description
Gemini Text-to-image + Image editing Google Gemini official SDK
Jiekouai Text-to-image + Image editing Gemini API reverse proxy
Antigravity Text-to-image only OpenAI-compatible reverse proxy

Switch via DEFAULT_PROVIDER environment variable, or specify provider per request.

When using a text-to-image-only provider, the frontend automatically hides the image editing mode.

Environment Variables

Variable Required Description Default
DEFAULT_PROVIDER Default image generation provider gemini
GEMINI_API_KEY * Gemini API key -
GEMINI_MODEL Gemini image generation model gemini-3-pro-image-preview
GEMINI_LLM_MODEL Gemini evaluation LLM model gemini-3-flash-preview
JIEKOUAI_API_KEY * Jiekouai API key -
JIEKOUAI_API_BASE_URL Jiekouai API base URL https://api.jiekou.ai/v3
JIEKOUAI_LLM_BASE_URL Jiekouai LLM base URL https://api.jiekou.ai/openai
JIEKOUAI_LLM_MODEL Jiekouai LLM model gpt-5-mini
ANTIGRAVITY_API_KEY * Antigravity API key -
ANTIGRAVITY_API_BASE_URL Antigravity API base URL http://127.0.0.1:8045/v1
ANTIGRAVITY_MODEL Antigravity model gemini-3-pro-image
DEFAULT_ASPECT_RATIO Default aspect ratio 16:9
DEFAULT_IMAGE_SIZE Default image size 2K
DEFAULT_IMAGES_PER_ROUND Images per round 5
DEFAULT_MAX_ROUNDS Max rounds 3
DEFAULT_QUALITY_THRESHOLD Quality threshold 0.7
STORAGE_TYPE Storage type local
STORAGE_PATH Local storage path ./storage/images
FRONTEND_URL Frontend URL (CORS) http://localhost:3019
DB_PATH SQLite database path ./data/pictactic.db
VISION_LLM_MODEL Vision model for evaluation gpt-4o
JWT_SECRET_KEY JWT signing key (change in production) change-this-...
JWT_ACCESS_TOKEN_EXPIRE_MINUTES Access token expiry (minutes) 30
JWT_REFRESH_TOKEN_EXPIRE_DAYS Refresh token expiry (days) 30
GITHUB_CLIENT_ID GitHub OAuth Client ID -
GITHUB_CLIENT_SECRET GitHub OAuth Client Secret -
GOOGLE_CLIENT_ID Google OAuth Client ID -
GOOGLE_CLIENT_SECRET Google OAuth Client Secret -
SMTP_HOST SMTP mail host -
SMTP_PORT SMTP port 587
SMTP_USER SMTP username -
SMTP_PASSWORD SMTP password -

* Required when using the corresponding provider

Development

Testing (346+ tests)

uv run pytest                    # Run all tests
uv run pytest tests/unit/        # Unit tests
uv run pytest tests/integration/ # Integration tests
uv run pytest tests/test_cli/    # CLI tests

Code Quality

uv run ruff check .              # Lint
uv run ruff format .             # Format
uv run mypy backend/             # Type check
cd frontend && npm run lint      # Frontend ESLint

FAQ

Q: Image quality is inconsistent?

Adjust the configuration:

  • Increase max_rounds for more iterations
  • Increase images_per_round for more candidates per round
  • Raise quality_threshold for stricter selection

Q: How to use my own API proxy?

Edit the API base URLs in .env:

GEMINI_API_BASE_URL=https://your-proxy.com/v3
OPENAI_API_BASE_URL=https://your-proxy.com/openai

Q: Why does the CLI prompt argument throw an error?

Typer limitation: the prompt argument must come after all options:

# Correct
uv run pictactic generate --format json "your prompt"

# Wrong (throws "Missing argument 'PROMPT'")
uv run pictactic generate "your prompt" --format json

License

MIT License

Contributing

Issues and Pull Requests are welcome!

About

AI-powered iterative image generation agent with auto-evaluation. LangGraph multi-step workflow + FastAPI backend + React UI + CLI. Supports Gemini & custom providers, prompt enhancement, AI scoring, and image editing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors