PicTacticAgent

English | 中文

AI-powered cover image generation tool that solves the randomness problem of text-to-image models through a LangGraph StateGraph workflow.


Login — Email / GitHub / Google	Workspace — History + Config Panel

Generation Result — AI Scoring + Ranking	Template Library — Filter + One-click Generate

One-click Install for Coding Agents

If you're using Claude Code or similar Coding Agents, let the AI read the installation guide and it will handle the setup automatically. Once installed, it becomes an AI image generation Skill — any Coding Agent can generate and edit images directly via CLI:

# Let AI read the install doc and complete all configuration
@docs/skill-installation.md Follow the guide to install PicTacticAgent Skill

After installation, simply tell the AI "generate a cyberpunk-style cover image" and you're good to go.

Overview

PicTacticAgent uses a LangGraph StateGraph workflow architecture to implement a closed-loop "Generate → Evaluate → Iterate" pipeline that automatically selects the best images, dramatically improving cover image creation efficiency.

Key Features

StateGraph Workflow — Deterministic LangGraph StateGraph workflow with modular node-based design
Iterative Generation — Configurable 1-10 rounds, 1-10 images per round, auto-stop when quality threshold is met
Auto Evaluation — GPT-4o Vision-based 6-dimension scoring (prompt match, visual appeal, layout, detail quality, technical completeness, portrait consistency)
Image Editing — AI-powered editing of existing images with text instructions
Template Style Analysis — Upload reference images to auto-extract style features and enhance prompts
Prompt Template Library — Save, manage, and reuse prompt templates with category filtering
Real-time Progress — WebSocket push for generation progress, cancel anytime
User Authentication — Email registration/login, JWT access & refresh tokens
OAuth Login — GitHub / Google third-party login support
Email Verification & Password Reset — Full verification and recovery flow
i18n Support — Chinese/English bilingual interface
Dark Theme — Default dark UI with theme toggle
CLI Tool — Generate/edit images from the command line, no Web server needed

Tech Stack

Layer	Technology
Agent Framework	LangChain + LangGraph (StateGraph)
Backend	FastAPI + Uvicorn
Database	SQLite + SQLAlchemy (async) + aiosqlite
Frontend	Vite 7 + React 19 + TailwindCSS 4
Image Generation	Gemini API / Jiekouai / Antigravity (OpenAI-compatible)
Image Evaluation	GPT-4o Vision
CLI	Typer + Rich
Package Management	uv (Python) + npm (Frontend)

Requirements

Python 3.11+
uv (recommended) or pip
Node.js 18+

Quick Start

1. Clone the Repository

git clone https://github.com/NanmiCoder/PicTacticAgent.git
cd PicTacticAgent

2. Configure Environment Variables

cp .env.example .env

Edit the .env file and configure API keys for your chosen provider:

We recommend Jiekouai, a proxy service where one API key covers both image generation and LLM evaluation. Sign up and link your GitHub account for $3 free trial credits.

# Option 1: Jiekouai (Recommended)
DEFAULT_PROVIDER=jiekouai
JIEKOUAI_API_KEY=your-jiekouai-api-key
JIEKOUAI_API_BASE_URL=https://api.jiekou.ai/v3
JIEKOUAI_LLM_BASE_URL=https://api.jiekou.ai/openai
JIEKOUAI_LLM_MODEL=gpt-5-mini

# Option 2: Official Gemini
DEFAULT_PROVIDER=gemini
GEMINI_API_KEY=your-google-gemini-api-key
GEMINI_MODEL=gemini-3-pro-image-preview
GEMINI_LLM_MODEL=gemini-3-flash-preview

# Option 3: Antigravity (Text-to-image only)
DEFAULT_PROVIDER=antigravity
ANTIGRAVITY_API_KEY=your-antigravity-api-key
ANTIGRAVITY_API_BASE_URL=http://127.0.0.1:8045/v1

3. Start the Services

Option A: One-click Start (Recommended)

./start.sh

Automatically installs dependencies and starts both frontend and backend.

Option B: Manual Start

Install backend dependencies and start:

uv sync
uv run uvicorn backend.src.pictactic.api.app:app --reload --port 8019

Install frontend dependencies and start (new terminal):

cd frontend
npm install
npm run dev

Option C: Docker Deployment

cd docker
docker-compose up -d

Access URLs:

Usage

Web Interface

Register an account or log in with OAuth
Describe the cover image you want in the input box
(Optional) Upload 1-5 reference images as style templates
(Optional) Adjust aspect ratio, image size, rounds, images per round, quality threshold, etc.
Click "Generate" to start, watch progress in real-time
Select the best images from results, download or save as template

CLI

Generate images directly from the terminal without starting the Web server:

# Generate images (prompt must come AFTER all options)
uv run pictactic generate "a tech-inspired cover image"

# Generate with parameters
uv run pictactic generate \
  --rounds 3 --images 5 --top-k 3 --threshold 0.7 \
  --aspect-ratio 16:9 --size 2K --provider gemini \
  "a tech-inspired cover image"

# Edit an existing image
uv run pictactic edit --source ./image.png "change the background to dark blue"

# JSON output (for scripting)
uv run pictactic generate --format json "your prompt"

# List available providers
uv run pictactic providers

# Check task status (requires backend)
uv run pictactic status <task_id>

Configuration

Parameter	Description	Default	Range
Max Rounds	Maximum generation iterations	3	1-10
Images per Round	Candidate images per round	5	1-10
Quality Threshold	Score threshold to stop iterating	0.7	0.5-1.0
Top-K	Number of final output images	3	1-10
Aspect Ratio	Image aspect ratio	16:9	1:1, 16:9, 9:16, 4:3, 3:4
Image Size	Output image size	2K	1K, 2K, 4K

Project Structure

PicTacticAgent/
├── backend/src/pictactic/     # Python backend
│   ├── agents/                # LangGraph StateGraph workflow
│   │   ├── workflow.py        # Workflow definition & orchestration
│   │   ├── conditions.py      # Conditional edge functions
│   │   ├── state.py           # GenerationState shared state
│   │   ├── prompts.py         # LLM prompt templates
│   │   ├── nodes/             # Workflow nodes
│   │   │   ├── analyze_node.py    # Template analysis node
│   │   │   ├── enhance_node.py    # Prompt enhancement node
│   │   │   ├── generate_node.py   # Image generation node
│   │   │   ├── evaluate_node.py   # Image evaluation node
│   │   │   ├── prepare_next_node.py # Next round preparation
│   │   │   └── finalize_node.py   # Final output node (Top-K)
│   │   └── tools/             # Node utility functions
│   ├── api/                   # FastAPI application
│   │   ├── app.py             # Main app (CORS, middleware)
│   │   ├── routes/            # API routes
│   │   │   ├── auth.py        # Authentication routes
│   │   │   ├── generation.py  # Generation routes
│   │   │   ├── templates.py   # Template management routes
│   │   │   └── health.py      # Health check
│   │   ├── dependencies.py    # Auth dependency injection
│   │   └── websocket.py       # WebSocket real-time progress
│   ├── cli/                   # CLI tool
│   │   ├── main.py            # Typer entry point
│   │   ├── generate.py        # generate command
│   │   ├── edit.py            # edit command
│   │   ├── providers.py       # providers command
│   │   ├── status.py          # status command
│   │   └── output.py          # Output formatting (text/json/quiet)
│   ├── providers/             # Image generation providers
│   │   ├── base.py            # ImageProvider ABC + data models
│   │   ├── gemini.py          # Gemini official SDK
│   │   ├── jiekouai.py        # Jiekouai reverse proxy
│   │   └── antigravity.py     # Antigravity (OpenAI-compatible)
│   ├── core/                  # Core configuration
│   │   └── config.py          # pydantic-settings config
│   ├── db/                    # Database layer
│   │   ├── engine.py          # SQLAlchemy async engine
│   │   ├── models.py          # Data models
│   │   ├── repository.py      # Task repository
│   │   └── template_repository.py # Template repository
│   ├── services/              # Business logic services
│   │   ├── auth_service.py    # Auth logic (JWT + bcrypt)
│   │   ├── generation_service.py  # Generation task management
│   │   ├── template_service.py    # Template management
│   │   ├── email_service.py   # Email sending
│   │   └── oauth_client.py    # OAuth client
│   ├── i18n/                  # Internationalization
│   └── models/                # Pydantic data models
│
├── frontend/src/              # React 19 SPA
│   ├── components/            # React components
│   │   ├── auth/              # Auth components (AuthShell, OAuthButtons)
│   │   ├── gallery/           # Image gallery (ImageCard, ImageModal, MasonryGrid, ImageEditDialog)
│   │   ├── generation/        # Generation panel (PromptInput, ConfigPanel, FloatingControlPanel, ProgressDisplay)
│   │   ├── templates/         # Template components (TemplateCard, TemplateList, TemplateDialog)
│   │   ├── layout/            # Layout (Header, TaskSidebar, TaskHeader)
│   │   ├── dialogs/           # Dialogs (SettingsDialog)
│   │   └── ui/                # Shared UI (Select, Toaster)
│   ├── hooks/                 # Custom hooks
│   │   ├── useGeneration.js   # Generation task state
│   │   ├── useAuth.jsx        # Auth state
│   │   ├── useLocale.jsx      # i18n
│   │   ├── useTheme.js        # Theme toggle
│   │   ├── useTemplates.js    # Template management
│   │   ├── useTaskHistory.js  # Task history
│   │   └── useImageEdit.js    # Image editing
│   ├── lib/                   # Utilities & API client
│   ├── pages/                 # Pages
│   │   ├── LandingPage.jsx    # Landing page
│   │   ├── LoginPage.jsx      # Login
│   │   ├── RegisterPage.jsx   # Register
│   │   ├── OAuthCallback.jsx  # OAuth callback
│   │   ├── VerifyEmailPage.jsx # Email verification
│   │   ├── ForgotPasswordPage.jsx # Forgot password
│   │   ├── ResetPasswordPage.jsx  # Reset password
│   │   └── TemplatesPage.jsx  # Template library page
│   └── locales/               # i18n language packs (zh/en)
│
├── docker/                    # Docker configuration
│   ├── docker-compose.yml
│   ├── Dockerfile
│   └── Dockerfile.frontend
│
├── docs/                      # Documentation
│   ├── PRD.md                 # Product requirements
│   ├── TECHNICAL_DESIGN.md    # Technical design
│   └── screenshots/           # Screenshots
│
├── tests/                     # Tests
│   ├── unit/                  # Unit tests
│   ├── integration/           # Integration tests
│   └── e2e/                   # End-to-end tests
│
├── .env.example               # Environment variable template
├── start.sh                   # One-click start script
├── pyproject.toml             # Python project config
└── README.md                  # This file

Workflow

User Input
    │
    ▼
[check_template] ─── Has reference ──→ [analyze_node]
    │                                        │
    │ No reference                           │
    ▼                                        ▼
[enhance_node] ◄────────────────────────────┘
    │
    ▼
[generate_node] ←──────────────────────┐
    │                                    │
    ▼                                    │
[evaluate_node]                          │
    │                                    │
    ▼                                    │
[should_continue]                        │
    │         │                          │
    │ Continue│ Done                     │
    ▼         ▼                          │
[prepare_next] ──────────────────────────┘
    │
    ▼
[finalize_node]
    │
    ▼
Output Top-K Best Images

Node Description

Node	Function
`check_template`	Check for reference images, decide if analysis is needed
`analyze_node`	Analyze reference image style, layout, color features
`enhance_node`	Enhance prompt based on analysis and evaluation feedback
`generate_node`	Concurrent image generation API calls with streaming progress
`evaluate_node`	6-dimension image quality evaluation and ranking
`should_continue`	Check if quality threshold or round limit is reached
`prepare_next`	Prepare next iteration (extract feedback, increment round)
`finalize_node`	Output Top-K final results

API Endpoints

Generation

Method	Endpoint	Description
POST	`/api/v1/generation/`	Create generation task
GET	`/api/v1/generation/{task_id}`	Get task status and progress
GET	`/api/v1/generation/{task_id}/result`	Get full generation result
POST	`/api/v1/generation/{task_id}/images/{image_id}/edit`	Edit a generated image
POST	`/api/v1/generation/{task_id}/cancel`	Cancel task
DELETE	`/api/v1/generation/{task_id}`	Delete task
GET	`/api/v1/generation/`	List tasks (paginated)
GET	`/api/v1/generation/history/list`	Task history (paginated)
GET	`/api/v1/generation/provider/capabilities`	Get provider capabilities

Authentication

Method	Endpoint	Description
POST	`/api/v1/auth/register`	Email registration
POST	`/api/v1/auth/login`	Email login
POST	`/api/v1/auth/logout`	Logout
POST	`/api/v1/auth/refresh`	Refresh token
GET	`/api/v1/auth/me`	Get current user
PUT	`/api/v1/auth/profile`	Update user profile
POST	`/api/v1/auth/verify-email`	Verify email
POST	`/api/v1/auth/forgot-password`	Request password reset
POST	`/api/v1/auth/reset-password`	Reset password
GET	`/api/v1/auth/oauth/{provider}/authorize`	OAuth authorization URL
POST	`/api/v1/auth/oauth/{provider}/callback`	OAuth callback

Template Management

Method	Endpoint	Description
POST	`/api/v1/templates/`	Create template
GET	`/api/v1/templates/`	List templates (search, category, paginated)
GET	`/api/v1/templates/{template_id}`	Get template
PUT	`/api/v1/templates/{template_id}`	Update template
DELETE	`/api/v1/templates/{template_id}`	Delete template
POST	`/api/v1/templates/{template_id}/generate`	Generate from template

WebSocket

Endpoint	Description
`ws://localhost:8019/ws/progress/{task_id}`	Real-time generation progress

Full API documentation: http://localhost:8019/docs

Image Generation Providers

Provider	Modes	Description
Gemini	Text-to-image + Image editing	Google Gemini official SDK
Jiekouai	Text-to-image + Image editing	Gemini API reverse proxy
Antigravity	Text-to-image only	OpenAI-compatible reverse proxy

Switch via DEFAULT_PROVIDER environment variable, or specify provider per request.

When using a text-to-image-only provider, the frontend automatically hides the image editing mode.

Environment Variables

Variable	Required	Description	Default
`DEFAULT_PROVIDER`		Default image generation provider	`gemini`
`GEMINI_API_KEY`	*	Gemini API key	-
`GEMINI_MODEL`		Gemini image generation model	`gemini-3-pro-image-preview`
`GEMINI_LLM_MODEL`		Gemini evaluation LLM model	`gemini-3-flash-preview`
`JIEKOUAI_API_KEY`	*	Jiekouai API key	-
`JIEKOUAI_API_BASE_URL`		Jiekouai API base URL	`https://api.jiekou.ai/v3`
`JIEKOUAI_LLM_BASE_URL`		Jiekouai LLM base URL	`https://api.jiekou.ai/openai`
`JIEKOUAI_LLM_MODEL`		Jiekouai LLM model	`gpt-5-mini`
`ANTIGRAVITY_API_KEY`	*	Antigravity API key	-
`ANTIGRAVITY_API_BASE_URL`		Antigravity API base URL	`http://127.0.0.1:8045/v1`
`ANTIGRAVITY_MODEL`		Antigravity model	`gemini-3-pro-image`
`DEFAULT_ASPECT_RATIO`		Default aspect ratio	`16:9`
`DEFAULT_IMAGE_SIZE`		Default image size	`2K`
`DEFAULT_IMAGES_PER_ROUND`		Images per round	`5`
`DEFAULT_MAX_ROUNDS`		Max rounds	`3`
`DEFAULT_QUALITY_THRESHOLD`		Quality threshold	`0.7`
`STORAGE_TYPE`		Storage type	`local`
`STORAGE_PATH`		Local storage path	`./storage/images`
`FRONTEND_URL`		Frontend URL (CORS)	`http://localhost:3019`
`DB_PATH`		SQLite database path	`./data/pictactic.db`
`VISION_LLM_MODEL`		Vision model for evaluation	`gpt-4o`
`JWT_SECRET_KEY`		JWT signing key (change in production)	`change-this-...`
`JWT_ACCESS_TOKEN_EXPIRE_MINUTES`		Access token expiry (minutes)	`30`
`JWT_REFRESH_TOKEN_EXPIRE_DAYS`		Refresh token expiry (days)	`30`
`GITHUB_CLIENT_ID`		GitHub OAuth Client ID	-
`GITHUB_CLIENT_SECRET`		GitHub OAuth Client Secret	-
`GOOGLE_CLIENT_ID`		Google OAuth Client ID	-
`GOOGLE_CLIENT_SECRET`		Google OAuth Client Secret	-
`SMTP_HOST`		SMTP mail host	-
`SMTP_PORT`		SMTP port	`587`
`SMTP_USER`		SMTP username	-
`SMTP_PASSWORD`		SMTP password	-

* Required when using the corresponding provider

Development

Testing (346+ tests)

uv run pytest                    # Run all tests
uv run pytest tests/unit/        # Unit tests
uv run pytest tests/integration/ # Integration tests
uv run pytest tests/test_cli/    # CLI tests

Code Quality

uv run ruff check .              # Lint
uv run ruff format .             # Format
uv run mypy backend/             # Type check
cd frontend && npm run lint      # Frontend ESLint

FAQ

Q: Image quality is inconsistent?

Adjust the configuration:

Increase max_rounds for more iterations
Increase images_per_round for more candidates per round
Raise quality_threshold for stricter selection

Q: How to use my own API proxy?

Edit the API base URLs in .env:

GEMINI_API_BASE_URL=https://your-proxy.com/v3
OPENAI_API_BASE_URL=https://your-proxy.com/openai

Q: Why does the CLI prompt argument throw an error?

Typer limitation: the prompt argument must come after all options:

# Correct
uv run pictactic generate --format json "your prompt"

# Wrong (throws "Missing argument 'PROMPT'")
uv run pictactic generate "your prompt" --format json

License

MIT License

Contributing

Issues and Pull Requests are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.claude/skills		.claude/skills
backend/src/pictactic		backend/src/pictactic
docker		docker
docs		docs
frontend		frontend
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
pyproject.toml		pyproject.toml
start.sh		start.sh
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

PicTacticAgent

One-click Install for Coding Agents

Overview

Key Features

Tech Stack

Requirements

Quick Start

1. Clone the Repository

2. Configure Environment Variables

3. Start the Services

Option A: One-click Start (Recommended)

Option B: Manual Start

Option C: Docker Deployment

Usage

Web Interface

CLI

Configuration

Project Structure

Workflow

Node Description

API Endpoints

Generation

Authentication

Template Management

WebSocket

Image Generation Providers

Environment Variables

Development

Testing (346+ tests)

Code Quality

FAQ

Q: Image quality is inconsistent?

Q: How to use my own API proxy?

Q: Why does the CLI prompt argument throw an error?

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages