SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

SwarmAgentic delivers the first fully automated pipeline for agentic system generation, jointly optimizing from-scratch agent creation, role specialization, and collaboration through language-driven particle swarm optimization (PSO).

Figure 1. Overview of the SwarmAgentic pipeline. Starting from only a task description and objective, the system autonomously generates and optimizes multi-agent architectures through language-driven particle swarm evolution. Each iteration involves execution, LLM-based failure analysis, and targeted refinement, yielding a fully functional and collaboratively optimized system built from scratch.

Overview

Fully automated multi-agent system discovery with language-guided PSO updates.
Self-optimizing agent behaviors and collaboration without predefined templates.
Supports diverse domains, including creative writing, travel planning, scheduling, and reasoning.
Modular design enables rapid evaluation and domain adaptation.

How It Works

SwarmAgentic iteratively discovers agent teams by:

Generating roles and collaboration plans directly from the task specification.
Executing each candidate team on sampled tasks while logging role interactions and scores.
Diagnosing failures with LLM-guided analysis (for example failure_identify.py) to surface missing skills or coordination gaps.
Updating the team via particle swarm optimization, adjusting prompts, and topology.
Selecting the best-performing particle and exporting artifacts (save.jsonl) for reuse or evaluation.

Key modules

Module	Location	Purpose
PSO driver	`*/pso.py`	Runs the optimization loop, logging, and checkpointing.
Failure analysis	`{domain}/prompt/failure_identify.py`	Generates targeted repair plans from execution traces.
Domain runtime	`{domain}/func.py`	Implements task-specific forward execution and tool usage.
Prompts	`{domain}/prompt/`	Stores task instructions, role schemas, and guardrails.
Evaluation	`{domain}/test.py`	Scores generations, writes metrics, and surfaces issues.
data	`{domain}/data/`	Houses datasets.

Setup

conda create -n swarm python=3.11
conda activate swarm
pip install -r requirements.txt

# provide your OpenAI API key
export OPENAI_API_KEY="YOUR KEY HERE"

Running SwarmAgentic

All commands assume you start inside the target domain directory.

Train + evaluate (TravelPlanner)

conda activate swarm
cd travelplanner/swarm
python pso.py --max_iteration 5 --dataset data/train_45.jsonl --ref_info data/train_ref_info.jsonl
python test.py --particle_idx -1

Evaluate an existing particle

cd travelplanner/swarm
python test.py --particle_idx -1

Train (Creative Writing)

cd creative_writing
python pso.py --max_iteration 10 --dataset data/data_100_random_text.txt

Common flags

For pso.py:

--max_iteration: maximum number of PSO iterations.
--settings: temperature settings for each particle (e.g., --settings [0.2 0.4 0.6 0.8 1.0]).
--model: model name to use (default: gpt-4o-mini).
--max_workers: maximum number of worker threads (default: None).
--save_dir: directory to save results.
For TravelPlanner only:
- --resume: restarts optimization from an existing checkpoint.
- --state_idx: chooses which saved state to resume from (-1 for the latest).

For test.py:

--particle_idx: chooses which saved particle to evaluate (-1 grabs the latest entry in save.jsonl).
--model: model name to use for execution (default: gpt-4o-mini).
--eval_model: model name to use for evaluation (default: gpt-4o-mini).

Data and Evaluation

Data layout: Each domain ships with a data/ directory containing train, and test splits. File formats are typically plain text or JSONL with tab-separated fields for prompts and expected responses.
Preparing new data: Drop cleaned files into the domain data/ folder.
Evaluation artifacts: test.py writes metrics to results/<run_name>/metrics.json and stores generations for manual review. During PSO, aggregate scores and trace summaries accumulate in save.jsonl.
Interpreting scores: Creative tasks report 1-10 coherence ratings, planning tasks report constraint satisfaction rates, and reasoning tasks report exact-match accuracy.

Extending to New Domains

You can easily adapt SwarmAgentic to new domains by following these steps:

1. Define Your Task (`prompt/base.py`)

Define the high-level task description that will guide the multi-agent team generation:

TASK_MINI = '''Your domain-specific task description here...'''

FUNCTION_DESCRIPTION = """
Description of what the forward function should do for your task.
The function signature should be 'def forward(team):'.
Define what the function should return.
"""

Example: For creative writing, TASK_MINI describes the narrative coherence task with specific constraints.

2. Implement Domain-Specific Evaluation (`eval.py`)

Create an evaluate() function that scores agent outputs. You'll also need to format task instances in pso.py:

def evaluate(llm, task, response):
    """Evaluate the quality of the agent's response.
    
    Returns:
        (score, problem): score is the evaluation metric,
                         problem describes issues if score is not perfect
    """
    # Implement your domain-specific evaluation logic
    # For creative writing: coherence scoring (1-10)
    # For reasoning tasks: correctness checking (0/1)
    # For planning tasks: constraint satisfaction

The framework will automatically generate specialized multi-agent teams for your task and optimize team structure and coordination through PSO.

3. Register the Domain Pipeline (`pso.py`)

Load prompts and evaluators defined above.
Configure PSO hyperparameters (max_iters, population_size, velocity damping) to match the complexity of your task.
Ensure pso.py exports the artifacts required by test.py (save.jsonl).

4. Prepare Your Dataset

Place your training and test datasets under data/ with consistent formatting (plain text or JSONL).

5. Run Training and Evaluation

Train with PSO: python pso.py --max_iteration 10 --dataset data/your_train.jsonl
After training completes, PSO saves all particles to save.jsonl. Choose the best particle and evaluate: python test.py --particle_idx -1

Safety Considerations

Warning

The code in this repository involves executing untrusted model-generated code. We strongly advise users to be aware of this safety concern. While it is highly unlikely that model-generated code will perform overtly malicious actions in our current settings and with the models we use, such code may still act destructively due to limitations in model capability or alignment. By using this repository, you acknowledge and accept these risks.

Run experiments inside sandboxed containers or restricted user accounts to reduce blast radius.
Scope API keys with least privilege and rotate them periodically.
Inspect artifacts in results/ and logs/ before deploying agent teams in production environments.

Repository Structure

.
├── creative_writing/
│   ├── data/
│   ├── prompt/
│   └── pso.py
├── mgsm/
│   ├── data/
│   ├── prompt/
│   └── pso.py
├── natural_plan/
│   ├── _calendar/
│   ├── _meeting/
│   └── _trip/
├── travelplanner/
│   ├── agents/
│   ├── evaluation/
│   └── swarm/
│     ├── pso.py
├── misc/
│   └── main.png
├── README.md
└── requirements.txt

Add new domain folders following this template to reuse the PSO drivers and evaluation tooling.

Citing

If you find this project useful, please consider citing:

@misc{zhang2025swarmagenticfullyautomatedagentic,
  title={SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence},
  author={Yao Zhang and Chenyang Lin and Shijie Tang and Haokun Chen and Shijie Zhou and Yunpu Ma and Volker Tresp},
  year={2025},
  eprint={2506.15672},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2506.15672},
}

License

SwarmAgentic is released under the Apache 2.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

Table of Contents

Overview

How It Works

Setup

Running SwarmAgentic

Data and Evaluation

Extending to New Domains

1. Define Your Task (`prompt/base.py`)

2. Implement Domain-Specific Evaluation (`eval.py`)

3. Register the Domain Pipeline (`pso.py`)

4. Prepare Your Dataset

5. Run Training and Evaluation

Safety Considerations

Repository Structure

Citing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
creative_writing		creative_writing
mgsm		mgsm
misc		misc
natural_plan		natural_plan
travelplanner		travelplanner
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

H2nryHe/SwarmAgenticCode

Folders and files

Latest commit

History

Repository files navigation

SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

Table of Contents

Overview

How It Works

Setup

Running SwarmAgentic

Data and Evaluation

Extending to New Domains

1. Define Your Task (prompt/base.py)

2. Implement Domain-Specific Evaluation (eval.py)

3. Register the Domain Pipeline (pso.py)

4. Prepare Your Dataset

5. Run Training and Evaluation

Safety Considerations

Repository Structure

Citing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Define Your Task (`prompt/base.py`)

2. Implement Domain-Specific Evaluation (`eval.py`)

3. Register the Domain Pipeline (`pso.py`)

Packages