FactualBench is a large-scale Chinese factual QA dataset introduced in EMNLP2025 Findings paper Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization. The dataset contains
Compared to earlier versions, we further refined the test split by:
- Deduplicating questions that overlap with the training set
- Removing low-quality or ambiguous samples
- Applying time constraints to time-sensitive questions.
.
├── FactualBench_train_v2.jsonl # Training split
├── FactualBench_test_v2.jsonl # Test split
├── evaluation_prompt.txt # Prompt for model-based evaluation
└── evaluate.py # Script to compute accuracy
We adopt a model-based judgment strategy for evaluation, and gpt-4-0125 is used as the automatic evaluator in our paper.
| Domain | 中文名 | Test | Training | Total |
|---|---|---|---|---|
| film & entertainment | 影视娱乐 | 191 | 54,433 | 54,624 |
| education & training | 教育培训 | 147 | 3,702 | 3,849 |
| physics, chemistry & mathematics & biology | 数理化生 | 178 | 9,171 | 9,349 |
| history & traditional culture | 历史国学 | 186 | 18,086 | 18,272 |
| biography | 人物百科 | 190 | 11,829 | 12,019 |
| politics & law | 政治法律 | 155 | 6,354 | 6,509 |
| economics & management | 经济管理 | 141 | 4,537 | 4,678 |
| computer science | 计算机科学 | 146 | 6,247 | 6,393 |
| medical | 医学 | 128 | 7,057 | 7,185 |
| sociology & humanity | 社会人文 | 187 | 8,494 | 8,681 |
| agriculture, forestry & fisheries & allied industries | 农林牧渔 | 138 | 3,725 | 3,863 |
| astronomy & geography | 天文地理 | 151 | 3,887 | 4,038 |
| sports & tourism | 运动旅游 | 143 | 4,867 | 5,010 |
| digital & automotive | 数码汽车 | 159 | 3,881 | 4,040 |
| industrial engineering | 工业工程 | 149 | 3,279 | 3,428 |
| military & war | 军武战争 | 142 | 2,568 | 2,710 |
| slang & memes | 网词网梗 | 104 | 529 | 633 |
| work & life | 工作生活 | 131 | 5,849 | 5,980 |
| high technology | 高新科技 | 112 | 310 | 422 |
| religion & culture | 信仰文化 | 122 | 508 | 630 |
| others | 其他 | - | 18,191 | 18,191 |
| Total | - | 3,000 | 177,504 | 180,504 |
Each sample in FactualBench consists of a question
a standard answer
3 wrong answers
and a domain
| Field | Content |
|---|---|
| Question |
第一台微波量子放大器是在哪一年制成的? In which year was the first microwave quantum amplifier made? |
| Standard Answer |
第一台微波量子放大器是在1954年制成的。 The first microwave quantum amplifier was made in 1954. |
| Wrong Answer |
第一台微波量子放大器是在1958年制成的。 The first microwave quantum amplifier was made in 1958. |
| Wrong Answer |
第一台微波量子放大器是在1960年制成的。 The first microwave quantum amplifier was made in 1960. |
| Wrong Answer |
第一台微波量子放大器是在1962年制成的。 The first microwave quantum amplifier was made in 1962. |
| Domain |
高新科技 high technology |
- The dataset is constructed from a publicly available Internet encyclopedia (Baidu Baike).
- It may contain references to individuals, locations, or medical and physiological concepts that are publicly known.
- The data is collected strictly for research purposes and without any intent to violate privacy or safety policies.
⚠️ Despite quality control efforts, the dataset may still contain inaccuracies or outdated facts (knowledge cutoff: 2025). FactualBench should not be treated as an authoritative knowledge base!!!
If you find this dataset useful, please cite:
@inproceedings{zhang-etal-2025-exploring-generalizability,
title = "Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization",
author = "Zhang, Siyuan and
Zhang, Yichi and
Dong, Yinpeng and
Su, Hang",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.211/",
doi = "10.18653/v1/2025.findings-emnlp.211",
pages = "3936--3968",
ISBN = "979-8-89176-335-7"
}