Skip to content

VictorMorand/llm2ner

Repository files navigation

ToMMeR – Efficient Entity Mention Detection from Large Language Models

Victor Morand1  Nadi Tomeh2Josiane Mothe3Benjamin Piwowarski1

1Sorbonne Université, CNRS, ISIR, F-75005 Paris, France 
2LIPN, Université Sorbonne Paris Nord, UMR7030 CNRS 
3IRIT, Université de Toulouse, UMR5505 CNRS, F-31400 Toulouse, France 

arXiv Repository version Try in Colab ! Repository version

ToMMeR Architecture

ToMMeR is a lightweight probing model extracting emergent mention detection capabilities from early layers representations of any LLM backbone, achieving high Zero Shot recall across a wide set of 13 NER benchmarks.

Abstract

Identifying which text spans refer to entities - mention detection - is both foundational for information extraction and a known performance bottleneck. We introduce ToMMeR, a lightweight model (<300K parameters) probing mention detection capabilities from early LLM layers. Across 13 NER benchmarks, ToMMeR achieves 93% recall zero-shot, with over 90% precision using an LLM as a judge showing that ToMMeR rarely produces spurious predictions despite high recall. Cross-model analysis reveals that diverse architectures (14M-15B parameters) converge on similar mention boundaries (DICE >75%), confirming that mention detection emerges naturally from language modeling. When extended with span classification heads, ToMMeR achieves near SOTA NER performance (80-87% F1 on standard benchmarks). Our work provides evidence that structured entity representations exist in early transformer layers and can be efficiently recovered with minimal parameters.

Installation

Using Pip

uv pip install -e git+https://github.com/VictorMorand/llm2ner.git

Local install for Dev

Using uv

We suggest using uv, a super fast package manager. The following commands will clone the repo and install it within a new ready-to-use .venv with all dependencies in a few minutes.

git clone https://github.com/VictorMorand/llm2ner.git
cd llm2ner
uv sync

Usage

Raw inference

tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L3_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)

#### Raw Inference
text = ["Large language models are awesome"]
print(f"Input text: {text[0]}")

#tokenize in shape (1, seq_len)
tokens = model.tokenizer(text, return_tensors="pt")["input_ids"].to(device)
# Output raw scores
output = tommer.forward(tokens, model) # (batch_size, seq_len, seq_len)
print(f"Raw Output shape: {output.shape}")

#use given decoding strategy to infer entities
entities = tommer.infer_entities(tokens=tokens, model=model, threshold=0.5, decoding_strategy="greedy")
str_entities = [ model.tokenizer.decode(tokens[0,b:e+1]) for b, e in entities[0]]
print(f"Predicted entities: {str_entities}")

>>> Input text: Large language models are awesome
>>> Raw Output shape: torch.Size([1, 6, 6])
>>> Predicted entities: ['Large language models']

HTML output

We also provide plotting options, outputting html for fancy notebook / web app display.

import llm2ner
from llm2ner import ToMMeR

tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L3_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)

text = "Large language models are awesome. While trained on language modeling, they exhibit emergent Zero Shot abilities that make them suitable for a wide range of tasks, including Named Entity Recognition (NER). "

#fancy interactive output
outputs = llm2ner.plotting.demo_inference( text, tommer, llm,
    decoding_strategy="threshold",  # or "greedy" for flat segmentation
    threshold=0.5, # default 50%
    show_attn=True,
)

Running experiments

Experimaestro is used to launch and monitor experiments. You can run an experiment training a ToMMeR Model on the specified Dataset with the following command:

uv run experimaestro run-experiment experiments/trainTokenMatching

Acknowledgements

We depend on several key packages:

  • experimaestro-python for experiment management.
  • transformer-lens can be used for wrapping LLMs in a generic HookedTransformer class with a unified nomencature for placing Hooks. It is build upon the hugginface transformers library.

Citation

If you find this work useful, please cite the associated paper:

@misc{morand2025tommerefficiententity,
      title={ToMMeR -- Efficient Entity Mention Detection from Large Language Models}, 
      author={Victor Morand and Nadi Tomeh and Josiane Mothe and Benjamin Piwowarski},
      year={2025},
      eprint={2510.19410},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.19410}, 
}

About

Codebase for ToMMeR – Efficient Entity Mention Detection from Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors