CueWords: Gender Agreement Analysis in Language Models

A research project that analyzes gender agreement patterns in fine-tuned language models using context mixing and value patching interpretability techniques.

Overview

This project investigates how language models learn and maintain gender agreement patterns through:

Data Creation: Generating gender agreement datasets from Wikipedia biographies
Fine-tuning: Training various transformer models (BERT, RoBERTa, GPT-2) on gender agreement tasks
Context Mixing Analysis: Measuring how well models maintain gender consistency across different contexts
Value Patching: Analyzing and modifying model representations to understand gender encoding

Project Structure

CueWords/
├── data_creation/          # Dataset generation scripts
├── fine_tuning/            # Model training and fine-tuning
├── context_mixing/         # Context mixing analysis tools
├── value_patching/         # Value patching implementation
├── data/                   # Generated datasets

Key Features

Multi-Model Support: Works with BERT, and GPT-2 architectures
Gender Agreement Dataset: Automatically generated from WikiBio dataset
Context Mixing Toolkit: Integrated analysis tools for measuring model consistency
Value Patching: Advanced techniques for analyzing and modifying model representations
Comprehensive Evaluation: Multiple metrics and analysis approaches

Installation

Clone the repository:

git clone https://github.com/hamid-amir/CueWords
cd CueWords

Install dependencies:

pip install -r requirements.txt

Download SpaCy model:

python -m spacy download en_core_web_sm

Quick Start

Run the complete pipeline:

./runAll.sh

Or run individual components:

Generate dataset:

python3 data_creation/gender_agreement.py

Fine-tune models:

python3 fine_tuning/train.py

Calculate context mixing scores:

python3 context_mixing/main.py

Research Applications

This project is designed for researchers studying:

Gender bias in language models
Model interpretability and analysis
Context mixing and consistency in NLP
Value patching and representation analysis
Fine-tuning effects on model behavior

If you use this code in your research, please cite the following paper and acknowledge this repository:

@inproceedings{amirzadeh-etal-2024-language,
  title     = "How Language Models Prioritize Contextual Grammatical Cues?",
  author    = "Amirzadeh, Hamidreza and
               Alishahi, Afra and
               Mohebbi, Hosein",
  booktitle = "Proceedings of the 7th BlackboxNLP Workshop at EMNLP 2024: Analyzing and Interpreting Neural Networks for NLP",
  month     = nov,
  year      = "2024",
  address   = "Miami, Florida, US",
  publisher = "Association for Computational Linguistics",
  url       = "https://aclanthology.org/2024.blackboxnlp-1.21/",
  doi       = "10.18653/v1/2024.blackboxnlp-1.21",
  pages     = "315--336",
}

License

MIT

Contact

For questions or issues, please open an issue on the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
context_mixing		context_mixing
data_creation		data_creation
fine_tuning		fine_tuning
notebooks		notebooks
value_patching		value_patching
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
runAll.sh		runAll.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CueWords: Gender Agreement Analysis in Language Models

Overview

Project Structure

Key Features

Installation

Quick Start

Research Applications

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CueWords: Gender Agreement Analysis in Language Models

Overview

Project Structure

Key Features

Installation

Quick Start

Research Applications

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages