Skip to content

Sharif-SLPL/CueWords

Repository files navigation

CueWords: Gender Agreement Analysis in Language Models

A research project that analyzes gender agreement patterns in fine-tuned language models using context mixing and value patching interpretability techniques.

Overview

This project investigates how language models learn and maintain gender agreement patterns through:

  • Data Creation: Generating gender agreement datasets from Wikipedia biographies
  • Fine-tuning: Training various transformer models (BERT, RoBERTa, GPT-2) on gender agreement tasks
  • Context Mixing Analysis: Measuring how well models maintain gender consistency across different contexts
  • Value Patching: Analyzing and modifying model representations to understand gender encoding

Project Structure

CueWords/
├── data_creation/          # Dataset generation scripts
├── fine_tuning/            # Model training and fine-tuning
├── context_mixing/         # Context mixing analysis tools
├── value_patching/         # Value patching implementation
├── data/                   # Generated datasets

Key Features

  • Multi-Model Support: Works with BERT, and GPT-2 architectures
  • Gender Agreement Dataset: Automatically generated from WikiBio dataset
  • Context Mixing Toolkit: Integrated analysis tools for measuring model consistency
  • Value Patching: Advanced techniques for analyzing and modifying model representations
  • Comprehensive Evaluation: Multiple metrics and analysis approaches

Installation

  1. Clone the repository:
git clone https://github.com/hamid-amir/CueWords
cd CueWords
  1. Install dependencies:
pip install -r requirements.txt
  1. Download SpaCy model:
python -m spacy download en_core_web_sm

Quick Start

Run the complete pipeline:

./runAll.sh

Or run individual components:

  1. Generate dataset:
python3 data_creation/gender_agreement.py
  1. Fine-tune models:
python3 fine_tuning/train.py
  1. Calculate context mixing scores:
python3 context_mixing/main.py

Research Applications

This project is designed for researchers studying:

  • Gender bias in language models
  • Model interpretability and analysis
  • Context mixing and consistency in NLP
  • Value patching and representation analysis
  • Fine-tuning effects on model behavior

If you use this code in your research, please cite the following paper and acknowledge this repository:

@inproceedings{amirzadeh-etal-2024-language,
  title     = "How Language Models Prioritize Contextual Grammatical Cues?",
  author    = "Amirzadeh, Hamidreza and
               Alishahi, Afra and
               Mohebbi, Hosein",
  booktitle = "Proceedings of the 7th BlackboxNLP Workshop at EMNLP 2024: Analyzing and Interpreting Neural Networks for NLP",
  month     = nov,
  year      = "2024",
  address   = "Miami, Florida, US",
  publisher = "Association for Computational Linguistics",
  url       = "https://aclanthology.org/2024.blackboxnlp-1.21/",
  doi       = "10.18653/v1/2024.blackboxnlp-1.21",
  pages     = "315--336",
}

License

MIT

Contact

For questions or issues, please open an issue on the repository.

About

Analyzing gender agreement with multiple cue words in language models using interpretability tools.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors