activation-engineering

Here are 5 public repositories matching this topic...

ZFancy / awesome-activation-engineering

A curated list of resources for activation engineering

control concept transparent ai-safety interpretability large-language-models llm llm-aligment activation-engineering concept-rep concept-activation-vector

Updated Oct 2, 2025

bassrehab / steering-vectors-agents

Star

Runtime control of LLM agent behaviors through activation steering vectors. More calibrated than prompting.

machine-learning transformers pytorch steering-behaviors ai-safety interpretability langchain llm-agents activation-engineering steering-vectors contrastive-activation-addition

Updated Dec 19, 2025
Python

G-Art / matrix_steering_vector_research

Star

Iterative Sparse Matrix Steering: Closed-Form Subspace Alignment for Multi-Layer LLM Control (No SGD required).

pytorch alignment interpretability llm activation-engineering steering-vectors

Updated Jan 5, 2026
Jupyter Notebook

Jason-Wang313 / RISER

Star

A closed-loop control system for Large Language Models that steers internal activation states in real-time to prevent mode collapse and toxicity

reinforcement-learning pytorch control-theory ai-safety riser mechanistic-interpretability llm-steering activation-engineering

Updated Feb 1, 2026
Python

SolomonB14D3 / knowledge-fidelity

Star

Behavioral auditing toolkit for LLMs: rho-audit measures factual accuracy, bias, sycophancy, toxicity, and reasoning via teacher-forced confidence probes. SVD compression with knowledge preservation. Steering vectors for runtime behavioral control. 12-model merge audit across SLERP/TIES/DARE-TIES/Linear.

transformers pytorch svd interpretability confidence bias-detection truthfulness model-merging sycophancy llm-compression mergekit activation-engineering model-auditing steering-vectors rho-audit behavioral-evaluation

Updated Feb 25, 2026
Python

Improve this page

Add a description, image, and links to the activation-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the activation-engineering topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

activation-engineering

Here are 5 public repositories matching this topic...

ZFancy / awesome-activation-engineering

bassrehab / steering-vectors-agents

G-Art / matrix_steering_vector_research

Jason-Wang313 / RISER

SolomonB14D3 / knowledge-fidelity

Improve this page

Add this topic to your repo