Research Notes Interpretability Alignment

Ryan Mulligan

Research and systems notes on interpretability, alignment, and agent infrastructure.

Ryan Mulligan

I build research‑grade systems that make model behavior inspectable and testable. My current focus is the introspection gap, when a model acts on a state it won’t admit.

If you work on interpretability, evaluations, or alignment, I’d love feedback or collaboration.

Research Highlights

Suppression reversal

Introspection Gap

Framing flips self‑report while behavior stays stable.

Read the post

Scaling comparison

Scale Effects

Suppression weakens at 14B compared to 8B baselines.

Read the post

Probe comparison

Probe Sensitivity

Probe choice changes the measured signal.

Read the post

New here?

Start with the introspection gap explainer, then the scale effects and probe sensitivity posts.

Support and Hiring

I am open to research sponsorships, collaborations, and hiring conversations. If you want to support this work, GitHub Sponsors is the easiest option.

If you want to explore a role, the best way to reach me is email.

Contact

Subscribe

Recent Posts