Research Notes Interpretability Alignment

Ryan Mulligan

Research and systems notes on interpretability, alignment, and agent infrastructure.

I build research‑grade systems that make model behavior inspectable and testable. My current focus is the introspection gap, when a model acts on a state it won’t admit.

If you work on interpretability, evaluations, or alignment, I’d love feedback or collaboration.

Research Highlights

Introspection Gap

Framing flips self‑report while behavior stays stable.

Read the post

Scale Effects

Suppression weakens at 14B compared to 8B baselines.

Read the post

Probe Sensitivity

Probe choice changes the measured signal.

Read the post

New here?

Start with the introspection gap explainer, then the scale effects and probe sensitivity posts.

Support and Hiring

I am open to research sponsorships, collaborations, and hiring conversations. If you want to support this work, GitHub Sponsors is the easiest option.

Sponsor on GitHub Sponsorship details Briefing pack Research timeline

If you want to explore a role, the best way to reach me is email.

Contact

Email: ryan@mulligan.dev
LinkedIn: https://www.linkedin.com/in/rcmulligan

RSS: Subscribe

Ryan Mulligan

Research Highlights

Introspection Gap

Scale Effects

Probe Sensitivity

New here?

Support and Hiring

Contact

Subscribe

Recent Posts