2KINGSDEV — Independent AI Research Lab

Current Research

Contrastive Behavioral Topology Scanning: Per-Head Attribution and Intervention-Based Analysis of Behavioral Structure in RLHF Transformers

M. Cray, C. Schmidt · April 2026

We introduce CBTS, an inference-only per-head attribution and intervention framework that maps behavioral architecture in RLHF-trained transformers across 131 behavioral and 19 structural directions. Validated across five RLHF-trained transformer configurations (Qwen-2.5 at 3B, 7B, 14B; Phi-3.5-Mini; Llama-3.2-3B).

The paper documents a benchmark-insensitive failure mode in current safety evaluation methodology, characterizes multi-gate safety architecture at per-head resolution, and demonstrates cross-dimensional intervention specificity (69× on-target to off-target ratio).

Read on Zenodo → DOI: 10.5281/zenodo.19665052 arXiv submission pending endorsement

What We Study

Behavioral control in language models is implemented through structured internal mechanisms that can be characterized, measured, and modified. Current evaluation methodology treats these mechanisms as a black box; current interpretability research focuses heavily on single behavioral dimensions.

Our work extends direction-based mechanistic analysis to a 131-direction behavioral catalog at per-head, sub-layer resolution. The framework is general: safety is the first rigorous demonstration because it supplies clean quantitative scoring, but the same methodology applies to creative expression, reasoning style, aesthetic preference, and other behavioral dimensions we are actively investigating.

Why It Matters

The ability to characterize and verify model behavior without relying on the developer's own evaluation methodology is a public good. The benchmark-insensitive failure mode we document suggests that aggregate safety metrics may be systematically inadequate at frontier scales. We publish our findings openly because the defensive value exceeds the restricted-access alternative.

What's Next

›Scale validation to frontier-size models
›Extend the framework beyond safety to creative and reasoning dimensions
›Release open-source tooling for reproducible per-head behavioral analysis
›Build cross-lab benchmarks for behavioral integrity verification

Applied Systems

Alongside our research, we build and deploy AI systems for operating businesses. These systems serve as practical tools and as live deployments where research insights can be tested against real-world usage patterns.

GRACE

Voice AI receptionist

Deployed at The Metalsmiths since 2025. Handles appointment scheduling, customer intake, and team identification by voice.

Built on ElevenLabs, Twilio, Supabase.

QUOTE FORGE

Intelligent estimating platform

Reduces quote generation from 3 hours to 15 minutes. Deployed in service-business operations.

Built on FastAPI, React, PostgreSQL.

For inquiries about custom applied systems: hello@2kingsdev.ai

About 2KINGSDEV

2KINGSDEV LLC is an independent research lab founded in 2026 by Michael Cray and Christopher Schmidt. We publish research openly on Zenodo and arXiv. Our work is supported by revenue from applied systems and by research funding.

Based in the Pacific Northwest. Incorporated in Washington State.

Active patent filings in mechanistic interpretability and behavioral analysis methodology.

2KINGSDEV is an independent research lab studying the behavioral architecture of RLHF-trained language models.

Contrastive Behavioral Topology Scanning: Per-Head Attribution and Intervention-Based Analysis of Behavioral Structure in RLHF Transformers

Michael Cray

Christopher Schmidt

Voice AI receptionist

Intelligent estimating platform