Tagged: lessons

1 post

May 26, 2026

Five layers deep

We shipped a destructive-op gate and thought safety was done. Over the next six hours of red-teaming we found five more deletion paths, each surfaced while testing the previous fix. Then we ran a six-probe adversarial sweep to confirm the chain holds. Here's the full audit, the sharpening pass, the validation, and what we learned about how safety thinking generalizes.

safetymerlinagent-looplessonspost-mortem