Securing AI Agents from Doing Bad Things
Show notes for AI Explained Part 31 — sandboxing, permission scoping, instruction hierarchy, and the metrics that tell you whether your agent is safe to ship.
Practical software security from an engineer's perspective — secrets handling, threat modelling, least privilege, input validation, prompt injection, sandboxing, and the AI-specific attack surfaces that change the threat model. Each post focuses on how to think about risk before it bites in production: which mitigations actually move the needle, which ones are theatre, and how to design systems so a single bug doesn't become a single point of catastrophic failure.
The coverage spans the boring-but-essential (rotating credentials, locking down server access, sanitising user input) and the AI-era unknowns (prompt-injected agents, untrusted tool outputs, exfiltrating data through innocent-looking model responses). Written for engineers shipping code, not security consultants writing reports — every recommendation is something you can apply in your next pull request.
1 post below, newest first.
Show notes for AI Explained Part 31 — sandboxing, permission scoping, instruction hierarchy, and the metrics that tell you whether your agent is safe to ship.
Subjects that frequently appear alongside #security. Click through to see every post on each one.
How LLMs actually work — tokenization, embeddings, RAG, fine-tuning, agents — explained for engineers who ship production code, not papers.
How autonomous AI agents reason, plan, use tools, and stay aligned with your intent — the ReAct loop, agentic RAG, and multi-agent orchestration.
The AI Explained series: short, focused episodes on individual AI building blocks — transformers, attention, tokenization, memory, tool use, multi-agent systems, and more.
Large language models — how they think, why they fail, what RAG fixes, and how to evaluate them. The fundamentals every engineer building on top of an LLM should internalise.