🐳 Module 09: Secure Container Orchestration & Sandboxing

Welcome to Module 09. In this section, you will master the architecture of Agentic Isolation. You will move beyond "running Docker" to understand the physics of Kernel Namespace Isolation, Resource Cgroups, and the implementation of Hardware-Level Sandboxing to ensure that agent-generated code remains safely contained.

🏛️ 1. Architectural Deep Dive: The Shared Kernel Vulnerability

Standard Docker containers are not "Virtual Machines." They are isolated processes sharing the same host Linux kernel.

Namespaces & Cgroups

Namespaces: Provide the illusion of a private system. They isolate the Process ID (PID) tree, Network stack, and Mount points. However, the container still makes direct syscalls to the host kernel.
Cgroups (Control Groups): These act as the "Resource Police." They enforce physical limits on CPU cycles, RAM, and Disk I/O. Without strict Cgroups, a recursive agent loop could trigger a Fork Bomb and crash your entire Linux workspace.

The Attack Surface

If an agent generates code that exploits a kernel vulnerability (e.g., a "Dirty COW" style exploit), it can "break out" of the container and gain root access to your host machine. This is why Production Agent Security requires an extra layer: the MicroVM or User-Space Kernel.

📊 2. Structured Tradeoff Matrix: Isolation Runtimes

Runtime	Isolation Level	Latency (Boot)	Resource Overhead	Primary Production Bottleneck
runc (Standard Docker)	Process-Level	< 100ms	Minimal	Shared kernel allows breakout exploits.
runsc (gVisor)	User-Space Kernel	200ms - 500ms	Moderate	High syscall overhead (Slow I/O).
Firecracker	MicroVM	100ms - 1s	High	Requires KVM support (Bare metal/Nested).
Wasm (WebAssembly)	Instruction-Level	< 1ms	Ultra-Low	Extremely limited library support (No `pip`).

🛠️ 3. Step-by-Step Mechanics Breakdown

Pattern: Multi-Stage Lean Runtimes

In Lab 1, we use python:3.11-slim and uv.

Layer Minimization: Every RUN command creates a disk layer. We chain commands and clean /var/lib/apt/lists/ to keep the image small.
UV Pre-compilation: We use uv to resolve and install dependencies in seconds. By using uv pip install --system, we bypass the need for slow virtualenvs inside a container that is already an isolated environment.
The appuser Pattern: We never run agents as root. We create a low-privilege appuser so that even if the agent is compromised, it cannot modify system binaries.

Pattern: The "Ephemeral Sandbox" Loop

In Lab 3, we implement the --rm and -m 128m pattern.

Rationale: Agents generate "disposable" code. By using --rm, we ensure that any temporary files or malicious artifacts are physically deleted the millisecond the process ends.
Resource Constraint: The -m 128m flag sends a SIGKILL to the process if it attempts to consume more RAM, protecting your host from LLM-driven memory leaks.

🛡️ 4. Failure Mode Analysis: Sandbox Breaking Points

Failure Mode	Error/Log Signature	Root Cause	Code-Level Mitigation
Memory Exhaustion	`OOMKilled` (Exit Code 137)	Agent code allocated too much RAM.	Increase `-m` limit or optimize the Python script.
Privilege Denied	`PermissionError: [Errno 13]`	Agent tried to write to a read-only (`:ro`) mount.	Use `:rw` for data folders and `:ro` for code.
Network Isolation	`Failed to establish connection`	Container network is `none` or `bridge` incorrectly.	Use `docker network inspect` to verify CIDR ranges.
Syscall Block	`Operation not permitted`	gVisor blocked a dangerous kernel call.	This is working as intended; log the attempt.

🧪 5. Runtime Verification: What to Observe

When executing the labs, monitor these security signals in your terminal:

Process Masking: Inside the container, run ps aux.
- Observation: You should only see PID 1 (your script). You should not see any processes from your host Linux machine.
Resource Pressure: Run docker stats in a separate tmux pane while your agent is running.
- Observation: Watch the MEM USAGE / LIMIT percentage. If it hits 100%, Docker will hard-terminate the container.
Runtime Audit: Run docker inspect [container_id] | grep -i runtime.
- Observation: Ensure it says "Runtime": "runsc" (if using gVisor) or "runc" (standard), confirming your security boundary is active.

Next Step: proceed to Module 10: Cloud Ops & Serverless GPUs to learn how to scale these containers to the cloud.

🐳 Module 09: Secure Container Orchestration & Sandboxing ​

🏛️ 1. Architectural Deep Dive: The Shared Kernel Vulnerability ​

Namespaces & Cgroups ​

The Attack Surface ​

📊 2. Structured Tradeoff Matrix: Isolation Runtimes ​

🛠️ 3. Step-by-Step Mechanics Breakdown ​

Pattern: Multi-Stage Lean Runtimes ​

Pattern: The "Ephemeral Sandbox" Loop ​

🛡️ 4. Failure Mode Analysis: Sandbox Breaking Points ​

🧪 5. Runtime Verification: What to Observe ​