๐Ÿฆ€/๐Ÿ—๏ธ/5. Miri, Valgrind, and Sanitizers โ€” Verifying Unsafe Code

Miri, Valgrind, and Sanitizers โ€” Verifying Unsafe Code ๐Ÿ”ด

What you'll learn:

  • Miri as a MIR interpreter โ€” what it catches (aliasing, UB, leaks) and what it can't (FFI, syscalls)
  • Valgrind memcheck, Helgrind (data races), Callgrind (profiling), and Massif (heap)
  • LLVM sanitizers: ASan, MSan, TSan, LSan with nightly -Zbuild-std
  • cargo-fuzz for crash discovery and loom for concurrency model checking
  • A decision tree for choosing the right verification tool

Cross-references: Code Coverage โ€” coverage finds untested paths, Miri verifies the tested ones ยท no_std & Features โ€” no_std code often requires unsafe that Miri can verify ยท CI/CD Pipeline โ€” Miri job in the pipeline

Safe Rust guarantees memory safety and data-race freedom at compile time. But the moment you write unsafe โ€” for FFI, hand-rolled data structures, or performance tricks โ€” those guarantees become your responsibility. This chapter covers the tools that verify your unsafe code actually upholds the safety contracts it claims.

Miri โ€” An Interpreter for Unsafe Rust

Miri is an interpreter for Rust's Mid-level Intermediate Representation (MIR). Instead of compiling to machine code, Miri executes your program step-by-step with exhaustive checks for undefined behavior at every operation.

# Install Miri (nightly-only component)
rustup +nightly component add miri

# Run your test suite under Miri
cargo +nightly miri test

# Run a specific binary under Miri
cargo +nightly miri run

# Run a specific test
cargo +nightly miri test -- test_name

How Miri works:

Source โ†’ rustc โ†’ MIR โ†’ Miri interprets MIR
                        โ”‚
                        โ”œโ”€ Tracks every pointer's provenance
                        โ”œโ”€ Validates every memory access
                        โ”œโ”€ Checks alignment at every deref
                        โ”œโ”€ Detects use-after-free
                        โ”œโ”€ Detects data races (with threads)
                        โ””โ”€ Enforces Stacked Borrows / Tree Borrows rules

What Miri Catches (and What It Cannot)

Miri detects:

CategoryExampleWould Crash at Runtime?
Out-of-bounds accessptr.add(100).read() past allocationSometimes (depends on page layout)
Use after freeReading a dropped Box through raw pointerSometimes (depends on allocator)
Double freeCalling drop_in_place twiceUsually
Unaligned access(ptr as *const u32).read() on odd addressOn some architectures
Invalid valuestransmute::<u8, bool>(2)Silently wrong
Dangling references&*ptr where ptr is freedNo (silent corruption)
Data racesTwo threads, one writing, no synchronizationIntermittent, hard to reproduce
Stacked Borrows violationAliasing &mut referencesNo (silent corruption)

Miri does NOT detect:

LimitationWhy
Logic bugsMiri checks memory safety, not correctness
Concurrency deadlocksMiri checks data races, not livelocks
Performance issuesInterpretation is 10-100ร— slower than native
OS/hardware interactionMiri can't emulate syscalls, device I/O
All FFI callsCan't interpret C code (only Rust MIR)
Exhaustive path coverageOnly tests the paths your test suite reaches

A concrete example โ€” catching unsound code that "works" in practice:

#[cfg(test)]
mod tests {
    #[test]
    fn test_miri_catches_ub() {
        // This "works" in release builds but is undefined behavior
        let mut v = vec![1, 2, 3];
        let ptr = v.as_ptr();

        // Push may reallocate, invalidating ptr
        v.push(4);

        // โŒ UB: ptr may be dangling after reallocation
        // Miri will catch this even if the allocator happens to
        // not move the buffer.
        // let _val = unsafe { *ptr };
        // Error: Miri would report:
        //   "pointer to alloc1234 was dereferenced after this
        //    allocation got freed"
        
        // โœ… Correct: get a fresh pointer after mutation
        let ptr = v.as_ptr();
        let val = unsafe { *ptr };
        assert_eq!(val, 1);
    }
}

Running Miri on a Real Crate

Practical Miri workflow for a crate with unsafe:

# Step 1: Run all tests under Miri
cargo +nightly miri test 2>&1 | tee miri_output.txt

# Step 2: If Miri reports errors, isolate them
cargo +nightly miri test -- failing_test_name

# Step 3: Use Miri's backtrace for diagnosis
MIRIFLAGS="-Zmiri-backtrace=full" cargo +nightly miri test

# Step 4: Choose a borrow model
# Stacked Borrows (default, stricter):
cargo +nightly miri test

# Tree Borrows (experimental, more permissive):
MIRIFLAGS="-Zmiri-tree-borrows" cargo +nightly miri test

Miri flags for common scenarios:

# Disable isolation (allow file system access, env vars)
MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test

# Memory leak detection is ON by default in Miri.
# To suppress leak errors (e.g., for intentional leaks):
# MIRIFLAGS="-Zmiri-ignore-leaks" cargo +nightly miri test

# Seed the RNG for reproducible results with randomized tests
MIRIFLAGS="-Zmiri-seed=42" cargo +nightly miri test

# Enable strict provenance checking
MIRIFLAGS="-Zmiri-strict-provenance" cargo +nightly miri test

# Multiple flags
MIRIFLAGS="-Zmiri-disable-isolation -Zmiri-backtrace=full -Zmiri-strict-provenance" \
    cargo +nightly miri test

Miri in CI:

# .github/workflows/miri.yml
name: Miri
on: [push, pull_request]

jobs:
  miri:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@nightly
        with:
          components: miri

      - name: Run Miri
        run: cargo miri test --workspace
        env:
          MIRIFLAGS: "-Zmiri-backtrace=full"
          # Leak checking is on by default.
          # Skip tests that use system calls Miri can't handle
          # (file I/O, networking, etc.)

Performance note: Miri is 10-100ร— slower than native execution. A test suite that runs in 5 seconds natively may take 5 minutes under Miri. In CI, run Miri on a focused subset: crates with unsafe code only.

Valgrind and Its Rust Integration

Valgrind is the classic C/C++ memory checker. It works on compiled Rust binaries too, checking for memory errors at the machine-code level.

# Install Valgrind
sudo apt install valgrind  # Debian/Ubuntu
sudo dnf install valgrind  # Fedora

# Build with debug info (Valgrind needs symbols)
cargo build --tests
# or for release with debug info:
# cargo build --release
# [profile.release]
# debug = true

# Run a specific test binary under Valgrind
valgrind --tool=memcheck \
    --leak-check=full \
    --show-leak-kinds=all \
    --track-origins=yes \
    ./target/debug/deps/my_crate-abc123 --test-threads=1

# Run the main binary
valgrind --tool=memcheck \
    --leak-check=full \
    --error-exitcode=1 \
    ./target/debug/diag_tool --run-diagnostics

Valgrind tools beyond memcheck:

ToolCommandWhat It Detects
Memcheck--tool=memcheckMemory leaks, use-after-free, buffer overflows
Helgrind--tool=helgrindData races and lock-order violations
DRD--tool=drdData races (different detection algorithm)
Callgrind--tool=callgrindCPU instruction profiling (path-level)
Massif--tool=massifHeap memory profiling over time
Cachegrind--tool=cachegrindCache miss analysis

Using Callgrind for instruction-level profiling:

# Record instruction counts (more stable than wall-clock time)
valgrind --tool=callgrind \
    --callgrind-out-file=callgrind.out \
    ./target/release/diag_tool --run-diagnostics

# Visualize with KCachegrind
kcachegrind callgrind.out
# or the text-based alternative:
callgrind_annotate callgrind.out | head -100

Miri vs Valgrind โ€” when to use which:

AspectMiriValgrind
Checks Rust-specific UBโœ… Stacked/Tree BorrowsโŒ Not aware of Rust rules
Checks C FFI codeโŒ Can't interpret Cโœ… Checks all machine code
Needs nightlyโœ… YesโŒ No
Speed10-100ร— slower10-50ร— slower
PlatformAny (interprets MIR)Linux, macOS (runs native code)
Data race detectionโœ… Yesโœ… Yes (Helgrind/DRD)
Leak detectionโœ… Yesโœ… Yes (more thorough)
False positivesVery rareOccasional (especially with allocators)

Use both:

  • Miri for pure-Rust unsafe code (Stacked Borrows, provenance)
  • Valgrind for FFI-heavy code and whole-program leak analysis

AddressSanitizer, MemorySanitizer, ThreadSanitizer

LLVM sanitizers are compile-time instrumentation passes that insert runtime checks. They're faster than Valgrind (2-5ร— overhead vs 10-50ร—) and catch different classes of bugs.

# Required: install Rust source for rebuilding std with sanitizer instrumentation
rustup component add rust-src --toolchain nightly
# AddressSanitizer (ASan) โ€” buffer overflows, use-after-free, stack overflows
RUSTFLAGS="-Zsanitizer=address" \
    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu

# MemorySanitizer (MSan) โ€” uninitialized memory reads
RUSTFLAGS="-Zsanitizer=memory" \
    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu

# ThreadSanitizer (TSan) โ€” data races
RUSTFLAGS="-Zsanitizer=thread" \
    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu

# LeakSanitizer (LSan) โ€” memory leaks (included in ASan by default)
RUSTFLAGS="-Zsanitizer=leak" \
    cargo +nightly test --target x86_64-unknown-linux-gnu

Note: ASan, MSan, and TSan require -Zbuild-std to rebuild the standard library with sanitizer instrumentation. LSan does not.

Sanitizer comparison:

SanitizerOverheadCatchesNightly?-Zbuild-std?
ASan2ร— memory, 2ร— CPUBuffer overflow, use-after-free, stack overflowYesYes
MSan3ร— memory, 3ร— CPUUninitialized readsYesYes
TSan5-10ร— memory, 5ร— CPUData racesYesYes
LSanMinimalMemory leaksYesNo

Practical example โ€” catching a data race with TSan:

use std::sync::Arc;
use std::thread;

fn racy_counter() -> u64 {
    // โŒ UB: unsynchronized shared mutable state
    let data = Arc::new(std::cell::UnsafeCell::new(0u64));
    let mut handles = vec![];

    for _ in 0..4 {
        let data = Arc::clone(&data);
        handles.push(thread::spawn(move || {
            for _ in 0..1000 {
                // SAFETY: UNSOUND โ€” data race!
                unsafe {
                    *data.get() += 1;
                }
            }
        }));
    }

    for h in handles {
        h.join().unwrap();
    }

    // Value should be 4000 but may be anything due to race
    unsafe { *data.get() }
}

// Both Miri and TSan catch this:
// Miri:  "Data race detected between (1) write and (2) write"
// TSan:  "WARNING: ThreadSanitizer: data race"
//
// Fix: use AtomicU64 or Mutex<u64>

cargo-fuzz โ€” Coverage-Guided Fuzzing (finds crashes in parsers and decoders):

# Install
cargo install cargo-fuzz

# Initialize a fuzz target
cargo fuzz init
cargo fuzz add parse_gpu_csv
// fuzz/fuzz_targets/parse_gpu_csv.rs
#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    if let Ok(s) = std::str::from_utf8(data) {
        // The fuzzer generates millions of inputs looking for panics/crashes.
        let _ = diag_tool::parse_gpu_csv(s);
    }
});
# Run the fuzzer (runs until interrupted or crash found)
cargo +nightly fuzz run parse_gpu_csv -- -max_total_time=300  # 5 minutes

# Minimize a crash
cargo +nightly fuzz tmin parse_gpu_csv artifacts/parse_gpu_csv/crash-...

When to fuzz: Any function that parses untrusted/semi-trusted input (sensor output, config files, network data, JSON/CSV). Fuzzing found real bugs in every major Rust parser crate (serde, regex, image).

loom โ€” Concurrency Model Checker (exhaustively tests atomic orderings):

[dev-dependencies]
loom = "0.7"
#[cfg(loom)]
mod tests {
    use loom::sync::atomic::{AtomicUsize, Ordering};
    use loom::thread;

    #[test]
    fn test_counter_is_atomic() {
        loom::model(|| {
            let counter = loom::sync::Arc::new(AtomicUsize::new(0));
            let c1 = counter.clone();
            let c2 = counter.clone();

            let t1 = thread::spawn(move || { c1.fetch_add(1, Ordering::SeqCst); });
            let t2 = thread::spawn(move || { c2.fetch_add(1, Ordering::SeqCst); });

            t1.join().unwrap();
            t2.join().unwrap();

            // loom explores ALL possible thread interleavings
            assert_eq!(counter.load(Ordering::SeqCst), 2);
        });
    }
}

When to use loom: When you have lock-free data structures or custom synchronization primitives. Loom exhaustively explores thread interleavings โ€” it's a model checker, not a stress test. Not needed for Mutex/RwLock-based code.

When to Use Which Tool

Decision tree for unsafe verification:

Is the code pure Rust (no FFI)?
โ”œโ”€ Yes โ†’ Use Miri (catches Rust-specific UB, Stacked Borrows)
โ”‚        Also run ASan in CI for defense-in-depth
โ””โ”€ No (calls C/C++ code via FFI)
   โ”œโ”€ Memory safety concerns?
   โ”‚  โ””โ”€ Yes โ†’ Use Valgrind memcheck AND ASan
   โ”œโ”€ Concurrency concerns?
   โ”‚  โ””โ”€ Yes โ†’ Use TSan (faster) or Helgrind (more thorough)
   โ””โ”€ Memory leak concerns?
      โ””โ”€ Yes โ†’ Use Valgrind --leak-check=full

Recommended CI matrix:

# Run all tools in parallel for fast feedback
jobs:
  miri:
    runs-on: ubuntu-latest
    steps:
      - uses: dtolnay/rust-toolchain@nightly
        with: { components: miri }
      - run: cargo miri test --workspace

  asan:
    runs-on: ubuntu-latest
    steps:
      - uses: dtolnay/rust-toolchain@nightly
      - run: |
          RUSTFLAGS="-Zsanitizer=address" \
          cargo test -Zbuild-std --target x86_64-unknown-linux-gnu

  valgrind:
    runs-on: ubuntu-latest
    steps:
      - run: sudo apt-get install -y valgrind
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo build --tests
      - run: |
          for test_bin in $(find target/debug/deps -maxdepth 1 -executable -type f ! -name '*.d'); do
            valgrind --error-exitcode=1 --leak-check=full "$test_bin" --test-threads=1
          done

Application: Zero Unsafe โ€” and When You'll Need It

The project contains zero unsafe blocks across 90K+ lines of Rust. This is a remarkable achievement for a systems-level diagnostics tool and demonstrates that safe Rust is sufficient for:

  • IPMI communication (via std::process::Command to ipmitool)
  • GPU queries (via std::process::Command to accel-query)
  • PCIe topology parsing (pure JSON/text parsing)
  • SEL record management (pure data structures)
  • DER report generation (JSON serialization)

When will the project need unsafe?

The likely triggers for introducing unsafe:

ScenarioWhy unsafeRecommended Verification
Direct ioctl-based IPMIlibc::ioctl() bypasses ipmitool subprocessMiri + Valgrind
Direct GPU driver queriesaccel-mgmt FFI instead of accel-query parsingValgrind (C library)
Memory-mapped PCIe configmmap for direct config-space readsASan + Valgrind
Lock-free SEL bufferAtomicPtr for concurrent event collectionMiri + TSan
Embedded/no_std variantRaw pointer manipulation for bare-metalMiri

Preparation: Before introducing unsafe, add the verification tools to CI:

# Cargo.toml โ€” add a feature flag for unsafe optimizations
[features]
default = []
direct-ipmi = []     # Enable direct ioctl IPMI instead of ipmitool subprocess
direct-accel-api = []     # Enable accel-mgmt FFI instead of accel-query parsing
// src/ipmi.rs โ€” gated behind a feature flag
#[cfg(feature = "direct-ipmi")]
mod direct {
    //! Direct IPMI device access via /dev/ipmi0 ioctl.
    //!
    //! # Safety
    //! This module uses `unsafe` for ioctl system calls.
    //! Verified with: Miri (where possible), Valgrind memcheck, ASan.

    use std::os::unix::io::RawFd;

    // ... unsafe ioctl implementation ...
}

#[cfg(not(feature = "direct-ipmi"))]
mod subprocess {
    //! IPMI via ipmitool subprocess (default, fully safe).
    // ... current implementation ...
}

Key insight: Keep unsafe behind feature flags so it can be verified independently. Run cargo +nightly miri test --features direct-ipmi in CI to continuously verify the unsafe paths without affecting the safe default build.

cargo-careful โ€” Extra UB Checks on Stable

cargo-careful runs your code with extra standard library checks enabled โ€” catching some undefined behavior that normal builds ignore, without requiring nightly or Miri's 10-100ร— slowdown:

# Install (requires nightly, but runs your code at near-native speed)
cargo install cargo-careful

# Run tests with extra UB checks (catches uninitialized memory, invalid values)
cargo +nightly careful test

# Run a binary with extra checks
cargo +nightly careful run -- --run-diagnostics

What cargo-careful catches that normal builds don't:

  • Reads of uninitialized memory in MaybeUninit and zeroed()
  • Creating invalid bool, char, or enum values via transmute
  • Unaligned pointer reads/writes
  • copy_nonoverlapping with overlapping ranges

Where it fits in the verification ladder:

Least overhead                                          Most thorough
โ”œโ”€ cargo test โ”€โ”€โ–บ cargo careful test โ”€โ”€โ–บ Miri โ”€โ”€โ–บ ASan โ”€โ”€โ–บ Valgrind โ”€โ”ค
โ”‚  (0ร— overhead)  (~1.5ร— overhead)   (10-100ร—)  (2ร—)     (10-50ร—)   โ”‚
โ”‚  Safe Rust only  Catches some UB    Pure-Rust  FFI+Rust FFI+Rust   โ”‚

Recommendation: Add cargo +nightly careful test to CI as a fast safety check. It runs at near-native speed (unlike Miri) and catches real bugs that safe Rust abstractions mask.

Troubleshooting Miri and Sanitizers

SymptomCauseFix
Miri does not support FFIMiri is a Rust interpreter; it can't execute C codeUse Valgrind or ASan for FFI code instead
error: unsupported operation: can't call foreign functionMiri hit an extern "C" callMock the FFI boundary or gate behind #[cfg(miri)]
Stacked Borrows violationAliasing rule violation โ€” even if code "works"Miri is correct; refactor to avoid aliasing &mut with &
Sanitizer says DEADLYSIGNALASan detected buffer overflowCheck array indexing, slice operations, and pointer arithmetic
LeakSanitizer: detected memory leaksBox::leak(), forget(), or missing drop()Intentional: suppress with __lsan_disable(); unintentional: fix the leak
Miri is extremely slowMiri interprets, doesn't compile โ€” 10-100ร— slowerRun only on --lib tests or tag specific tests with #[cfg_attr(miri, ignore)] for slow ones
TSan: false positive with atomicsTSan doesn't understand Rust's atomic ordering model perfectlyAdd TSAN_OPTIONS=suppressions=tsan.supp with specific suppressions

Try It Yourself

  1. Trigger a Miri UB detection: Write an unsafe function that creates two &mut references to the same i32 (aliasing violation). Run cargo +nightly miri test and observe the "Stacked Borrows" error. Fix it with UnsafeCell or separate allocations.

  2. Run ASan on a deliberate bug: Create a test that does unsafe out-of-bounds array access. Build with RUSTFLAGS="-Zsanitizer=address" and observe ASan's report. Note how it pinpoints the exact line.

  3. Benchmark Miri overhead: Time cargo test --lib vs cargo +nightly miri test --lib on the same test suite. Calculate the slowdown factor. Based on this, decide which tests to run under Miri in CI and which to skip with #[cfg_attr(miri, ignore)].

Safety Verification Decision Tree

flowchart TD
    START["Have unsafe code?"] -->|No| SAFE["Safe Rust โ€” no\nverification needed"]
    START -->|Yes| KIND{"What kind?"}
    
    KIND -->|"Pure Rust unsafe"| MIRI["Miri\nMIR interpreter\ncatches aliasing, UB, leaks"]
    KIND -->|"FFI / C interop"| VALGRIND["Valgrind memcheck\nor ASan"]
    KIND -->|"Concurrent unsafe"| CONC{"Lock-free?"}
    
    CONC -->|"Atomics/lock-free"| LOOM["loom\nModel checker for atomics"]
    CONC -->|"Mutex/shared state"| TSAN["TSan or\nMiri -Zmiri-check-number-validity"]
    
    MIRI --> CI_MIRI["CI: cargo +nightly miri test"]
    VALGRIND --> CI_VALGRIND["CI: valgrind --leak-check=full"]
    
    style SAFE fill:#91e5a3,color:#000
    style MIRI fill:#e3f2fd,color:#000
    style VALGRIND fill:#ffd43b,color:#000
    style LOOM fill:#ff6b6b,color:#000
    style TSAN fill:#ffd43b,color:#000

๐Ÿ‹๏ธ Exercises

๐ŸŸก Exercise 1: Trigger a Miri UB Detection

Write an unsafe function that creates two &mut references to the same i32 (aliasing violation). Run cargo +nightly miri test and observe the Stacked Borrows error. Fix it.

<details> <summary>Solution</summary>
#[cfg(test)]
mod tests {
    #[test]
    fn aliasing_ub() {
        let mut x: i32 = 42;
        let ptr = &mut x as *mut i32;
        unsafe {
            // BUG: Two &mut references to the same location
            let _a = &mut *ptr;
            let _b = &mut *ptr; // Miri: Stacked Borrows violation!
        }
    }
}

Fix: use separate allocations or UnsafeCell:

use std::cell::UnsafeCell;

#[test]
fn no_aliasing_ub() {
    let x = UnsafeCell::new(42);
    unsafe {
        let a = &mut *x.get();
        *a = 100;
    }
}
</details>

๐Ÿ”ด Exercise 2: ASan Out-of-Bounds Detection

Create a test with unsafe out-of-bounds array access. Build with RUSTFLAGS="-Zsanitizer=address" on nightly and observe ASan's report.

<details> <summary>Solution</summary>
#[test]
fn oob_access() {
    let arr = [1u8, 2, 3, 4, 5];
    let ptr = arr.as_ptr();
    unsafe {
        let _val = *ptr.add(10); // Out of bounds!
    }
}
RUSTFLAGS="-Zsanitizer=address" cargo +nightly test -Zbuild-std \
  --target x86_64-unknown-linux-gnu -- oob_access
# ASan report: stack-buffer-overflow at <exact address>
</details>

Key Takeaways

  • Miri is the tool for pure-Rust unsafe โ€” it catches aliasing violations, use-after-free, and leaks that compile and pass tests
  • Valgrind is the tool for FFI/C interop โ€” it works on the final binary without recompilation
  • Sanitizers (ASan, TSan, MSan) require nightly but run at near-native speed โ€” ideal for large test suites
  • loom is purpose-built for verifying lock-free concurrent data structures
  • Run Miri in CI on every push; run sanitizers on a nightly schedule to avoid slowing the main pipeline