๐Ÿฆ€/๐Ÿ—๏ธ/7. Release Profiles and Binary Size

Release Profiles and Binary Size ๐ŸŸก

What you'll learn:

  • Release profile anatomy: LTO, codegen-units, panic strategy, strip, opt-level
  • Thin vs Fat vs Cross-Language LTO trade-offs
  • Binary size analysis with cargo-bloat
  • Dependency trimming with cargo-udeps and cargo-machete

Cross-references: Compile-Time Tools โ€” the other half of optimization ยท Benchmarking โ€” measure runtime before you optimize ยท Dependencies โ€” trimming deps reduces both size and compile time

The default cargo build --release is already good. But for production deployment โ€” especially single-binary tools deployed to thousands of servers โ€” there's a significant gap between "good" and "optimized." This chapter covers the profile knobs and the tools to measure binary size.

Release Profile Anatomy

Cargo profiles control how rustc compiles your code. The defaults are conservative โ€” designed for broad compatibility, not maximum performance:

# Cargo.toml โ€” Cargo's built-in defaults (what you get if you specify nothing)

[profile.release]
opt-level = 3        # Optimization level (0=none, 1=basic, 2=good, 3=aggressive)
lto = false          # Link-time optimization OFF
codegen-units = 16   # Parallel compilation units (faster compile, less optimization)
panic = "unwind"     # Stack unwinding on panic (larger binary, catch_unwind works)
strip = "none"       # Keep all symbols and debug info
overflow-checks = false  # No integer overflow checks in release
debug = false        # No debug info in release

Production-optimized profile (what the project already uses):

[profile.release]
lto = true           # Full cross-crate optimization
codegen-units = 1    # Single codegen unit โ€” maximum optimization opportunity
panic = "abort"      # No unwinding overhead โ€” smaller, faster
strip = true         # Remove all symbols โ€” smaller binary

The impact of each setting:

SettingDefault โ†’ OptimizedBinary SizeRuntime SpeedCompile Time
lto = false โ†’ trueโ€”-10 to -20%+5 to +20%2-5ร— slower
codegen-units = 16 โ†’ 1โ€”-5 to -10%+5 to +10%1.5-2ร— slower
panic = "unwind" โ†’ "abort"โ€”-5 to -10%NegligibleNegligible
strip = "none" โ†’ trueโ€”-50 to -70%NoneNone
opt-level = 3 โ†’ "s"โ€”-10 to -30%-5 to -10%Similar
opt-level = 3 โ†’ "z"โ€”-15 to -40%-10 to -20%Similar

Additional profile tweaks:

[profile.release]
# All of the above, plus:
overflow-checks = true    # Keep overflow checks even in release (safety > speed)
debug = "line-tables-only" # Minimal debug info for backtraces without full DWARF
rpath = false             # Don't embed runtime library paths
incremental = false       # Disable incremental compilation (cleaner builds)

# For size-optimized builds (embedded, WASM):
# opt-level = "z"         # Optimize for size aggressively
# strip = "symbols"       # Strip symbols but keep debug sections

Per-crate profile overrides โ€” optimize hot crates, leave others alone:

# Dev builds: optimize dependencies but not your code (fast recompile)
[profile.dev.package."*"]
opt-level = 2          # Optimize all dependencies in dev mode

# Release builds: override specific crate optimization
[profile.release.package.serde_json]
opt-level = 3          # Maximum optimization for JSON parsing
codegen-units = 1

# Test profile: match release behavior for accurate integration tests
[profile.test]
opt-level = 1          # Some optimization to avoid timeout in slow tests

LTO in Depth โ€” Thin vs Fat vs Cross-Language

Link-Time Optimization lets LLVM optimize across crate boundaries โ€” inlining functions from serde_json into your parsing code, removing dead code from regex, etc. Without LTO, each crate is a separate optimization island.

[profile.release]
# Option 1: Fat LTO (default when lto = true)
lto = true
# All code merged into one LLVM module โ†’ maximum optimization
# Slowest compile, smallest/fastest binary

# Option 2: Thin LTO
lto = "thin"
# Each crate stays separate but LLVM does cross-module optimization
# Faster compile than fat LTO, nearly as good optimization
# Best trade-off for most projects

# Option 3: No LTO
lto = false
# Only intra-crate optimization
# Fastest compile, larger binary

# Option 4: Off (explicit)
lto = "off"
# Same as false

Fat LTO vs Thin LTO:

AspectFat LTO (true)Thin LTO ("thin")
Optimization qualityBest~95% of fat
Compile timeSlow (all code in one module)Moderate (parallel modules)
Memory usageHigh (all LLVM IR in memory)Lower (streaming)
ParallelismNone (single module)Good (per-module)
Recommended forFinal release buildsCI builds, development

Cross-language LTO โ€” optimize across Rust and C boundaries:

[profile.release]
lto = true

# Cargo.toml โ€” for crates using the cc crate
[build-dependencies]
cc = "1.0"
// build.rs โ€” enable cross-language (linker-plugin) LTO
fn main() {
    // The cc crate respects CFLAGS from the environment.
    // For cross-language LTO, compile C code with:
    //   -flto=thin -O2
    cc::Build::new()
        .file("csrc/fast_parser.c")
        .flag("-flto=thin")
        .opt_level(2)
        .compile("fast_parser");
}
# Enable linker-plugin LTO (requires compatible LLD or gold linker)
RUSTFLAGS="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=lld" \
    cargo build --release

Cross-language LTO allows LLVM to inline C functions into Rust callers and vice versa. This is most impactful for FFI-heavy code where small C functions are called frequently (e.g., IPMI ioctl wrappers).

Binary Size Analysis with cargo-bloat

cargo-bloat answers: "What functions and crates are taking up the most space in my binary?"

# Install
cargo install cargo-bloat

# Show largest functions
cargo bloat --release -n 20
# Output:
#  File  .text     Size          Crate    Name
#  2.8%   5.1%  78.5KiB  serde_json       serde_json::de::Deserializer::parse_...
#  2.1%   3.8%  58.2KiB  regex_syntax     regex_syntax::ast::parse::ParserI::p...
#  1.5%   2.7%  42.1KiB  accel_diag         accel_diag::vendor::parse_smi_output
#  ...

# Show by crate (which dependencies are biggest)
cargo bloat --release --crates
# Output:
#  File  .text     Size Crate
# 12.3%  22.1%  340KiB serde_json
#  8.7%  15.6%  240KiB regex
#  6.2%  11.1%  170KiB std
#  5.1%   9.2%  141KiB accel_diag
#  ...

# Compare two builds (before/after optimization)
cargo bloat --release --crates > before.txt
# ... make changes ...
cargo bloat --release --crates > after.txt
diff before.txt after.txt

Common bloat sources and fixes:

Bloat SourceTypical SizeFix
regex (full engine)200-400 KBUse regex-lite if you don't need Unicode
serde_json (full)200-350 KBConsider simd-json or sonic-rs if perf matters
Generics monomorphizationVariesUse dyn Trait at API boundaries
Formatting machinery (Display, Debug)50-150 KB#[derive(Debug)] on large enums adds up
Panic message strings20-80 KBpanic = "abort" removes unwinding, strip removes strings
Unused featuresVariesDisable default features: serde = { version = "1", default-features = false }

Trimming Dependencies with cargo-udeps

cargo-udeps finds dependencies declared in Cargo.toml that your code doesn't actually use:

# Install (requires nightly)
cargo install cargo-udeps

# Find unused dependencies
cargo +nightly udeps --workspace
# Output:
# unused dependencies:
# `diag_tool v0.1.0`
# โ””โ”€โ”€ "tempfile" (dev-dependency)
#
# `accel_diag v0.1.0`  
# โ””โ”€โ”€ "once_cell"    โ† was needed before LazyLock, now dead

Every unused dependency:

  • Increases compile time
  • Increases binary size
  • Adds supply chain risk
  • Adds potential license complications

Alternative: cargo-machete โ€” faster, heuristic-based approach:

cargo install cargo-machete
cargo machete
# Faster but may have false positives (heuristic, not compilation-based)

Size Optimization Decision Tree

flowchart TD
    START["Binary too large?"] --> STRIP{"strip = true?"}
    STRIP -->|"No"| DO_STRIP["Add strip = true\n-50 to -70% size"]
    STRIP -->|"Yes"| LTO{"LTO enabled?"}
    LTO -->|"No"| DO_LTO["Add lto = true\ncodegen-units = 1"]
    LTO -->|"Yes"| BLOAT["Run cargo-bloat\n--crates"]
    BLOAT --> BIG_DEP{"Large dependency?"}
    BIG_DEP -->|"Yes"| REPLACE["Replace with lighter\nalternative or disable\ndefault features"]
    BIG_DEP -->|"No"| UDEPS["cargo-udeps\nRemove unused deps"]
    UDEPS --> OPT_LEVEL{"Need smaller?"}
    OPT_LEVEL -->|"Yes"| SIZE_OPT["opt-level = 's' or 'z'"]
    
    style DO_STRIP fill:#91e5a3,color:#000
    style DO_LTO fill:#e3f2fd,color:#000
    style REPLACE fill:#ffd43b,color:#000
    style SIZE_OPT fill:#ff6b6b,color:#000

๐Ÿ‹๏ธ Exercises

๐ŸŸข Exercise 1: Measure LTO Impact

Build a project with default release settings, then with lto = true + codegen-units = 1 + strip = true. Compare binary size and compile time.

<details> <summary>Solution</summary>
# Default release
cargo build --release
ls -lh target/release/my-binary
time cargo build --release  # Note time

# Optimized release โ€” add to Cargo.toml:
# [profile.release]
# lto = true
# codegen-units = 1
# strip = true
# panic = "abort"

cargo clean
cargo build --release
ls -lh target/release/my-binary  # Typically 30-50% smaller
time cargo build --release       # Typically 2-3ร— slower to compile
</details>

๐ŸŸก Exercise 2: Find Your Biggest Crate

Run cargo bloat --release --crates on a project. Identify the largest dependency. Can you reduce it by disabling default features or switching to a lighter alternative?

<details> <summary>Solution</summary>
cargo install cargo-bloat
cargo bloat --release --crates
# Output:
#  File  .text     Size Crate
# 12.3%  22.1%  340KiB serde_json
#  8.7%  15.6%  240KiB regex

# For regex โ€” try regex-lite if you don't need Unicode:
# regex-lite = "0.1"  # ~10ร— smaller than full regex

# For serde โ€” disable default features if you don't need std:
# serde = { version = "1", default-features = false, features = ["derive"] }

cargo bloat --release --crates  # Compare after changes
</details>

Key Takeaways

  • lto = true + codegen-units = 1 + strip = true + panic = "abort" is the production release profile
  • Thin LTO (lto = "thin") gives 80% of Fat LTO's benefit at a fraction of the compile cost
  • cargo-bloat --crates tells you exactly which dependencies are eating binary space
  • cargo-udeps and cargo-machete find dead dependencies that waste compile time and binary size
  • Per-crate profile overrides let you optimize hot crates without slowing the whole build