hyperfine — CLI Benchmarking & Command Timing
In this tutorial, you'll learn about hyperfine. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
hyperfine is a command-line benchmarking tool that runs shell commands multiple times, performs statistical analysis, handles warm-up caches, and produces clear comparative results with confidence intervals.
What You'll Learn
How to benchmark any CLI command with hyperfine, compare multiple tools and approaches, use parameterized benchmarks to test different inputs, export results to JSON and Markdown for reports, and integrate benchmarking into your development workflow.
Why hyperfine Matters
Optimizing code or choosing between tools requires data. The naive approach — time command — runs once and produces unreliable results due to Caching, thermal throttling, and system noise. hyperfine runs commands 10-100 times, discards warm-up runs, calculates mean and standard deviation, and produces statistically meaningful numbers. DodaZIP's compression team uses hyperfine to benchmark every code change against the previous commit, ensuring performance never regresses.
Learning Path
flowchart LR A[mise/asdf] --> B[hyperfine
You are here] B --> C[Difftastic] B --> D[procs & bottom] style B fill:#f90,color:#fff
Installation
# macOS
brew install hyperfine
# Ubuntu/Debian
sudo apt install hyperfine
# Fedora
sudo dnf install hyperfine
# Arch
sudo pacman -S hyperfine
# Download binary
curl -LO https://github.com/sharkdp/hyperfine/releases/download/v1.18.0/hyperfine_1.18.0_amd64.deb
sudo dpkg -i hyperfine_1.18.0_amd64.deb
# Cargo
cargo install hyperfine
Basic Usage
# Benchmark a single command
hyperfine "sleep 1"
# With custom run count
hyperfine -r 50 "ls -la"
# Show output (not hidden)
hyperfine --show-output "echo hello"
Expected output:
Benchmark 1: sleep 1
Time (mean +- sd): 1.002 s +- 0.003 s [User: 0.001 s, System: 0.000 s]
Range (min / max): 1.000 s / 1.010 s [10 runs]
Comparing Multiple Commands
# Compare two tools
hyperfine "rg 'function' src/" "grep -r 'function' src/"
# Compare three or more
hyperfine "python3 script.py" "node script.js" "deno script.js"
# Compare with custom names
hyperfine --command-name "ripgrep" "rg 'function' src/" \
--command-name "grep" "grep -r 'function' src/"
Expected output:
Benchmark 1: ripgrep
Time (mean +- sd): 23.1 ms +- 1.2 ms [User: 18.5 ms, System: 4.2 ms]
Range (min / max): 21.5 ms / 27.8 ms [100 runs]
Benchmark 2: grep
Time (mean +- sd): 456.2 ms +- 12.3 ms [User: 412.3 ms, System: 43.5 ms]
Range (min / max): 441.0 ms / 478.5 ms [10 runs]
Summary
'rg '\''function'\'' src/' ran
19.75 +- 0.67 times faster than 'grep -r '\''function'\'' src/'
Warm-Up Runs
Filesystem Caching heavily affects benchmark results. Use warm-up runs to simulate cache-hot conditions:
# 3 warm-up runs before measurements
hyperfine -w 3 "rg 'function' src/"
# Combine with --prepare to run a command between each run
# Clear caches between runs (requires root)
hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' \
"rg 'function' src/"
Parameterized Benchmarks
Test multiple parameter values in a single run:
# Test with different parameters
hyperfine --parameter-list size 10,100,1000 \
"python3 -c \"import time; time.sleep({size}/1000)\""
# With multiple parameters
hyperfine --parameter-list tool,rg,grep,ag \
--parameter-list dir,src,lib \
"{tool} 'function' {dir}/"
# Export results with parameters
hyperfine --parameter-list num 1,2,4,8 \
--export-json results.json \
"python3 worker.py --threads {num}"
Expected parameterized output:
Benchmark 1: python3 -c "import time; time.sleep(10/1000)"
Time (mean +- sd): 10.1 ms +- 0.3 ms
Benchmark 2: python3 -c "import time; time.sleep(100/1000)"
Time (mean +- sd): 100.2 ms +- 0.5 ms
Benchmark 3: python3 -c "import time; time.sleep(1000/1000)"
Time (mean +- sd): 1001.0 ms +- 2.1 ms
Exporting Results
# Export to JSON
hyperfine --export-json benchmark.json \
"rg 'TODO' src/" "grep -r 'TODO' src/"
# Export to Markdown
hyperfine --export-markdown benchmark.md \
"rg 'TODO' src/" "grep -r 'TODO' src/"
# Export to CSV
hyperfine --export-csv benchmark.csv \
"rg 'TODO' src/" "grep -r 'TODO' src/"
JSON Output Structure
{
"results": [
{
"command": "rg 'TODO' src/",
"mean": 0.0231,
"stddev": 0.0012,
"median": 0.0228,
"user": 0.0185,
"system": 0.0042,
"min": 0.0215,
"max": 0.0278,
"times": [0.0215, 0.0220, ...],
"runs": 100
}
]
}
Advanced Options
# Override shell
hyperfine --shell bash "echo hello"
hyperfine --shell fish "echo hello"
# Time limit per run
hyperfine --time-limit 30 "slow-command"
# Minimum time for benchmark (even if run count not reached)
hyperfine --min-runs 5 --time-limit 10 "slow-command"
# Style output
hyperfine --style color "command"
hyperfine --style basic "command"
hyperfine --style nocolor "command"
# Ignore failures (non-zero exit codes)
hyperfine --ignore-failure "command-that-may-fail"
Real-World Examples
Comparing Compression Tools
hyperfine --prepare "rm -f test.tar.gz test.tar.zst test.tar.xz" \
--parameter-list compressor,gzip,zstd,xz \
"{compressor} -k test.tar"
Testing Shell Startup Time
hyperfine --runs 30 \
"zsh -i -c exit" \
"bash -i -c exit" \
"fish -i -c exit"
# Compare with/without Oh My Zsh
hyperfine "zsh -i -c exit" "env ZSH_DISABLE_COMPFIX=true zsh -i -c exit"
Benchmarking File Search
hyperfine --warmup 5 \
"fd -e js -t f 'test' src/" \
"find src/ -name '*test*.js' -type f"
# With parameterized directory size
hyperfine --parameter-list path src,lib,tests \
"fd -e js '{path}'"
Database Query Benchmark
hyperfine --prepare "psql -c 'SELECT 1'" \
--runs 20 \
"psql -c 'SELECT COUNT(*) FROM users'"
Build Time Comparison
hyperfine --prepare "cargo clean" \
--runs 3 \
"cargo build" \
"cargo build --release"
Integration with CI/CD
#!/bin/bash
# ci-benchmark.sh — Run on every PR
# Benchmark the current build vs main
hyperfine --export-json bench.json \
"./build/new-binary" \
"./build/old-binary"
# Parse JSON and check for regression
MEAN_NEW=$(jq '.results[0].mean' bench.json)
MEAN_OLD=$(jq '.results[1].mean' bench.json)
if (( $(echo "$MEAN_NEW > $MEAN_OLD * 1.05" | bc -l) )); then
echo "Performance regression detected!"
echo "New: ${MEAN_NEW}s, Old: ${MEAN_OLD}s"
exit 1
fi
Common Errors
1. "Results are unreliable" Warning
The benchmark has high variance (stddev > 5%). Increase run count with -r and ensure the system is idle. Close background processes, disable wifi, and avoid thermal throttling.
2. Command With Pipes or Redirection Fails
Wrap complex commands in quotes. Use single quotes for the outer command to avoid shell expansion: hyperfine 'rg "pattern" src/ | head -5'.
3. Out-of-Memory During Benchmark
Some commands use more memory than expected. Use --time-limit to cap individual runs and prevent OOM.
4. --prepare Conflicts With Benchmarked Command
The --prepare command runs between each benchmark iteration. It should not interfere with the benchmarked command's output or state.
5. Export File Overwritten
--export-json, --export-markdown, and --export-csv overwrite existing files without warning. Use unique filenames or timestamp them.
6. Parameterized Results Hard to Parse
Use --export-json and Process with jq. The JSON output includes the parameter values for each benchmark entry.
7. very Slow Commands Not Completing
If a command takes minutes, hyperfine respects --time-limit but must wait for the command to finish. Use --min-runs 3 with --time-limit 30 to cap each run.
Practice Questions
1. How is hyperfine different from the time command?
time runs the command once. hyperfine runs it multiple times (default 10), provides statistical analysis (mean, stddev), handles warm-up, and supports parameterized comparisons.
2. What does the -w flag do?
-w specifies warm-up runs — runs that are not counted in the results but warm up the filesystem cache.
3. How do you compare the performance of two commands?
Pass both commands as arguments: hyperfine "rg pattern" "grep -r pattern". hyperfine runs both and shows the speed ratio.
4. How do you export benchmark results in JSON format?
hyperfine --export-json results.json "command" — the JSON contains mean, stddev, min, max, and raw times for each command.
5. What causes the "Results are unreliable" warning? High variance across runs (stddev exceeds ~5% of mean). Increase run count, close background processes, and ensure consistent system state.
Challenge: Create a benchmark script that: (1) compares ripgrep vs grep searching for a pattern in a large codebase with and without warm caches, (2) compares Node.js vs Python vs Rust for a simple CPU-bound computation (prime number calculation), (3) compares gzip vs zstd vs xz compression ratio and speed on a 1GB tar file, (4) exports all results to a Markdown report with parameterized benchmarks showing how performance scales with input size.
What's Next
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-24.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro