sed and awk: Text Processing Power Tools
In this tutorial, you'll learn sed and awk for stream editing and text processing including search and replace, field extraction, report generation, and combining both tools for complex text transformations.
Why sed and awk Matter
sed and awk are the two most powerful text-processing tools on Unix. They are installed on every Linux and macOS system. sed excels at stream editing -- find and replace, line deletion, and text transformations. awk adds column-based processing, arithmetic, and report generation. Together, they handle almost any text manipulation task without writing a full program.
By the end of this guide, you will use sed for search/replace and filtering, awk for field extraction and reporting, and combine both for complex pipelines.
What are sed and awk?
sed (stream editor) reads input line by line and applies editing commands. It is ideal for simple transformations. awk is a pattern-scanning and processing language. It splits each line into fields and supports variables, conditionals, and loops.
flowchart LR A[Input Text] --> B[sed] A --> C[awk] B --> D[Search & Replace] B --> E[Line Deletion] B --> F[Text Insertion] C --> G[Field Extraction] C --> H[Report Generation] C --> I[Data Aggregation] D --> J[Output Text]
sed Fundamentals
Basic Syntax
sed 'command' file.txt
sed -i 'command' file.txt # In-place edit
Search and Replace
# Replace first occurrence on each line
sed 's/old/new/' file.txt
# Replace all occurrences (global)
sed 's/old/new/g' file.txt
# Replace only on lines matching a pattern
sed '/pattern/s/old/new/' file.txt
# Replace with case-insensitive
sed 's/old/new/gi' file.txt
Expected Output
$ echo "foo bar foo baz" | sed 's/foo/FIXED/'
FIXED bar foo baz
$ echo "foo bar foo baz" | sed 's/foo/FIXED/g'
FIXED bar FIXED baz
Line Operations
# Delete lines containing a pattern
sed '/pattern/d' file.txt
# Delete empty lines
sed '/^$/d' file.txt
# Delete line 5
sed '5d' file.txt
# Delete lines 10-20
sed '10,20d' file.txt
# Print specific lines
sed -n '5,10p' file.txt # Print lines 5-10
# Insert before a line
sed '5i\This is inserted before line 5' file.txt
# Append after a line
sed '5a\This is appended after line 5' file.txt
Advanced sed
# Multiple commands
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt
# Write matches to another file
sed -n '/ERROR/w errors.txt' log.txt
# In-place edit with backup
sed -i.bak 's/foo/bar/g' config.conf
# Replace only on lines 3-8
sed '3,8s/old/new/g' file.txt
# Use different delimiter
sed 's|/path/to/old|/path/to/new|g' paths.txt
Practical sed Examples
# Remove trailing whitespace
sed -i 's/[[:space:]]*$//' file.txt
# Convert tabs to spaces
sed -i 's/\t/ /g' file.txt
# Uppercase first letter of each word
sed 's/\b\(.\)/\u\1/g' file.txt
# Comment out lines with "DEBUG"
sed '/DEBUG/s/^/# /' config.conf
awk Fundamentals
awk splits each input line into fields separated by whitespace. $1 is the first field, $2 the second, and $0 is the entire line.
Field Extraction
# Print first field of each line
awk '{print $1}' file.txt
# Print first and third fields
awk '{print $1, $3}' file.txt
# Print with custom separator (CSV)
awk -F, '{print $1, $2}' data.csv
# Print formatted output
awk '{printf "%-10s %s\n", $1, $3}' file.txt
Expected Output
$ cat data.txt
Alice 30 Developer
Bob 25 Designer
Charlie 35 Manager
$ awk '{print $1, $3}' data.txt
Alice Developer
Bob Designer
Charlie Manager
Patterns in awk
# Print lines matching a pattern
awk '/ERROR/ {print}' log.txt
# Print lines where field matches
awk '$1 == "Alice" {print}' data.txt
# Numeric comparison
awk '$2 > 30 {print $1, "is over 30"}' data.txt
# Start/end patterns
awk '/BEGIN/,/END/ {print}' file.txt
awk Variables
# Built-in variables
awk '{print NR, NF, $0}' file.txt
# NR = line number, NF = number of fields
# Field separator as variable
awk -v FS=',' '{print $1}' data.csv
# Custom variables
awk -v min=30 '$2 > min {print $1}' data.txt
awk Calculations
# Sum a column
awk '{sum += $2} END {print "Total:", sum}' sales.txt
# Average
awk '{sum += $2; count++} END {print "Average:", sum/count}' data.txt
# Count matches
awk '/ERROR/ {count++} END {print count, "errors found"}' log.txt
# Min and max
awk 'NR == 1 {min = $2; max = $2} $2 < min {min = $2} $2 > max {max = $2} END {print "Min:", min, "Max:", max}' data.txt
awk Functions
# String functions
awk '{print toupper($1), length($1)}' names.txt
# Substring
awk '{print substr($1, 1, 3)}' names.txt
# Split
awk '{split($0, arr, ","); print arr[1]}' data.csv
# Math functions
awk '{print sqrt($2), int($2)}' numbers.txt
Combining sed and awk
# Pipe: sed to clean, awk to process
sed 's/[^a-zA-Z0-9 ]//g' dirty.txt | awk '{print $1, $NF}'
# awk to filter, sed to modify
awk '$2 > 100' data.txt | sed 's/^/HIGH: /'
# Complex pipeline
cat access.log \
| sed 's/\[.*\]//' \
| awk '{print $1, $7}' \
| sort \
| uniq -c \
| sort -rn \
| head -10
Real-World Examples
Log File Analysis
# Count error types in a log file
awk '/ERROR/ {errors[$NF]++} END {for (e in errors) print e, errors[e]}' application.log
# Find top 10 IP addresses accessing a server
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10
# Average response time per endpoint
awk '{endpoints[$7] += $NF; counts[$7]++} END {for (e in endpoints) print e, endpoints[e]/counts[e]}' access.log
CSV Processing
# Convert CSV to tab-separated
sed 's/,/\t/g' data.csv
# Sum a column in CSV (with header)
awk -F, 'NR > 1 {sum += $3} END {print sum}' data.csv
# Extract specific columns from CSV
awk -F, '{print $1, $4}' data.csv | column -t
Configuration File Editing
# Uncomment a line in a config file
sed -i '/^# server_host/s/^# //' config.conf
# Update a configuration value
sed -i 's/^max_connections = .*/max_connections = 200/' config.conf
# Add a line after a match
sed -i '/^\[database\]/a host = localhost' config.conf
Common Errors
| Problem | Cause | Fix |
|---|---|---|
sed: -e expression #1, char 1: unknown command |
Wrong command syntax | sed commands start with a letter: s for substitute, d for delete |
sed: extra characters after command |
Trailing characters | Ensure no extra spaces after the closing / |
awk: division by zero |
Empty field causes division | Check for zero: if ($2 != 0) ... |
awk: fatal: cannot open file |
File not found | Verify file path and permissions |
| sed in-place not working on macOS | BSD sed uses different syntax | Use sed -i '' 's/old/new/g' file.txt (empty string for no backup) |
Practice Questions
1. How do you replace all occurrences of "foo" with "bar" using sed?
sed 's/foo/bar/g' file.txt.
2. What does $1 represent in awk?
The first field of the current record (line).
3. How do you print lines 10-20 of a file with sed?
sed -n '10,20p' file.txt.
4. What is the difference between NR and NF in awk?
NR is the current record (line) number. NF is the number of fields in the current record.
5. How do you sum a column of numbers in awk?
awk '{sum += $1} END {print sum}' file.txt.
Challenge
Write a sed+awk pipeline that processes a web server access log and generates a report showing: the top 10 IP addresses by request count, the number of 404 errors by URL, and the average response time per HTTP method (GET, POST, etc.).
Real-World Task
Use awk to analyze a system log file. Extract all ERROR and WARNING messages, group them by component name, count occurrences of each, and generate a summary report. Use sed to sanitize the input (remove timestamps, anonymize IP addresses). The final output should be a formatted table showing component, error count, and severity level.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro