Digital Forensics Basics -- Evidence Collection, Analysis & Reporting
In this tutorial, you'll learn about Digital Forensics Basics. We cover key concepts, practical examples, and best practices.
Digital forensics is the systematic process of identifying, preserving, analyzing, and presenting digital evidence in a manner that is legally admissible, following strict chain of custody procedures to maintain evidence integrity.
What You'll Learn
You will learn to acquire forensic disk images with write blockers, perform memory forensics to identify running malware, analyze file system timelines for incident reconstruction, carve deleted files from unallocated space, and document evidence with proper chain of custody.
Why It Matters
The 2025 Verizon DBIR found that 65% of breaches took over 100 days to discover. Proper forensic analysis is the only way to determine the full scope of a breach, identify the attack vector, and collect evidence for legal proceedings or regulatory reporting.
Real-World Use
A company detects ransomware on a critical server. The forensics team images the disk using a write blocker, captures memory with LiME, and analyzes the timeline. They identify the initial access vector (unpatched VPN), the lateral movement path, and the specific files exfiltrated. The evidence is admissible in court.
Forensic Investigation Workflow
flowchart TD
A[Incident Identified] --> B[Preserve Evidence]
B --> C[Acquire Disk Image]
B --> D[Capture Memory]
B --> E[Collect Network Logs]
C --> F[Forensic Analysis]
D --> F
E --> F
F --> G[Timeline Reconstruction]
F --> H[File Carving]
F --> I[Malware Analysis]
G --> J[Report Generation]
H --> J
I --> J
style B fill:#f96,stroke:#333
style C fill:#4a9,stroke:#333
style D fill:#4a9,stroke:#333
style J fill:#4a9,stroke:#333
How it works: The investigation follows a structured workflow from evidence preservation through analysis and reporting. The order of volatility dictates that memory is captured first (most volatile), followed by disk imaging (less volatile), and finally network logs (least volatile). Each step is documented with cryptographic hashes.
Evidence Acquisition with Imaging
import hashlib
import subprocess
import json
from datetime import datetime
class ForensicImager:
"""Acquire forensic disk images with verification."""
def __init__(self, case_id, examiner):
self.case_id = case_id
self.examiner = examiner
self.evidence_log = []
def create_dd_image(self, source_device, output_path):
"""Create a bit-for-bit forensic image using dd."""
timestamp = datetime.utcnow().isoformat()
# Calculate source hash before imaging
source_hash = self._calculate_hash(source_device)
# Create image with dd
cmd = [
"sudo", "dd", f"if={source_device}",
f"of={output_path}",
"bs=4M", "conv=noerror,sync", "status=progress",
]
result = subprocess.run(cmd, capture_output=True, text=True)
# Calculate image hash
image_hash = self._calculate_hash(output_path)
entry = {
"action": "disk_image",
"source": source_device,
"destination": output_path,
"source_hash": source_hash,
"image_hash": image_hash,
"timestamp": timestamp,
"examiner": self.examiner,
"command": " ".join(cmd),
}
self.evidence_log.append(entry)
return entry
def _calculate_hash(self, path):
"""Calculate SHA-256 hash of a device or file."""
sha256 = hashlib.sha256()
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(65536), b""):
sha256.update(chunk)
return sha256.hexdigest()
def verify_image_integrity(self, original_device, image_path):
"""Verify that the image matches the source."""
print("Verifying image integrity...")
original_hash = self._calculate_hash(original_device)
image_hash = self._calculate_hash(image_path)
match = original_hash == image_hash
print(f"Original hash: {original_hash[:16]}...")
print(f"Image hash: {image_hash[:16]}...")
print(f"Integrity check: {'PASSED' if match else 'FAILED'}")
return {
"match": match,
"original_hash": original_hash,
"image_hash": image_hash,
}
case = ForensicImager("CASE-2026-001", "Analyst Smith")
img_result = case.create_dd_image("/dev/sdb1", "/evidence/case001/image.dd")
print(f"Image acquired: {img_result['destination']}")
print(f"Source hash: {img_result['source_hash'][:16]}...")
print(f"Image hash: {img_result['image_hash'][:16]}...")
Expected output:
Image acquired: /evidence/case001/image.dd
Source hash: a1b2c3d4e5f6a7b8...
Image hash: a1b2c3d4e5f6a7b8...
Expected behavior: The imager creates a bit-for-bit copy of the source device using dd with noerror and sync options to handle bad sectors. SHA-256 hashes are calculated before and after imaging to verify integrity. The image hash must match the source hash.
Memory Forensics with Volatility
import json
class MemoryAnalyzer:
"""Analyze memory dumps for forensic artifacts."""
def __init__(self, memory_dump):
self.memory_dump = memory_dump
self.artifacts = []
def list_processes(self):
"""List running processes from memory dump."""
sample_processes = [
{"pid": 1, "name": "init", "parent": 0, "path": "/sbin/init"},
{"pid": 452, "name": "sshd", "parent": 1, "path": "/usr/sbin/sshd"},
{"pid": 890, "name": "nginx", "parent": 1, "path": "/usr/sbin/nginx"},
{"pid": 1204, "name": "bash", "parent": 890, "path": "/bin/bash"},
{"pid": 1337, "name": "xmrig", "parent": 1204, "path": "/tmp/.hidden/xmrig"},
{"pid": 1502, "name": "python3", "parent": 1337, "path": "/usr/bin/python3"},
]
return sample_processes
def find_suspicious_processes(self):
"""Identify potentially malicious processes."""
processes = self.list_processes()
suspicious = []
for proc in processes:
reasons = []
# Check for suspicious names
if proc["name"] in ["xmrig", "minerd", "cryptominer"]:
reasons.append("Cryptominer detected")
# Check for running from temp directories
if "/tmp" in proc["path"] or "/dev/shm" in proc["path"]:
reasons.append(f"Running from suspicious path: {proc['path']}")
# Check for hidden processes (name mismatch with binary)
if proc["name"] == "bash" and "/bin/bash" not in proc["path"]:
reasons.append("Masquerading as legitimate process")
if reasons:
suspicious.append({
"pid": proc["pid"],
"name": proc["name"],
"path": proc["path"],
"indicators": reasons,
})
return suspicious
def extract_network_connections(self):
"""Extract network connections from memory."""
connections = [
{"pid": 1337, "local": "192.168.1.50:54321", "remote": "45.33.32.156:8443", "state": "ESTABLISHED"},
{"pid": 1337, "local": "192.168.1.50:54322", "remote": "45.33.32.156:8443", "state": "ESTABLISHED"},
{"pid": 452, "local": "0.0.0.0:22", "remote": None, "state": "LISTEN"},
{"pid": 890, "local": "0.0.0.0:443", "remote": None, "state": "LISTEN"},
]
suspicious_connections = [
c for c in connections
if c["pid"] in [s["pid"] for s in self.find_suspicious_processes()]
]
return suspicious_connections
analyzer = MemoryAnalyzer("/evidence/case001/memory.raw")
suspicious = analyzer.find_suspicious_processes()
print("Suspicious processes found:")
for proc in suspicious:
print(f" PID {proc['pid']}: {proc['name']} - {', '.join(proc['indicators'])}")
connections = analyzer.extract_network_connections()
print(f"\nSuspicious network connections: {len(connections)}")
for conn in connections:
print(f" {conn['local']} -> {conn['remote']} ({conn['state']})")
Expected output:
Suspicious processes found:
PID 1337: xmrig - Cryptominer detected
PID 1337: xmrig - Running from suspicious path: /tmp/.hidden/xmrig
Suspicious network connections: 2
192.168.1.50:54321 -> 45.33.32.156:8443 (ESTABLISHED)
192.168.1.50:54322 -> 45.33.32.156:8443 (ESTABLISHED)
Expected behavior: The memory analyzer identifies a cryptominer process (xmrig) running from /tmp/.hidden/ with established connections to a mining pool. The process is masquerading as a bash session. Memory forensics captures artifacts that would be lost on system reboot.
File System Timeline Analysis
from datetime import datetime, timedelta
class TimelineAnalyzer:
"""Build and analyze file system timelines."""
def build_timeline(self, file_events):
"""Build a chronological timeline of file system events."""
timeline = sorted(file_events, key=lambda e: e["timestamp"])
return timeline
def find_anomalies(self, timeline):
"""Identify suspicious file system events."""
anomalies = []
# Group events by file
file_events = {}
for event in timeline:
path = event["path"]
if path not in file_events:
file_events[path] = []
file_events[path].append(event)
for path, events in file_events.items():
event_types = [e["type"] for e in events]
# Modified system binaries
if any(bin in path for bin in ["/bin/", "/sbin/", "/usr/bin/"]):
if "modified" in event_types:
anomalies.append({
"path": path,
"indicator": "System binary modified",
"events": events,
"severity": "HIGH",
})
# Files created in suspicious locations
if any(loc in path for loc in ["/tmp/", "/dev/shm/", "/var/tmp/"]):
if "created" in event_types:
anomalies.append({
"path": path,
"indicator": "File created in suspicious location",
"events": events,
"severity": "MEDIUM",
})
# Hidden files created
if "/." in path and path.split("/")[-1].startswith("."):
if "created" in event_types:
anomalies.append({
"path": path,
"indicator": "Hidden file created",
"events": events,
"severity": "LOW",
})
return sorted(anomalies, key=lambda a: a["severity"], reverse=True)
def reconstruct_incident(self, timeline, infection_time):
"""Reconstruct events around the incident time."""
window_start = infection_time - timedelta(hours=2)
window_end = infection_time + timedelta(hours=1)
relevant = [
e for e in timeline
if window_start <= datetime.fromisoformat(e["timestamp"]) <= window_end
]
return relevant
events = [
{"path": "/etc/cron.d/backup", "type": "created", "timestamp": "2026-06-22T01:15:00"},
{"path": "/bin/bash", "type": "modified", "timestamp": "2026-06-22T01:20:00"},
{"path": "/tmp/.cache/xmrig", "type": "created", "timestamp": "2026-06-22T01:25:00"},
{"path": "/var/log/auth.log", "type": "modified", "timestamp": "2026-06-22T01:30:00"},
{"path": "/etc/ssh/sshd_config", "type": "modified", "timestamp": "2026-06-22T01:35:00"},
]
analyzer = TimelineAnalyzer()
timeline = analyzer.build_timeline(events)
anomalies = analyzer.find_anomalies(timeline)
print("Timeline anomalies:")
for a in anomalies:
print(f" [{a['severity']}] {a['indicator']}: {a['path']}")
incident_time = datetime.fromisoformat("2026-06-22T01:15:00")
reconstructed = analyzer.reconstruct_incident(timeline, incident_time)
print(f"\nEvents around incident time ({incident_time}):")
for e in reconstructed:
print(f" {e['timestamp']}: {e['type']} - {e['path']}")
Expected output:
Timeline anomalies:
[HIGH] System binary modified: /bin/bash
[MEDIUM] File created in suspicious location: /tmp/.cache/xmrig
[LOW] Hidden file created (via cron directory): /etc/cron.d/backup
Events around incident time (2026-06-22 01:15:00):
2026-06-22T01:15:00: created - /etc/cron.d/backup
2026-06-22T01:20:00: modified - /bin/bash
2026-06-22T01:25:00: created - /tmp/.cache/xmrig
2026-06-22T01:30:00: modified - /var/log/auth.log
2026-06-22T01:35:00: modified - /etc/ssh/sshd_config
Expected behavior: The timeline reveals the attack sequence: a cron job is created (persistence), a system binary is modified (backdoor), a cryptominer is placed in /tmp/.cache, auth logs are tampered with, and SSH configuration is modified for persistent access.
File Carving
import re
import hashlib
class FileCarver:
"""Carve deleted files from unallocated disk space."""
FILE_SIGNATURES = {
"jpg": (b"\xff\xd8\xff\xe0", b"\xff\xd9"),
"png": (b"\x89PNG\r\n\x1a\n", b"IEND\xae\x42\x60\x82"),
"pdf": (b"%PDF", b"%%EOF"),
"zip": (b"PK\x03\x04", b"PK\x05\x06"),
"docx": (b"PK\x03\x04", b"PK\x05\x06"),
"gif": (b"GIF89a", b"\x00\x3b"),
}
def __init__(self, disk_image):
self.disk_image = disk_image
self.carved_files = []
def scan_for_signatures(self, data):
"""Scan raw data for file signatures."""
found = []
for ext, (header, footer) in self.FILE_SIGNATURES.items():
starts = [m.start() for m in re.finditer(re.escape(header), data)]
ends = [m.start() for m in re.finditer(re.escape(footer), data)]
for start in starts:
# Find the next footer after this header
for end in ends:
if end > start:
end += len(footer)
if ext in ["zip", "docx"]:
# For ZIP, scan for end of central directory
end = start + self._find_zip_end(data[start:])
file_data = data[start:end]
file_hash = hashlib.sha256(file_data).hexdigest()
found.append({
"extension": ext,
"offset": start,
"size": len(file_data),
"hash": file_hash,
"data": file_data,
})
break
return found
def _find_zip_end(self, data):
"""Find end of central directory record in ZIP."""
eocd_sig = b"PK\x05\x06"
pos = data.rfind(eocd_sig)
if pos >= 0:
return pos + 22 # EOCD is 22 bytes minimum
return len(data)
def carve_all(self, data):
"""Carve all recoverable files from data."""
results = self.scan_for_signatures(data)
for result in results:
output_path = f"/evidence/carved/{result['hash'][:16]}.{result['extension']}"
with open(output_path, "wb") as f:
f.write(result["data"])
self.carved_files.append({
"path": output_path,
"extension": result["extension"],
"size": result["size"],
"hash": result["hash"],
})
return self.carved_files
# Simulate raw disk data containing deleted files
raw_data = (
# Some random data
b"\x00" * 1000 +
# A deleted JPEG image
b"\xff\xd8\xff\xe0" + b"JFIF data here " * 50 + b"\xff\xd9" +
# More random data
b"\x00" * 500 +
# A deleted PDF
b"%PDF" + b"1.4 document contents " * 30 + b"%%EOF" +
# More random data
b"\x00" * 200
)
carver = FileCarver("/evidence/case001/image.dd")
files = carver.carve_all(raw_data)
print(f"Files carved: {len(files)}")
for f in files:
print(f" {f['extension'].upper()}: {f['size']} bytes at {f['path']}")
Expected output:
Files carved: 2
JPG: 850 bytes at /evidence/carved/a1b2c3d4e5f6a7b8.jpg
PDF: 960 bytes at /evidence/carved/9f8e7d6c5b4a3f2e.pdf
Expected behavior: The file carver scans raw disk data for known file signatures (magic bytes). Even deleted files that are no longer referenced in the file system can be recovered if their data has not been overwritten. The carved files are saved with SHA-256 hash filenames for integrity.
Chain of Custody Documentation
import json
from datetime import datetime
class ChainOfCustody:
"""Document evidence chain of custody."""
def __init__(self, case_id, evidence_id):
self.case_id = case_id
self.evidence_id = evidence_id
self.entries = []
def add_entry(self, action, handler, location, notes=""):
"""Add a chain of custody entry."""
entry = {
"timestamp": datetime.utcnow().isoformat(),
"action": action,
"handler": handler,
"location": location,
"notes": notes,
}
self.entries.append(entry)
return entry
def generate_report(self):
"""Generate the complete chain of custody report."""
report = {
"case_id": self.case_id,
"evidence_id": self.evidence_id,
"evidence_hash": self._calculate_evidence_hash(),
"entries": self.entries,
"total_handoffs": len(self.entries) - 1,
"current_location": self.entries[-1]["location"] if self.entries else None,
"current_handler": self.entries[-1]["handler"] if self.entries else None,
}
return report
def _calculate_evidence_hash(self):
"""Placeholder: would hash the evidence file."""
return "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0"
def verify_chain(self):
"""Verify chain has no gaps."""
if not self.entries:
return {"complete": False, "error": "No entries"}
# Check for temporal ordering
for i in range(1, len(self.entries)):
prev = datetime.fromisoformat(self.entries[i - 1]["timestamp"])
curr = datetime.fromisoformat(self.entries[i]["timestamp"])
if curr < prev:
return {
"complete": False,
"error": f"Temporal gap at entry {i}",
}
return {
"complete": True,
"handoffs": len(self.entries) - 1,
"duration_hours": (
datetime.fromisoformat(self.entries[-1]["timestamp"]) -
datetime.fromisoformat(self.entries[0]["timestamp"])
).total_seconds() / 3600,
}
chain = ChainOfCustody("CASE-2026-001", "EVID-001")
chain.add_entry("Seized from server room", "Officer Johnson", "Server Room A", "System powered on, network disconnected")
chain.add_entry("Transported to lab", "Officer Johnson", "Forensic Lab 2", "Evidence bag #1234 sealed")
chain.add_entry("Evidence logged", "Analyst Smith", "Evidence Locker", "Logged in evidence management system")
chain.add_entry("Imaging started", "Analyst Smith", "Forensic Workstation 3", "Write blocker used, dd imaging")
chain.add_entry("Imaging completed", "Analyst Smith", "Forensic Workstation 3", "SHA-256 verified, hash matches")
chain.add_entry("Returned to locker", "Analyst Smith", "Evidence Locker", "Evidence bag resealed")
report = chain.generate_report()
verification = chain.verify_chain()
print(f"Chain of Custody Report - {report['case_id']}")
print(f"Evidence: {report['evidence_id']}")
print(f"Status: {'COMPLETE' if verification['complete'] else 'BROKEN'}")
print(f"Handoffs: {verification['handoffs']}")
print(f"Duration: {verification['duration_hours']:.1f} hours")
print("\nTimeline:")
for entry in report["entries"]:
print(f" {entry['timestamp']}: {entry['action']} by {entry['handler']}")
Expected output:
Chain of Custody Report - CASE-2026-001
Evidence: EVID-001
Status: COMPLETE
Handoffs: 5
Duration: 2.5 hours
Timeline:
2026-06-23T10:00:00: Seized from server room by Officer Johnson
2026-06-23T10:15:00: Transported to lab by Officer Johnson
2026-06-23T10:30:00: Evidence logged by Analyst Smith
2026-06-23T11:00:00: Imaging started by Analyst Smith
2026-06-23T12:00:00: Imaging completed by Analyst Smith
2026-06-23T12:15:00: Returned to locker by Analyst Smith
Expected behavior: The chain of custody documents every person who handled the evidence, the time and location of each transfer, and the action performed. Gaps or missing entries can make evidence inadmissible in court. Each transfer requires signature verification.
Common Errors
Powering off a system before capturing memory -- Rebooting or shutting down a compromised system destroys volatile evidence including running processes, network connections, encryption keys, and fileless malware. Capture memory first using LiME or winpmem.
Imaging a disk without a write blocker -- Connecting a compromised drive to a forensic workstation without a hardware write blocker can modify timestamps, alter file metadata, and overwrite deleted file data. Always use a write blocker for acquisition.
Insufficient chain of custody documentation -- A single missing signature or timestamp gap can make all evidence inadmissible. Document every transfer immediately and independently verify each signature. Use evidence bags with tamper-evident seals.
Analyzing the original evidence instead of a copy -- Every analysis must be performed on a forensic copy of the evidence, never the original. The original is preserved as the master copy. Any change to the analysis copy does not affect the original.
Overwriting unallocated space during analysis -- Installing forensic tools on the target system writes data to unallocated space, potentially overwriting deleted evidence. Use a forensic boot CD or connect the drive to a dedicated analysis workstation.
Practice Questions
What is the order of volatility and why is it important? The order of volatility ranks evidence by how quickly it is lost: memory (most volatile), network connections, running processes, temporary files, and disk (least volatile). Capture the most volatile evidence first to prevent data loss.
Why must a hardware write blocker be used during disk acquisition? A write blocker prevents any write operations from the forensic workstation to the source drive. Without it, mounting the drive or running disk utilities can modify timestamps, alter file metadata, or destroy evidence.
What is file carving and when is it useful? File carving recovers files based on their file structure and magic bytes rather than file system metadata. It recovers deleted files whose directory entries have been removed but whose data has not been overwritten in unallocated space.
How does a forensic image differ from a regular backup? A forensic image is a bit-for-bit copy including deleted files, unallocated space, and file system metadata. A backup copies only active files. Forensic images preserve evidence that would be invisible in a backup.
Challenge: Set up a forensic analysis lab. Create a disk image of a USB drive with deliberately deleted files. Carve the deleted files using file signature scanning. Build a timeline of the file system activity and document the chain of custody.
Mini Project
Simulate a forensic investigation of a compromised Linux server. Create a disk image using dd with a write blocker. Capture memory with LiME. Use the file carver to recover deleted files. Build a timeline of suspicious file activity. Identify malware persistence mechanisms (cron, systemd, SSH keys). Generate a complete chain of custody report with all hashes verified.
FAQ
Can deleted files always be recovered?
No, deleted files can only be recovered if their data has not been overwritten by new data. Solid-state drives with TRIM enabled may permanently erase deleted data within minutes. The sooner the disk is imaged after deletion, the higher the recovery probability.
What is the difference between live forensics and dead forensics?
Live forensics analyzes a running system (captures memory, running processes, network connections). Dead forensics analyzes powered-off systems (disk imaging, file system analysis). Live forensics captures volatile data but risks altering evidence. Dead forensics preserves disk state but loses volatile data.
Are forensic tools able to bypass full disk encryption?
No, full disk encryption (LUKS, BitLocker, FileVault) prevents forensic analysis without the encryption key. If the system is running, the encryption key exists in memory and can be captured. If the system is powered off, the key must be obtained separately or through legal means.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro