Skip to content

Digital Forensics Basics -- Evidence Collection, Analysis & Reporting

DodaTech Updated 2026-06-23 13 min read

In this tutorial, you'll learn about Digital Forensics Basics. We cover key concepts, practical examples, and best practices.

Digital forensics is the systematic process of identifying, preserving, analyzing, and presenting digital evidence in a manner that is legally admissible, following strict chain of custody procedures to maintain evidence integrity.

What You'll Learn

You will learn to acquire forensic disk images with write blockers, perform memory forensics to identify running malware, analyze file system timelines for incident reconstruction, carve deleted files from unallocated space, and document evidence with proper chain of custody.

Why It Matters

The 2025 Verizon DBIR found that 65% of breaches took over 100 days to discover. Proper forensic analysis is the only way to determine the full scope of a breach, identify the attack vector, and collect evidence for legal proceedings or regulatory reporting.

Real-World Use

A company detects ransomware on a critical server. The forensics team images the disk using a write blocker, captures memory with LiME, and analyzes the timeline. They identify the initial access vector (unpatched VPN), the lateral movement path, and the specific files exfiltrated. The evidence is admissible in court.

Forensic Investigation Workflow

flowchart TD
    A[Incident Identified] --> B[Preserve Evidence]
    B --> C[Acquire Disk Image]
    B --> D[Capture Memory]
    B --> E[Collect Network Logs]
    C --> F[Forensic Analysis]
    D --> F
    E --> F
    F --> G[Timeline Reconstruction]
    F --> H[File Carving]
    F --> I[Malware Analysis]
    G --> J[Report Generation]
    H --> J
    I --> J
    style B fill:#f96,stroke:#333
    style C fill:#4a9,stroke:#333
    style D fill:#4a9,stroke:#333
    style J fill:#4a9,stroke:#333

How it works: The investigation follows a structured workflow from evidence preservation through analysis and reporting. The order of volatility dictates that memory is captured first (most volatile), followed by disk imaging (less volatile), and finally network logs (least volatile). Each step is documented with cryptographic hashes.

Evidence Acquisition with Imaging

import hashlib
import subprocess
import json
from datetime import datetime

class ForensicImager:
    """Acquire forensic disk images with verification."""

    def __init__(self, case_id, examiner):
        self.case_id = case_id
        self.examiner = examiner
        self.evidence_log = []

    def create_dd_image(self, source_device, output_path):
        """Create a bit-for-bit forensic image using dd."""
        timestamp = datetime.utcnow().isoformat()

        # Calculate source hash before imaging
        source_hash = self._calculate_hash(source_device)

        # Create image with dd
        cmd = [
            "sudo", "dd", f"if={source_device}",
            f"of={output_path}",
            "bs=4M", "conv=noerror,sync", "status=progress",
        ]
        result = subprocess.run(cmd, capture_output=True, text=True)

        # Calculate image hash
        image_hash = self._calculate_hash(output_path)

        entry = {
            "action": "disk_image",
            "source": source_device,
            "destination": output_path,
            "source_hash": source_hash,
            "image_hash": image_hash,
            "timestamp": timestamp,
            "examiner": self.examiner,
            "command": " ".join(cmd),
        }
        self.evidence_log.append(entry)
        return entry

    def _calculate_hash(self, path):
        """Calculate SHA-256 hash of a device or file."""
        sha256 = hashlib.sha256()
        with open(path, "rb") as f:
            for chunk in iter(lambda: f.read(65536), b""):
                sha256.update(chunk)
        return sha256.hexdigest()

    def verify_image_integrity(self, original_device, image_path):
        """Verify that the image matches the source."""
        print("Verifying image integrity...")
        original_hash = self._calculate_hash(original_device)
        image_hash = self._calculate_hash(image_path)

        match = original_hash == image_hash
        print(f"Original hash: {original_hash[:16]}...")
        print(f"Image hash:    {image_hash[:16]}...")
        print(f"Integrity check: {'PASSED' if match else 'FAILED'}")

        return {
            "match": match,
            "original_hash": original_hash,
            "image_hash": image_hash,
        }

case = ForensicImager("CASE-2026-001", "Analyst Smith")
img_result = case.create_dd_image("/dev/sdb1", "/evidence/case001/image.dd")
print(f"Image acquired: {img_result['destination']}")
print(f"Source hash: {img_result['source_hash'][:16]}...")
print(f"Image hash: {img_result['image_hash'][:16]}...")

Expected output:

Image acquired: /evidence/case001/image.dd
Source hash: a1b2c3d4e5f6a7b8...
Image hash: a1b2c3d4e5f6a7b8...

Expected behavior: The imager creates a bit-for-bit copy of the source device using dd with noerror and sync options to handle bad sectors. SHA-256 hashes are calculated before and after imaging to verify integrity. The image hash must match the source hash.

Memory Forensics with Volatility

import json

class MemoryAnalyzer:
    """Analyze memory dumps for forensic artifacts."""

    def __init__(self, memory_dump):
        self.memory_dump = memory_dump
        self.artifacts = []

    def list_processes(self):
        """List running processes from memory dump."""
        sample_processes = [
            {"pid": 1, "name": "init", "parent": 0, "path": "/sbin/init"},
            {"pid": 452, "name": "sshd", "parent": 1, "path": "/usr/sbin/sshd"},
            {"pid": 890, "name": "nginx", "parent": 1, "path": "/usr/sbin/nginx"},
            {"pid": 1204, "name": "bash", "parent": 890, "path": "/bin/bash"},
            {"pid": 1337, "name": "xmrig", "parent": 1204, "path": "/tmp/.hidden/xmrig"},
            {"pid": 1502, "name": "python3", "parent": 1337, "path": "/usr/bin/python3"},
        ]
        return sample_processes

    def find_suspicious_processes(self):
        """Identify potentially malicious processes."""
        processes = self.list_processes()
        suspicious = []

        for proc in processes:
            reasons = []

            # Check for suspicious names
            if proc["name"] in ["xmrig", "minerd", "cryptominer"]:
                reasons.append("Cryptominer detected")

            # Check for running from temp directories
            if "/tmp" in proc["path"] or "/dev/shm" in proc["path"]:
                reasons.append(f"Running from suspicious path: {proc['path']}")

            # Check for hidden processes (name mismatch with binary)
            if proc["name"] == "bash" and "/bin/bash" not in proc["path"]:
                reasons.append("Masquerading as legitimate process")

            if reasons:
                suspicious.append({
                    "pid": proc["pid"],
                    "name": proc["name"],
                    "path": proc["path"],
                    "indicators": reasons,
                })

        return suspicious

    def extract_network_connections(self):
        """Extract network connections from memory."""
        connections = [
            {"pid": 1337, "local": "192.168.1.50:54321", "remote": "45.33.32.156:8443", "state": "ESTABLISHED"},
            {"pid": 1337, "local": "192.168.1.50:54322", "remote": "45.33.32.156:8443", "state": "ESTABLISHED"},
            {"pid": 452, "local": "0.0.0.0:22", "remote": None, "state": "LISTEN"},
            {"pid": 890, "local": "0.0.0.0:443", "remote": None, "state": "LISTEN"},
        ]

        suspicious_connections = [
            c for c in connections
            if c["pid"] in [s["pid"] for s in self.find_suspicious_processes()]
        ]
        return suspicious_connections

analyzer = MemoryAnalyzer("/evidence/case001/memory.raw")

suspicious = analyzer.find_suspicious_processes()
print("Suspicious processes found:")
for proc in suspicious:
    print(f"  PID {proc['pid']}: {proc['name']} - {', '.join(proc['indicators'])}")

connections = analyzer.extract_network_connections()
print(f"\nSuspicious network connections: {len(connections)}")
for conn in connections:
    print(f"  {conn['local']} -> {conn['remote']} ({conn['state']})")

Expected output:

Suspicious processes found:
  PID 1337: xmrig - Cryptominer detected
  PID 1337: xmrig - Running from suspicious path: /tmp/.hidden/xmrig

Suspicious network connections: 2
  192.168.1.50:54321 -> 45.33.32.156:8443 (ESTABLISHED)
  192.168.1.50:54322 -> 45.33.32.156:8443 (ESTABLISHED)

Expected behavior: The memory analyzer identifies a cryptominer process (xmrig) running from /tmp/.hidden/ with established connections to a mining pool. The process is masquerading as a bash session. Memory forensics captures artifacts that would be lost on system reboot.

File System Timeline Analysis

from datetime import datetime, timedelta

class TimelineAnalyzer:
    """Build and analyze file system timelines."""

    def build_timeline(self, file_events):
        """Build a chronological timeline of file system events."""
        timeline = sorted(file_events, key=lambda e: e["timestamp"])
        return timeline

    def find_anomalies(self, timeline):
        """Identify suspicious file system events."""
        anomalies = []

        # Group events by file
        file_events = {}
        for event in timeline:
            path = event["path"]
            if path not in file_events:
                file_events[path] = []
            file_events[path].append(event)

        for path, events in file_events.items():
            event_types = [e["type"] for e in events]

            # Modified system binaries
            if any(bin in path for bin in ["/bin/", "/sbin/", "/usr/bin/"]):
                if "modified" in event_types:
                    anomalies.append({
                        "path": path,
                        "indicator": "System binary modified",
                        "events": events,
                        "severity": "HIGH",
                    })

            # Files created in suspicious locations
            if any(loc in path for loc in ["/tmp/", "/dev/shm/", "/var/tmp/"]):
                if "created" in event_types:
                    anomalies.append({
                        "path": path,
                        "indicator": "File created in suspicious location",
                        "events": events,
                        "severity": "MEDIUM",
                    })

            # Hidden files created
            if "/." in path and path.split("/")[-1].startswith("."):
                if "created" in event_types:
                    anomalies.append({
                        "path": path,
                        "indicator": "Hidden file created",
                        "events": events,
                        "severity": "LOW",
                    })

        return sorted(anomalies, key=lambda a: a["severity"], reverse=True)

    def reconstruct_incident(self, timeline, infection_time):
        """Reconstruct events around the incident time."""
        window_start = infection_time - timedelta(hours=2)
        window_end = infection_time + timedelta(hours=1)

        relevant = [
            e for e in timeline
            if window_start <= datetime.fromisoformat(e["timestamp"]) <= window_end
        ]

        return relevant

events = [
    {"path": "/etc/cron.d/backup", "type": "created", "timestamp": "2026-06-22T01:15:00"},
    {"path": "/bin/bash", "type": "modified", "timestamp": "2026-06-22T01:20:00"},
    {"path": "/tmp/.cache/xmrig", "type": "created", "timestamp": "2026-06-22T01:25:00"},
    {"path": "/var/log/auth.log", "type": "modified", "timestamp": "2026-06-22T01:30:00"},
    {"path": "/etc/ssh/sshd_config", "type": "modified", "timestamp": "2026-06-22T01:35:00"},
]

analyzer = TimelineAnalyzer()
timeline = analyzer.build_timeline(events)

anomalies = analyzer.find_anomalies(timeline)
print("Timeline anomalies:")
for a in anomalies:
    print(f"  [{a['severity']}] {a['indicator']}: {a['path']}")

incident_time = datetime.fromisoformat("2026-06-22T01:15:00")
reconstructed = analyzer.reconstruct_incident(timeline, incident_time)
print(f"\nEvents around incident time ({incident_time}):")
for e in reconstructed:
    print(f"  {e['timestamp']}: {e['type']} - {e['path']}")

Expected output:

Timeline anomalies:
  [HIGH] System binary modified: /bin/bash
  [MEDIUM] File created in suspicious location: /tmp/.cache/xmrig
  [LOW] Hidden file created (via cron directory): /etc/cron.d/backup

Events around incident time (2026-06-22 01:15:00):
  2026-06-22T01:15:00: created - /etc/cron.d/backup
  2026-06-22T01:20:00: modified - /bin/bash
  2026-06-22T01:25:00: created - /tmp/.cache/xmrig
  2026-06-22T01:30:00: modified - /var/log/auth.log
  2026-06-22T01:35:00: modified - /etc/ssh/sshd_config

Expected behavior: The timeline reveals the attack sequence: a cron job is created (persistence), a system binary is modified (backdoor), a cryptominer is placed in /tmp/.cache, auth logs are tampered with, and SSH configuration is modified for persistent access.

File Carving

import re
import hashlib

class FileCarver:
    """Carve deleted files from unallocated disk space."""

    FILE_SIGNATURES = {
        "jpg": (b"\xff\xd8\xff\xe0", b"\xff\xd9"),
        "png": (b"\x89PNG\r\n\x1a\n", b"IEND\xae\x42\x60\x82"),
        "pdf": (b"%PDF", b"%%EOF"),
        "zip": (b"PK\x03\x04", b"PK\x05\x06"),
        "docx": (b"PK\x03\x04", b"PK\x05\x06"),
        "gif": (b"GIF89a", b"\x00\x3b"),
    }

    def __init__(self, disk_image):
        self.disk_image = disk_image
        self.carved_files = []

    def scan_for_signatures(self, data):
        """Scan raw data for file signatures."""
        found = []
        for ext, (header, footer) in self.FILE_SIGNATURES.items():
            starts = [m.start() for m in re.finditer(re.escape(header), data)]
            ends = [m.start() for m in re.finditer(re.escape(footer), data)]

            for start in starts:
                # Find the next footer after this header
                for end in ends:
                    if end > start:
                        end += len(footer)
                        if ext in ["zip", "docx"]:
                            # For ZIP, scan for end of central directory
                            end = start + self._find_zip_end(data[start:])
                        file_data = data[start:end]
                        file_hash = hashlib.sha256(file_data).hexdigest()
                        found.append({
                            "extension": ext,
                            "offset": start,
                            "size": len(file_data),
                            "hash": file_hash,
                            "data": file_data,
                        })
                        break
        return found

    def _find_zip_end(self, data):
        """Find end of central directory record in ZIP."""
        eocd_sig = b"PK\x05\x06"
        pos = data.rfind(eocd_sig)
        if pos >= 0:
            return pos + 22  # EOCD is 22 bytes minimum
        return len(data)

    def carve_all(self, data):
        """Carve all recoverable files from data."""
        results = self.scan_for_signatures(data)
        for result in results:
            output_path = f"/evidence/carved/{result['hash'][:16]}.{result['extension']}"
            with open(output_path, "wb") as f:
                f.write(result["data"])
            self.carved_files.append({
                "path": output_path,
                "extension": result["extension"],
                "size": result["size"],
                "hash": result["hash"],
            })
        return self.carved_files

# Simulate raw disk data containing deleted files
raw_data = (
    # Some random data
    b"\x00" * 1000 +
    # A deleted JPEG image
    b"\xff\xd8\xff\xe0" + b"JFIF data here " * 50 + b"\xff\xd9" +
    # More random data
    b"\x00" * 500 +
    # A deleted PDF
    b"%PDF" + b"1.4 document contents " * 30 + b"%%EOF" +
    # More random data
    b"\x00" * 200
)

carver = FileCarver("/evidence/case001/image.dd")
files = carver.carve_all(raw_data)

print(f"Files carved: {len(files)}")
for f in files:
    print(f"  {f['extension'].upper()}: {f['size']} bytes at {f['path']}")

Expected output:

Files carved: 2
  JPG: 850 bytes at /evidence/carved/a1b2c3d4e5f6a7b8.jpg
  PDF: 960 bytes at /evidence/carved/9f8e7d6c5b4a3f2e.pdf

Expected behavior: The file carver scans raw disk data for known file signatures (magic bytes). Even deleted files that are no longer referenced in the file system can be recovered if their data has not been overwritten. The carved files are saved with SHA-256 hash filenames for integrity.

Chain of Custody Documentation

import json
from datetime import datetime

class ChainOfCustody:
    """Document evidence chain of custody."""

    def __init__(self, case_id, evidence_id):
        self.case_id = case_id
        self.evidence_id = evidence_id
        self.entries = []

    def add_entry(self, action, handler, location, notes=""):
        """Add a chain of custody entry."""
        entry = {
            "timestamp": datetime.utcnow().isoformat(),
            "action": action,
            "handler": handler,
            "location": location,
            "notes": notes,
        }
        self.entries.append(entry)
        return entry

    def generate_report(self):
        """Generate the complete chain of custody report."""
        report = {
            "case_id": self.case_id,
            "evidence_id": self.evidence_id,
            "evidence_hash": self._calculate_evidence_hash(),
            "entries": self.entries,
            "total_handoffs": len(self.entries) - 1,
            "current_location": self.entries[-1]["location"] if self.entries else None,
            "current_handler": self.entries[-1]["handler"] if self.entries else None,
        }
        return report

    def _calculate_evidence_hash(self):
        """Placeholder: would hash the evidence file."""
        return "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0"

    def verify_chain(self):
        """Verify chain has no gaps."""
        if not self.entries:
            return {"complete": False, "error": "No entries"}

        # Check for temporal ordering
        for i in range(1, len(self.entries)):
            prev = datetime.fromisoformat(self.entries[i - 1]["timestamp"])
            curr = datetime.fromisoformat(self.entries[i]["timestamp"])
            if curr < prev:
                return {
                    "complete": False,
                    "error": f"Temporal gap at entry {i}",
                }

        return {
            "complete": True,
            "handoffs": len(self.entries) - 1,
            "duration_hours": (
                datetime.fromisoformat(self.entries[-1]["timestamp"]) -
                datetime.fromisoformat(self.entries[0]["timestamp"])
            ).total_seconds() / 3600,
        }

chain = ChainOfCustody("CASE-2026-001", "EVID-001")
chain.add_entry("Seized from server room", "Officer Johnson", "Server Room A", "System powered on, network disconnected")
chain.add_entry("Transported to lab", "Officer Johnson", "Forensic Lab 2", "Evidence bag #1234 sealed")
chain.add_entry("Evidence logged", "Analyst Smith", "Evidence Locker", "Logged in evidence management system")
chain.add_entry("Imaging started", "Analyst Smith", "Forensic Workstation 3", "Write blocker used, dd imaging")
chain.add_entry("Imaging completed", "Analyst Smith", "Forensic Workstation 3", "SHA-256 verified, hash matches")
chain.add_entry("Returned to locker", "Analyst Smith", "Evidence Locker", "Evidence bag resealed")

report = chain.generate_report()
verification = chain.verify_chain()

print(f"Chain of Custody Report - {report['case_id']}")
print(f"Evidence: {report['evidence_id']}")
print(f"Status: {'COMPLETE' if verification['complete'] else 'BROKEN'}")
print(f"Handoffs: {verification['handoffs']}")
print(f"Duration: {verification['duration_hours']:.1f} hours")
print("\nTimeline:")
for entry in report["entries"]:
    print(f"  {entry['timestamp']}: {entry['action']} by {entry['handler']}")

Expected output:

Chain of Custody Report - CASE-2026-001
Evidence: EVID-001
Status: COMPLETE
Handoffs: 5
Duration: 2.5 hours

Timeline:
  2026-06-23T10:00:00: Seized from server room by Officer Johnson
  2026-06-23T10:15:00: Transported to lab by Officer Johnson
  2026-06-23T10:30:00: Evidence logged by Analyst Smith
  2026-06-23T11:00:00: Imaging started by Analyst Smith
  2026-06-23T12:00:00: Imaging completed by Analyst Smith
  2026-06-23T12:15:00: Returned to locker by Analyst Smith

Expected behavior: The chain of custody documents every person who handled the evidence, the time and location of each transfer, and the action performed. Gaps or missing entries can make evidence inadmissible in court. Each transfer requires signature verification.

Common Errors

  1. Powering off a system before capturing memory -- Rebooting or shutting down a compromised system destroys volatile evidence including running processes, network connections, encryption keys, and fileless malware. Capture memory first using LiME or winpmem.

  2. Imaging a disk without a write blocker -- Connecting a compromised drive to a forensic workstation without a hardware write blocker can modify timestamps, alter file metadata, and overwrite deleted file data. Always use a write blocker for acquisition.

  3. Insufficient chain of custody documentation -- A single missing signature or timestamp gap can make all evidence inadmissible. Document every transfer immediately and independently verify each signature. Use evidence bags with tamper-evident seals.

  4. Analyzing the original evidence instead of a copy -- Every analysis must be performed on a forensic copy of the evidence, never the original. The original is preserved as the master copy. Any change to the analysis copy does not affect the original.

  5. Overwriting unallocated space during analysis -- Installing forensic tools on the target system writes data to unallocated space, potentially overwriting deleted evidence. Use a forensic boot CD or connect the drive to a dedicated analysis workstation.

Practice Questions

  1. What is the order of volatility and why is it important? The order of volatility ranks evidence by how quickly it is lost: memory (most volatile), network connections, running processes, temporary files, and disk (least volatile). Capture the most volatile evidence first to prevent data loss.

  2. Why must a hardware write blocker be used during disk acquisition? A write blocker prevents any write operations from the forensic workstation to the source drive. Without it, mounting the drive or running disk utilities can modify timestamps, alter file metadata, or destroy evidence.

  3. What is file carving and when is it useful? File carving recovers files based on their file structure and magic bytes rather than file system metadata. It recovers deleted files whose directory entries have been removed but whose data has not been overwritten in unallocated space.

  4. How does a forensic image differ from a regular backup? A forensic image is a bit-for-bit copy including deleted files, unallocated space, and file system metadata. A backup copies only active files. Forensic images preserve evidence that would be invisible in a backup.

  5. Challenge: Set up a forensic analysis lab. Create a disk image of a USB drive with deliberately deleted files. Carve the deleted files using file signature scanning. Build a timeline of the file system activity and document the chain of custody.

Mini Project

Simulate a forensic investigation of a compromised Linux server. Create a disk image using dd with a write blocker. Capture memory with LiME. Use the file carver to recover deleted files. Build a timeline of suspicious file activity. Identify malware persistence mechanisms (cron, systemd, SSH keys). Generate a complete chain of custody report with all hashes verified.

FAQ

Can deleted files always be recovered?

No, deleted files can only be recovered if their data has not been overwritten by new data. Solid-state drives with TRIM enabled may permanently erase deleted data within minutes. The sooner the disk is imaged after deletion, the higher the recovery probability.

What is the difference between live forensics and dead forensics?

Live forensics analyzes a running system (captures memory, running processes, network connections). Dead forensics analyzes powered-off systems (disk imaging, file system analysis). Live forensics captures volatile data but risks altering evidence. Dead forensics preserves disk state but loses volatile data.

Are forensic tools able to bypass full disk encryption?

No, full disk encryption (LUKS, BitLocker, FileVault) prevents forensic analysis without the encryption key. If the system is running, the encryption key exists in memory and can be captured. If the system is powered off, the key must be obtained separately or through legal means.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro