Input Validation for APIs — Complete Injection Prevention Guide
In this tutorial, you will learn about Input Validation for APIs. We cover key concepts, practical examples, and best practices to help you master this topic.
Input validation is the process of verifying that user-supplied data conforms to expected formats, types, and ranges before processing. It is the primary defense against injection attacks, including SQL Injection and Command Injection.
What You'll Learn
You'll learn input validation techniques including whitelist validation, schema validation, sanitization, and secure Parsing.
Why It Matters
Injection attacks are the number one web security risk according to OWASP. Proper input validation would prevent over 80% of all web vulnerabilities, including SQLi, XSS, and command injection.
Real-World Use
An e-commerce API validates product IDs as integers between 1 and 100,000. An attacker sending "1; DROP TABLE products" is rejected immediately because the input is not a valid integer.
flowchart LR
A[Raw Input] --> B{Type Check}
B -->|Valid| C{Length Check}
B -->|Invalid| D[Reject]
C -->|Valid| E{Pattern Match}
C -->|Invalid| D
E -->|Match| F{Sanitize}
E -->|No Match| D
F --> G[Safe Input]
D --> H[400 Bad Request]
Teacher's Mindset
Input validation is like checking IDs at an airport security checkpoint. You verify the name matches the ticket (format), check expiration (validity), and scan for prohibited items (sanitization) before letting passengers through.
Implementing Input Validation
from flask import Flask, request, jsonify
import re
app = Flask(__name__)
def validate_username(username: str) -> bool:
if not username or len(username) > 30:
return False
return bool(re.match(r"^[a-zA-Z0-9_]+$", username))
def validate_email(email: str) -> bool:
if not email or len(email) > 254:
return False
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
return bool(re.match(pattern, email))
@app.route("/api/users", methods=["POST"])
def create_user():
data = request.json
if not validate_username(data.get("username", "")):
return jsonify({"error": "Invalid username"}), 400
if not validate_email(data.get("email", "")):
return jsonify({"error": "Invalid email"}), 400
return jsonify({"message": "User created"})
# Schema validation with Pydantic
from pydantic import BaseModel, EmailStr, Field, validator
class CreateUserRequest(BaseModel):
username: str = Field(..., min_length=3, max_length=30, pattern=r"^[a-zA-Z0-9_]+$")
email: EmailStr
age: int = Field(..., ge=0, le=150)
bio: str = Field("", max_length=500)
@validator("username")
def username_must_be_valid(cls, v):
if "admin" in v.lower():
raise ValueError("Username cannot contain 'admin'")
return v
user = CreateUserRequest(
username="john_doe",
email="john@example.com",
age=25
)
print(user.json())
# Input sanitization
import html
def sanitize_input(value: str) -> str:
value = html.escape(value)
value = value.strip()
value = value.replace("\0", "")
return value
def sanitize_filename(filename: str) -> str:
filename = re.sub(r"[^\w\-_.]", "", filename)
filename = filename[:255]
return filename
user_input = "<script>alert('xss')</script>"
print(sanitize_input(user_input))
Common Mistakes
| Mistake | Why It's Wrong | Fix |
|---|---|---|
| Blacklist instead of whitelist | Attackers bypass with new patterns | Always whitelist allowed characters and patterns |
| Validating only on frontend | Attackers call APIs directly | Always validate server-side |
| Not validating all input fields | One unvalidated field becomes the attack vector | Validate every field, including optional ones |
| Accepting unexpected fields | Mass assignment attacks | Strip unknown fields from the request |
| Using eval or exec on user input | Remote code execution | Never execute user input as code |
Practice Questions
- Why is whitelist validation better than blacklist?
- What is mass assignment and how do you prevent it?
- Why should validation be server-side even with client-side validation?
- What is the difference between validation and sanitization?
- How does input length validation prevent buffer overflow?
Challenge
Build a comprehensive input validation middleware using Pydantic. Support nested JSON validation, custom validators, and detailed error messages. Test with injection payloads.
FAQ
Mini Project
Create a user registration endpoint with Pydantic validation. Include username (alphanumeric, 3-30 chars), email (valid format), age (13-120), password (8+ chars, 1 uppercase, 1 number). Test with valid and invalid inputs.
What's Next
Learn about output encoding to prevent XSS when returning user-controlled data.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro