Terraform Production Best Practices & Patterns
Terraform production best practices combine state isolation, modular design, automated testing, security controls, and operational workflows that enable teams to manage infrastructure reliably at scale.
What You'll Learn
In this tutorial, you will learn production patterns for Terraform including state architecture, module design, team workflows, cost management, incident response, and operational excellence.
Why It Matters
What works for a single developer breaks at team scale. Production Terraform requires deliberate design decisions around state isolation, module versioning, CI/CD integration, and access controls to prevent outages and enable collaboration.
Real-World Use
DodaTech's platform team manages over 500 Terraform resources across 30 environments. Durga Antivirus Pro follows these patterns to achieve sub-minute plan times, zero state corruption incidents, and full audit compliance.
State Architecture
Environment Isolation
Each environment gets its own state file:
# dev/backend.hcl
bucket = "dodatech-terraform-state"
key = "dev/infrastructure/terraform.tfstate"
region = "us-east-1"
# staging/backend.hcl
bucket = "dodatech-terraform-state"
key = "staging/infrastructure/terraform.tfstate"
region = "us-east-1"
# prod/backend.hcl
bucket = "dodatech-terraform-state"
key = "prod/infrastructure/terraform.tfstate"
region = "us-east-1"
Expected output: Three separate state files. Dev changes never affect prod state. Each environment can be modified independently.
Service Isolation
Separate state files per service or domain:
terraform/
network/
backend.hcl -> key = "prod/network/terraform.tfstate"
main.tf
database/
backend.hcl -> key = "prod/database/terraform.tfstate"
main.tf
compute/
backend.hcl -> key = "prod/compute/terraform.tfstate"
main.tf
Expected output: A networking change does not require loading database or compute state. Plan times stay fast, and blast radius is contained.
Module Design Patterns
Stable Module Registry
Publish modules with semantic versioning:
module "vpc" {
source = "github.com/dodatech/terraform-aws-vpc"
version = "2.1.0"
name = "production-vpc"
cidr = "10.0.0.0/16"
}
Expected output: Module versions are pinned. Updates are explicit version bumps reviewed in pull requests.
Module Contract
Every module defines a clear contract:
# modules/rds/variables.tf
variable "name" {
description = "Database identifier"
type = string
}
variable "engine" {
description = "Database engine (postgres, mysql, mariadb)"
type = string
validation {
condition = contains(["postgres", "mysql", "mariadb"], var.engine)
error_message = "Engine must be postgres, mysql, or mariadb."
}
}
variable "instance_class" {
description = "RDS instance class"
type = string
}
variable "storage_gb" {
description = "Allocated storage in GB"
type = number
default = 100
}
Team Workflows
Branch Strategy
# Git branch workflow
main # Production branch, apply on merge
├── staging # Staging branch, auto-apply
├── dev # Dev branch, auto-apply
└── feature/* # Feature branches, plan-only
Expected output: Feature branches run plan-only in CI. Merging to dev or staging triggers auto-apply. PRs to main require review and manual approval.
Pull Request Template
## Terraform Plan Summary
- Resources to add: X
- Resources to change: Y
- Resources to destroy: Z
- Estimated cost impact: $X/month
## Review Checklist
- [ ] Plan output reviewed
- [ ] No destructive changes without justification
- [ ] Security scan passed (tfsec)
- [ ] Formatting check passed
- [ ] Module versions pinned
Cost Management
Cost Estimation in CI
- name: Infracost
run: |
infracost breakdown --path . \
--format=diff \
--show-skipped
Expected output: Every pull request shows the estimated monthly cost impact. Teams catch cost regressions before deployment.
Tagging for Cost Allocation
locals {
cost_tags = {
CostCenter = var.cost_center
Project = var.project_name
Environment = var.environment
Owner = var.team_name
Provisioner = "Terraform"
}
}
Operational Excellence
Drift Detection
# Scheduled drift check
terraform plan -no-color -detailed-exitcode
if [ $? -eq 2 ]; then
echo "Drift detected in $ENVIRONMENT"
# Trigger notification
fi
Expected output: A scheduled job runs <a href="/devops/terraform/">terraform</a> plan and reports any differences between configuration and real-world infrastructure.
Change Management
# Prevent accidental destroy of critical resources
resource "aws_instance" "bastion" {
lifecycle {
prevent_destroy = true
}
}
resource "aws_db_instance" "main" {
lifecycle {
prevent_destroy = true
}
}
Common Mistakes
1. Monolithic State Files
A single state file for all resources causes slow plans, large blast radius, and team conflicts.
2. Unversioned Modules
Using source = "./modules/vpc" without versioning means any change affects all consumers. Use Git tags or registry versions.
3. No Drift Detection
Infrastructure changed outside Terraform creates drift. Automated drift detection catches and reports it.
4. Skipping Pre-Apply Plan Review
Applying without reviewing the plan in production ignores potential destructive changes. Always require plan approval.
5. No Rollback Strategy
A failed apply without rollback capability causes extended outages. Store state history and practice recovery procedures.
Practice Questions
1. Why should state files be separated by environment and service? Environment isolation prevents changes from affecting other environments. Service isolation keeps plans fast and limits blast radius.
2. What is a module contract? A module's defined interface of input variables and outputs, with validation ensuring correct usage.
3. How do you detect configuration drift?
Run <a href="/devops/terraform/">terraform</a> plan on a schedule and alert on unexpected differences between configuration and actual infrastructure.
4. Why pin module versions in production? Version pinning prevents unexpected changes from module updates and ensures reproducible deployments.
5. Challenge: Design a Terraform directory structure for a three-environment (dev, staging, prod), three-service (network, database, compute) deployment with separate state files, module registry, and CI/CD workflow.
Mini Project: Production Terraform Repository
Create a complete Terraform repository structure with separate directories per environment, remote backends per state file, a module registry with semantic versioning, a GitHub Actions CI pipeline with plan and apply stages, drift detection, and Infracost cost estimation.
Related Concepts
What's Next
Apply Terraform production best practices to your infrastructure, and study Terraform Troubleshooting to handle common errors and operational issues.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro