Terraform Production Best Practices & Patterns

DodaTech 5 min read

Terraform production best practices combine state isolation, modular design, automated testing, security controls, and operational workflows that enable teams to manage infrastructure reliably at scale.

What You'll Learn

In this tutorial, you will learn production patterns for Terraform including state architecture, module design, team workflows, cost management, incident response, and operational excellence.

Why It Matters

What works for a single developer breaks at team scale. Production Terraform requires deliberate design decisions around state isolation, module versioning, CI/CD integration, and access controls to prevent outages and enable collaboration.

Real-World Use

DodaTech's platform team manages over 500 Terraform resources across 30 environments. Durga Antivirus Pro follows these patterns to achieve sub-minute plan times, zero state corruption incidents, and full audit compliance.

State Architecture

Environment Isolation

Each environment gets its own state file:

# dev/backend.hcl
bucket = "dodatech-terraform-state"
key    = "dev/infrastructure/terraform.tfstate"
region = "us-east-1"

# staging/backend.hcl
bucket = "dodatech-terraform-state"
key    = "staging/infrastructure/terraform.tfstate"
region = "us-east-1"

# prod/backend.hcl
bucket = "dodatech-terraform-state"
key    = "prod/infrastructure/terraform.tfstate"
region = "us-east-1"

Expected output: Three separate state files. Dev changes never affect prod state. Each environment can be modified independently.

Service Isolation

Separate state files per service or domain:

terraform/
  network/
    backend.hcl  -> key = "prod/network/terraform.tfstate"
    main.tf
  database/
    backend.hcl  -> key = "prod/database/terraform.tfstate"
    main.tf
  compute/
    backend.hcl  -> key = "prod/compute/terraform.tfstate"
    main.tf

Expected output: A networking change does not require loading database or compute state. Plan times stay fast, and blast radius is contained.

Module Design Patterns

Stable Module Registry

Publish modules with semantic versioning:

module "vpc" {
  source  = "github.com/dodatech/terraform-aws-vpc"
  version = "2.1.0"

  name = "production-vpc"
  cidr = "10.0.0.0/16"
}

Expected output: Module versions are pinned. Updates are explicit version bumps reviewed in pull requests.

Module Contract

Every module defines a clear contract:

# modules/rds/variables.tf
variable "name" {
  description = "Database identifier"
  type        = string
}

variable "engine" {
  description = "Database engine (postgres, mysql, mariadb)"
  type        = string
  validation {
    condition     = contains(["postgres", "mysql", "mariadb"], var.engine)
    error_message = "Engine must be postgres, mysql, or mariadb."
  }
}

variable "instance_class" {
  description = "RDS instance class"
  type        = string
}

variable "storage_gb" {
  description = "Allocated storage in GB"
  type        = number
  default     = 100
}

Team Workflows

Branch Strategy

# Git branch workflow
main                  # Production branch, apply on merge
├── staging           # Staging branch, auto-apply
├── dev               # Dev branch, auto-apply
└── feature/*         # Feature branches, plan-only

Expected output: Feature branches run plan-only in CI. Merging to dev or staging triggers auto-apply. PRs to main require review and manual approval.

Pull Request Template

## Terraform Plan Summary
- Resources to add: X
- Resources to change: Y
- Resources to destroy: Z
- Estimated cost impact: $X/month

## Review Checklist
- [ ] Plan output reviewed
- [ ] No destructive changes without justification
- [ ] Security scan passed (tfsec)
- [ ] Formatting check passed
- [ ] Module versions pinned

Cost Management

Cost Estimation in CI

- name: Infracost
  run: |
    infracost breakdown --path . \
      --format=diff \
      --show-skipped

Expected output: Every pull request shows the estimated monthly cost impact. Teams catch cost regressions before deployment.

Tagging for Cost Allocation

locals {
  cost_tags = {
    CostCenter   = var.cost_center
    Project      = var.project_name
    Environment  = var.environment
    Owner        = var.team_name
    Provisioner  = "Terraform"
  }
}

Operational Excellence

Drift Detection

# Scheduled drift check
terraform plan -no-color -detailed-exitcode
if [ $? -eq 2 ]; then
  echo "Drift detected in $ENVIRONMENT"
  # Trigger notification
fi

Expected output: A scheduled job runs <a href="/devops/terraform/">terraform</a> plan and reports any differences between configuration and real-world infrastructure.

Change Management

# Prevent accidental destroy of critical resources
resource "aws_instance" "bastion" {
  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_db_instance" "main" {
  lifecycle {
    prevent_destroy = true
  }
}

Common Mistakes

1. Monolithic State Files

A single state file for all resources causes slow plans, large blast radius, and team conflicts.

2. Unversioned Modules

Using source = "./modules/vpc" without versioning means any change affects all consumers. Use Git tags or registry versions.

3. No Drift Detection

Infrastructure changed outside Terraform creates drift. Automated drift detection catches and reports it.

4. Skipping Pre-Apply Plan Review

Applying without reviewing the plan in production ignores potential destructive changes. Always require plan approval.

5. No Rollback Strategy

A failed apply without rollback capability causes extended outages. Store state history and practice recovery procedures.

Practice Questions

1. Why should state files be separated by environment and service? Environment isolation prevents changes from affecting other environments. Service isolation keeps plans fast and limits blast radius.

2. What is a module contract? A module's defined interface of input variables and outputs, with validation ensuring correct usage.

3. How do you detect configuration drift? Run <a href="/devops/terraform/">terraform</a> plan on a schedule and alert on unexpected differences between configuration and actual infrastructure.

4. Why pin module versions in production? Version pinning prevents unexpected changes from module updates and ensures reproducible deployments.

5. Challenge: Design a Terraform directory structure for a three-environment (dev, staging, prod), three-service (network, database, compute) deployment with separate state files, module registry, and CI/CD workflow.

Mini Project: Production Terraform Repository

Create a complete Terraform repository structure with separate directories per environment, remote backends per state file, a module registry with semantic versioning, a GitHub Actions CI pipeline with plan and apply stages, drift detection, and Infracost cost estimation.

Multi-Cloud Terraform

Troubleshooting

What's Next

Apply Terraform production best practices to your infrastructure, and study Terraform Troubleshooting to handle common errors and operational issues.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

← Previous Multi-Cloud Terraform: AWS, Azure & GCP Next → Terraform Troubleshooting: Common Errors & Solutions

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Terraform