Skip to content

Terraform Best Practices and Patterns: Production-Grade Infrastructure Code

DodaTech 5 min read

Terraform best practices and production patterns combine state architecture, module design, security controls, team workflows, cost management, and code conventions that enable reliable infrastructure management at any scale.

What You'll Learn

In this tutorial, you will learn industry-standard Terraform best practices including state architecture patterns, module design principles, naming conventions, security controls, cost optimization, team collaboration workflows, code review processes, and operational runbooks.

Why It Matters

What works for a single developer fails at team scale. Monolithic state files cause slow plans and wide blast radius. Unversioned modules create unpredictable changes. Missing security controls expose infrastructure. These patterns prevent the most common production Terraform failures.

Real-World Use

DodaTech's platform team manages 500 Terraform resources across 30 environments using these patterns. Durga Antivirus Pro achieves sub-minute plan times, zero state corruption, full audit compliance, and automated cost tracking through these production practices.

State Architecture Patterns

graph TD
    subgraph "State Isolation Strategy"
        ENV[Environment Layer]
        ENV --> DEV[dev/terraform.tfstate]
        ENV --> STG[staging/terraform.tfstate]
        ENV --> PRD[production/terraform.tfstate]
        
        SVC[Service Layer]
        SVC --> NET[network/terraform.tfstate]
        SVC --> DB[database/terraform.tfstate]
        SVC --> CMP[compute/terraform.tfstate]
        
        REG[Registry Layer]
        REG --> S3[S3 Backend]
        REG --> DDB[DynamoDB Locks]
        REG --> KMS[KMS Encryption]
    end
    style DEV fill:#50c878,color:#fff
    style PRD fill:#e74c3c,color:#fff
    style NET fill:#4a90d9,color:#fff
    style DB fill:#ff9900,color:#fff

State File Structure

terraform-state/
  aws-account-123456789012/
    dev/
      network/terraform.tfstate
      database/terraform.tfstate
      compute/terraform.tfstate
    staging/
      network/terraform.tfstate
      database/terraform.tfstate
      compute/terraform.tfstate
    production/
      network/terraform.tfstate
      database/terraform.tfstate
      compute/terraform.tfstate

Module Design Patterns

Standard Module Interface

# modules/vpc/variables.tf
variable "name" {
  description = "Resource name prefix"
  type        = string

  validation {
    condition     = length(var.name) > 2
    error_message = "Name must be at least 3 characters."
  }
}

variable "environment" {
  description = "Deployment environment (dev, staging, production)"
  type        = string

  validation {
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "Environment must be dev, staging, or production."
  }
}

variable "tags" {
  description = "Additional resource tags"
  type        = map(string)
  default     = {}
}

# modules/vpc/outputs.tf
output "vpc_id" {
  description = "The VPC identifier"
  value       = aws_vpc.this.id
}

output "public_subnet_ids" {
  description = "Public subnet identifiers"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "Private subnet identifiers"
  value       = aws_subnet.private[*].id
}

output "vpc_cidr" {
  description = "The VPC CIDR block"
  value       = aws_vpc.this.cidr_block
}

# modules/vpc/main.tf
resource "aws_vpc" "this" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(var.tags, {
    Name        = "${var.name}-vpc"
    Environment = var.environment
    ManagedBy   = "Terraform"
  })
}

Module Composition with Data Flow

# production/main.tf
module "network" {
  source = "github.com/dodatech/terraform-aws-vpc?ref=v2.1.0"

  name        = "production"
  environment = "production"
  cidr_block  = "10.0.0.0/16"
  tags        = { CostCenter = "platform" }
}

module "database" {
  source = "github.com/dodatech/terraform-aws-rds?ref=v1.3.0"

  name        = "main"
  environment = "production"
  vpc_id      = module.network.vpc_id
  subnet_ids  = module.network.private_subnet_ids
  tags        = { CostCenter = "platform" }
}

module "application" {
  source = "github.com/dodatech/terraform-aws-ecs?ref=v3.0.1"

  name        = "api"
  environment = "production"
  vpc_id      = module.network.vpc_id
  subnet_ids  = module.network.public_subnet_ids
  db_endpoint = module.database.endpoint
  tags        = { CostCenter = "platform" }
}

Expected output: Terraform provisions network first, database second, and application third. All modules receive validated, documented inputs. Tags propagate consistently.

Naming Conventions

Resource Naming Standards

# Consistent naming: {environment}-{service}-{resource}
resource "aws_s3_bucket" "logs" {
  bucket = "dodatech-${var.environment}-logs"
}

resource "aws_db_instance" "main" {
  identifier = "${var.environment}-${var.service}-db"
}

resource "aws_security_group" "web" {
  name = "sg-${var.environment}-${var.service}-web"
}

Directory Structure Standard

terraform/
  environments/
    dev/
    staging/
    production/
  modules/
    networking/
    database/
    compute/
    monitoring/
  global/
    iam/
    route53/
  scripts/
  tests/

Security Patterns

Least-Privilege IAM

# iam-patterns.tf
data "aws_iam_policy_document" "terraform_execution" {
  statement {
    effect = "Allow"
    actions = [
      "ec2:Describe*",
      "ec2:Create*",
      "ec2:Delete*",
      "s3:Get*",
      "s3:Put*",
      "s3:List*",
      "dynamodb:GetItem",
      "dynamodb:PutItem",
      "dynamodb:DeleteItem]
    ]
    resources = ["*"]
  }
}

# Prevent destruction of critical resources
resource "aws_db_instance" "critical" {
  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket" "state" {
  lifecycle {
    prevent_destroy = true
  }
}

Secrets Management

# secrets.tf
resource "aws_secretsmanager_secret" "db_password" {
  name = "${var.environment}-db-password"
}

resource "aws_secretsmanager_secret_version" "db_password" {
  secret_id     = aws_secretsmanager_secret.db_password.id
  secret_string = random_password.db.result
}

Operational Patterns

Drift Detection

#!/bin/bash
# drift-detection.sh
ENVIRONMENTS=("dev" "staging" "production")

for ENV in "${ENVIRONMENTS[@]}"; do
  cd "environments/$ENV"
  terraform init -backend-config=backend.hcl
  terraform plan -no-color -detailed-exitcode

  case $? in
    0) echo "$ENV: No drift detected" ;;
    2) echo "ALERT: Drift detected in $ENV" | slack-notify ;;
    *) echo "$ENV: Error running plan" ;;
  esac
done

Cost Estimation

# .github/workflows/infracost.yml
- name: Run Infracost
  uses: infracost/actions/setup@v3
  with:
    api-key: ${{ secrets.INFRACOST_API_KEY }}

- name: Generate Cost Estimate
  run: |
    infracost breakdown --path . \
      --format=diff \
      --show-skipped \
      --terraform-plan-flags="-out=plan.tfplan"

Common Mistakes

1. Monolithic State Files

A single state for everything causes 5-minute plan times and pan-team lock contention. Split by environment and service.

2. Unversioned Module Sources

Using source = "./modules/vpc" means any module change affects all consumers immediately. Always version modules.

3. Ignoring Terraform Lock File

The .<a href="/devops/terraform/">terraform</a>.lock.hcl file pins provider versions. Commit it to Git and never delete it.

4. No Pre-Apply Plan Review

Applying without reviewing the plan ignores destructive changes. Always require plan approval for production.

5. Missing Lifecycle Rules

Resources like databases and state buckets need prevent_destroy to guard against accidental deletion.

Practice Questions

1. What is the recommended Terraform state architecture for production? Separate state files per environment and per service, stored in an encrypted S3 backend with DynamoDB locking.

2. Why should modules use semantic versioning? Versioning ensures consumers opt into changes explicitly. Breaking changes are communicated through major version bumps.

3. What three lifecycle rules should every production Terraform configuration include? prevent_destroy on critical resources, create_before_destroy on replacements, and ignore_changes on auto-managed attributes.

4. Challenge: Design a complete Terraform repository structure for a multi-service, multi-environment deployment with module versioning, remote state, CI/CD pipelines, drift detection, and cost estimation -- then implement the state backend infrastructure.

Mini Project: Production Terraform Repository

Create a complete Terraform repository with: separate directories per environment (dev, staging, production), separate state files per environment and service, a module registry with semantic versioning, a CI/CD pipeline with plan-only PR checks and approval gates, drift detection as a cron job, Infracost cost estimation, and Sentinel policy enforcement.

Terraform Workspaces & Environments
Production Best Practices

What's Next

Apply Terraform best practices to your infrastructure code, then study Production Patterns for advanced operational strategies. Explore DevOps workflows for team-scale infrastructure management.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro