~/src/www.mokhan.ca/xlgmokha [main]
cat infrastructure-as-code-guide.md
infrastructure-as-code-guide.md 59248 bytes | 2021-05-01 12:00
symlink: /dev/random/infrastructure-as-code-guide.md

Infrastructure as Code Guide

This is a collection of notes covering Infrastructure as Code (IaC) principles, Terraform fundamentals, and automation best practices.

Infrastructure as Code Fundamentals

What is Infrastructure as Code?

Infrastructure as Code (IaC) is the practice of managing and provisioning computing infrastructure through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.

Key Benefits:

  • Version Control: Track infrastructure changes over time
  • Reproducibility: Create identical environments consistently
  • Automation: Reduce manual errors and deployment time
  • Documentation: Infrastructure becomes self-documenting
  • Cost Management: Better visibility into resource usage

IaC Principles

  1. Declarative: Describe the desired state, not the steps to achieve it
  2. Idempotent: Running the same configuration multiple times produces the same result
  3. Immutable: Replace infrastructure components rather than modifying them
  4. Version Controlled: All infrastructure definitions should be in source control
  5. Testable: Infrastructure should be validated before deployment

Terraform

Terraform makes it easy to describe your desired infrastructure as code. It takes care of invoking the appropriate APIs to turn your description of infrastructure into actual running resources.

Core Concepts

Configuration Language

Terraform uses HashiCorp Configuration Language (HCL), which is declarative and describes an intended goal rather than the steps to reach that goal.

Basic Syntax:

resource "aws_vpc" "main" {
  cidr_block = var.base_cidr_block
}

<BLOCK TYPE> "<BLOCK LABEL>" "<BLOCK LABEL>" {
  # Block body
  <IDENTIFIER> = <EXPRESSION> # Argument
}

Blocks

Blocks are containers for other content and usually represent the configuration of some kind of object like a resource.

Common Block Types:

  • resource: Infrastructure objects
  • data: Read-only information
  • provider: Plugin configurations
  • variable: Input parameters
  • output: Return values
  • module: Reusable configurations

Getting Started with Terraform

1. Provider Configuration

Specify the cloud provider and authentication:

# Configure the AWS Provider
provider "aws" {
  region = "us-west-2"
  
  # Authentication via environment variables:
  # AWS_ACCESS_KEY_ID
  # AWS_SECRET_ACCESS_KEY
  # Or use AWS profiles/IAM roles
}

# Alternative provider examples
provider "google" {
  project = "my-project-id"
  region  = "us-central1"
}

provider "azurerm" {
  features {}
}

2. Resource Declaration

Define the infrastructure components you want to create:

# VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "main-vpc"
    Environment = "production"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "main-igw"
  }
}

# Subnet
resource "aws_subnet" "public" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  availability_zone       = "us-west-2a"
  map_public_ip_on_launch = true

  tags = {
    Name = "public-subnet"
    Type = "public"
  }
}

# Security Group
resource "aws_security_group" "web" {
  name_prefix = "web-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "web-security-group"
  }
}

# EC2 Instance
resource "aws_instance" "web" {
  ami                    = "ami-0c02fb55956c7d316" # Amazon Linux 2
  instance_type          = "t3.micro"
  subnet_id              = aws_subnet.public.id
  vpc_security_group_ids = [aws_security_group.web.id]

  user_data = <<-EOF
              #!/bin/bash
              yum update -y
              yum install -y httpd
              systemctl start httpd
              systemctl enable httpd
              echo "<h1>Hello from Terraform!</h1>" > /var/www/html/index.html
              EOF

  tags = {
    Name = "web-server"
  }
}

Variables and Outputs

Input Variables

Make your configurations flexible and reusable:

# variables.tf
variable "environment" {
  description = "Environment name"
  type        = string
  default     = "dev"
}

variable "instance_type" {
  description = "EC2 instance type"
  type        = string
  default     = "t3.micro"
  
  validation {
    condition = contains([
      "t3.micro", "t3.small", "t3.medium"
    ], var.instance_type)
    error_message = "Instance type must be t3.micro, t3.small, or t3.medium."
  }
}

variable "allowed_cidr_blocks" {
  description = "CIDR blocks allowed to access the instance"
  type        = list(string)
  default     = ["0.0.0.0/0"]
}

variable "tags" {
  description = "Default tags to apply to resources"
  type        = map(string)
  default = {
    Terraform = "true"
    Owner     = "infrastructure-team"
  }
}

Using Variables:

resource "aws_instance" "web" {
  instance_type = var.instance_type
  
  tags = merge(var.tags, {
    Name        = "${var.environment}-web-server"
    Environment = var.environment
  })
}

Output Values

Return information about your infrastructure:

# outputs.tf
output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "instance_public_ip" {
  description = "Public IP of the web server"
  value       = aws_instance.web.public_ip
}

output "instance_dns" {
  description = "Public DNS name of the web server"
  value       = aws_instance.web.public_dns
  sensitive   = false
}

output "database_endpoint" {
  description = "Database endpoint"
  value       = aws_rds_instance.main.endpoint
  sensitive   = true
}

Data Sources

Query existing infrastructure or external data:

# Get latest Amazon Linux AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

# Get current AWS region
data "aws_region" "current" {}

# Get current AWS account ID
data "aws_caller_identity" "current" {}

# Use data sources in resources
resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t3.micro"
  
  tags = {
    Name      = "web-server"
    Region    = data.aws_region.current.name
    AccountId = data.aws_caller_identity.current.account_id
  }
}

Modules

Create reusable, composable infrastructure components:

Module Structure

modules/
└── vpc/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    └── README.md

Module Definition (modules/vpc/main.tf)

resource "aws_vpc" "this" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = var.enable_dns_hostnames
  enable_dns_support   = var.enable_dns_support

  tags = merge(var.tags, {
    Name = var.name
  })
}

resource "aws_internet_gateway" "this" {
  vpc_id = aws_vpc.this.id

  tags = merge(var.tags, {
    Name = "${var.name}-igw"
  })
}

resource "aws_subnet" "public" {
  count = length(var.public_subnet_cidrs)

  vpc_id                  = aws_vpc.this.id
  cidr_block              = var.public_subnet_cidrs[count.index]
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = merge(var.tags, {
    Name = "${var.name}-public-${count.index + 1}"
    Type = "public"
  })
}

Module Variables (modules/vpc/variables.tf)

variable "name" {
  description = "Name prefix for VPC resources"
  type        = string
}

variable "cidr_block" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "public_subnet_cidrs" {
  description = "CIDR blocks for public subnets"
  type        = list(string)
  default     = ["10.0.1.0/24", "10.0.2.0/24"]
}

variable "availability_zones" {
  description = "Availability zones for subnets"
  type        = list(string)
}

variable "tags" {
  description = "Tags to apply to all resources"
  type        = map(string)
  default     = {}
}

Module Outputs (modules/vpc/outputs.tf)

output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.this.id
}

output "public_subnet_ids" {
  description = "IDs of the public subnets"
  value       = aws_subnet.public[*].id
}

output "internet_gateway_id" {
  description = "ID of the Internet Gateway"
  value       = aws_internet_gateway.this.id
}

Using Modules

module "vpc" {
  source = "./modules/vpc"

  name               = "production"
  cidr_block         = "10.0.0.0/16"
  availability_zones = ["us-west-2a", "us-west-2b"]
  
  public_subnet_cidrs = [
    "10.0.1.0/24",
    "10.0.2.0/24"
  ]

  tags = {
    Environment = "production"
    Project     = "web-app"
  }
}

# Use module outputs
resource "aws_instance" "web" {
  subnet_id = module.vpc.public_subnet_ids[0]
  # ... other configuration
}

State Management

Local State

By default, Terraform stores state locally in terraform.tfstate:

# Initialize Terraform
terraform init

# Plan changes
terraform plan

# Apply changes
terraform apply

# Show current state
terraform show

# List resources in state
terraform state list

Remote State

For team environments, use remote state storage:

# Configure S3 backend
terraform {
  backend "s3" {
    bucket = "my-terraform-state"
    key    = "production/terraform.tfstate"
    region = "us-west-2"
    
    # DynamoDB table for state locking
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

State Commands

# Import existing resource
terraform import aws_instance.web i-1234567890abcdef0

# Remove resource from state (doesn't destroy)
terraform state rm aws_instance.web

# Move resource in state
terraform state mv aws_instance.web aws_instance.app

# Refresh state from real infrastructure
terraform refresh

Terraform Workflow

1. Development Workflow

# 1. Initialize working directory
terraform init

# 2. Format and validate configuration
terraform fmt
terraform validate

# 3. Plan changes
terraform plan -out=tfplan

# 4. Apply changes
terraform apply tfplan

# 5. Clean up (when needed)
terraform destroy

2. CI/CD Integration

# GitHub Actions example
name: Terraform

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  terraform:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v2
    
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v1
      with:
        terraform_version: 1.0.0
    
    - name: Terraform Init
      run: terraform init
    
    - name: Terraform Format Check
      run: terraform fmt -check
    
    - name: Terraform Validate
      run: terraform validate
    
    - name: Terraform Plan
      run: terraform plan
      
    - name: Terraform Apply
      if: github.ref == 'refs/heads/main'
      run: terraform apply -auto-approve

Advanced Terraform Concepts

Workspaces

Manage multiple environments with the same configuration:

# Create and switch to workspace
terraform workspace new staging
terraform workspace new production

# List workspaces
terraform workspace list

# Switch workspace
terraform workspace select production

# Use workspace in configuration
resource "aws_instance" "web" {
  instance_type = terraform.workspace == "production" ? "t3.medium" : "t3.micro"
  
  tags = {
    Environment = terraform.workspace
  }
}

Provisioners

Execute scripts on resources:

resource "aws_instance" "web" {
  # ... other configuration

  # Remote exec provisioner
  provisioner "remote-exec" {
    inline = [
      "sudo yum update -y",
      "sudo yum install -y httpd",
      "sudo systemctl start httpd"
    ]

    connection {
      type        = "ssh"
      user        = "ec2-user"
      private_key = file("~/.ssh/id_rsa")
      host        = self.public_ip
    }
  }

  # Local exec provisioner
  provisioner "local-exec" {
    command = "echo Instance ${self.id} created at ${timestamp()}"
  }
}

Dynamic Blocks

Generate repeated nested blocks:

resource "aws_security_group" "web" {
  name_prefix = "web-"

  dynamic "ingress" {
    for_each = var.ingress_ports
    content {
      from_port   = ingress.value
      to_port     = ingress.value
      protocol    = "tcp"
      cidr_blocks = ["0.0.0.0/0"]
    }
  }
}

variable "ingress_ports" {
  type    = list(number)
  default = [80, 443, 22]
}

Best Practices

1. Code Organization

terraform/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── production/
├── modules/
│   ├── vpc/
│   ├── security/
│   └── compute/
├── shared/
│   ├── variables.tf
│   └── outputs.tf
└── scripts/
    ├── deploy.sh
    └── validate.sh

2. Security Best Practices

Sensitive Data Management:

# Use environment variables for secrets
variable "database_password" {
  description = "Database password"
  type        = string
  sensitive   = true
}

# Mark outputs as sensitive
output "database_password" {
  value     = random_password.db_password.result
  sensitive = true
}

# Use AWS Secrets Manager
resource "aws_secretsmanager_secret" "db_password" {
  name = "database-password"
}

resource "aws_secretsmanager_secret_version" "db_password" {
  secret_id     = aws_secretsmanager_secret.db_password.id
  secret_string = random_password.db_password.result
}

Resource Tagging:

locals {
  common_tags = {
    Environment = var.environment
    Project     = var.project_name
    ManagedBy   = "terraform"
    Owner       = var.team_email
    CostCenter  = var.cost_center
  }
}

resource "aws_instance" "web" {
  # ... other configuration
  tags = local.common_tags
}

3. Performance Optimization

Use Data Sources Efficiently:

# Cache data source results with locals
locals {
  availability_zones = data.aws_availability_zones.available.names
}

data "aws_availability_zones" "available" {
  state = "available"
}

Minimize Provider Calls:

# Use for_each instead of count when possible
resource "aws_subnet" "private" {
  for_each = var.private_subnets

  vpc_id            = aws_vpc.main.id
  cidr_block        = each.value.cidr
  availability_zone = each.value.az

  tags = {
    Name = each.key
  }
}

Troubleshooting

Common Issues

State Lock:

# Force unlock (use with caution)
terraform force-unlock LOCK_ID

Import Existing Resources:

# Import resource to state
terraform import aws_instance.web i-1234567890abcdef0

Debug Mode:

# Enable detailed logging
export TF_LOG=DEBUG
terraform plan

Validate Configuration:

# Check syntax and validate
terraform fmt -check
terraform validate
terraform plan -detailed-exitcode

Infrastructure Testing

1. Terraform Validate

terraform validate

2. Unit Tests with Terratest

// test/terraform_test.go
package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestTerraformVPC(t *testing.T) {
    terraformOptions := &terraform.Options{
        TerraformDir: "../",
        Vars: map[string]interface{}{
            "environment": "test",
        },
    }

    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    vpcId := terraform.Output(t, terraformOptions, "vpc_id")
    assert.NotEmpty(t, vpcId)
}

3. Policy as Code

# Using Sentinel (Terraform Cloud/Enterprise)
import "tfplan"

main = rule {
    all tfplan.resource_changes as _, rc {
        rc.type is "aws_instance" implies
        rc.change.after.instance_type in ["t3.micro", "t3.small"]
    }
}

Cost Management

1. Resource Tagging for Cost Allocation

locals {
  cost_tags = {
    CostCenter  = var.cost_center
    Project     = var.project
    Environment = var.environment
    Owner       = var.owner
  }
}

2. Right-sizing Resources

variable "instance_types" {
  type = map(string)
  default = {
    dev        = "t3.micro"
    staging    = "t3.small"
    production = "t3.medium"
  }
}

resource "aws_instance" "web" {
  instance_type = var.instance_types[var.environment]
}

3. Scheduled Resources

# Auto-scaling schedule for non-production
resource "aws_autoscaling_schedule" "scale_down" {
  count = var.environment != "production" ? 1 : 0
  
  scheduled_action_name  = "scale-down"
  min_size               = 0
  max_size               = 0
  desired_capacity       = 0
  recurrence             = "0 18 * * MON-FRI"
  autoscaling_group_name = aws_autoscaling_group.web.name
}

This comprehensive guide covers the essential concepts and practices for Infrastructure as Code using Terraform, providing a solid foundation for managing infrastructure programmatically.