Terraform module debug

Lessons learned

The terraform module change I deployed seemed to have valid syntax, the module initialized properly with terraform init. The terraform plan created a valid plan, no errors or syntax issues were raised. When I started to execute the plan with terraform deploy it returned the following error:

Error: Error launching source instance: VPCIdNotSpecified: No default VPC for this user
        status code: 400, request id: xxx

This error pointed me towards a bug report on the Terraform github issues page, this stated:

I believe the Instance actually derives it's VPC from the subnet, 
can you verify that specifying a subnet_id in a VPC works as designed?

In my setup I was not using a Default VPC, I was creating VPC’s with modules and I was using their output, subnets as an input value for other resources and modules. I had just changed a string variable that passed a subnet_id into a string list variable. This string list variable contained a list of subnet ID’s. I blamed the error on this change and started looking how I could see the context of the variable I just created. Ansible has the -vvvv flag to raise verbosity, this often returns the context of filled in variables. I hoped I would find a way to raise verbosity in Terraform. On Google I found options to raise Terraform logs, this however was for Terraform crash logs. My Google-fu was failing me that day and I could not find a way to debug Terraform modules.

After a while I decided to raise the following question on the Terraform gitter:

Hi all, how do you debug things in Terraform?

Of course I explained the scenario I was working on and the channel fell silent, after waiting a weekend I asked again. And received the following answer: I use the terraform console to check interpolation behavior. It looks like you’re having a problem with the actual value you’re using for your resource arguments.

This made me try the Terraform console and see what results I would get. Unfortunately this would not give any result:

echo "module.subnets.private_subnet_ids" |terraform console -var-file=variables.tfvars
Error: Result depends on values that cannot be determined until after "terraform apply".

Perhaps I used the Terraform console wrong, my syntax was incorrect or I simply didn’t understand the Terraform console. It was not working and I did not get any further. So I decided to start debugging as if I was using Go; forward the values inside the module to output.

Terraform code

The problematic Terraform module:

locals {
  additional_ips_count        = var.associate_public_ip_address && var.instance_enabled && var.additional_ips_count > 0 ? var.additional_ips_count : 0
}

resource "aws_instance" "default" {
  count                       = var.instance_count
  ami                         = data.aws_ami.info.id
  availability_zone           = var.availability_zone
  instance_type               = var.instance_type
  ebs_optimized               = var.ebs_optimized
  disable_api_termination     = var.disable_api_termination
  user_data                   = var.user_data
  iam_instance_profile        = join("", aws_iam_instance_profile.default.*.name)
  associate_public_ip_address = var.associate_public_ip_address
  key_name                    = var.ssh_key_pair
  monitoring                  = var.monitoring
  private_ip                  = concat(var.private_ips, [""])[min(length(var.private_ips), count.index)]
  source_dest_check           = var.source_dest_check

  vpc_security_group_ids      = compact(
    concat(
      [
        var.create_default_security_group ? join("", aws_security_group.default.*.id) : ""
      ],
      var.security_groups
    )
  )

  root_block_device {
    volume_type               = var.root_volume_type
    volume_size               = var.root_volume_size
    iops                      = var.root_iops
    delete_on_termination     = var.delete_on_termination
  }
}

resource "aws_network_interface" "extra_nic" {
  count                       = local.additional_ips_count * var.instance_count
  subnet_id                   = element(var.subnet_ids, (count.index % length(var.subnet_ids)))

  security_groups             = compact(
    concat(
      [
        var.create_default_security_group ? join("", aws_security_group.default.*.id) : ""
      ],
      var.security_groups
    )
  )

  depends_on                   = [aws_instance.default]
}

resource "aws_network_interface_attachment" "extra_nic" {
  count                = local.additional_ips_count * var.instance_count
  instance_id          = aws_instance.default.*.id[count.index % var.instance_count]
  network_interface_id = aws_network_interface.extra_nic.*.id[count.index]
  device_index         = 1 + count.index
  depends_on           = [aws_instance.default]
}

And the variable that caused the issues was:

variable "subnet_ids" {
  type        = list(string)
  description = "List of subnet IDs created in this network"
}

The alert Terraform users reading this could probably already see that I was missing something. The error however did not tell me and since I changed the subnet variable to a string list variable with key subnet_ids, the error was not helping at all, I decided to add output blocks to get some more information:

output "subnet_ids" {
  description = "Lists all the subnet IDs passed to the module."
  value       = var.subnet_ids
}

The output was this:

master_subnet_ids = [
  "subnet-0844242cd93939299",
  "subnet-04d2d883388848c11",
]

Using the terraform output command now gave me the list of subnet id’s I was searching for, there was nothing wrong with the output. This made me realize that the values passed to the string list variable was correct and the issue was not with the input. Something else was wrong, but I knew now it was not with the variable itself. This made me decide to use the ec2 resource directly. Here I got the same error as before. After reading up on the documentation I found out that the subnet_id field was missing from the ec2 resource. A silly mistake, could have probably spent less time in finding it but apparently this was one of those days.

Lessons learned

Main lesson learned should be to RTFM. But who reads the manual anyway? What I took away from this experience is that terraform is far less sophisticated than Ansible in verbosity and debugging. You however have the option to add debugging to modules yourself. It will not tell you what caused the issue, but it will tell you what works and gives you a direction where to go from there.