The terraform module change I deployed seemed to have valid syntax, the module initialized properly with terraform init
. The terraform plan
created a valid plan, no errors or syntax issues were raised. When I started to execute the plan with terraform deploy
it returned the following error:
Error: Error launching source instance: VPCIdNotSpecified: No default VPC for this user
status code: 400, request id: xxx
This error pointed me towards a bug report on the Terraform github issues page, this stated:
I believe the Instance actually derives it's VPC from the subnet,
can you verify that specifying a subnet_id in a VPC works as designed?
In my setup I was not using a Default VPC, I was creating VPC’s with modules and I was using their output, subnets as an input value for other resources and modules.
I had just changed a string variable that passed a subnet_id into a string list variable. This string list variable contained a list of subnet ID’s. I blamed the error on this change and started looking how I could see the context of the variable I just created. Ansible has the -vvvv
flag to raise verbosity, this often returns the context of filled in variables. I hoped I would find a way to raise verbosity in Terraform.
On Google I found options to raise Terraform logs, this however was for Terraform crash logs. My Google-fu was failing me that day and I could not find a way to debug Terraform modules.
After a while I decided to raise the following question on the Terraform gitter:
Hi all, how do you debug things in Terraform?
Of course I explained the scenario I was working on and the channel fell silent, after waiting a weekend I asked again. And received the following answer:
I use the terraform console to check interpolation behavior. It looks like you’re having a problem with the actual value you’re using for your resource arguments.
This made me try the Terraform console and see what results I would get. Unfortunately this would not give any result:
echo "module.subnets.private_subnet_ids" |terraform console -var-file=variables.tfvars
Error: Result depends on values that cannot be determined until after "terraform apply".
Perhaps I used the Terraform console wrong, my syntax was incorrect or I simply didn’t understand the Terraform console. It was not working and I did not get any further. So I decided to start debugging as if I was using Go; forward the values inside the module to output.
Terraform code
The problematic Terraform module:
locals {
additional_ips_count = var.associate_public_ip_address && var.instance_enabled && var.additional_ips_count > 0 ? var.additional_ips_count : 0
}
resource "aws_instance" "default" {
count = var.instance_count
ami = data.aws_ami.info.id
availability_zone = var.availability_zone
instance_type = var.instance_type
ebs_optimized = var.ebs_optimized
disable_api_termination = var.disable_api_termination
user_data = var.user_data
iam_instance_profile = join("", aws_iam_instance_profile.default.*.name)
associate_public_ip_address = var.associate_public_ip_address
key_name = var.ssh_key_pair
monitoring = var.monitoring
private_ip = concat(var.private_ips, [""])[min(length(var.private_ips), count.index)]
source_dest_check = var.source_dest_check
vpc_security_group_ids = compact(
concat(
[
var.create_default_security_group ? join("", aws_security_group.default.*.id) : ""
],
var.security_groups
)
)
root_block_device {
volume_type = var.root_volume_type
volume_size = var.root_volume_size
iops = var.root_iops
delete_on_termination = var.delete_on_termination
}
}
resource "aws_network_interface" "extra_nic" {
count = local.additional_ips_count * var.instance_count
subnet_id = element(var.subnet_ids, (count.index % length(var.subnet_ids)))
security_groups = compact(
concat(
[
var.create_default_security_group ? join("", aws_security_group.default.*.id) : ""
],
var.security_groups
)
)
depends_on = [aws_instance.default]
}
resource "aws_network_interface_attachment" "extra_nic" {
count = local.additional_ips_count * var.instance_count
instance_id = aws_instance.default.*.id[count.index % var.instance_count]
network_interface_id = aws_network_interface.extra_nic.*.id[count.index]
device_index = 1 + count.index
depends_on = [aws_instance.default]
}
And the variable that caused the issues was:
variable "subnet_ids" {
type = list(string)
description = "List of subnet IDs created in this network"
}
The alert Terraform users reading this could probably already see that I was missing something. The error however did not tell me and since I changed the subnet
variable to a string list variable with key subnet_ids
, the error was not helping at all, I decided to add output blocks to get some more information:
output "subnet_ids" {
description = "Lists all the subnet IDs passed to the module."
value = var.subnet_ids
}
The output was this:
master_subnet_ids = [
"subnet-0844242cd93939299",
"subnet-04d2d883388848c11",
]
Using the terraform output command now gave me the list of subnet id’s I was searching for, there was nothing wrong with the output. This made me realize that the values passed to the string list variable was correct and the issue was not with the input. Something else was wrong, but I knew now it was not with the variable itself. This made me decide to use the ec2 resource directly. Here I got the same error as before. After reading up on the documentation I found out that the subnet_id
field was missing from the ec2 resource. A silly mistake, could have probably spent less time in finding it but apparently this was one of those days.
Lessons learned
Main lesson learned should be to RTFM. But who reads the manual anyway? What I took away from this experience is that terraform is far less sophisticated than Ansible in verbosity and debugging. You however have the option to add debugging to modules yourself. It will not tell you what caused the issue, but it will tell you what works and gives you a direction where to go from there.