1.9.1.1. 有条件创建
Terraform被设计成声明式而非命令式,例如没有常见的 if
条件语句,后来才加上了 count
和 for_each
实现的循环语句(但循环的次数必须是在 plan
阶段就能够确认的,无法根据其他 resource
的输出动态决定)
有时候我们需要根据某种条件来判断是否创建一个资源。虽然我们无法使用if来完成条件判断,但我们还有 count
和 for_each
可以帮助我们完成这个目标。
我们以 UCloud 为例,假如我们正在编写一个旨在被复用的模块,模块的逻辑要创建一台虚拟机,我们的代码可以是这样的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 data ucloud_vpcs "default" { name_regex = "^Default" } data "ucloud_images" "centos" { name_regex = "^CentOS 7" } resource "ucloud_instance" "web" { availability_zone = "cn-bj2-02" image_id = data.ucloud_images.centos.images[0].id instance_type = "n-basic-2" } output "uhost_id" { value = ucloud_instance.web.id }
非常简单。但是如果我们想进一步,让模块的调用者决定创建的主机是否要搭配一个弹性公网 IP 该怎么办?
我们可以在上面的代码后面接上这样的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 variable "allocate_public_ip" { description = "Decide whether to allocate a public ip and bind it to the host" type = bool default = false } resource "ucloud_eip" "public_ip" { count = var.allocate_public_ip ? 1 : 0 name = "public_ip_for_${ucloud_instance.web.name}" internet_type = "bgp" } resource "ucloud_eip_association" "public_ip_binding" { count = var.allocate_public_ip ? 1 : 0 eip_id = ucloud_eip.public_ip[0].id resource_id = ucloud_instance.web.id }
我们首先创建了名为 allocate_public_ip
的输入变量,然后在编写弹性 IP 相关资源代码的时候都声明了 count
参数,值使用了条件表达式,根据 allocate_public_ip
这个输入变量的值决定是 1
还是 0
。这实际上实现了按条件创建资源。
需要注意的是,由于我们使用了 count
,所以现在弹性 IP 相关的资源实际上是多实例资源类型的。我们在 ucloud_eip_association.public_ip_binding
中引用 ucloud_eip.public
时,还是要加上访问下标。由于 ucloud_eip_association.public_ip_binding
与 ucloud_eip.public
实际上是同生同死,所以在这里他们之间的引用还比较简单;如果是其他没有声明 count
的资源引用它们的话,还要针对 allocate_public_ip
为 false
时 ucloud_eip.public
实际为空做相应处理,比如在 output
中:
1 2 3 output "public_ip" { value = join("", ucloud_eip.public_ip[*].public_ip) }
使用 join
函数就可以在即使没有创建弹性 IP 时也能返回空字符串。或者我们也可以用条件表达式:
1 2 3 output "public_ip" { value = length(ucloud_eip.public_ip[*].public_ip) > 0 ? ucloud_eip.public_ip[0].public_ip : "" }
1.9.2.1. 依赖反转
Terraform 编排的基础设施对象彼此之间可能互相存在依赖关系,有时我们在编写一些旨在重用的模块时,模块内定义的资源可能本身需要依赖其他一些资源,这些资源可能已经存在,也可能有待创建。
举一个例子,假设我们编写了一个模块,定义了在 UCloud 上同一个 VPC 中的两台服务器;第一台服务器部署了一个 Web 应用,它被分配在一个 DMZ 子网里;第二台服务器部署了一个数据库,它被分配在一个内网子网里。现在的问题是,在我们编写模块时,我们并没有关于 VPC 和子网的任何信息,我们甚至连服务器应该部署在哪个可用区都不知道。VPC 和子网可能已经存在,也可以有待创建。
我们可以定义这样的一个模块代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 terraform { required_providers { ucloud = { source = "ucloud/ucloud" version = "~>1.22.0" } } } variable "network_config" { type = object({ vpc_id = string web_app_config = object({ az = string subnet_id = string }) db_config = object({ az = string subnet_id = string }) }) } data "ucloud_images" "web_app" { name_regex = "^WebApp" } data "ucloud_images" "mysql" { name_regex = "^MySql 5.7" } resource "ucloud_instance" "web_app" { availability_zone = var.network_config.web_app_config.az image_id = data.ucloud_images.web_app.images[0].id instance_type = "n-basic-2" vpc_id = var.network_config.vpc_id subnet_id = var.network_config.web_app_config.subnet_id } resource "ucloud_instance" "mysql" { availability_zone = var.network_config.db_config.az image_id = data.ucloud_images.mysql.images[0].id instance_type = "n-basic-2" vpc_id = var.network_config.vpc_id subnet_id = var.network_config.db_config.subnet_id }
在代码中我们把依赖的网络参数定义为一个复杂类型,一个强类型对象结构。这样的话模块代码就不用再关注网络层究竟是查询而来的还是创建的,模块中只定义了抽象的网络层定义,其具体实现由调用者从外部注入,从而实现了依赖反转。
如果调用者需要创建网络层,那么代码可以是这样的(假设我们把前面编写的模块保存在 ./machine
目录下而成为一个内嵌模块):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 resource "ucloud_vpc" "vpc" { cidr_blocks = [ "192.168.0.0/16"] } resource "ucloud_subnet" "dmz" { cidr_block = "192.168.0.0/24" vpc_id = ucloud_vpc.vpc.id } resource "ucloud_subnet" "db" { cidr_block = "192.168.1.0/24" vpc_id = ucloud_vpc.vpc.id } module "machine" { source = "./machine" network_config = { vpc_id = ucloud_vpc.vpc.id web_app_config = { az = "cn-bj2-02" subnet_id = ucloud_subnet.dmz.id } db_config = { az = "cn-bj2-02" subnet_id = ucloud_subnet.db.id } } }
或者我们想使用现存的网络来托管服务器:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 data "ucloud_vpcs" "vpc" { name_regex = "^AVeryImportantVpc" } data "ucloud_subnets" dmz_subnet { vpc_id = data.ucloud_vpcs.vpc.vpcs[0].id name_regex = "^DMZ" } data "ucloud_subnets" "db_subnet" { vpc_id = data.ucloud_vpcs.vpc.vpcs[0].id name_regex = "^DataBase" } module "machine" { source = "./machine" network_config = { vpc_id = data.ucloud_vpcs.vpc.vpcs[0].id web_app_config = { az = "cn-bj2-02" subnet_id = data.ucloud_subnets.dmz_subnet.subnets[0].id } db_config = { az = "cn-bj2-02" subnet_id = data.ucloud_subnets.db_subnet.subnets[0].id } } }
由于模块代码中对网络层的定义是抽象的,并没有指定必须是 resource
或是 data
,所以使得模块的调用者可以自己决定如何构造模块的依赖层,作为参数注入模块。
1.9.3.1. 多可用区分布
这是一个相当常见的小技巧。多数公有云为了高可用性,都在单一区域内提供了多可用区的设计。一个可区是一个逻辑上的数据中心,单个可用区可能由于各种自然灾害、网络故障而导致不可用,所以公有云应用部署高可用应用应时刻考虑跨可用区设计。
假如我们想要创建 N 台不同的云主机实例,在 Terraform 0.12 之前的版本中,我们只能用 count
配合模运算来达成这个目的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 variable "az" { type = list(string) default = [ "cn-bj2-03", "cn-bj2-04", ] } variable "instance_count" { type = number default = 4 } data "ucloud_images" "centos" { name_regex = "^CentOS 7" } resource "ucloud_instance" "web" { count = var.instance_count availability_zone = var.az[count.index % length(var.az)] image_id = data.ucloud_images.centos.images[0].id instance_type = "n-standard-1" charge_type = "dynamic" name = "${var.az[count.index % length(var.az)]}-${floor(count.index/length(var.az))}" }
简单来说就是使用 count
创建多实例资源时,用 var.az[count.index % length(var.az)]
可以循环使用每个可用区,使得机器尽可能均匀分布在各个可用区。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 $ terraform apply -auto-approve data.ucloud_images.centos: Refreshing state... ucloud_instance.web[2]: Creating... ucloud_instance.web[0]: Creating... ucloud_instance.web[1]: Creating... ucloud_instance.web[3]: Creating... ucloud_instance.web[2]: Still creating... [10s elapsed] ucloud_instance.web[0]: Still creating... [10s elapsed] ucloud_instance.web[1]: Still creating... [10s elapsed] ucloud_instance.web[3]: Still creating... [10s elapsed] ucloud_instance.web[2]: Still creating... [20s elapsed] ucloud_instance.web[0]: Still creating... [20s elapsed] ucloud_instance.web[1]: Still creating... [20s elapsed] ucloud_instance.web[3]: Still creating... [20s elapsed] ucloud_instance.web[2]: Creation complete after 22s [id =uhost-txa2owrp] ucloud_instance.web[3]: Creation complete after 24s [id =uhost-v3qxdbju] ucloud_instance.web[1]: Creation complete after 26s [id =uhost-td3x545p] ucloud_instance.web[0]: Still creating... [30s elapsed] ucloud_instance.web[0]: Still creating... [40s elapsed] ucloud_instance.web[0]: Creation complete after 43s [id =uhost-scq1prqj] Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
我们可以看一下创建的主机信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 $ terraform show data "ucloud_images" "centos" { id = "475496684" ids = [ "uimage-22noyd" , "uimage-3p0wg0" , "uimage-4keil1" , "uimage-aqvo5l" , "uimage-f1chxn" , "uimage-hq5elw" , "uimage-rkn1v2" , ] images = [ { availability_zone = "cn-bj2-02" create_time = "2019-04-23T17:39:46+08:00" description = "" features = [ "NetEnhanced" , "HotPlug" , ] id = "uimage-rkn1v2" name = "CentOS 7.0 64位" os_name = "CentOS 7.0 64位" os_type = "linux" size = 20 status = "Available" type = "base" }, { availability_zone = "cn-bj2-02" create_time = "2019-04-16T21:05:03+08:00" description = "" features = [ "NetEnhanced" , "HotPlug" , ] id = "uimage-f1chxn" name = "CentOS 7.2 64位" os_name = "CentOS 7.2 64位" os_type = "linux" size = 20 status = "Available" type = "base" }, { availability_zone = "cn-bj2-02" create_time = "2019-09-09T11:40:31+08:00" description = " " features = [ "NetEnhanced" , "HotPlug" , ] id = "uimage-aqvo5l" name = "CentOS 7.4 64位" os_name = "CentOS 7.4 64位" os_type = "linux" size = 20 status = "Available" type = "base" }, { availability_zone = "cn-bj2-02" create_time = "2020-05-07T17:40:42+08:00" description = "" features = [ "NetEnhanced" , "HotPlug" , "CloudInit" , ] id = "uimage-hq5elw" name = "CentOS 7.6 64位" os_name = "CentOS 7.6 64位" os_type = "linux" size = 20 status = "Available" type = "base" }, { availability_zone = "cn-bj2-02" create_time = "2019-04-16T21:05:05+08:00" description = "" features = [ "NetEnhanced" , "HotPlug" , ] id = "uimage-3p0wg0" name = "CentOS 7.3 64位" os_name = "CentOS 7.3 64位" os_type = "linux" size = 20 status = "Available" type = "base" }, { availability_zone = "cn-bj2-02" create_time = "2019-04-16T21:05:02+08:00" description = "" features = [ "NetEnhanced" , "HotPlug" , ] id = "uimage-4keil1" name = "CentOS 7.1 64位" os_name = "CentOS 7.1 64位" os_type = "linux" size = 20 status = "Available" type = "base" }, { availability_zone = "cn-bj2-02" create_time = "2019-04-16T21:04:53+08:00" description = "" features = [ "NetEnhanced" , "HotPlug" , ] id = "uimage-22noyd" name = "CentOS 7.5 64位" os_name = "CentOS 7.5 64位" os_type = "linux" size = 20 status = "Available" type = "base" }, ] most_recent = false name_regex = "^CentOS 7" total_count = 7 } resource "ucloud_instance" "web" { auto_renew = true availability_zone = "cn-bj2-04" boot_disk_size = 20 boot_disk_type = "local_normal" charge_type = "dynamic" cpu = 1 cpu_platform = "Intel/Broadwell" create_time = "2020-11-28T23:09:04+08:00" disk_set = [ { id = "df06380a-00e1-42df-8c07-eec67d817f97" is_boot = true size = 20 type = "local_normal" }, ] expire_time = "2020-11-29T00:09:06+08:00" id = "uhost-td3x545p" image_id = "uimage-dhe5m2" instance_type = "n-standard-1" ip_set = [ { internet_type = "Private" ip = "10.9.44.37" }, ] memory = 4 name = "cn-bj2-04-0" private_ip = "10.9.44.37" root_password = (sensitive value) security_group = "firewall-juhsrlvr" status = "Running" subnet_id = "subnet-dtu3dgpr" tag = "Default" vpc_id = "uvnet-f1c3jq2b" } resource "ucloud_instance" "web" { auto_renew = true availability_zone = "cn-bj2-03" boot_disk_size = 20 boot_disk_type = "local_normal" charge_type = "dynamic" cpu = 1 cpu_platform = "Intel/IvyBridge" create_time = "2020-11-28T23:09:01+08:00" disk_set = [ { id = "1d7f07c9-7342-431b-85bb-d3ee0022063d" is_boot = true size = 20 type = "local_normal" }, ] expire_time = "2020-11-29T00:09:02+08:00" id = "uhost-txa2owrp" image_id = "uimage-pxplaj" instance_type = "n-standard-1" ip_set = [ { internet_type = "Private" ip = "10.9.45.234" }, ] memory = 4 name = "cn-bj2-03-1" private_ip = "10.9.45.234" root_password = (sensitive value) security_group = "firewall-juhsrlvr" status = "Running" subnet_id = "subnet-dtu3dgpr" tag = "Default" vpc_id = "uvnet-f1c3jq2b" } resource "ucloud_instance" "web" { auto_renew = true availability_zone = "cn-bj2-04" boot_disk_size = 20 boot_disk_type = "local_normal" charge_type = "dynamic" cpu = 1 cpu_platform = "Intel/Broadwell" create_time = "2020-11-28T23:09:04+08:00" disk_set = [ { id = "31e2cad6-79a1-4475-a9f5-2c5c95605b18" is_boot = true size = 20 type = "local_normal" }, ] expire_time = "2020-11-29T00:09:04+08:00" id = "uhost-v3qxdbju" image_id = "uimage-dhe5m2" instance_type = "n-standard-1" ip_set = [ { internet_type = "Private" ip = "10.9.85.40" }, ] memory = 4 name = "cn-bj2-04-1" private_ip = "10.9.85.40" root_password = (sensitive value) security_group = "firewall-juhsrlvr" status = "Running" subnet_id = "subnet-dtu3dgpr" tag = "Default" vpc_id = "uvnet-f1c3jq2b" } resource "ucloud_instance" "web" { auto_renew = true availability_zone = "cn-bj2-03" boot_disk_size = 20 boot_disk_type = "local_normal" charge_type = "dynamic" cpu = 1 cpu_platform = "Intel/IvyBridge" create_time = "2020-11-28T23:09:04+08:00" disk_set = [ { id = "da27595d-9645-4883-bf95-87b9076ab7e4" is_boot = true size = 20 type = "local_normal" }, ] expire_time = "2020-11-29T00:09:04+08:00" id = "uhost-scq1prqj" image_id = "uimage-pxplaj" instance_type = "n-standard-1" ip_set = [ { internet_type = "Private" ip = "10.9.107.152" }, ] memory = 4 name = "cn-bj2-03-0" private_ip = "10.9.107.152" root_password = (sensitive value) security_group = "firewall-juhsrlvr" status = "Running" subnet_id = "subnet-dtu3dgpr" tag = "Default" vpc_id = "uvnet-f1c3jq2b" }
可以看到,主机的确是均匀地分散在两个可用区了。
但是这样做在调整可用区时会发生大问题,例如:
1 2 3 4 5 6 7 variable "az" { type = list(string) default = [ "cn-bj2-03", # "cn-bj2-04", ] }
我们禁用了 cn-bj2-04
可用区,按道理我们期待的变更计划应该是将两台原本属于 cn-bj2-04
的主机删除,在 cn-bj2-03
可用区新增两台主机。让我们看看会发生什么:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 $ terraform plan Refreshing Terraform state in-memory prior to plan... The refreshed state will be used to calculate this plan, but will not be persisted to local or remote state storage. data.ucloud_images.centos: Refreshing state... [id =475496684] ucloud_instance.web[0]: Refreshing state... [id =uhost-scq1prqj] ucloud_instance.web[3]: Refreshing state... [id =uhost-v3qxdbju] ucloud_instance.web[2]: Refreshing state... [id =uhost-txa2owrp] ucloud_instance.web[1]: Refreshing state... [id =uhost-td3x545p] ------------------------------------------------------------------------ An execution plan has been generated and is shown below. Resource actions are indicated with the following symbols: ~ update in-place -/+ destroy and then create replacement Terraform will perform the following actions: -/+ resource "ucloud_instance" "web" { ~ auto_renew = true -> (known after apply) ~ availability_zone = "cn-bj2-04" -> "cn-bj2-03" ~ boot_disk_size = 20 -> (known after apply) ~ boot_disk_type = "local_normal" -> (known after apply) charge_type = "dynamic" ~ cpu = 1 -> (known after apply) ~ cpu_platform = "Intel/Broadwell" -> (known after apply) ~ create_time = "2020-11-28T23:09:04+08:00" -> (known after apply) + data_disk_size = (known after apply) + data_disk_type = (known after apply) ~ disk_set = [ - { - id = "df06380a-00e1-42df-8c07-eec67d817f97" - is_boot = true - size = 20 - type = "local_normal" }, ] -> (known after apply) ~ expire_time = "2020-11-29T00:09:06+08:00" -> (known after apply) ~ id = "uhost-td3x545p" -> (known after apply) ~ image_id = "uimage-dhe5m2" -> "uimage-rkn1v2" instance_type = "n-standard-1" ~ ip_set = [ - { - internet_type = "Private" - ip = "10.9.44.37" }, ] -> (known after apply) + isolation_group = (known after apply) ~ memory = 4 -> (known after apply) ~ name = "cn-bj2-04-0" -> "cn-bj2-03-1" ~ private_ip = "10.9.44.37" -> (known after apply) + remark = (known after apply) ~ root_password = (sensitive value) ~ security_group = "firewall-juhsrlvr" -> (known after apply) ~ status = "Running" -> (known after apply) ~ subnet_id = "subnet-dtu3dgpr" -> (known after apply) tag = "Default" ~ vpc_id = "uvnet-f1c3jq2b" -> (known after apply) } ~ resource "ucloud_instance" "web" { auto_renew = true availability_zone = "cn-bj2-03" boot_disk_size = 20 boot_disk_type = "local_normal" charge_type = "dynamic" cpu = 1 cpu_platform = "Intel/IvyBridge" create_time = "2020-11-28T23:09:01+08:00" disk_set = [ { id = "1d7f07c9-7342-431b-85bb-d3ee0022063d" is_boot = true size = 20 type = "local_normal" }, ] expire_time = "2020-11-29T00:09:02+08:00" id = "uhost-txa2owrp" image_id = "uimage-pxplaj" instance_type = "n-standard-1" ip_set = [ { internet_type = "Private" ip = "10.9.45.234" }, ] memory = 4 ~ name = "cn-bj2-03-1" -> "cn-bj2-03-2" private_ip = "10.9.45.234" root_password = (sensitive value) security_group = "firewall-juhsrlvr" status = "Running" subnet_id = "subnet-dtu3dgpr" tag = "Default" vpc_id = "uvnet-f1c3jq2b" } -/+ resource "ucloud_instance" "web" { ~ auto_renew = true -> (known after apply) ~ availability_zone = "cn-bj2-04" -> "cn-bj2-03" ~ boot_disk_size = 20 -> (known after apply) ~ boot_disk_type = "local_normal" -> (known after apply) charge_type = "dynamic" ~ cpu = 1 -> (known after apply) ~ cpu_platform = "Intel/Broadwell" -> (known after apply) ~ create_time = "2020-11-28T23:09:04+08:00" -> (known after apply) + data_disk_size = (known after apply) + data_disk_type = (known after apply) ~ disk_set = [ - { - id = "31e2cad6-79a1-4475-a9f5-2c5c95605b18" - is_boot = true - size = 20 - type = "local_normal" }, ] -> (known after apply) ~ expire_time = "2020-11-29T00:09:04+08:00" -> (known after apply) ~ id = "uhost-v3qxdbju" -> (known after apply) ~ image_id = "uimage-dhe5m2" -> "uimage-rkn1v2" instance_type = "n-standard-1" ~ ip_set = [ - { - internet_type = "Private" - ip = "10.9.85.40" }, ] -> (known after apply) + isolation_group = (known after apply) ~ memory = 4 -> (known after apply) ~ name = "cn-bj2-04-1" -> "cn-bj2-03-3" ~ private_ip = "10.9.85.40" -> (known after apply) + remark = (known after apply) ~ root_password = (sensitive value) ~ security_group = "firewall-juhsrlvr" -> (known after apply) ~ status = "Running" -> (known after apply) ~ subnet_id = "subnet-dtu3dgpr" -> (known after apply) tag = "Default" ~ vpc_id = "uvnet-f1c3jq2b" -> (known after apply) } Plan: 2 to add, 1 to change, 2 to destroy. ------------------------------------------------------------------------ Note: You didn't specify an "-out" parameter to save this plan, so Terraform can' t guarantee that exactly these actions will be performed if "terraform apply" is subsequently run.
变更计划与期望略有不同。我们仔细看细节:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 ~ resource "ucloud_instance" "web" { auto_renew = true availability_zone = "cn-bj2-03" boot_disk_size = 20 boot_disk_type = "local_normal" charge_type = "dynamic" cpu = 1 cpu_platform = "Intel/IvyBridge" create_time = "2020-11-28T23:09:01+08:00" disk_set = [ { id = "1d7f07c9-7342-431b-85bb-d3ee0022063d" is_boot = true size = 20 type = "local_normal" }, ] expire_time = "2020-11-29T00:09:02+08:00" id = "uhost-txa2owrp" image_id = "uimage-pxplaj" instance_type = "n-standard-1" ip_set = [ { internet_type = "Private" ip = "10.9.45.234" }, ] memory = 4 ~ name = "cn-bj2-03-1" -> "cn-bj2-03-2" private_ip = "10.9.45.234" root_password = (sensitive value) security_group = "firewall-juhsrlvr" status = "Running" subnet_id = "subnet-dtu3dgpr" tag = "Default" vpc_id = "uvnet-f1c3jq2b" }
原本名为 cn-bj2-03-1
的主机被更名为 cn-bj2-03-2
了,原本属于 cn-bj2-04
的第一台主机的变更计划是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 -/+ resource "ucloud_instance" "web" { ~ auto_renew = true -> (known after apply) ~ availability_zone = "cn-bj2-04" -> "cn-bj2-03" ~ boot_disk_size = 20 -> (known after apply) ~ boot_disk_type = "local_normal" -> (known after apply) charge_type = "dynamic" ~ cpu = 1 -> (known after apply) ~ cpu_platform = "Intel/Broadwell" -> (known after apply) ~ create_time = "2020-11-28T23:09:04+08:00" -> (known after apply) + data_disk_size = (known after apply) + data_disk_type = (known after apply) ~ disk_set = [ - { - id = "df06380a-00e1-42df-8c07-eec67d817f97" - is_boot = true - size = 20 - type = "local_normal" }, ] -> (known after apply) ~ expire_time = "2020-11-29T00:09:06+08:00" -> (known after apply) ~ id = "uhost-td3x545p" -> (known after apply) ~ image_id = "uimage-dhe5m2" -> "uimage-rkn1v2" instance_type = "n-standard-1" ~ ip_set = [ - { - internet_type = "Private" - ip = "10.9.44.37" }, ] -> (known after apply) + isolation_group = (known after apply) ~ memory = 4 -> (known after apply) ~ name = "cn-bj2-04-0" -> "cn-bj2-03-1" ~ private_ip = "10.9.44.37" -> (known after apply) + remark = (known after apply) ~ root_password = (sensitive value) ~ security_group = "firewall-juhsrlvr" -> (known after apply) ~ status = "Running" -> (known after apply) ~ subnet_id = "subnet-dtu3dgpr" -> (known after apply) tag = "Default" ~ vpc_id = "uvnet-f1c3jq2b" -> (known after apply) }
它的名字从 cn-bj2-04-0
变成了 cn-bj2-03-1
。
仔细想想,实际上这是一个比较低效的变更计划。原本属于 cn-bj2-03
的两台主机应该不做任何变更,只需要删除 cn-bj2-04
的主机,再补充两台 cn-bj2-03
的主机即可。这是因为我们使用的是 count
,而 count
只看元素在列表中的序号。当我们删除一个可用区时,实际上会引起主机序号的重大变化,导致出现大量低效的变更,这就是我们在讲 count
与 for_each
时强调过的,如果创建的资源实例彼此之间几乎完全一致,那么 count
比较合适。否则,那么使用 for_each
会更加安全。
让我们尝试使用 for_each
改写这段逻辑:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 variable "az" { type = list(string) default = [ "cn-bj2-03", "cn-bj2-04", ] } variable "instance_count" { type = number default = 4 } locals { instance_names = [for i in range(var.instance_count):"${var.az[i%length(var.az)]}-${floor(i/length(var.az))}"] } data "ucloud_images" "centos" { name_regex = "^CentOS 7" } resource "ucloud_instance" "web" { for_each = toset(local.instance_names) name = each.value availability_zone = var.az[index(local.instance_names, each.value) % length(var.az)] image_id = data.ucloud_images.centos.images[0].id instance_type = "n-standard-1" charge_type = "dynamic" }
为了生成主机独一无二的名字,我们首先用 range
函数生成了一个序号集合,比如目标主机数是 4
,那么 range(4)
的结果就是 [0, 1, 2, 3]
;然后我们通过取模运算使得名字前缀在可用区列表之间循环递增,最后用 floor(i/length(var.az))
计算出当前序号对应在当前可用区是第几台。例如 4 号主机在第二个可用区就是第二台,生成的名字应该就是 cn-bj-04-1
。
执行结果是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 $ terraform apply -auto-approve data.ucloud_images.centos: Refreshing state... ucloud_instance.web["cn-bj2-03-1" ]: Creating... ucloud_instance.web["cn-bj2-03-0" ]: Creating... ucloud_instance.web["cn-bj2-04-0" ]: Creating... ucloud_instance.web["cn-bj2-04-1" ]: Creating... ucloud_instance.web["cn-bj2-03-1" ]: Still creating... [10s elapsed] ucloud_instance.web["cn-bj2-03-0" ]: Still creating... [10s elapsed] ucloud_instance.web["cn-bj2-04-0" ]: Still creating... [10s elapsed] ucloud_instance.web["cn-bj2-04-1" ]: Still creating... [10s elapsed] ucloud_instance.web["cn-bj2-03-1" ]: Still creating... [20s elapsed] ucloud_instance.web["cn-bj2-03-0" ]: Still creating... [20s elapsed] ucloud_instance.web["cn-bj2-04-0" ]: Still creating... [20s elapsed] ucloud_instance.web["cn-bj2-04-1" ]: Still creating... [20s elapsed] ucloud_instance.web["cn-bj2-04-1" ]: Creation complete after 21s [id =uhost-fjci1i4o] ucloud_instance.web["cn-bj2-04-0" ]: Creation complete after 23s [id =uhost-bkkhmref] ucloud_instance.web["cn-bj2-03-1" ]: Creation complete after 26s [id =uhost-amosgdaa] ucloud_instance.web["cn-bj2-03-0" ]: Still creating... [30s elapsed] ucloud_instance.web["cn-bj2-03-0" ]: Still creating... [40s elapsed] ucloud_instance.web["cn-bj2-03-0" ]: Creation complete after 45s [id =uhost-kltudgnf] Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
如果我们去掉一个可用区:
1 2 3 4 5 6 7 variable "az" { type = list(string) default = [ "cn-bj2-03", # "cn-bj2-04", ] }
我们可以检查一下执行计划:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 $ terraform plan Refreshing Terraform state in-memory prior to plan... The refreshed state will be used to calculate this plan, but will not be persisted to local or remote state storage. data.ucloud_images.centos: Refreshing state... [id =475496684] ucloud_instance.web["cn-bj2-03-1" ]: Refreshing state... [id =uhost-amosgdaa] ucloud_instance.web["cn-bj2-04-0" ]: Refreshing state... [id =uhost-bkkhmref] ucloud_instance.web["cn-bj2-03-0" ]: Refreshing state... [id =uhost-kltudgnf] ucloud_instance.web["cn-bj2-04-1" ]: Refreshing state... [id =uhost-fjci1i4o] ------------------------------------------------------------------------ An execution plan has been generated and is shown below. Resource actions are indicated with the following symbols: + create - destroy Terraform will perform the following actions: + resource "ucloud_instance" "web" { + auto_renew = (known after apply) + availability_zone = "cn-bj2-03" + boot_disk_size = (known after apply) + boot_disk_type = (known after apply) + charge_type = "dynamic" + cpu = (known after apply) + cpu_platform = (known after apply) + create_time = (known after apply) + data_disk_size = (known after apply) + data_disk_type = (known after apply) + disk_set = (known after apply) + expire_time = (known after apply) + id = (known after apply) + image_id = "uimage-rkn1v2" + instance_type = "n-standard-1" + ip_set = (known after apply) + isolation_group = (known after apply) + memory = (known after apply) + name = "cn-bj2-03-2" + private_ip = (known after apply) + remark = (known after apply) + root_password = (sensitive value) + security_group = (known after apply) + status = (known after apply) + subnet_id = (known after apply) + tag = "Default" + vpc_id = (known after apply) } + resource "ucloud_instance" "web" { + auto_renew = (known after apply) + availability_zone = "cn-bj2-03" + boot_disk_size = (known after apply) + boot_disk_type = (known after apply) + charge_type = "dynamic" + cpu = (known after apply) + cpu_platform = (known after apply) + create_time = (known after apply) + data_disk_size = (known after apply) + data_disk_type = (known after apply) + disk_set = (known after apply) + expire_time = (known after apply) + id = (known after apply) + image_id = "uimage-rkn1v2" + instance_type = "n-standard-1" + ip_set = (known after apply) + isolation_group = (known after apply) + memory = (known after apply) + name = "cn-bj2-03-3" + private_ip = (known after apply) + remark = (known after apply) + root_password = (sensitive value) + security_group = (known after apply) + status = (known after apply) + subnet_id = (known after apply) + tag = "Default" + vpc_id = (known after apply) } - resource "ucloud_instance" "web" { - auto_renew = true -> null - availability_zone = "cn-bj2-04" -> null - boot_disk_size = 20 -> null - boot_disk_type = "local_normal" -> null - charge_type = "dynamic" -> null - cpu = 1 -> null - cpu_platform = "Intel/Broadwell" -> null - create_time = "2020-11-28T22:35:53+08:00" -> null - disk_set = [ - { - id = "b214d840-ffec-4958-a3da-3580846fd2a3" - is_boot = true - size = 20 - type = "local_normal" }, ] -> null - expire_time = "2020-11-28T23:35:53+08:00" -> null - id = "uhost-bkkhmref" -> null - image_id = "uimage-dhe5m2" -> null - instance_type = "n-standard-1" -> null - ip_set = [ - { - internet_type = "Private" - ip = "10.9.48.82" }, ] -> null - memory = 4 -> null - name = "cn-bj2-04-0" -> null - private_ip = "10.9.48.82" -> null - root_password = (sensitive value) - security_group = "firewall-juhsrlvr" -> null - status = "Running" -> null - subnet_id = "subnet-dtu3dgpr" -> null - tag = "Default" -> null - vpc_id = "uvnet-f1c3jq2b" -> null } - resource "ucloud_instance" "web" { - auto_renew = true -> null - availability_zone = "cn-bj2-04" -> null - boot_disk_size = 20 -> null - boot_disk_type = "local_normal" -> null - charge_type = "dynamic" -> null - cpu = 1 -> null - cpu_platform = "Intel/Broadwell" -> null - create_time = "2020-11-28T22:35:53+08:00" -> null - disk_set = [ - { - id = "6a3f274f-e072-4a46-90f8-edc7dbaa27f7" - is_boot = true - size = 20 - type = "local_normal" }, ] -> null - expire_time = "2020-11-28T23:35:53+08:00" -> null - id = "uhost-fjci1i4o" -> null - image_id = "uimage-dhe5m2" -> null - instance_type = "n-standard-1" -> null - ip_set = [ - { - internet_type = "Private" - ip = "10.9.176.28" }, ] -> null - memory = 4 -> null - name = "cn-bj2-04-1" -> null - private_ip = "10.9.176.28" -> null - root_password = (sensitive value) - security_group = "firewall-juhsrlvr" -> null - status = "Running" -> null - subnet_id = "subnet-dtu3dgpr" -> null - tag = "Default" -> null - vpc_id = "uvnet-f1c3jq2b" -> null } Plan: 2 to add, 0 to change, 2 to destroy. ------------------------------------------------------------------------ Note: You didn't specify an "-out" parameter to save this plan, so Terraform can' t guarantee that exactly these actions will be performed if "terraform apply" is subsequently run.
可以看到,原来属于 cn-bj2-03
的两台主机原封不动,删除了属于 cn-bj2-04
的两台主机,并且在 cn-bj2-03
可用区新增两台主机。
1.9.4.1. provisioner 与 user_data
我们在介绍资源时介绍了预置器 provisioner
。同时不少公有云厂商的虚拟机都提供了 cloud-init 功能,可以让我们在虚拟机实例第一次启动时执行一段自定义的脚本来执行一些初始化操作。例如我们在Terraform 初步体验 一章里举的例子,在 UCloud 主机第一次启动时我们通过 user_data
来调用 yum 安装并配置了 ngnix 服务。预置器与 cloud-init 都可以用于初始化虚拟机,那么我们应该用哪一种呢?
首先要指出的是,provisioner
的官方文档里明确指出,由于预置器内部的行为 Terraform 无法感知,无法将它执行的变更纳入到声明式的代码管理中,所以预置器应被作为最后的手段使用,那么也就是说,如果 cloud-init 能够满足我们的要求,那么我们应该优先使用 cloud-init。
但是仍然存在一些 cloud-init 无法满足的场景。例如一个最常见的情况是,比如我们要在 cloud-init 当中格式化卷,后续的所有操作都必须在主机成功格式化并挂载卷之后才能顺利进行下去。但是比如 aws_instance
,它的创建是不会等待 user_data
代码执行完成的,只要虚拟机创建成功开始启动,Terraform 就会认为资源创建完成从而继续后续的创建了。
解决这个问题目前来看还是只能依靠预置器。我们以一段 UCloud 云主机代码为例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 resource "ucloud_instance" "web" { availability_zone = "cn-bj2-03" image_id = data.ucloud_images.centos.images[0].id instance_type = "n-standard-1" charge_type = "dynamic" network_interface { eip_internet_type = "bgp" eip_charge_mode = "traffic" eip_bandwidth = 1 } delete_eips_with_instance = true root_password = var.root_password provisioner "remote-exec" { connection { type = "ssh" host = [for ipset in self.ip_set: ipset.ip if ipset.internet_type=="BGP"][0] user = "root" password = var.root_password timeout = "1h" } inline = [ "sleep 1h" ] } }
我们在资源声明中附加了一个 remote-exec
类型的预置器,它的 host
取值使用了 self.ip_set
,self
在当前上下文中指代 provisioner
所属的 ucloud_instance.web
,ip_set
是 ucloud_instance
的一个输出属性,内含云主机的内网 IP 以及绑定的弹性公网 IP 信息。我们用一个 for
表达式过滤出弹性公网 IP 地址,然后使用 ssh 连接。预置器执行的脚本代码很简单,休眠一小时。如果我们执行这段代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 $ terraform apply -auto-approve data.ucloud_images.centos: Refreshing state... ucloud_instance.web: Creating... ucloud_instance.web: Still creating... [10s elapsed] ucloud_instance.web: Still creating... [20s elapsed] ucloud_instance.web: Provisioning with 'remote-exec' ... ucloud_instance.web (remote-exec): Connecting to remote host via SSH... ucloud_instance.web (remote-exec): Host: 106.75.87.148 ucloud_instance.web (remote-exec): User: root ucloud_instance.web (remote-exec): Password: true ucloud_instance.web (remote-exec): Private key: false ucloud_instance.web (remote-exec): Certificate: false ucloud_instance.web (remote-exec): SSH Agent: true ucloud_instance.web (remote-exec): Checking Host Key: false ucloud_instance.web: Still creating... [30s elapsed] ucloud_instance.web (remote-exec): Connecting to remote host via SSH... ucloud_instance.web (remote-exec): Host: 106.75.87.148 ucloud_instance.web (remote-exec): User: root ucloud_instance.web (remote-exec): Password: true ucloud_instance.web (remote-exec): Private key: false ucloud_instance.web (remote-exec): Certificate: false ucloud_instance.web (remote-exec): SSH Agent: true ucloud_instance.web (remote-exec): Checking Host Key: false ucloud_instance.web: Still creating... [40s elapsed] ucloud_instance.web (remote-exec): Connecting to remote host via SSH... ucloud_instance.web (remote-exec): Host: 106.75.87.148 ucloud_instance.web (remote-exec): User: root ucloud_instance.web (remote-exec): Password: true ucloud_instance.web (remote-exec): Private key: false ucloud_instance.web (remote-exec): Certificate: false ucloud_instance.web (remote-exec): SSH Agent: true ucloud_instance.web (remote-exec): Checking Host Key: false ucloud_instance.web (remote-exec): Connected! ucloud_instance.web: Still creating... [50s elapsed] ucloud_instance.web: Still creating... [1m0s elapsed] ucloud_instance.web: Still creating... [1m10s elapsed] ucloud_instance.web: Still creating... [1m20s elapsed] ucloud_instance.web: Still creating... [1m30s elapsed] ucloud_instance.web: Still creating... [1m40s elapsed] ...
不出所料的话,该过程会持续一小时,也就是说,无论预置器脚本中执行的操作耗时多长,ucloud_instance
的创建都会等待它完成,或是触发超时。
在这里我们可以使用这种方法的前提是我们使用的 UCloud 云主机的资源定义允许我们定义资源时声明 network_interface
属性,直接绑定一个公网 IP。如果我们使用的云厂商 Provider 无法让我们在创建主机时绑定公网 IP,而是必须事后绑定弹性 IP 呢?又或者,初始化脚本必须在云主机成功绑定了云盘之后才能成功运行?这种情况下我们还有最后的武器,就是 null_resource
。
null_resource
可能是 Terraform 体系中最“不 Terraform”的存在,它就是我们用来在 Terraform 这样一个声明式世界里干各种命令式脏活的工具。null_resouce
本身是一个空的 resource
,只有一个名为 triggers
的参数以及 id
作为输出属性。
我们看下这个例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 data "ucloud_images" "centos" { name_regex = "^CentOS 7" } resource "ucloud_eip" "eip" { internet_type = "bgp" bandwidth = 1 charge_mode = "traffic" } resource "ucloud_disk" "data_disk" { availability_zone = "cn-bj2-03" disk_size = 10 charge_type = "dynamic" disk_type = "data_disk" } resource "ucloud_instance" "web" { availability_zone = "cn-bj2-03" image_id = data.ucloud_images.centos.images[0].id instance_type = "n-standard-1" charge_type = "dynamic" root_password = var.root_password } resource "ucloud_eip_association" "eip_association" { eip_id = ucloud_eip.eip.id resource_id = ucloud_instance.web.id } resource "ucloud_disk_attachment" "data_disk" { availability_zone = "cn-bj2-03" disk_id = ucloud_disk.data_disk.id instance_id = ucloud_instance.web.id } resource "null_resource" "web_init" { depends_on = [ ucloud_eip_association.eip_association, ucloud_disk_attachment.data_disk ] provisioner "remote-exec" { connection { type = "ssh" host = ucloud_eip.eip.public_ip user = "root" password = var.root_password } inline = [ "echo hello" ] } }
我们假设需要远程执行的操纵是必须在云盘挂载成功以后才可以运行的,那么我们可以声明一个 null_resource
,把 provisioner
声明放在那里,通过显式声明 depends_on
确保它的执行一定是在云盘挂载结束以后。
另外这个例子里我们运行的脚本非常简单,考虑一种更加复杂一些的场景,我们运行的脚本是通过文件读取的,我们希望在文件内容发生变化时能够重新在服务器上运行该脚本,这时我们可以使用 null_resource
的 triggers
参数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 resource "null_resource" "web_init" { depends_on = [ ucloud_eip_association.eip_association, ucloud_disk_attachment.data_disk ] triggers = { script_hash = filemd5("${path.module}/init.sh") } provisioner "remote-exec" { connection { type = "ssh" host = ucloud_eip.eip.public_ip user = "root" password = var.root_password } script = "${path.module}/init.sh" } }
现在 provisioner
运行的脚本是通过 script
参数传入的脚本文件路径,而我们通过 filemd5
函数把文件内容的哈希值传入了 triggers
。triggers
会在值发生改变时触发 null_resource
的重建,这样脚本发生些许变化都会导致重新执行。
官方文档上还给出了对于 triggers
的另一个妙用:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 resource "aws_instance" "cluster" { count = 3 # ... } resource "null_resource" "cluster" { # Changes to any instance of the cluster requires re-provisioning triggers = { cluster_instance_ids = "${join(",", aws_instance.cluster.*.id)}" } # Bootstrap script can run on any instance of the cluster # So we just choose the first in this case connection { host = "${element(aws_instance.cluster.*.public_ip, 0)}" } provisioner "remote-exec" { # Bootstrap script called with private_ip of each node in the clutser inline = [ "bootstrap-cluster.sh ${join(" ", aws_instance.cluster.*.private_ip)}", ] } }
这个例子里,我们需要所有 AWS 主机的内网 IP 参与才能够成功初始化集群,可能是类似 Kafka 或是 RabbitMQ 这样的应用,我们需要把集群节点的IP写入配置文件。如何确保未来机器数量发生调整以后,机器上的配置文件始终能够获得完整的集群内网 IP 信息,这里使用 triggers
就可以轻松完成目标。
另外在绝大多数生产环境中,服务器都不允许拥有独立的公网 IP,或是禁止从服务器对外服务的公网 IP 直接连接 ssh。这时一般我们会在集群中配置一台堡垒机,通过堡垒机进行跳转连接。可以访问通过堡垒机使用SSH的官方文档 获取详细信息,在此不再赘述。
1.9.5.1. destroy-provisioner 中使用变量
我们可以在定义一个 provisioner
块时设置 when
为 destroy
,资源在销毁之前 会首先执行 provisioner
,可以帮助我们执行一些析构逻辑。但是如果我们在 Destroy-Provisioner 中引用了变量的话,比如这样的代码:
1 2 3 4 5 6 7 8 9 10 11 resource "aws_volume_attachment" "attachement_myservice" { count = "${length(var.network_myservice_subnet_ids)}" device_name = "/dev/xvdg" volume_id = "${element(aws_ebs_volume.ebs_myservice.*.id, count.index)}" instance_id = "${element(aws_instance.myservice.*.id, count.index)}" provisioner "local-exec" { command = "aws ec2 stop-instances --instance-ids ${element(aws_instance.myservice.*.id, count.index)} --region ${var.region} && sleep 30" when = "destroy" } }
那么我们会看见这样的报错信息:
1 2 3 4 5 | Error: Invalid reference from destroy provisioner │ │ Destroy-time provisioners and their connection configurations may only reference attributes of the related resource, via 'self', 'count.index', or 'each.key'. │ │ References to other resources during the destroy phase can cause dependency cycles and interact poorly with create_before_destroy.
从 0.12
开始 Terraform 会对在 Destroy-Time Provisioner 中引用除 self
、count.index
、each.key
以外的变量做警告,从 0.13
开始则会直接报错。
1.9.5.1.1. 解决方法
目前官方推荐的做法是把需要引用的变量值通过 triggers
“捕获”一下再引用,例如:
1 2 3 4 5 6 7 8 9 10 11 resource "null_resource" "foo" { triggers { interpreter = var .local_exec_interpreter } provisioner { when = destroy interpreter = self.triggers.interpreter ... } }
通过这种方法就可以避免这个问题。
1.9.6.1. 利用 null_resource 的 triggers 触发其他资源更新
社区有人提了一个 Terraform 问题,他写了这样一段 Terraform 代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 resource "azurerm_key_vault_secret" "service_bus_connection_string" { name = "service-bus-connection-string" value = azurerm_servicebus_topic_authorization_rule.mysb.primary_connection_string key_vault_id = azurerm_key_vault.main.id } resource "azurerm_function_app" "main" { name = "myfn" location = azurerm_resource_group.main.location resource_group_name = azurerm_resource_group.main.name app_service_plan_id = azurerm_app_service_plan.main.id enable_builtin_logging = true https_only = true os_type = "linux" storage_account_name = azurerm_storage_account.main.name storage_account_access_key = azurerm_storage_account.main.primary_access_key version = "~3" app_settings = { AzureWebJobsServiceBus = "@Microsoft.KeyVault(SecretUri=${azurerm_key_vault_secret.service_bus_connection_string.id})" } }
意思大概是他把一段含有机密信息的连接字符串保存在 Azure KeyVault 服务中,然后创建了一个 Azure Faas 函数,通过 KeyVault 机密引用地址传递该机密。
1.9.6.1.1. 问题描述
这位老兄发现,如果他修改了机密的内容,也就是 azurerm_key_vault_secret
声明里的 value = azurerm_servicebus_topic_authorization_rule.mysb.primary_connection_string
这一段的值的时候,KeyVault 保存的机密内容的确会正确更新,但 Azure Function 读取到的还是旧的机密引用地址,也就是这段代码中得到的 KeyVault 机密引用地址没有更新:
1 2 3 app_settings = { AzureWebJobsServiceBus = "@Microsoft.KeyVault(SecretUri=${azurerm_key_vault_secret.service_bus_connection_string.id})" }
更加奇怪的是,这之后他什么都没有做,只是重新再执行一次 terraform apply
,该引用地址又被正确更新了?!
1.9.6.1.2. 问题原因
因为 KeyVault Secret 被设计成是不可变的,所以更新 azurerm_key_vault_secret
的 value
会导致资源被重新创建。Terraform 官网上的相关文档中对该参数的定义如下:
value
- (Required) Specifies the value of the Key Vault Secret.
在 Terraform 中 ,一个参数如果被标记为 Required
,那么它不但是必填项,同时类似数据库记录的主键的概念,主键不同的记录被认定是两条不同的记录,修改记录的主键值可以看作是删除重建之。Terraform 资源的 Required
参数如果发生变化会触发重新创建资源,这就导致了修改 value
后,该 azurerm_key_vault_secret
的 id
也会发生变化。
那么为什么在 azurerm_key_vault_secret
被重新创建之后,我们会发现 azurerm_function_app
中引用的 id
没有变化呢?
Terraform 的工作流含有 Plan 和 Apply 两个主要阶段,首先会分析 Terraform 代码,调用 terraform refresh
(可以用参数跳过该步骤)读取资源在云端目前的最新状态,再加上 State 文件中记录的状态,三个状态对比出一个执行计划,使得最终产生的云端状态能够符合当前代码描述的状态。
就这个场景而言,Terraform 能够意识到 azurerm_key_vault_secret
的参数发生了变化,这会导致某种程度的更新,但它无法意识到这个更新会导致 azurerm_key_vault_secret
的 id
发生变化,进而导致 azurerm_function_app
也必须进行更新,所以就发生了他第一次执行 terraform apply
后看到的情况。
当他第二次执行 terraform apply
时,Terraform 记录的 State 文件里,azurerm_key_vault_secret
的 id
和azurerm_function_app
里使用的 id
已经对不上了,这时 Terraform 会再生成一个更新 azurerm_function_app
的 Plan,执行后一切恢复正常。
有没有办法让 azurerm_function_app
能在第一次生成 Plan 时就感知到这个变更?
1.9.6.1.3. 巧用 null_resource 的 triggers
HashiCorp 提供了一个非常常用的内建 Provider —— null
。其中最知名的资源就是 null_resource
了,一般它都是和 provisioner
搭配出现,可以用来在某些资源创建完成后执行一些自定义脚本等等。但是它还有一个很有用的参数:
The triggers
argument allows specifying an arbitrary set of values that, when changed, will cause the resource to be replaced.
triggers
参数可以用来指向一些值,只要这些值的内容发生了变动,会导致 null_resource
资源被重新创建,从而生成一个新的 id
。
1.9.6.1.4. 一个小实验
我们尝试构建一个简单的实验环境来验证一下,首先是这样一段代码:
1 2 3 4 5 6 7 8 9 10 resource "azurerm_key_vault_secret" "example" { name = "secret-sauce" value = "szechuan" key_vault_id = azurerm_key_vault.example.id } resource "local_file" "output" { filename = "${path.module}/output.txt" content = azurerm_key_vault_secret.example.id }
我们创建一个 azurerm_key_vault_secret
,然后把它的 id
输出到一个文件里。随后我们复制一下该文件,比如叫 output.bak
好了。随后我们修改 azurerm_key_vault_secret
的 value
到一个新的值,执行 terraform apply
以后,我们会发现 output.txt
与 output.bak
的内容完全一样,说明 value
的更新并没有触发 local_file
的更新。
随后我们把代码改成这样:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 resource "azurerm_key_vault_secret" "example" { name = "secret-sauce" value = "szechuan2" key_vault_id = azurerm_key_vault.example.id } resource "null_resource" "example" { triggers = { trigger = azurerm_key_vault_secret.example.value } } resource "local_file" "output" { filename = "${path.module}/output.txt" content = null_resource.example.id == null_resource.example.id ? azurerm_key_vault_secret.example.id : "" }
我们在代码中插入了一个 null_resource
,并设置 triggers
的内容,盯住 azurerm_key_vault_secret.example.value
。在 value
发生变化时,null_resource
的 id
也会发生变化。
然后我们在 local_file
的代码中,content
的赋值改成了这样一个三目表达式:null_resource.example.id == null_resource.example.id ? azurerm_key_vault_secret.example.id : ""
。这个表达式里实际上 null_resource.example.id
是不起作用的,自己等于自己的永真条件会导致仍然使用 azurerm_key_vault_secret.example.id
作为值,但是由于掺入了 null_resource.example.id
,使得 Terraform 在第一次计算 Plan 时就感知到 local_file
的内容发生了变化,从而使得我们可以一次 terraform apply
搞定。
1.9.7.1. 利用 null_resource 搭配 replace_triggered_by 更新无法从服务端读取内容的属性
曾经处理的一个提问,有人写了这样一段 Terraform 代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 resource "azurerm_container_group" "this" { name = var.name location = var.location resource_group_name = var.resource_group_name ip_address_type = "Private" network_profile_id = azurerm_network_profile.this.id os_type = "Linux" container { name = "someName" image = "someImage" cpu = "0.5" memory = "0.5" commands = ["some", "commands"] ports { port = 53 protocol = "UDP" } volume { mount_path = "/app/conf" name = "someName" read_only = true secret = { Corefile = base64encode(someContent) } } } tags = var.tags }
结果每次执行 apply
操作时,都会发现 Terraform 试图重建这个容器:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 # module.dns_forwarder.azurerm_container_group.this must be replaced -/+ resource "azurerm_container_group" "this" { ~ exposed_port = [ - { - port = 53 - protocol = "UDP" }, ] -> (known after apply) + fqdn = (known after apply) ~ id = "/subscriptions/<mySubId>/resourceGroups/<myRgName>/providers/Microsoft.ContainerInstance/containerGroups/<myContainerGroupName>" -> (known after apply) ~ ip_address = "someIp" -> (known after apply) name = "someName" - tags = {} -> null # (6 unchanged attributes hidden) ~ container { - environment_variables = {} -> null name = "someName" - secure_environment_variables = (sensitive value) # (4 unchanged attributes hidden) ~ volume { name = "someName" ~ secret = (sensitive value) # forces replacement # (3 unchanged attributes hidden) } # (1 unchanged block hidden) } }
这个问题的原因是 API 在读取容器信息时不会返回 volume
的 secret
数据,这其实是一个还挺合理的设定,机密数据的确不应该可以直接从 API 返回,但这就导致 Terraform 每次制定变更计划时都会试图重新设置这个值(因为会理解成服务端这个值被修改成了空),而容器是不可变的,要修改容器的任何配置都会导致容器被重建。
有没有办法能够避免这种问题?经验告诉我们,可以使用 ignore_changes
让 Terraform 忽略这个属性的变更来避免重建,但如果 secret
真的变了怎么办?
我们可以这样干,第一,在 azurerm_container_group
添加这样一段 lifecycle
块:
1 2 3 4 lifecycle { ignore_changes = [container[0].volume[0].secret] replace_triggered_by = [null_resource.secret_trigger.id] }
这会忽略 secret
的变化,但我们同时声明了一个 replace_triggered_by
,在 null_resource.secret_trigger.id
的值发生变化时可以删除重建 azurerm_container_group
实例。
其次,我们把 secret
的内容提取到一个 local
里,这时 azurerm_container_group
的 volume
看起来大概是这样的:
1 2 3 4 5 6 7 8 volume { mount_path = "/app/conf" name = "somename" read_only = true secret = { Corefile = local.secret } }
local.secret
存放着使用的机密数据。这时我们再定义一个 null_resource
充当触发器:
1 2 3 4 5 6 7 8 9 locals { secret = base64encode("abcdefg") } resource "null_resource" "secret_trigger" { triggers = { trigger = local.secret } }
这样在机密数据真的发生变化的时候,triggers
会触发 null_resource
的重建,导致 null_resource.secret_trigger.id
发生变化,进而触发 azurerm_container_group
的重建。
1.9.8.1. 创建资源的条件依赖另一个资源的输出时怎么办
我们在有条件创建 当中介绍了如何可以通过判断用户的输入参数来决定是否要创建某个资源,让我们来看一下这样一个 Module 的例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 variable "vpc_id" { type = string default = null } resource "ucloud_vpc" "vpc" { count = var.vpc_id == null ? 1 : 0 cidr_blocks = ["10.0.0.0/16"] name = "vpc" } resource "ucloud_subnet" "subnet" { cidr_block = "10.0.0.0/24" vpc_id = var.vpc_id == null ? ucloud_vpc.vpc[0].id : var.vpc_id }
我们想在 Module 中创建一个 ucloud_subnet
,用户可以输入一个 vpc_id
配置给它,也可以不输入,这时 Module 会创建一个 ucloud_vpc
来用。
假如我们使用这个模块,不传入 vpc_id
:
1 2 3 module vpc { source = "./vpc" }
这段代码生成的 Plan 内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # module.vpc.ucloud_subnet.subnet will be created + resource "ucloud_subnet" "subnet" { + cidr_block = "10.0.0.0/24" + create_time = (known after apply) + id = (known after apply) + name = (known after apply) + remark = (known after apply) + tag = "Default" + vpc_id = (known after apply) } # module.vpc.ucloud_vpc.vpc[0] will be created + resource "ucloud_vpc" "vpc" { + cidr_blocks = [ + "10.0.0.0/16", ] + create_time = (known after apply) + id = (known after apply) + name = "vpc" + network_info = (known after apply) + remark = (known after apply) + tag = "Default" + update_time = (known after apply) } Plan: 2 to add, 0 to change, 0 to destroy.
完全符合预期。假如我们希望由模块的调用者来创建 Vpc 的话:
1 2 3 4 5 6 7 8 9 10 resource "ucloud_vpc" "vpc" { cidr_blocks = ["10.0.0.0/16"] name = "vpc" } module vpc { source = "./vpc" vpc_id = ucloud_vpc.vpc.id }
我们执行 terraform plan
的话,会得到这样的结果:
1 2 3 4 5 6 7 8 9 ╷ │ Error: Invalid count argument │ │ on vpc/main.tf line 16, in resource "ucloud_vpc" "vpc": │ 16: count = var.vpc_id == null ? 1 : 0 │ │ The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created. To work around this, │ use the -target argument to first apply only the resources that the count depends on. ╵
Terraform 试图向我们抱怨,我们在 count
参数的表达式里使用了一个必须在 apply
阶段才能知道的值,所以它无法在 plan
阶段就计算出 count
的值。它建议我们先用 terraform apply
命令搭配 -target
参数把 Vpc 先创建出来,消除后续计算 Plan 时尚不知晓的值来解决这个问题。
这当然是一种很麻烦的方法,所以我们在设计 Module 时就要考虑到这种问题。有一种很简单的方法可以解决这个问题:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 variable "vpc" { type = object( { id = string } ) default = null } resource "ucloud_vpc" "vpc" { count = var.vpc == null ? 1 : 0 cidr_blocks = ["10.0.0.0/16"] name = "vpc" } resource "ucloud_subnet" "subnet" { cidr_block = "10.0.0.0/24" vpc_id = var.vpc == null ? ucloud_vpc.vpc[0].id : var.vpc.id }
我们把用来判断创建条件的输入参数类型改成 object
,调用 Module 的代码就变成了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Terraform will perform the following actions: # ucloud_vpc.vpc will be created + resource "ucloud_vpc" "vpc" { + cidr_blocks = [ + "10.0.0.0/16", ] + create_time = (known after apply) + id = (known after apply) + name = "vpc" + network_info = (known after apply) + remark = (known after apply) + tag = "Default" + update_time = (known after apply) } # module.vpc.ucloud_subnet.subnet will be created + resource "ucloud_subnet" "subnet" { + cidr_block = "10.0.0.0/24" + create_time = (known after apply) + id = (known after apply) + name = (known after apply) + remark = (known after apply) + tag = "Default" + vpc_id = (known after apply) } Plan: 2 to add, 0 to change, 0 to destroy.
成功计算出 Plan。请注意虽然这个 Plan 仍然是创建两个资源,但 ucloud_vpc
资源并不是 Module 创建的。
这个方法的原理就是虽然 var.vpc.id
仍然是一个只有在 apply
阶段才能知道的值,但 var.vpc
本身是一个在 plan
阶段就可以知道的值,直接可以判读它是否为 null
,所以该方法可以绕过这个限制。
1.9.9.1. 利用 create_before_destroy 调整资源 Update 的执行顺序
最近处理了一个问题,有人写了这样一段代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 provider "azurerm" { features { resource_group { prevent_deletion_if_contains_resources = false } } } resource "azurerm_resource_group" "rg" { location = "eastus" name = "example" } locals { environments = toset(["one", "two", "three"]) } resource "azurerm_public_ip" "lb" { for_each = local.environments name = "frontend-lb-${each.key}" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name allocation_method = "Static" ip_version = "IPv4" sku = "Standard" zones = [1, 2, 3] } resource "azurerm_lb" "this" { name = "azurelb" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name sku = "Standard" dynamic "frontend_ip_configuration" { for_each = local.environments content { name = frontend_ip_configuration.key public_ip_address_id = azurerm_public_ip.lb[frontend_ip_configuration.key].id } } }
当他从 local.environments
中删除一个元素,然后执行 terraform apply
时,他遇到了下面的问题:
1 │ Error: deleting Public Ip Address: (Name "azurelb" / Resource Group "example"): network.PublicIPAddressesClient#Delete: Failure sending request: StatusCode=400 -- Original Error: Code="PublicIPAddressCannotBeDeleted" Message="Public IP address /subscriptions/subscription-id/resourceGroups/resource-group/providers/Microsoft.Network/publicIPAddresses/one can not be deleted since it is still allocated to resource /subscriptions/subscription-id/resourceGroups/resource-group/providers/Microsoft.Network/loadBalancers/azurelb/frontendIPConfigurations/one. In order to delete the public IP, disassociate/detach the Public IP address from the resource. To learn how to do this, see aka.ms/deletepublicip." Details=[]
这其实是一个还挺常见的问题,azurerm_lb.this
依赖于 azurerm_public_ip.lb[index]
,正确的变更顺序应该是先更新 azurerm_lb.this
,再删除 azurerm_public_ip.lb
的成员,但是 Terraform 默认的执行顺序会首先尝试执行删除操作,这时因为 ip 仍然被 LoadBalancer 使用着,所以会引发一个错误。
解决方法是给 azurerm_public.lb
添加一个 create_before_destroy
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 resource "azurerm_public_ip" "lb" { for_each = local.environments name = "frontend-lb-${each.key}" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name allocation_method = "Static" ip_version = "IPv4" sku = "Standard" zones = [1, 2, 3] lifecycle { create_before_destroy = true } }
create_before_destroy
名字里虽然看起来是与 Create 有关,实际上它也会将 Update 与 Create 放在一起调整,声明该参数后实际上是将 azurerm_public_ip.lb
的 Delete 推迟到执行 Update 之后再执行了,该问题得解。
如果团队使用 Terraform 作为变更管理和部署管道的核心工具,可能需要以某种自动化方式编排 Terraform 的运行,以确保运行之间的一致性,并提供其他有趣的功能,例如与版本控制系统钩子的集成。
Terraform 的自动化可以有多种形式,并且程度不同。一些团队继续在本地运行 Terraform,但使用脚本代码来准备一致的工作目录来运行 Terraform,而另一些团队则完全在 Jenkins 等 CI 工具中运行 Terraform。
本篇涵盖了实现此类自动化时应考虑的一些事项,既确保 Terraform 的安全运行,又适应 Terraform 工作流程中当前需要仔细注意的一些限制。它假设 Terraform 将在非交互式环境中运行,无法在终端提示输入。对于脚本代码来说不一定如此,但在 CI 工具中运行时通常如此。
在自动化流程中运行 Terraform 时,重点通常是核心的 plan
/apply
循环。那么,使用 Terraform 命令行的流程大体如下:
初始化 Terraform 工作目录。
针对当前代码,为产生变化的资源计算变更计划
让操作员审查计划,以确保其可接受
应用计划描述的更改。
步骤 1、2 和 4 可以使用熟悉的 Terraform 命令以及一些附加选项来执行:
terraform init -input=false
初始化工作目录。
terraform plan -out=tfplan -input=false
创建计划文件并将其保存到名为 tfplan
的本地文件。
terraform apply -input=false tfplan
执行存储在文件 tfplan
中的计划。
-input=false
参数命令 Terraform 不应尝试提示输入,而是要求配置文件或命令行提供所有必要的值。因此,可能需要在 terraform plan
上使用 -var
和 -var-file
参数来指定所有传统上在交互式使用下手动输入的变量值。
强烈建议使用支持远程状态的 Backend,因为 Terraform 可以自动将持久保存状态,后续运行可以在找回并更新状态。选择支持状态锁定的 Backend 还将提供针对 Terraform 并发运行的竞争安全保障。
默认情况下,一些 Terraform 命令会提示用户下一步可能执行的步骤,通常包括具体的下一步要运行的命令。
自动化工具通常会封装正在运行的命令的具体细节,只提供抽象的步骤,这时 Terraform 输出的此类消息反而令人困惑,且无法操作,如果它们无意中鼓励用户完全绕过自动化工具,则可能还是有害的。
当环境变量 TF_IN_AUTOMATION
设置为任何非空值时,Terraform 会对其输出进行一些细微调整,不再强调要运行的特定命令。所做的具体更改会随着时间的推移而变化,但一般来说,Terraform 发现该变量时,会认为存在某种包装了 Terraform 的应用程序,它们会帮助用户进行下一步。
为了降低复杂性,该功能主要针对 Terraform 主要的工作流程命令实现。无论该变量为何值如何,其他辅助命令仍可能会产生命令行建议。
1.9.10.1.3. 在不同的机器上运行 plan 和 apply
在 CI 工具中运行时,可能很难或无法确保 plan
和 apply
命令在同一台计算机上的同一目录中运行,并且所有的文件都保持相同。
在不同的机器上运行 plan
和 apply
需要一些额外的步骤来确保正确的行为。稳健的策略如下:
plan
完成后,保存整个工作目录,包括 init
期间创建的 .terraform
子目录,并将其保存在 apply
阶段可以访问得到的位置。常见的选择是作为所选 CI 工具中的“Build Artifact”。
在运行 apply
之前,获取上一步中创建的存档并将其解压到相同的绝对路径。这会重新创建 plan
后出现的所有内容,避免在 plan
步骤期间创建本地文件的奇怪问题。
Terraform 目前为此类自动化系统设置了一些必须满足的前提条件:
保存的计划文件可以包含子模块的绝对路径以及代码中引用的其他数据文件。因此,必须确保在相同的绝对路径中还原保存的工作目录。这通常是通过在某种隔离中运行 Terraform 来实现的,例如可以控制文件系统布局的 Docker 容器。
Terraform 假设该计划将在与其创建时相同的操作系统和 CPU 架构上 Apply。例如,这意味着无法在 Windows 计算机上创建计划,然后将其应用到 Linux 服务器上。
Terraform 期望用于生成计划的 Provider 程序插件在应用计划时可用且相同,以确保正确执行计划。如果在创建和应用计划之间升级 Terraform 或任何插件,将会产生错误。
Terraform 无法自动检测用于创建计划的凭据是否授予对用于应用该计划的相同资源的访问权限。如果对每个凭据使用不同的凭据(例如,使用只读凭据生成计划),那么确保两套凭据在它们所属的相应服务的帐户中保持一致非常重要。
警告 :计划文件包含代码的完整副本、计划所要应用的状态数据以及传递给 terraform plan
的所有变量。如果其中包含任意敏感数据,则包含计划文件的存档工作目录应受到相应保护。对于 Provider 使用的身份验证凭据,建议尽可能使用环境变量,因为这些变量不会被包含在计划中或由 Terraform 以任何其他方式保存到磁盘。
1.9.10.1.4. 交互式审批计划
自动化 Terraform 工作流程的另一个挑战是需要在计划和应用之间进行交互式审批步骤。为了稳健地实现这一点,重要的是要确保一次只能有一个计划未完成,或者两个步骤相互连接,以便批准计划将足够的信息传递到应用步骤,以确保应用正确的计划,与后来也存在的一些计划相反。
不同的 CI 工具以不同的方式解决这个问题,但通常这是通过构建管道功能实现的,其中可以按顺序应用不同的步骤,后面的步骤可以访问前面步骤生成的数据。
推荐的方法是一次只允许一个计划处于未应用状态。应用计划时,针对同一状态生成的任何其他现有计划都会失效,因为现在必须相对于新状态重新计算它们。通过强制计划按顺序获得批准(或驳回),可以避免这种情况。
1.9.10.1.5. 自动批准计划
虽然强烈建议对生产环境应用计划前要进行人工审查,但有时在预生产或开发环境中部署时需要采取更自动化的方法。
如果不需要手动批准,可以使用更简单的命令序列:
terraform init -input=false
terraform apply -input=false -auto-approve
apply
命令的这个变体隐式地创建一个新计划,然后立即应用它。 -auto-approve
选项告诉 Terraform 在应用计划之前不需要对计划进行交互式批准。
警告 :当 Terraform 有权对基础设施进行破坏性更改时,始终建议对计划进行人工审查,除非在发生意外更改时可以容忍停机。仅对非关键基础设施使用自动批准。
terraform plan
可以用来对 Terraform 配置的有效性进行某些有限的验证,而不影响实际的基础设施。尽管 plan
命令会更新状态以匹配实际资源,从而确保准确的计划,但更新后的状态文件并不会持久保存,因此可以安全地使用该命令来生成仅为了帮助代码审查而创建的“一次性”计划。
实现此类工作流程时,可以在相关代码审查工具(例如,Github Pull Request)中使用钩子,为每个正在审查的新提交触发 CI 工具。在这种情况下,Terraform 可以按如下方式运行:
terraform plan -input=false
与在“主”工作流程中一样,可能需要根据需要设置 -var
或 -var-file
。在这种情况下不使用 -out
选项,因为为代码审查目的而生成的计划永远不会被应用。相反,一旦合并更改,就可以从主版本控制分支创建并应用新计划。
警告 :请注意,通过输入变量或环境变量将敏感秘密数据传递给 Terraform 将使任何可以提交 PR 的人都可以看到,因此在开源项目或任何私人项目上必须谨慎使用此流程部分,或所有贡献者不应能够直接访问凭据等。
1.9.10.1.7. 多环境部署
Terraform 的自动化通常会被用来创建数个相同的配置,比如为预发布、测试或多租户基础设施等场景生成平行的环境。这种情况下的自动化可以帮助确保为每个环境使用正确的设置,并且在每次操作之前正确配置工作目录。
多环境编排最有趣的两个命令是 terraform init
和 terraform workspace
。前者可以与其他参数一起使用,以针对环境之间的差异定制 Backend 配置,而后者可用于在单个 Backend 中存储的相同配置的多个状态之间安全切换。
如果可能,建议对所有环境使用单一后端配置,并使用 terraform workspace
命令在工作空间之间切换:
terraform init -input=false
terraform workspace select QA
在此使用模型中,Backend 存储中使用固定的命名方案,以允许多个状态共存,而无需任何进一步的配置。
或者,自动化工具可以将环境变量 TF_WORKSPACE
设置为现有工作空间名称,这将覆盖使用 terraform workspace select
命令所做的任何选择。建议仅在非交互式使用中使用此环境变量,因为在本地 shell 环境中,很容易忘记设置该变量并将变更应用到错误的状态。
在一些更复杂的情况下,不可能跨环境共享相同的 Backend 配置。例如,环境可能运行在完全独立的不同帐户的服务里,因此需要对 Backend 本身使用不同的凭据或端点。在这种情况下,可以通过 terraform init
的 -backend-config
选项覆盖后端配置设置。
1.9.10.1.8. 预先安装的插件
在默认使用情况下,terraform init
会自动下载并安装代码中使用的所有 Provider 程序的插件,并将它们放置在 .terraform
目录的子目录中。这为简单的情况提供了更简单的工作流程,并允许每段代码可以使用不同版本的插件。
在自动化环境中,可能需要禁用此行为,而是提供一组已安装在运行 Terraform 的系统上的固定插件。这样就避免了每次执行时重新下载插件的开销,并允许系统管理员控制可以使用哪些插件。
要使用此机制,请在系统上的某个位置创建一个 Terraform 运行时会将插件可执行文件放入其中的目录。已发布的插件文件可在 releases.hashicorp.com 上下载。请务必下载适合目标操作系统和体系结构的文件。
提取必要的插件后,新插件目录的内容将如下所示:
1 2 3 4 $ ls -lah /usr/lib/custom-terraform-plugins-rwxrwxr-x 1 user user 84M Jun 13 15:13 terraform-provider-aws-v1.0.0-x3 -rwxrwxr-x 1 user user 84M Jun 13 15:15 terraform-provider-rundeck-v2.3.0-x3 -rwxrwxr-x 1 user user 84M Jun 13 15:15 terraform-provider-mysql-v1.2.0-x3
文件名末尾的版本信息很重要,它使得 Terraform 可以推断每个插件的版本号。可以安装同一 Provider 程序插件的多个版本,Terraform 将使用与 Terraform 代码中的 Provider 程序版本约束相匹配的最新版本。
填充此目录后,可以使用 terraform init
的 -plugin-dir
选项跳过常规的自动下载和插件发现行为:
terraform init -input=false -plugin-dir=/usr/lib/custom-terraform-plugins
使用该组参数时,只有给定目录中的插件可以被使用。这使系统管理员可以对执行环境进行强力控制,但另一方面,它会阻止使用尚未安装到本地插件目录中的较新插件版本。哪种方法更合适将取决于每个组织内的特定情况。
还可以通过创建 terraform.d/plugins/OS_ARCH
目录与配置一起提前安装插件,在自动下载其他插件之前将搜索该目录。 -get-plugins=false
参数可禁止 Terraform 自动下载其他插件。