Infrastructure as Code (IaC) is a methodology that uses code (typically written in declarative languages such as Terraform) to configure, provision, and manage cloud infrastructure development. IaC enables organizations to achieve automation and rapid repeatable deployment to cut down on time and cost. However, IaC development is not a small undertaking as it requires skilled resources, substantial development time, and investment to write good-quality code that implements cloud best practices, incorporating necessary security and compliance requirements.

Large Language Models or LLMs can generate contextually relevant code recommendations based on natural language prompts. In most cases, base LLMs that are trained for performing instruction/response types of tasks (and generating code suggestions specifically) do a great job in generating context-aware code suggestions. Widely used LLMs for coding from different providers such as OpenAI or Meta can generate Terraform scripts based on descriptions of required infrastructure and within already present contexts. However, that by itself does not always produce deployable production-grade code. Developers still need to double check the generated code for syntactic and semantic accuracy.  Also, the generated code does not accurately capture security and compliance requirements for the deployment.

So, does generating IaC by prompting pre-trained base LLMs produce desired outputs?  Generally, the answer is no —as for better output LLMs need to be fine-tuned on generating accurate IaC. In this blog, we will address this issue, highlighting Terraform as a technology of our choice.

Exploring better accuracy and specificity

While base LLMs trained for solving coding tasks generate Infrastructure as Code (IaC) suggestions based on detailed and relevant prompts, they do regularly suffer from hallucinations,  a phenomenon where LLMs are known to produce irrelevant (or simply wrong) outputs.

Then there is the additional requirement of specificity . One might want their generated infrastructure in Terraform to have certain patterns or follow certain coding standards and syntactical validations, or have certain documentation around each of the generated resources.  Base LLMs that are trained to produce IaC Terraform suggestions are unlikely to solve these specific tasks or requirements accurately.

LLM hallucinations can be mitigated using Reflexion where output of one LLM is reviewed by a second one. The other technique that can help with code accuracy and in solving specific coding requirements is called Fine-Tuning an LLM, which is a process of adjusting a LLM’s parameters to optimize its output.

What is LLM Fine-Tuning ?

 A primary requirement for fine-tuning is an extensive dataset (for example using 18,000-20,000 IaC examples) where each IaC example shows an LLM of what kind of output is expected for each instruction/input scenario. The base LLM is trained with this dataset to tune its parameters (and weights) to produce optimized output. Typically, a small number of parameters are tuned at once and the process is executed over multiple epochs to achieve lower resource usage and training time/cost outcomes.

Training Methodology and Techniques

Typically, training an LLM for accurate outputs that meet specific coding requirements involves two stages:

  • A pre-training or training phase on billions of tokens (e.g., using extensive number of Terraform code examples) to obtain a base foundational model (this is already mostly available, e.g., the Llama 7B models)
  • Fine-tuning this base model to specialize it on a specific downstream task (e.g., annotate each generated Terraform resource with a specific description).

Parameter Efficient Fine-Tuning or PEFT is a technique that keeps the base model untouched and adds additional parameter layers (called “adapters”) on top of the base model and aims to train only these new layers. In the LoRa or Low-Rank Adaptation of LLMs technique, the weights of the base model are preserved, and training is restricted to the scope of values added to the parameters/new layers.

Process for Fine-Tuning LLMs

Any kind of meaningful fine-tuning process of a code LLM involves two primary steps: dataset preparation and training the model using this dataset.

Dataset Preparation

Each example in a typical extensive dataset (e.g., 18,000-20,000 IaC examples) will be in the form of an instruction/response pair (with an optional input field). Let us take an example of Terraform code generation, where each generated resource has an annotation that says, “This resource is generated using PiCloud tool” and a “created_by = PiCloud” Tag. A data example for this task may look like this:

### Instruction:

Create an AWS EC2 instance with instance type t2.micro. Annotate this resource with a comment “This resource is generated using PiCloud tool” and add a tag “created_by = PiCloud’

### Response:

# This resource is generated using PiCloud tool

resource “aws_instance” “my_instance” {

  instance_type = “t2.micro”

  …

  tags = {

    created_by = “PiCloud”

  }

}

Training the Model

Code LLMs are trained in GPUs (e.g., NVIDIA instances) having appropriate RAM/storage configurations over epochs. Ideally, a small number of parameters are tuned at a time to achieve lower training time and cost.

In Conclusion

A fine-tuned LLM for IaC can produce code suggestions that implement required coding standards, best practices, and other security/compliance requirements — resulting in accurate code for faster creation of cloud infrastructure, fulfilling necessary compliance and saving time and cost. A good dataset with well-documented examples can train an LLM to achieve better code accuracy and typically a small number of parameters should be tuned iteratively to reduce fine-tuning time and cost. 

To learn more about our cloud and infrastructure offerings, reach out to us.

Author’s Profile

Sandip Dey

Sandip Dey

Principal Architect, Accelerite Business Unit

sandip_dey@persistent.com

Linked In

Sandip Dey works as a Principal Architect as part of the Accelerite Business Unit at Persistent. He is responsible for developing Cloud and GenAI based innovation solutions for SASVA.