“Mona Lisa, Mona Lisa, men have named you; you’re so like the lady with the mystic smile.”
Last time, I finished a set of articles about using AI coding tools. In the end, my conclusion was that the tools can help greatly but that right now, using them to generate whole programs was pushing the envelope. As helpers, as pseudo-pair-programmers, the tools can offer great value and make you more productive. However, it’s essential to understand that this process works because of the interaction between humans and AI. Without the steering and a continuous mix of rejection and acceptance, the AI can’t learn how to respond correctly.
In this article, I want to explore how this interaction, the need to train and retrain our AIs and our need to get business value from AI, in general, will shape development and how our CI/CD processes function.
While generative AI can accelerate development, it also introduces unique precision, security, and governance challenges, particularly within CI/CD pipelines. I’ll explore how DevOps teams can adapt pipelines to harness AI’s potential while mitigating its risks. Along the way, I’ll call out some practical strategies, tools, and processes to make your pipeline AI-ready.
Incorporating AI into your business might seem primarily a technical challenge, but the consequences to the business are far-ranging and far from ‘just’ a technical challenge.
The Mona Lisa Analogy
In this article, I will use image-generating AIs to explain my points. Generated images are more intuitive in terms of understanding the process and its pros and cons. Regardless of the output form, these comments are relevant to any generative AI and probably to most AI usage.
Here, we have several images of the Mona Lisa. The first, on the top left, is an image of the actual painting (though digitally enhanced since the original is quite dark). The others are all AI-generated. I used various image generation tools using the simple prompt.
“Create an image of the Mona Lisa that is as accurate as possible.”
Perfect Results?
How surprised are you at the outcome? Did you expect a perfect rendition or a dismal failure? At first glance, the results seem promising. The enigmatic smile? Check. The timeless allure? Sort of. But look closer, and you’ll notice the flaws: an asymmetrical background here, an odd hand position there. A cat. These imperfections reflect a core truth about AI: its outputs depend heavily on the quality of its training data and the specificity of its instructions.
Good enough can be misleading
Without well-curated inputs and iterative refinement, AI may produce “good enough” results but not the precise outcomes your application demands.
For images like the Mona Lisa particularly, this test is a bit of a cheat. Later, we’ll look at a related prompt, and you’ll see the difference in more detail. The ‘cheat’ is that, by definition, the AI must have been trained on images of the Mona Lisa and had sufficient metadata to strongly associate the pictures with the Mona Lisa. It’s entirely possible that the Mona Lisa was part of a specific training set to ensure the AI could generate well-known images with some accuracy.
DevOps Considerations
This analogy underscores a key challenge for DevOps teams: incorporating AI effectively into CI/CD pipelines requires careful control over inputs, outputs, and feedback loops. Apparent success can hide multiple inaccuracies.
Let’s flip this around a bit to see what I mean.
Imagining AI
Let’s imagine that If you squint hard enough at an Image-generating AI, you can consider it an enormously compressed image database. A database where each image is incomplete but also where almost all possible images – even the ones you didn’t add – exist. Each image was added with metadata: labels that describe what’s in the image. The prompt is (also squinting hard) like SQL.
Imagine that you train this database to respond to your SQL by making requests and then scoring the responses. Got the right image +1, got the wrong image -1. You get the idea. Now, also understand that except for quite specific cases, the database will always give you a slightly different answer.
The lossiness of the data coupled with the training to respond to your prompt means that the output is biased and selective. You’ll get primarily answers that work but never exactly what you wanted! If there are a million images of a cat in the database, you’ll get a cat. Which one depends on prior training and related metadata. Ask again, and you’ll get a different cat.
Why AI in CI/CD Is Different
Traditional CI/CD systems are designed for deterministic codebases where outputs are predictable. AI, however, operates probabilistically. This introduces unique challenges, such as:
- Non-deterministic outputs: Two runs of the same AI model might yield slightly different results.
- Model drift: AI models may “drift” over time as new data retraining alters their behaviour.
- Complex dependencies: AI systems often require extensive external dependencies, from training data to pre-trained models.
To address these challenges, CI/CD pipelines must evolve from static, code-focused workflows to dynamic systems that account for AI’s iterative nature.
Inherent challenges
Our pretend database certainly can’t provide you with an image of the Mona Lisa unless it has been given at least one with the Mona Lisa label. Without that labelling, it’s impossible to create an accurate representation. Can you describe the Mona Lisa sufficiently so that someone, some AI, could draw it? You might get close using an iterative process, but not at the start and never completely.
As an example, I asked a few AIs about this.
“Create a highly detailed image of a Renaissance portrait featuring a woman with a gentle, enigmatic smile and calm expression. She is seated with her hands gracefully folded, wearing a dark dress with delicate, semi-transparent fabric draped over her shoulders. The background is a soft, dreamlike landscape with winding paths and distant mountains, evoking a sense of depth and serenity. The lighting is gentle, emphasising her face and hands, while shadows subtly enhance the contours of her expression. Render the portrait with a timeless, classic style that reflects the artistry of the Renaissance period.”
Here’s what they produced:
As you can see, there’s some similarity, but nowhere near the original. Without the right training and data, whether that means more data or better prompts, the AI can flounder and be useless.
Accuracy in AIs is essential but challenging to maintain
Good data and the proper training are required to allow the AI to respond with the output you need. However, ‘accuracy’ is whatever you want it to mean. Simplistically, you can imagine an AI generating countless variations of the Mona Lisa while the training process scores and feeds back the accuracy against the actual image. Tuning the AI until it meets the required ‘accuracy’ level. Also, unless constantly retested, the AI model can drift off this level of accuracy as other training data pulls it in different directions for other images. A common cause of ‘hallucinations’ (aka the cat) is this bleed-through during training.
I seriously glossed over how AI image generators work. There’s much more going on than I described, but the intent is to make you think differently about training, data, and accuracy. There’s a lot to consider.
New elements
Trust
When using a pre-trained model, you rely on a third party to train the AI appropriately. Since the model functions as a near-literal black box, analyzing the binary to determine its trustworthiness is futile. It’s safe to assume we will never be able to examine AI that way. Whether good or bad, complete or incomplete, biased or unbiased, these judgments are ultimately subjective from an AI’s perspective.
How does the AI access your data? How does the data flow? Understanding the likely access patterns the AI will use is equally important. If a third-party service hosts the AI, data flows out of your systems, making it critical to ensure you can control the AI’s access. Will it need to access information about individuals or specific accounts? What if an unusual prompt causes the AI to read every record in your database? These considerations are critical for production systems during the CI/CD process. You must design, test, and manage these scenarios effectively during testing
Unexpected Actions
What else can the AI do, and can you limit it? AI chatbots have frequently been manipulated to perform tasks beyond their intended scope. Whether through prompt engineering that bypasses set instructions or organizations failing to realize that an AI trained to answer questions on their data can also generate code, write songs, or even promise to give away all your goods for free, the risks are clear. Granting direct access to an AI chatbot without safeguards is a recipe for disaster.
While AI testing might address some of these issues, large LLM providers emphasize that automation has limits. Take the example of the Mona Lisa images: AI testing can compare a test image against generated output and score it in various ways. However, no matter how you code for artefacts or ‘hallucinations,’ there’s always a chance that an image, such as one featuring a white cat, slips through undetected.
Using an adversarial AI to handle testing -such as ensuring images are safe for work, doesn’t solve the problem; it simply shifts the challenge to another AI. The only real solution is to involve humans in the quality assurance process. Human oversight is essential as final testers or as reviewers sampling questions and answers within the system. Building an effective bias detection and correction process is no longer optional—it’s a requirement for your CI/CD pipeline
Legislation
Governments are increasingly focused on using and misusing AI, introducing new laws and adapting existing ones to regulate its commercial applications. The term’ responsible’ is at the heart of much of this legislation. Being responsible means ensuring the ethical use of AI and accepting accountability for its misuse.
For CI/CD systems, this translates to a need for command and control. You must provide evidence of AI use, training practices, security measures, provenance, and bias control. Creating, deploying, and running an AI must be fully auditable.
Adapting Your CI/CD Pipeline for AI
1. Accuracy: Defining Success
In AI, “accuracy” is context-dependent. Consider the Mona Lisa analogy: how close is “close enough”? For CI/CD pipelines, defining what constitutes a “successful” AI output is crucial. This might include:
- Establishing measurable quality metrics (e.g., BLEU scores for text, F1 scores for classification tasks).
- Automating feedback loops where outputs are scored against ground truth data.
Incorporate tools like TensorFlow Extended (TFX) or MLflow to evaluate AI model accuracy during pipeline execution.
2. Training Data: The Foundation of Precision
Training data is the bedrock of AI success. Poor data leads to poor models, just as low-quality testing data undermines pipeline reliability.
Best Practices for DevOps:
- Curate datasets: Regularly audit and clean training data to remove biases and inaccuracies.
- Version datasets: Use version control tools (e.g., DVC) to track changes in training data and ensure reproducibility.
- Automate data validation: Integrate data quality checks into the pipeline to catch errors before they propagate.
3. Security: Protecting Data and Models
AI introduces new vectors for security vulnerabilities, including poisoned models, data leakage, and dependency risks.
Mitigation Strategies:
- Model provenance: Use trusted sources for pre-trained models and verify their origins.
- Dependency scanning: Treat AI libraries like any other software dependency—scan them for vulnerabilities.
- Isolated testing: Run model updates in isolated environments to test for regressions or malicious behaviour.
4. Automation and Human Oversight
While CI/CD pipelines rely on automation, integrating AI demands human involvement for critical validation steps, especially for bias detection.
Hybrid Approach:
- Automate initial validation using adversarial testing tools to simulate edge cases.
- Incorporate human reviewers to audit AI outputs periodically, focusing on high-risk areas.
5. Legislation: The Compliance Factor
Emerging laws and regulations are shaping how AI is developed and deployed. For CI/CD systems, this means:
- Auditability: Maintain logs of training data, model updates, and outputs for regulatory compliance.
- Bias control: Implement processes to identify and mitigate bias in AI models.
Checklist for Compliance:
- Document training data sources.
- Track model versioning and changes.
- Establish processes for handling user data securely.
Realigning CI/CD for AI Workflows
The Mona Lisa analogy reframes how we approach CI/CD pipelines. Integrating AI goes beyond deploying models; it requires building a resilient system that guarantees precision, security, and compliance at every stage. Here’s how to begin:
Immediate Actions for DevOps Teams
- Assess readiness: Audit your current CI/CD system for AI compatibility.
- Define success: Establish metrics and thresholds for AI accuracy and performance.
- Integrate tools: Leverage MLOps platforms like Kubeflow to manage AI workflows.
- Educate your team: Train DevOps engineers on AI-specific challenges and solutions.
Long-Term Strategies
- Embrace iterative development: Regularly retrain and validate AI models to ensure long-term reliability.
- Focus on interoperability: Favor open standards and modular pipelines to avoid vendor lock-in.
- Stay informed: Monitor regulatory developments and adapt processes as needed.
Conclusion: The Future of AI and CI/CD
AI is revolutionizing software development, testing, and deployment, challenging DevOps teams to adopt new tools and mindsets. By embracing AI’s iterative and probabilistic nature while embedding precision, security, and compliance into every pipeline stage, you can future-proof your workflows for this transformative era.
As Da Vinci might have said, “Art is never finished, only abandoned.” The same holds for AI in CI/CD—continuous refinement and grasping the probabilistic nettle are the keys to mastering this new frontier.