When AI Actually Works in IT Operations — And Why Most Teams Are Doing It Wrong

June 25, 2026

There’s a version of AI adoption in IT ops that looks something like this: someone buys a Copilot license, attaches it to ServiceNow, and waits for the magic to happen. Three months later, the tickets are still messy, the on-call engineer is still waking up at 2 am, and nobody can explain why the AI recommended closing an incident that was still actively on fire.

We’ve seen this pattern more times than we can count. And it’s not a technology problem, it’s an approach problem.

The Gap Between “AI-Capable” and “AI-Useful”

Infrastructure and operations teams aren’t short on tools. Most have 150% of the tools they need. They have observability stacks, ITSM platforms, runbook documentation, and CI/CD pipelines generating telemetry constantly. The problem is that nobody has connected those data sources to an AI workflow in a way that actually reflects how the team works.

One of our clients – a regional financial services firm – managing roughly 400 VMs across a hybrid AWS and on-prem footprint came to us after spending six months trying to get Azure OpenAI to summarize their PagerDuty alerts in a useful way. The summaries were technically accurate but operationally useless. They’d describe what happened with perfect clarity and say nothing about the why, and nothing about what the on-call engineer should do about it at midnight.

The root cause wasn’t the model. It was that nobody had connected the alert data to the runbooks, to the CMDB topology, or to the historical incident record, or the knowledge base. The AI was operating without context like asking a new hire to triage production issues on their first day without showing them the architecture diagram.

What The Three Days of Workshop Actually Looks Like

Our AI Enablement Workshop is a three-day, hands-on engagement. Not a training. Not a slide deck. We come in, work directly with your tools and your data, and leave with working prototypes and a prioritized roadmap.

Day one is mostly listening. We sit with the platform engineering team, the SRE leads, and often someone from security. We ask a lot of questions about where the pain actually lives – not the official answer, but the real one. Alert fatigue is almost always on the list. So is the runbook problem. Most teams have documentation that’s six months out of date (sometimes years out of date) and written in a way that only the person who wrote it can interpret quickly under pressure.

We also audit data readiness. This is where most AI projects quietly fall apart. A Tier 2 incident with three lines of description and no attached logs is not a useful training signal. We look at what’s actually in your ITSM system, what observability data can be queried, and whether the runbooks are in a format that a retrieval-augmented AI can actually use.

Day two is where we build. Typically, we develop two or three use cases. The specific mix depends on what the team needs most. The most common starting point is an Incident Copilot: an AI-powered assistant that, when a P1 fires, automatically pulls the relevant alerts, correlates them with recent change activity, surfaces the most applicable runbook section, and produces a structured incident summary that goes directly into the ticket.

For one manufacturing client, this took the average time-to-context – i.e. the point at which the on-call engineer actually understands what they’re dealing with, from 18 minutes down to under 4. That’s not a vendor benchmark. That’s a number their SRE lead pulled from incident records after 4 weeks in production.

We also commonly build a Runbook Assistant – a natural language interface over internal operational documentation. Engineers type a question in plain English, like “what do we do when the payments service is throwing 503s from the ALB?” And get a synthesized, sourced answer from the actual runbooks, not a hallucinated guess. The key engineering challenge here is retrieval quality and chunking strategy. Getting that wrong is what produces the confident-sounding wrong answers that erode trust in AI tools. Another key point to take away is that developing habits for optimizing tokens will help with making these tools cost efficient.

Day three is architecture and roadmap. Where does this go in production? How does it connect to Teams or Slack for alert delivery? What are the human approval controls? What does the security and governance boundary look like, and who owns the ongoing prompt maintenance?

Frankly Speaking

Not every team is ready for day two on day one. We’ve walked into environments where the runbooks are in a SharePoint folder that hasn’t been touched in two years, or where the observability data is rich but structured in a way that requires significant preprocessing before it’s useful as AI context. That’s fine, since part of what the workshop produces is clarity on what has to happen before scaling, not just what can be built right now.

We also don’t recommend automating things that shouldn’t be automated yet. Ticket enrichment – i.e. automatically classifying, tagging, and augmenting incoming tickets before a human reviews them – is a very different risk profile from automated ticket resolution. We help teams think through that boundary clearly, and we’re direct when we think a client is moving faster than their governance posture supports.

What You Walk Away With

After three days, teams have working prototypes built against their own data, a reusable codebase and prompt library, an architecture blueprint for production deployment, and a 30/60/90-day roadmap with use cases prioritized by effort versus operational impact.

More practically, they have a team that has actually built something with AI, and not just watched a demo. They have a concrete sense of what AI can and cannot do in their environment.

That last part matters more than people think. The biggest barrier to AI adoption in infrastructure operations isn’t budget or tooling. It’s the lack of a shared mental model across the team for what AI is actually good at and how to work with it. The workshop builds that model by doing, not by describing.

Keyva delivers the AI Enablement Workshop for Infrastructure and Operations as a fixed-price, three-day engagement. If you’re evaluating where AI can have the most immediate impact for your ops team, reach out at info@keyvatech.com.

Anuj Tuli, Chief Technology Officer

Anuj specializes in developing and delivering vendor-agnostic solutions that avoid the “rip-and-replace” of existing IT investments. He has worked on Cloud Automation, DevOps, Cloud Readiness Assessments, and Migration projects for healthcare, banking, ISP, telecommunications, government and other sectors. He leads the development and management of Cloud Automation IP (intellectual property) and related professional services.

During his career, he held multiple roles in the Cloud and Automation, and DevOps domains. With certifications in AWS, VMware, HPE, BMC and ITIL, Anuj offers a hands-on perspective on these technologies.

Like what you read? Follow Anuj on LinkedIn at https://www.linkedin.com/in/anujtuli/

Get Appointment

When AI Actually Works in IT Operations — And Why Most Teams Are Doing It Wrong - Keyva

Get In Touch