We’ve been practicing agile for twenty years since the manifesto was published
and transitioning to DevOps cultures for about ten of them. And IT Ops
predates these best practices going way back to the days of mainframes in
pre-web data centers.
So why aren’t we aligned yet?
IT is hard – really hard. We’re trying to improve customer experiences,
develop digital products, automate workflows, become more data-driven, and
bring AI, IoT, and other emerging technologies to production. At the same
time, business leaders expect near-100 percent reliable systems, the bad guys
are creating tougher security issues, and tracking all the dependencies across
hybrid clouds, microservices, and integrations aren’t getting any easier.
In StarCIO’s latest benchmark report
on how AIOps is the operating platform for digital transformation, it wasn’t
surprising that the top IT Ops KPIs were on decreasing incidents’ mean time to
recovery (MTTR) and improving the availability, uptime, and performance of
business services. Seeing these KPIs at the top wasn’t shocking because IT
departments, whether viewed through an operational, DevOps, or agile lens,
must adopt a first-things-first attitude and provide reliable systems before
prioritizing, measuring, and improving other performance indicators.
But after those two KPIs, the report captures respondents’ top three metrics
from a list of ten with no clear consensus on which ones are most important
for IT organizations driving digital transformation. That list includes
ops-centric KPIs such as reducing the number of bridge calls, dev ones such as
reducing defect escape rates, and DevOps priorities such as increasing
The lack of consensus isn’t surprising, as I advise IT leaders to pick KPIs
most aligned to the business drivers and IT execution gaps. So, for example,
if you have many end-users reporting defects after application releases, then
measuring and improving defect escape rates might be a good choice.
That being said, I want to share three of the KPIs that I believe help drive
an aligned IT department focused on delivering business outcomes. CIOs want
customer centricity, speed, and innovation from agile methodologies, DevOps’
automation and quality, and IT Ops’ reliability and performance. Can we have
our cake and eat it too?
Here are my three:
1. All IT Departments Should Measure Customer Satisfaction (CSat)
Focusing IT starts with becoming more customer-centric, whether you are
developing applications, testing them, or in the NOC improving their
reliability. There’s a message from customers, stakeholders, and leaders about
their pain points and opportunities for improvement in every low score.
Coming up with a measurement process isn’t trivial as this CSat score must
have a larger scope than measuring end-user satisfaction with the request they
put into the ITSM ticketing system. Do stakeholders have high regard for IT’s
services? Is IT improving their experience and efficiency in getting their
Figuring out how to translate these problem statements into prioritized
improvements is still challenging, which is one-way AIOps can help. Having
centralized event data across the full stack and using it as an
open operational hub
provides the lens leading to data-driven decisions on where to make technology
2. Address the Root Causes of High Change Failure Rates
Speed without guard rails and safety can lead to disastrous crashes – but
stagnation and creating bureaucracy-driven change processes that slow the
delivery of innovation, new capabilities, and improvements can lead to
Whether you are agile, DevOps, or IT Ops-centric, we’re all trying to deliver
positive business outcomes through
transformation management. And change failure rates is the first indicative KPI of how well IT
performs in delivering business outcomes. When change failure rates are high,
IT has to slow down and fix things, while business stakeholders lose trust in
IT. And that’s just the start of impacts because change failures can lead to
outages, security issues, and other major incidents.
A measurement is only as good as its ability to lead to action. Using an
AIOps platform to improve root cause analysis
by correlating incidents to the changes that caused them is a best practice
for identifying systemic causes and helping reduce change failure rates.
3. Reduce Time to Resolve Incidents, Escalations, and Other Operational
The third leg of the alignment stool comes from freeing more of everyone
working in IT’s time to focus on solving problems, releasing improvements,
delivering innovations, and achieving transformational business outcomes.
What’s holding us back? In many cases, it’s the distractions to our time,
focus, and energy – sucked away when we respond to incidents of all priority
levels, perform root cause analysis, or fix issues that we potentially could
Here’s my formula
D(t) = Sum(X*It + Y*Lt + Z*Bt)
D(t) is the total time distracted by the IT department from
is the total time applied to resolve all incidents
represents level one (L1) escalation times and the time applied by people
outside of the NOC and incident management teams to resolve incidents and
address problem root causes
represents the time spent on bridge calls and war rooms
- X, Y, and Z are weighting factors
Leaders can include other factors such as time to complete administrative
tasks and other similar distractions.
Reduce D(t), and you’re likely to improve CSat by reducing the number of
issues, decreasing MTTR, and increasing time applied to delivering business
So, how does one reduce all the distractions when we’re
implementing digital transformation
and increasing our delivery velocities?
Like many business areas, it comes down to having centralized and enriched
operational data to improve decision-making and leveraging machine learning
and automation to drive operational improvements. In IT, that’s the primary
AIOps platforms and why I
believer AIOps is the digital operations platform.
There’s more to read and learn in our
AIOps benchmark report.
This post is brought to you by BigPanda.
The views and opinions expressed herein are those of the author and do not
necessarily represent the views and opinions of BigPanda.