Jason Walker CTO of IT systems management provider BigPanda, talks about how the COVID-19 crisis might be the catalyst needed to trigger an investment in IT Ops’ people, processes and technology â€“ one that should have been made years ago, to provide the necessary agility and scalability for the future.
The global pandemic has forced entire industries to shift to remote, distributed work. For example, employees that would typically log onto an on-premises network to handle secure information must now implement their own security protocols from their living rooms. Teams who used to work side-by-side in a network operations center (NOC) using a wall of dashboards to provide situational awareness now run operations from their home office. This isn’t the only IT concern in our new work paradigm.
Operations personnel must be engaged 24/7 to monitor, triage, communicate, and manage incidents across all services as their organizations transition to a remote workforce. As IT teams Opens a new window work to accommodate the new reality, it’s becoming clear that existing tools weren’t meant to handle this kind of workload. Corporate VPNsOpens a new window , SaaS Opens a new window applications, as well as legacy on-premises and homegrown tools and systems are all stretching to meet the new demands from the business. Some of these tools and systems will encounter performance problems or fail outright, leading to business disruption.
Here are some ways CIOsOpens a new window and their IT OperationsOpens a new window teams can adapt to the immediate challenges this crisis is creating, evolve their organizations to survive in the short-term, and better position themselves for the long term.
1: Mission First, but People Always
You can’t run your services, or your IT Ops, SREOpens a new window , and DevOps functions without the people who do the work. These people are experiencing inordinate amounts of pressure to keep your operations running smoothly and your remote teamsOpens a new window operating efficiently. And doing so in addition to dealing with the impacts of the crisis on their personal lives.
As the situation evolves, business leaders need to rise above the noise and point the way. This boils down to communicating two messages: â€œKeep calm, we are going to be alrightâ€ and â€œhere is what we are going to do to improve our situation.â€ Both are necessary. Taking care of your people should be the main consideration for leadership, and helping teams remain focused while looking beyond the horizon can help with that task.
2: Focus on the Big Picture
All of your online services need to stay up and running because when they aren’t, your business stops. Too often, service status visibility remains siloed within an IT Operations team, and only when issues become critical are parts of that awareness pulled or pushed throughout the organization. This fact is especially frustrating during times of crisis because IT OperationsOpens a new window teams often have the best, and least-leveraged, situational awareness in a company. They field your services’ alerts and user reports of problems in real-time, and know how to leverage that information to assess, prioritize, diagnose, and resolve incidents across networks, infrastructure, applications, and services.
This bird’s-eye view of the company gives IT Ops teams an understanding of the health of employee- and customer-facing services from which leadership and individual contributors alike can benefit. To mitigate the impacts of downtime quickly, and simultaneously plan for future challenges, business leaders should ensure that big-picture understanding pervades the entire organization. From high-level periodic reports, through service-status dashboards, to the very tactical live incident status page, leaders must work to create a shared awareness across teams, so they can make decisions and focus effort in a way that’s aligned to a common ground truth.
3: The Only Constant is Change, so Learn to Do it Well
IT environments have grown exponentially more complex over the years, introducing new layers of dependencies that aren’t well understood. At the same time, the combined change velocity across those layers has ramped up significantly. However, managing changes effectively across distributed teams can be difficult. The work from home Opens a new window shift will impact teams differently: Some teams will slow down while others speed up to respond to business demands. Regardless, teams will still need to ensure application updates, database maintenance, server operating system updates, security fixes, and network configuration changes happen regularly.
Leaders must ensure a collective awareness of these changes and their correlating risk levels. They may want to establish a centralized change awareness process or even build a change-information hub as a resource for distributed teams. Doing so can help teams naturally deconflict their changes remotely, and minimize the risk to the business. And when a service goes down as a result of a change, the information will be there to quickly identify the causal change and rollback or mitigate the impact.
4: Use Metrics Proactively to Orient, Observe, Decide, and Act
It’s difficult right now to feel proactive. The sudden change to remote working and the subsequent actions to adapt have likely felt reactive and uncertain. The good news is your IT Operations team already has a number of metrics and KPIs it uses to report on processes, service performance, and even teams.
These metrics are usually pretty stable within a certain range during regular service conditions. But these aren’t regular conditions. As a result, these metrics might be on the move from their normal baseline behavior. Changes in usage patterns might present business opportunities, or cause capacity issues. Team activity and development velocity might indicate technical problems in adapting to remote work, or communication issues. Now is the time to start looking for outliers and trends that help define and quantify how service usage and team performance are changing. If CIOs can identify and act on trends, they can help the business emerge from the crisis healthy, or even stronger than it was before.
5: Consolidate Tooling
These days, many CIOs are faced with mounting solution sprawl. It seems every process, from data visualization to ticketing systems to project management has its own tool. In some cases, organizations purchase multiple solutions to serve the same need, which encourages silos and degrades communication Opens a new window and interoperability between teams. Supporting all of these solutions requires more time and resources, and makes it more difficult to maintain high availability across an ever-increasing surface area.
This crisis has caused organizations to reconsider how they operate, so it’s a good time to explore consolidating tooling. Examine the cost of each solution compared to the effectiveness it offers, and perform your due diligence to understand which of the industry’s offerings are truly best-in-class. If you can find places where redundant tooling is creating roadblocks or excessive costs, you can unify your teams on fewer tools, reduce your supported surface area, and reclaim some organizational agility for today and tomorrow.
6. Invest Intelligently
We know that legacy approaches and manual incident response processes aren’t equipped to handle the speed with which today’s IT environmentOpens a new window moves. And today’s crisis is exposing gaps in many organizations’ IT Operations capabilities. If anything, the challenges business leaders are encountering should solidify how critical IT Operations is to providing actionable information to your teams while filtering out the noise. This crisis might be the catalyst needed to trigger an investment in IT Ops’ people, processes and technology that should have been made years ago, and provide the necessary agility and scalability for the future.