How to Stop Playing the Wi-Fi Blame Game


Automated analysis of client transaction behavior across the network reveals that poor device performance isn’t always a Wi-Fi problem.

The Wi-Fi blame game is in full force. As enterprise access networks are littered with wireless-connected IoT devices, smartphones and cloud-based applications, finding and fixing problems that impact device performance and security has never been harder.

Wi-Fi is unjustly charged as the culprit for almost every user network problem. But as new, full-stack infrastructure analytics tools emerge, IT staff are proving that it’s not always Wi-Fi’s fault. These systems automatically analyze and correlate every client transaction on the network to pinpoint the precise cause of device performance and security problems.

Wi-Fi is generally more complex and difficult to manage than other parts of the network. Pervasive coverage, strong and consistent RF signals that yield higher data rates requires visibility into the invisible. Unlike the wired network, Wi-Fi requires delivering, pervasive coverage, dealing with constantly moving devices and an ever-changing RF environment that must contend with interference, attenuation, obstacles and other amorphous attributes of the unlicensed band.

However, what’s often overlooked is that wireless is dependent on the wired network, IP network services, such as ARP, DNS, DHCP, AAA, application health and solid WAN connectivity to deliver the best possible user performance. If anything along this client network journey goes wrong, it’s up to IT staff to prove that it isn’t always a Wi-Fi problem.

Today that’s easier said than done.

Manual Network Data Analysis Has Become Problematic

Identifying the root cause of any individual device or systemic network incident often involves analyzing the behavior of client transactions across the entire network. This means IT staff, often different people responsible for different parts of the network, must constantly monitor the health of network transactions across every layer of the infrastructure. This is typically a manual process that requires examining disparate data gathered from logs, packet captures, and vendor management tools. And as network traffic explodes, this has become a daunting task for IT professionals – resulting in “tool overload”.

While a myriad of vendor monitoring systems exist, they typically examine only a single infrastructure element or portion of the network. Because network connections require interdependencies between different layers, connection context is often lost or non-existent. In turn, IT staff must correlate transaction behavior of clients across the network to pinpoint where a problem is hiding and the impact of the problem across tens or even hundreds of thousands of end devices.

A recent analysis of a major enterprise network IoT problem illustrates this precise problem.

A Tale of Misconnecting IoT Robots

A global manufacture of plastics and other engineered products had purchased and deployed autonomous guided vehicles (AGV), so-called IoT robots, within its massive warehouse operations. Wi-Fi connected autonomous guided vehicles are one of the newest IoT systems to find their way onto enterprise access networks. This is not a small IT environment. Over 1,500 access points across some 30 VLANs service a network with more than 14,000 clients connecting at any given time.

The newly deployed AGVs used Wi-Fi to connect to the network but would randomly stop working. For more than 6 months, the IT organization saw these systems fall off the Wi-Fi network. And these are not inexpensive devices.

The AGVs are designed to constantly check in to their command and control several multiple times a second. If a robot misses a certain number of “check-ins”, it freezes on the spot. This required manual intervention with the device on the warehouse floor to reset the IoT robot. The device behavior was clearly abnormal. But where was the problem hiding?

It’s a Wi-Fi problem. Or is it?

In their frustration, the network team approached the AGV vendor for assistance. The response received was the typical: “It’s the Wi-Fi network.” Wrong answer, but quantitative proof was needed.

The wireless network had recently been upgraded and was functioning flawlessly with the exception of a particular SSID to which these IoT robots were connected. And because these IoT devices were directly tied to line of business goals, time was of the essence. The traditional means of triaging the problem would simply take too long, negatively impacting revenue and productivity.

To identify the root cause of the problem, network engineers needed to analyze in real-time each network transaction that the robots were performing so that they could understand the relationship between the transactions to get to the heart of the problem. This could take weeks of scouring through log, raw packets and other data from a variety of disparate vendor tools. Instead, IT engineers turned to an AI-based analytics platform that was divorced from any infrastructure vendor.

The software-only platform could be quickly deployed to ingest and automate the analysis of packet data spanned from switches, wireless metrics from WLAN controllers, SYSLOG data from DNS, DHCP and other network services, WAN flows and application response times. By doing this, in short order, IT engineers identified the issue. And guest what? It’s wasn’t a Wi-Fi issue.

An excessive number of DHCP problems were occurring on the SSID to which the AGVs were connected. A closer look revealed that a DHCP snooping setting on a core router was holding the DHCP packets for too long before forwarding to the appropriate destination.

DHCP snooping is a layer 2 security technology built into the operating system of a capable network switch that drops DHCP traffic determined to be unacceptable. The fundamental use case for DHCP snooping is to prevent unauthorized (rogue) DHCP servers offering IP addresses to DHCP clients.

DHCP snooping filters messages. In an enterprise network, devices under administrative control are considered trusted sources. These devices include the switches, routers and servers in the network.

Once a DHCP snooping setting was changed on their core routers, affected devices saw more than a 36% improved performance through the reduction of DHCP protocol latency issues (see figure below). Consequently, the intermittent connectivity of the AGVs disappeared.

This was the precise data that the IT staff needed to prove that this incident wasn’t a Wi-Fi issue. Screen shot validates before and after device performance after DHCP setting was removed from core router.

Automating Network Data Analysis No Longer Optional

While Wi-Fi continues to be a pain for IT staff, it’s simply not the source of every network problem. Siloed infrastructure management solutions were never designed to analyze and compare the preponderance of data running across today’s enterprise infrastructure to solve these sorts of issues.

Moving towards intelligent, purpose-built analytics platforms is an essential first step. This helps organizations to reduce the cost and glut of vendor tools they must buy, learn and use to do their jobs.

With these new AI-based platforms, IT can now automate the analysis and correlation of huge volumes of traffic, reaching faster answers to complex network questions along with essential context to performance and security problems that can cut remediation times in half.