Truth and Metrics: A Perfect Pair for Tracking App Changes


Engineers want to know exactly how feature changes affect their organization, so gathering as much feedback and data as possible is key. Measurement that combines feature changes with performance metrics can help democratize data, so that every stakeholder throughout the organization has access to the same information about the feature change and its impact.

Sometimes, we assume that development teams focus on changes to features and function in the user interface because conversion rates or user engagement are normally tracked as the most critical business metrics. But server-side innovations, such as machine learning or performance improvements, can really affect the user’s experience as well. In both cases, the goal remains to provide a stable user experience both before and after any change while tracking its impact.

When any of these changes happen, it is natural to want to know their impact. When were changes made? Who was exposed to the changes? Did the last change decrease the application’s performance? Did it make error rates increase? Did the change perhaps have a negative effect on other applications?

When these changes are rolled out progressively with feature flags (a practice that is becoming more common with each passing day), the most reliable method of understanding how the changes affected an application is to use measurement that corresponds to the feature flags. That may sound like common sense, but it’s not yet common practice. Why? Manually filtering or grouping metrics to align with feature rollouts (which are, by design, often changing) takes too much time. Feeling they don’t have the time to examine and measure what’s happening, teams are left to assume – which is risky and often far from the truth.

Feature flags used in conjunction with segmented experimentation provides the ability to track the impact individually. When developers can see the impact of any kind of application changes on business KPIs, that is true experimentation.

If you’re not already measuring backend changes this way, there are some important things to consider:

Two Steps Forward with One Click to Roll Back

Feature flags empower dev teams to perform testing during phased rollouts. Rolling out changes slowly, to small groups of users, means less risk and less chance of a negative user experience. But taking this approach also means you won’t always see much change in metrics. Everything will appear fine on your application performance management (APM) dashboard, so you’ll increase exposure to more users. By the time a large enough swath of the user base is exposed to a feature to see movement in the APM dashboard, too many users could also have been exposed to the problem, which essentially defeats the purpose of a phased rollout.

It makes sense, then, to make sure measurements are done with feature changes in mind. Developers should evaluate application metrics at every phase of the rollout to see how successful the change has been to the specific group exposed and to prevent any issues from affecting the rest of the user base, if necessary.

Uncovering problems, the hard way

Sometimes, the desired result of a change is simply to ensure that KPIs are not affected, that the change doesn’t increase error rates or make pages load more slowly, for example. But care is needed, especially if changes are continuously deployed without measuring KPIs, because it is quite possible that performance percentage points have worsened but have gone unnoticed.

The fact is, when there’s a performance problem and there’s no clear indication to help identify which specific bottleneck is the culprit, it causes a slow and painful death. Using measurement to discover these small performance bottlenecks may not stop the release of more changes, but it will help a team elevate and identify any issues earlier and with more precision.

For example, an online bank used a phased rollout for new features. Even for the small group of users, the bank saw many more database calls, which could affect performance. Because they kept a close eye on measurements like these, the bank was able to quickly address the problem through a cache as it scaled up, before the problem became a reason to roll back.

Finding the unexpected advantage

It’s important to look at metrics for business KPIs and other projects within the context of specific features or server-side changes. Not only can you identify accidental or unintended results, but you could also uncover surprising benefits or advantages.

An illustration of this is an e-commerce company that uses company revenue, a business KPI, as an indication of positive or negative effects coming from an application update. That’s because the company knows that application features can affect user experience, which can have a direct impact on the company’s top line. This awareness extends to company executives, who review results if revenue jumps by a certain percentage. No pressure there.

Evaluations After the Fact

While it’s beneficial to review engineering changes during a phased rollout, sometimes a post mortem is also helpful. In those cases, the team can look at the full impact of an application change, review the duration, and see what steps were taken to address any issues. Elite operational teams working on complex systems will want to monitor metrics tied to features over time, for different areas of the world, or for different accounts and so forth. Even development teams responsible for their coding may need to continue monitoring feature stats after the features are released. This is precisely why it’s critical for companies to keep a central log of changes, including information about exactly what was released, when it was released, and to which users. It becomes the single source of truth around each change.