Empowering Website Management Teams Through Efficient Performance Monitoring
By Mehdi Daoudi, Catchpoint
One of the overarching challenges for any website management team is delivering exceptional Web experiences to their end users in the face of the vast complexity that is the modern Internet. Between expansive infrastructure needs, reliance on multiple technology providers, an ever-increasing array of third-party tags, and the different challenges presented by various network limitations around the globe, ensuring consistently superior Web performance is a never-ending uphill battle.
Fortunately, there are plenty of strategies that organizations can use to combat these obstacles, helping to promote strong end user experiences while also respecting the time of website management teams and empowering them with precisely the right information to do their job:
1) Test at all hours of the day
Obviously, preventing end-users from experiencing performance issues on their Web page requires organizations to catch these issues early. To do that, round-the-clock testing is necessary.
This is what synthetic (or active) monitoring accomplishes by testing the site’s infrastructure from a controlled environment – simulating transaction paths to determine how quickly end users are able to load a page– and alerting website IT operations teams about any problems that are detected within their own infrastructure.
However, to gain the truest possible picture of site performance, the advance warnings offered by synthetic monitoring should also be supplemented with real user measurement (RUM) data. By collecting data from real end-user site visits, website operators can learn what those users are doing once they enter the site and as well as the speed of critical conversion paths. Even more importantly, website management teams can glean what impact the site’s speed has on user engagement and conversions. After all, there’s no point in having a fast homepage if the experience slows way down once end users get into the site.
However, one of the drawbacks is that RUM does not provide insight on when downtime happens, or to whom it happens. Downtime is also no longer just about outages that impact all end-users -- due to problems with isolated systems, it can affect certain users in the form of micro (or partial) outages. Hence the need for a joint strategy of synthetic monitoring and RUM.
2) Focus on what matters
Given the complexity involved in rendering modern websites to end-users, it becomes necessary to implement a comprehensive monitoring strategy covering all critical end-user geographies, and for all of the different systems that make up a site. However, the old adage about “too much of a good thing” also comes into play here, because while organizations want to keep an eye on all systems which can ultimately impact their web performance, the vast array of elements and services can also lead to a deluge of data that leaves website management teams struggling to keep their heads above water. When it comes to pinpointing and fixing a specific problem with a website, it’s akin to trying to find a needle in a haystack, if there are mountains of data to sift through.
To overcome this dilemma, it’s important to focus on items that are actionable. If there’s nothing that can be done about an issue (e.g. mobile network degradations in a particular geography), then it’s probably best not to wake people up with alerts about it, but rather just keep an eye on the issue or have quick access to it in case user complaints do arise.
Even more important, however, is to use a monitoring tool that provides extensive analytics and charting capabilities which can isolate the different systems that make up a website. In other words, organizations can see directly how their overall site performance fares, both with and without a third-party service. This helps them determine if having that service is worth any possible performance degradation. In addition, organizations can see the impact of different optimization techniques overall on a site, through the establishment of baselines and compilation of historical trends.
To aid this effort, it’s important to have the capability of long-term data storage, as opposed to just a few months. By saving data for 2-3 years, website management teams can analyze this data and compare trends from a month-over-month or even a year-over year perspective. This not only helps these teams identify problems that might occur only once or twice annually, but also understand and prepare for seasonal impacts on their web traffic.
3) Screen your guests
A performance outage caused by a failure within one’s own infrastructure is one thing. But a performance issue caused by third-party technology vendors or third-party tags is completely different. Knowing that it was avoidable makes the ramifications that much harder to stomach.
With the number of third-party tags exploding across the Web, as well as more and more specialized technology vendors like DNS management, CDNs, site accelerators, etc., website management teams are under increasing stress to monitor their third parties in order to ensure that they don’t negatively impact the end users’ experience. Much of this stress can be alleviated by putting these vendors and tags through a thorough vetting process before incorporating them into a site. Website operators must set clear baselines that every third-party provider must meet before they can be put on the page, and ensure they maintain those minimum standards with clear-cut SLAs once they’re there.
By monitoring both first- and third-party systems, organizations can better understand overall performance before and after a tag is placed on their site, which better enables them to work with third-party providers to optimize performance. Furthermore, establishing a contingency plan for failures can allow website management teams to quickly remove a malfunctioning tag until the matter is resolved.
4) Reduce false alarms
Is there anything more frustrating than getting woken up in the middle night for an urgent matter that eventually is revealed to be a false alarm? It’s one of the biggest complaints that website management teams have, and often come as a result of problems with the network rather than the website itself, or with a third party as opposed to a first party. What’s needed is an alerting system that can differentiate between these and verified failed tests before raising the alarm.
The end goal of all of this is to enable website management teams to give their sites’ end-users get the best Web experience possible, while also narrowly focusing time and effort spent on performance optimization. By maintaining a performance monitoring initiative in which testing is done around the clock, unnecessary data is filtered out, and plans are put in place to fix any problems that do arise with third-party site elements, website management teams will be able to do their jobs more efficiently, and ultimately save the company the most valuable resource of all: time.
Mehdi Daoudi is the co-founder and CEO of Catchpoint Systems, where he combines in-depth expertise in developing and operating large on-demand software platforms with hands-on experience in application monitoring, performance management and IT operations practices.