The Who, Why and How of Online Data

Linc Wonham
by Linc Wonham 05 Dec, 2011

Just about every business today has an online presence. Whether they are selling products, advertising, publishing content or using the Web as a brand-building and customer-engagement channel, companies want to get better at doing what they do online - and do it better than their competition.

For advertisers, this means increasing clickthrough rates and responses to online offers. Retailers want to increase sales and customer loyalty. Digital marketing companies that serve brands and content providers are seeking to demonstrate the return on investment their services provide, and so on.

This, of course, is where data analytics come in. It's no secret that online activities generate lots of data. Mined properly, online data has the potential to offer a treasure trove of insight about how to drive up sales, optimize ad performance, leverage social networks and more. There are a number of ways that companies can approach online analysis, however, so it's important to think carefully about what questions you need to ask and what technologies are going to help you do the best job of answering them. 

Aggregate reporting is the tip of the iceberg
There are a number of fee-based and free tools that focus specifically on Web analytics. Many of these solutions provide aggregate reporting on website visitor actions and "clicks." In short, they tend to focus solely on the the "what," as in, What content is being viewed most often by site visitors? What ads were clicked on? What did traffic look like the day of the special offer?  

While keeping track of everything that happens on a website is certainly useful, it provides a "page-based" view of online data. This can help analysts understand things like which pages are of greatest interest to the most people or identify overall traffic trends, but it doesn't tell much about who the visitors are, why they are there in the first place and how to most effectively engage with them. For example, Who are my most frequent visitors? Why do certain visitors browse but never buy? How can I predict what content will generate more ad responses?

Getting detailed insights into online behaviors such as these requires a more involved form of analysis.

Uncovering the insight that lies beneath
For a more "behavior-centric" view of online data, analysts need access to detailed data and the appropriate tools. They may also want to combine Web data with information from other sources such as CRM, sales and other back-office systems to get a more complete picture of who consumers are as individuals, what their histories are and how they prefer to engage. This kind of intelligent behavioral analysis drives the most value for several reasons.

One, it opens the door to far more personalized content and targeted marketing. Two, it helps companies segment online consumers to understand how to most effectively communicate with them. And three, it also allows for predictive analytics so that businesses can understand how changes to the website might be received, what types of advertising campaigns will resonate, and what offers might be compelling to which groups of customers.

But getting deeper behavioral insights can be difficult, and traditional tools are ill-suited for rapid-fire analysis. So, what's the best approach?

The new analytics paradigm: flexible, fast and efficient
To extract insight fast enough for it to be useful, load speed and query performance are critical variables. All that data needs somewhere to live, too, and with a need to keep sufficient data history for trending analysis, storage is also an important consideration. At the same time, the tools deployed must be powerful enough to generate deep, behavior-centric intelligence.

Building your own Web analytics data platform can take you beyond the realm of simple aggregate reporting, and the good news is that this is no longer a difficult undertaking thanks to newer database tools and technologies that can be deployed with minimal time, effort and investment.

Here are some key capabilities to look for: 

Fast, ad-hoc query capability. Because most off-the-shelf Web reporting tools have already pre-defined what can be analyzed, they don't offer much flexibility when the questions you need to ask of your data are constantly changing. For behavior-centric analysis, users need to be able to drill down into the data and get fast responses regardless of whether the queries are routinely executed or are created on the fly.

Support for diverse data sources. Fully understanding online behavior demands a single view of consumer demographics, history and activities across diverse channels and information silos. These sources may include traditional clickstream data, but they should also encompass relevant data from social media channels and mobile environments, as well information from CRM systems and other back-office sources. Keep in mind that data must be properly integrated and transformed before running queries, often requiring tools that support ELT (extract, load and transform) capabilities.

Efficient ways to handle Big Data. Increasing online data volumes are bumping up against the ability of most organizations to store and analyze it all. Continuing to throw more servers at the problem creates massive infrastructure footprints that are extremely costly to scale, house, power and maintain. A software-focused approach to more efficient storage can go a long way towards reigning in hardware costs.

Simple administration. High performance shouldn't require an army of database administrators to manually tune the database by creating indexes or projections or partitioning data. So look for technology that is self-tuning and self-managing.

Getting deeper insights into the online behaviors of website visitors can be critical to a company's competitive advantage. Today's most successful online businesses are looking towards these emerging approaches and solutions that will provide the flexibility, speed, efficiency and in-depth analytic capabilities to take them beyond simple aggregate reporting.

See below to learn how one company is using behavior-centric intelligence to gain a better understanding of what drives their customers and their own campaigns:

Behavior-centric Analysis in Action
For LiveRail, a leading provider of video advertising technology, getting deeper insight into the online behavior of content consumers was critical to the company's competitive advantage. Unlike simple online display advertising that primarily tracks impressions served and user clicks, LiveRail monitors more than a dozen engagement metrics including the percentage of content viewed, pause/resume actions and muting - giving its clients better intelligence about the performance of their campaigns.

Clients can use this intelligence to more effectively match ads with viewer demographics, increase clickthroughs and ultimately drive higher response rates. With a growing roster of customers, however, LiveRail was challenged to manage increasingly large data volumes while giving clients real-time access to the information for ad-hoc analysis and reporting.

LiveRail chose two complementary technologies to address these challenges: Apache Hadoop Hive and Infobright's analytic database. As an open source distributed system for data storage, Apache Hadoop helps LiveRail better manage the huge volume of cookie-level data that they capture. Summary data is created as well, and loaded into Infobright to enable LiveRail customers to conduct ad-hoc analysis and use the solution's robust reporting capabilities within LiveRail's platform. Customers can also schedule time to access the cookie-level data stored in Hadoop Hive if needed.

Columnar databases are increasingly being used for analytic applications. As the name implies, columnar databases store data column-by-column rather than row-by-row, enabling the delivery of faster query responses against large amounts of data. Most analytic queries only involve a subset of the columns in a table, so a columnar database has to retrieve much less data to answer a query than a row database, which must retrieve all the columns for each row.

This simple pivot in perspective - looking down rather than across - has profound implications for analytic speed. They are also simpler to use, so ad-hoc and complex queries can be set up without data partitions or indexes. Finally, most columnar databases provide data compression, which in addition to further speeding queries also means less storage hardware and lower costs. In LiveRail's case, using these two technologies in tandem resulted in better and faster service for its clients.

The challenges that LiveRail has had to overcome to provide its client base with timely, actionable, behavior-centric intelligence parallel those faced by a number of modern businesses operating in the online arena. Like LiveRail, these businesses will look towards next-generation approaches and solutions that provide the flexibility, speed, efficiency and in-depth analytic capabilities that take them beyond simple aggregate reporting.

About the author: Susan Davis is vice president of product management at Infobright, an open-source analytic database provider being used by enterprises, SaaS and software companies to provide rapid access to critical business data.