A data-driven method for consolidating website content
How can we use data to identify and score the quality of web content in a comprehensive and efficient way?
Problem statement
As operations expanded over time, this client's web content had grown organically across more than one website. Members were finding it hard to locate relevant information, leading to increased calls into customer hub, while the websites themselves were sitting on an outdated tech stack, making it hard for any major changes to be made.
Background
The client was a well established professional body operating in AUS and NZ. Over the years their digital content had expanded into three separate websites, one serving AUS, one for NZ and the beginning of a consolidated AUS/NZ site.
We were brought in to assist the client with consolidating all three sites into a single, modern website. This included the following key tasks:
Content audit of the three websites
Recommendations put forward on what content to keep, remove and keep / uplift
Full website rebuild into a single site with robust foundation
Refresh of logo, branding and design system to launch alongside the new site
My role was the Lead Data Analyst, among a full service team of web designers and developers, content strategists, graphic designers and management consultants.
Approach
(1) Gather the data needed
The client provided access to the Google Analytics accounts that tracked data for the three websites. I exported a page level set of data to start analysing the web pages across all sites. I loaded this dataset into Microsoft SQL Server to better manage the large number of records and transform the data.
To complement what Google Analytics (GA) could track, I also used Screaming Frog (an SEO web crawler) to understand how content was accessible via search engine. Running this tool resulted in another set of data at the page level, which I stitched together with GA data.
(2) Determine how to score content quality at scale
My data export identified over 300,000 pages worth of content. Too large for us to manually sift through!
By working alongside our content strategist, we devised a set of metrics in which to quantify the quality and usefulness of each page.
Metrics included aspects such as:
Traffic to the page over the last 6 months > proxy for popularity or usefulness
Count of words on the page > proxy for comprehensiveness
Page depth i.e. how many clicks to reach the page > proxy of how easy or hard the content may be to find
Through the data cleaning process, I also identified potential duplicate pages and content – both across and within sites – based on page name and location.
(3) Categorise content to add context
Creating a quantitative score was only one part of the picture, we also needed to understand what the content was about by categorising pages into key themes.
In this step I started by navigating the websites myself, getting acquainted with the existing IA (information architecture) and how the client currently organised and conveyed their business and services to customers.
This personal research was complemented with going back to our wider team and discussing what they’d learnt from the client in meetings, as well as asking questions to clarify the importance of any one theme with either the client or my wider team.
The UX designers also ran a card sorting exercise with our team and the wider stakeholder steering co, to compare how the content was currently organised, versus where people would expect to find certain content.
Some content themes that came out were:
Legal pages, such as terms and conditions, privacy statements
Educational pages, such as learning how to do XYZ
Informational pages, such as how to apply to courses offered by the client
(4) Visualise data and present a story
Now we had a much better understanding of the content ecosystem across all three websites by applying both quant and qual layers of analysis, it was time to synthesise our findings into a digestible and persuasive story. We needed to show the client enough of our process so that they felt comfortable with our resulting recommendations – such as why we would remove certain pages, versus combining others.
I created a waterfall chart to show how we started with 300,000 pages and by the end of our analysis were recommending to keep or consolidate content into ~2,000 pages.
Then I created a custom network visual using D3.js to provide examples of where the hubs of content were (high traffic, linked to many other pages) and where pieces of content were sitting orphaned (pages not connected to any others, with reasonable or low traffic).
Together these visuals showed the client just how unwieldy it would be if we simply added the content of all three sites into the new redesigned site. Along with examples of how we identified quality content and areas to clean up based on themes (the hubs).
Outcome
A fast tracked content audit – from months of somewhat manual work, into four weeks of structured analysis.
My analysis was a key asset to bring the client on the journey, and was taken up by our content strategists to execute. Along with the client’s content team, they worked to write up the content for the new site.
This project was completed using agile methods, and the above analysis was completed over the space of the first two sprints (4 weeks) of the project:
The first sprint allowed me to acquaint myself with the data available and client’s needs,
The second sprint encompassed the content scoring and visuals.
This saved the client months of work if they were to go through this manually. It also provided a higher quality result, given the comprehensiveness of running the same process over all existing websites / content.