A Guide to Web Analytics

A Guide to Web Analytics

The digital visitor rate is for M.E. Grenander Department of Special Collections & Archives is usually 90 times bigger than the in-person visitor rate. That means that we don’t even physically meet a majority of our researchers, but they are still reading and interacting with our materials. But how do we know that we are serving their needs? A mixture of understanding your audience and web analytics! Being knowledgeable in web analytics is not a typical requisite for an archivist, but when your archive has a digital presence particularly one that can be more frequented than your physical location, you may need a general understanding to meet user expectations. 

Let’s look at some of the most common/helpful analytics areas for archives:

It should be noted that all archival staff computers are excluded from our tracking metrics.

Web Structure

  • Visit: When a user comes to the site, and spends a duration of time and performs any amount of actions, it is called a visit. If the same user comes and leaves and comes again multiple times in a day, each visit is counted in this metric. 

    • Unique visits: If the same user comes and leaves and comes again multiple times in a day, only one visit is counted per day. 

  • Pageviews: When a visitor loads a webpage, it is counted as a pageview. If a user reloads or refreshes a page multiple times, each new load of the page is counted. This can be conflated with bot attacks as bots will attempt to reload challenge pages over and over to gain entry. If a user backtracks to a page they have already visited, it is counted again.

    • Unique pageviews: Reloaded or refreshed pages are not counted in unique pageviews, each page is only counted once.   

  • Visit duration: The total time a user spends on the website in its entirety. Not typically counted on a page-by-page basis for individual users, rather the total time spent from initial page load to exit is calculated.  

    • Average Visit Duration: The average duration of all visitors, for a period of time (days, weeks, months, etc.) and/or a specific webpage, depending on the viewpoint. This may have also been impacted by bot attacks because they load the website, load the verification page a few times, and then leave, all under a minute. Having many users that leave the website very quickly lowers the average. 

      • It should be noted that we see a wide range of visit durations on the site which means that the average visit duration can be misleading or misrepresent without proper contextualization. We see users come to the site after a search engine search, find their answer and then immediately leave. We also see users that spend hours on the site, sifting through multiple digital objects and finding aids. Having such a wide range of user types can conflate data. 

  • Bounce Rate: If a user immediately leaves the website after visiting one page, they have been considered to have “bounced”.  We measure the percentage of users that “bounce” compared to overall user participation. This is a standard marketing term that doesn’t necessarily apply to archives. In marketing, they want users to spend as much time on the site as possible to increase the amount of money spent. In archives, we want users to find the information they are looking for, whether that be on one page or spread across many. What we don’t want is for users to leave because they find the website frustrating or unhelpful, which is why we look at the bounce rate to ensure that it's within acceptable bounds.

    • Note that bots will inflate this number because they never get past the challenge page. 

  • Actions: Pageviews, downloads, outlinks, and internal site searches are all counted as actions on the site. We look at the total number of actions taken by individuals in one visit as well as the maximum number of actions taken in one visit (in a designated period of time). Currently the highest number of visits we have seen (by an actual patron, not a bot) in one visit is 321 in a period of 1 hour and 3 minutes. 

    • It should be noted that bots have also inflated these numbers, especially prior to our implementation of the turnstile, because each time a page was reloaded it counts as a new action. 

    • Average actions: The average number of actions of all visitors for a chosen period of time. It is not necessary to create or establish a desired number of average actions, but it should be monitored for dramatic or sudden increases or decreases.  

User Analytics “Visitors”

  • IP address: The IP address of the user is recorded as part of the user profile automatically. This information is not shared with any outside parties or sold to any advertising companies. We can use IP addresses to see where users are visiting the site from. We have a specific breakdown to see if the users visited from the campus IP. 

  • Locations: It helps us to know, from our metrics, where our users are visiting from. Since digitizing more content, we have seen an increase in international visitors. It is helpful for the department to know the range of our reach, and how much local traffic we are generating.    

  • Device Type: This metric tells us what type of device the user accessed the website with. You would expect that most users access from a Desktop (on Matomo Desktop refers to both desktop computers and laptops), though our second highest is smart phone, which reinforces the belief that our site should be mobile responsive and accessible on smaller screen sizes. When the website is updated, we always test on both a desktop and laptop. 

  • Visit Logs: Instead of overall sums and averages, the visit logs shows all of the actions taken by each user: time spent, number of actions taken, IP addresses, locations, method of entrance to the site, software, device type and if any goals were met during the session. Logs are often scanned each day to try to find organic patterns and actions outside of the data shown.

  • Not often used:

    • Real Time Map: Shows geolocation for past periods of time (commonly last 7 hours). A map of the world is shown, with circles of the user's locations. The larger the circle, the more pageviews from that location. Hovering over a circle provides the location, device type, software, page of access, time since access, entry method, number of pageviews, and local time of the user. Clicking on a location on the map zooms into the country/continent of the user.        

    • Software: Shows the operating system of the user (such as Windows, Mac, Linux etc.), the Browser of the user (such as Chrome, Safari, Firefox etc.), and what combination is most frequently used (example: GNU/Linux/Chrome/800x600) which are called “configurations”). 

    • Segments: Segments have been created to help narrow our audiences or, primarily, exclude bots from our primary statistics. We have been working on trying to create a segment to only look at users connected to the UAlbany Wi-Fi, but it’s not consistent enough to be of any use. There is a Segment tab under “Visitors” where you can look at all the segments, and their trends, at once. To facet by segment across multiple metrics, click the “All Visits” tab at the top of any page, then click on the segment that you wish to view.

    • Screenshot of Matomo Segments tab with eight segments displayed.

       

      • EXCLUDE Bots-I have this segment selected almost always; the way our turnstile works is that bots get the challenge page but cannot get past it. There is no way to exclude the challenge page, because it is the same page that every user, bot or real, first experiences. Instead, this segment excludes all users who only perform less than 2 actions on the site (loading the challenge page). Some bots will still get through (those who refresh the page more than once) but it will be significantly less. 

  • Not used:  

    • Times: Local times of users as well as the amount of visits per local time (i.e. “7 PM is the most frequent time for visitors in their respective time zones”). 

    • User IDs: If our website had a login option or registered users, the unique identifiers would be tracked here. 

    • Cohorts: A group of users who have the same acquisition date (the date a user started visiting your website). This Cohort analysis section allows you to view and compare different cohorts’ behavior over time to understand how well you retain and engage visitors.

Behaviors

  • Page vs. Page Titles: The page is identified at the URL path. The Page Title comes from the website and is usually written in layman's terms. 

    • Example: 

      • Page:

        • https://archives.albany.edu/description/catalog/ua500aspace_226bdc218ab62a9d5100c5f4690ebac1

      • Page Title: 

        • “Circular of the New York State Normal College for 1901, 1901”

  • Entry Page: Page that the user entered the website on. 

  • Exit Page: Page that the user exited the website on. 

  • Paths: The exact sequence of pages that individual users visit during their session. 

    • User Flows: The average or most common paths that users perform that indicate flow patterns. It is a good way to establish common drop off points or dead ends where users could be getting stuck.

    • Transitions: Transitions are similar to user paths but instead of showing overall paths or patterns, it shows percentage breakdowns of the most to least common next pages or exits. It also shows the most common entry point for the page, whether that be another page, direct entry, search engines, etc. The interface allows you to click into the desired next page and follow certain pathways. 

  • Site Searches: All terms that users use to search the site internally (does not include search engine keywords that allow users to enter the site externally) and the number of times that the terms have been used in a specified time period.  

    • Unique site searches: Unique site searches only account for the total individual list of search terms used on the site. It does not duplicate terms, and does not account for the number of times that the term was used on the site.   

    • Pages following a site search: We can also see a list of the most common pages that follow an internal site search, with further breakdowns for number of times clicked in the search results and number of pageviews.   

  • Downloads: Downloads account for our downloadable material (pdfs) on the site, in this case finding aids, digital objects, and about documents (maps, forms etc.). A separate report has been set up just for viewing these materials. We cannot currently see if items have been downloaded in other formats, or how many times.  

    • Unique downloads: List of individual items that have been downloaded, without counts as to how many times. 

  • Outlinks: The links users click to exit the site. These are links that do not have the “archives.albany.eduarchives.albany.edu ” path. There are ways to establish links as included within our purview (such as “media.archives.albany.edu” for our digital objects) in the systems setting. Outlinks should be known to us as ways for users to exit the site. Sometimes outlinks are supplementary resources like copyright or LibGuides and we hope they are viewed alongside the website, or viewed and then returned to the site. 

    • Unique Outlinks: List of individualized links that have been used, without counts as to how many times.

  • Engagement: Compares returning versus new visits which is called the “Frequency Overview”. It lists the number of returning and new visits, average visit duration, average actions per visit, bounce rate, and total number of actions for returning and new visitors. It also accounts for the amount of visits per visit number (is this their first, second, third…all the way through 201+ visits) and days since last visit (new visit, 0 days, 1 day…all the way through 365+ days since last visit). 

  • Not used: 

    • We do not utilize the “Events” or “Contents” tabs because they have to be defined and the parameters are more akin to ecommerce than archival use. 

      • Note: Events, without the custom configuration, currently only track when video or audio is engaged on the site. It breaks down when the seek, pause, play, resume, and finish features are used (called “event actions”) as well as the link of the material that was engaged.     

Acquisition 

  • Acquisition refers to how users are finding and accessing our site. 

  • Channel Types:

    • Direct Entry: Refers to users that enter the URL of the website directly into their web browser. Direct entry is typically the predominant method of entry for our site. This can be inflated by bots that load webpages directly.  

    • Referrers

      • Search Engines: Our Google Search (the Google platform that allows you to see what search terms are used to find and access your site) account is linked to Matomo so we can not only see the search terms leading user to our site, but also the pages they went to after entering the site and subsequent pathways. You can also see all of the different search engines that lead to our site (Google is the most common by FAR).

      • Websites: External websites that have driven traffic to the site. Wikipedia is our highest external web source by far. 

      • Social Networks: Social platforms, such as Reddit, Facebook, Bluesky, LinkedIn, Github and more, that are driving traffic to the site. 

      • AI Assistants: We do allow AI tools access to our site so that they can refer to information in our collections and even link to our website in their answers. While it does not account for a large percentage of our traffic, we like to keep the option open as it becomes more prevalent as a search technique for researchers.

  • Campaigns: Campaigns are designed for ecommerce sites and are not particularly helpful to archives. It allows you to analyze the visits associated with various tracking values, but values are usually designed to encourage sales. “Campaigns” usually refer to digital advertising/marketing campaigns with specified goals to increase traffic or sales of collections/time of year/products etc. There is also a Campaign Builder, but we do not currently use that tab either. 

Goals

  • Overall: Goals allow us to create and evaluate metrics that would go unnoticed, or harder to determine, if only looking at the standard evaluation tools. Most are based on  loading a specific URL, or completing a form. Some of the goals can be completed more than once a session by the same user.

  • Our current goals are: 

    • “Book a Meet” Form

      • Description: Determine when visitors visit the "Research consultation" page.  

      • Triggered when: Clicked on an external link “https://albany.libcal.com/appointments/mmcmullen”

    • Contact Form

      • Description: Determine when website visitors complete the contact form page. Will be cross-referenced with data from LibWizard.

      • Triggered when: Clicked on an external link “https://albany.libwizard.com/f/contactus?i_have_a_questi=Special%20Collections%20&%20Archives=”

    • Download Action

      • Description: When users use one of the download options on our digital objects. Does not work with the pdf option below the digital object box. 

      • Triggered when: When URL equals "?download""

    • Download PDF

      • Description: Determine when visitors open a PDF (typically a finding aid) to download.

      • Triggered when: External link contains ‘pdf’

    • Materials Request Form

      • Description: Determine when website visitors complete the request materials form. Will be cross-referenced with data from LibWizard.

      • Triggered when: Internal URL contains “https://archives.albany.edu/reference/”

    • Open Digital Object in a New Tab

      • Description: Determine when visitors open the digital object in a new tab digital objects via the "media.archives.albany.edu" page

      • Triggered when: Click to external website contains “media.archives.albany.edu”

    • Open Digital Objects Full Screen

      • Description: Determine when visitors open the digital objects via the "media.archives.albany.edu” page in full screen/new tab 

      • Triggered when: Visits page containing “media.archives.albany.edu”  

    • Visit Preservation Lab Page

      • Description: Though it is considered to be an outlink, this goal will see when visitors leave our site to visit the preservation lab page (hosted on the main Library site) as our site has the most direct path to the preservation lab page.

      • Triggered when: Click to external site contains “https://library.albany.edu/preservation”  

    • Visit Us Page

      • Description: Determine when visitors visit the "Make an Appointment" page.

      • Triggered when: Visit to a page contains “https://archives.albany.edu/reference/?visit”

  • Goals are flagged in visit logs when completed as well as the Goals>Overview tab. Successes are determined by “conversions” with the ratio of conversions to visits creating a “conversion rate”. 

  • Rarely Used: The Multi-Attribution (determines how much credit towards your Goals each of your marketing channels is actually responsible for) and Conversion Exports (allows you to report conversion data to supported ad platforms without needing to embed third-party tracking pixels).

Funnels

  • This feature is not often used 

    • Funnels are used to determine where visitors drop off. Typically this is used to increase revenue of existing traffic, sales, and conversions (sometimes goals).  

    • To build a funnel, you set up the desired user path of a user and then see how many users follow that path. You can then make adjustments to the pages to see if it increases users following the path and conversions along the path. 

Forms

  • This feature is not often used 

  • Matomo automatically defines forms that it finds on the website. All forms listed are generated from Matomo.

  • It is difficult for us to add our forms to this feature as their URLs are outside of the main webpage. That’s why we combine our Matomo metrics with LibAnswers. It is also why our form competitions are listed under goals. 

Media

  • This feature is not often used 

  • Media refers to playing Video and Audio on the website. This doesn’t always work with our digital access system, the IIIF viewer. When it does work, it can show the impressions and plays of a video/audio, if it was scrubbed or rewound etc. It also shows an audience log and breakdown of those who did interact with audio/video as well as an audience map. When it works, these views also appear in the Visitors log. 

A/B Tests

  • This feature is not often used 

  • Also frequently used in marketing, A/B testing allows you to compare different versions of the site and see which variation is more successful. A common use case would be testing two different landing pages for an advertising campaign to see which one garners more sales.

  • This could be helpful for us if we wanted to compare two different displays of the same collection page or objects etc.  

Heatmaps

  • Heatmaps are a wonderful resource for understanding how users move across pages: scrolling, clicking, and hovering over information. They have to be set up on individual pages, they do not happen automatically.

    • To set them up, you have to determine the number of page views that you wish to capture in total as well as your sample rate. Ex: if you capture 1,000 sessions at a 10% sample rate meaning every tenth visitor will be recorded for 100 users total. The lower the percentage you select, the longer it will take to reach the selected sample limit. This helps to fully randomize the data.

    • You can also ensure that users spend a minimum number of seconds on the page before they are recorded. This could be to prevent capturing bots (which we don’t need because of our turnstile) or users that are going to immediately bounce. We don’t currently use this parameter.  

  • Our most frequently used heatmaps are:

    • Main Page

    • A-Z Lists of Collections

    • Online Content- ESPY page

    • Rare Books Page

    • New York State Political Archive Collection Page

    • National Death Penalty Archive Collection Page

    • German and Jewish Emigre Collections

    • University Archives Collections Page

    • Mathes Children Literature Collections Page

    • Business, Literary, and Local History Manuscripts

    • Online Content- UAlbany History 

  • You will be able to facet heatmaps by Action: Click, Move, and Scroll.  You will also be able to facet by Device Type: Desktop, Mobile, and Tablet. Like other heatmaps, red indicates a larger presence and blue a lesser one with a range of volume in between. The top of the heatmap will also tell you the total number of captures so far.  

  • Note: The heatmap screenshots the first user that looks at the page which can lead to some odd presentations. Sometimes the page doesn’t load with all of our branding, sometimes it contains the facets or search terms that the user utilized to land on that page.  

  • Note: Heatmaps expire after 3 months so if you want to keep the data, you have to regularly export the reports by download. This will only download in Excel/data format, not the actual screens. You will have to screen record or snippet to save the visualization of the data. 

Session Recordings

  • Session recording offers similar insights to heatmaps but instead of general patterns, it records individual users sessions. These are anonymized and record what the users see on their screen, without seeing the search terms (those are captured in the visitors log). It also records Actions: click, move, scroll, resize, form change, and change within the page. It is recorded as a differently colored line that moves across the page as the user does.

  • Session recordings will appear within the Visitor Log and can be replayed from there.  

    • To set them up, you have to determine the number of page views that you wish to capture in total as well as your sample rate. Ex: if you capture 1,000 sessions at a 10% sample rate meaning every tenth visitor will be recorded for 100 users total. The lower the percentage you select, the longer it will take to reach the selected sample limit. This helps to fully randomize the data.

    • You can also ensure that users spend a minimum number of seconds on the page before they are recorded. This could be to prevent capturing bots (which we don’t need because of our turnstile) or users that are going to immediately bounce. We don’t currently use this parameter. 

  • Note: Media within our IIIF viewers are not captured in Screen Recordings as it is a separate viewer that just appears on the page.

  • Note: Session Recordings expire after 3 months so if you want to keep the data, you have to regularly export the reports by download. This will only download in Excel/data format, not the actual screens. You will have to screen record or snippet to save the visualization of the data. 

Custom Reports

  • Custom Reports allow you to highlight data that would normally go unnoticed or areas of interest more akin to archival practice than ecommerce. Reports let you take preexisting dimensions and metrics and put them alongside each other for consideration. You can preview the report before setting it up. After saving, it takes time to generate the report. 

  • You can also email these custom reports at intervals of your choice (we send them weekly, but you can also send them daily or monthly) as well as the standard reports.

  • Reports we’ve created so far, all exclude challenge pages:

    • APAP Collection Visitors: A report that tracks which “apap” (political, New York Modern Political and Death Penalty typically) collections are being visited and which collections are the most popular. It looks at Page Titles and URLS to look at number of Visits, Total Actions in a Visit, and Hits (unique number of actions). It is sent weekly.

    • UA Collection Visitors: A report that tracks which “ua” (University Archives) collections are being visited and which collections are the most popular. It looks at Page Titles and URLS to look at number of Visits, Total Actions in a Visit, and Hits (unique number of actions). It is sent weekly.

    • GER Collection Visitors: A report that tracks which “ger” (German and Jewish Intellectual Émigré) collections are being visited and which collections are the most popular. It looks at Page Titles and URLS to look at number of Visits, Total Actions in a Visit, and Hits (unique number of actions). It is sent weekly.

    • MSS Collection Visitors: A report that tracks which “mss” (manuscript) collections are being visited and which collections are the most popular. It looks at Page Titles and URLS to look at number of Visits, Total Actions in a Visit, and Hits (unique number of actions). It is sent weekly.

    • Top Collections Page - All Collecting Areas: A report that looks at the highest performing collection across all of the collecting areas. It looks at Page Titles and URLS to look at number of Visits, Total Actions in a Visit, and Hits (unique number of actions). It is sent weekly

    • Media Interaction Report: A report that looks at how our digital materials are being interacted with (primarily downloaded). It looks at the Download URL and the number of hits (unique number of actions). It captures pdfs and is sent weekly.

    • The Brothers Collection Tracking: A report that looks at how “The Brothers” collection (apap081) is interacted with since we digitized all copies of “The Albany Liberator”. It is not sent weekly, rather it accumulates data and is referenced when asked. 

    • Wikipedia Acquisition Tracking: A report that tracks how linking our finding aids to Wikipedia has increased traffic. It looks at the Visits and Total Actions in the visit for users who entered the site through Wikipedia (tracked through name and URL). It is not sent weekly, rather it accumulates data and is referenced when asked. 

    • Brightspace Acquisition Tracking: A report that tracks what pages users are going to if they come from Brightspace. It looks at the Page Title and URL and Visits and Total Actions in the visit for users who entered the site through Brightspace (tracked through URL). It is not sent weekly, rather it accumulates data and is referenced when asked. 

Crashes

  • This is a relatively new feature that identifies and creates an overview of the crashes that are occurred on your website within the selected period. You can see the number of crashes over time, along with which crashes had newly occurred, which crashes disappeared and which crashes reappeared after an absence.

  • You have the option to ignore crashes, if they are a known or accepted issue, as well as merge similar crashes from the same source file to reduce noise in your data. When crashes are merged, reports will treat them as the same even if their messages differ. Note that only crashes from the same source file can be merged, and inline crashes cannot be merged.

  • You can view crashes by page URL. They also appear in the Visit Logs.

AI Assistants

  • A new feature that allows greater insight to how AI bots are driving traffic to the website. We currently do not have the AI Chatbot tracking set up on the website.  

  • AI Chatbots Overview: Provides insights into website traffic originating from AI chatbots such as ChatGPT and other large language model–based assistants. These reports track key metrics including the number of requests made by these bots, the pages and documents they access, and any errors encountered. They also offer detailed breakdowns showing which bots visit specific page URLs, helping you understand how AI chatbots interact with your content and identify opportunities to improve visibility and accessibility for AI-driven users.