Simple and privacy respecting web analytics using NGINX and GoAccess

Simple and privacy respecting web analytics using NGINX and GoAccess

GoAccess is a log file parser that can transform your NGINX access logs into an analytics dashboard. It provides website stats while retaining user privacy.

December 28, 2023
Table of Contents

Google Analytics is the de facto standard for web analytics. It provides many advanced features such as conversion tracking and real-time user counts. However, it's designed to track user activity across the web and can be considered an invasion of privacy. Furthermore, its use requires adding a blob of proprietary JavaScript to your website.

I would argue that for most website owners, the only metrics that are truly valuable is the number of users, the pages they visited and where they came from. It just so happens that the information in the NGINX access log can be harnessed to generate fresh activity dashboards. GoAccess is a log file parser that can accomplish this task. With a little configuration it can provide great analytics with no user tracking and no added JavaScript.

This article will explore how this can be used with NGINX on Ubuntu. The gist of it should be the same no matter the web server or Linux distribution used.

Installing GoAccess

On Ubuntu and Debian, GoAccess can be installed through the default repositories. Simply install the tool using apt.

sudo apt install goaccess

Once installed the goaccess command should be available.

jamie@jmh:~$ goaccess --version
GoAccess - 1.5.5.
For more details visit: https://goaccess.io/
Copyright (C) 2009-2022 by Gerardo Orellana

Build configure arguments:
  --enable-utf8
  --enable-geoip=mmdb
  --with-openssl

Alternatively, you can download the latest version from a tar archive. Simply follow the installation instructions.

Generating a dashboard

By default, NGINX writes the access log for all sites to /var/log/nginx/access.log. If you've changed this path, you can simply use that instead.

To generate an HTML dashboard using the log file, use the following command:

goaccess -c /var/log/nginx/access.log -o dashboard.html --log-format=COMBINED

This creates a nice looking dashboard, dashboard.html in this case, that can be viewed in a web browser. Just like the GoAccess demo.

Example dashboard

You can tweak the generated panels using the GoAccess configuration file located at /etc/goaccess/goaccess.conf by changing the enable-panel and ignore-panel directives.

Creating a password-protected page

Generating a file locally on the server isn't particularly useful by itself so, let's create a password protected subdirectory on our site to view it.

Creating the web root

Let's create a root directory for our dashboard and change the owner to the web server user and group.

sudo mkdir /var/www/goaccess
sudo chown www-data:www-data /var/www/goaccess

Generating a .htpasswd file

In order to add a password to our directory, we'll need to generate an Apache .htpasswd file with our credentials.

You will need the apache2-utils. It can be installed using the following command:

sudo apt install apache2-utils

Let's create the file in the NGINX etc directory to keep it safe. You will be prompted for a password when running this command.

sudo htpasswd -c /etc/nginx/.htpasswd-goaccess [username]
sudo chown www-data:www-data /etc/nginx/.htpasswd-goaccess

This has created a password file at /etc/nginx/.htpasswd-goaccess.

Updating our host configuration

Update your site configuration file stored in /etc/nginx/sites-available with the following location block.

location ^~ /goaccess {
    alias /var/www/goaccess;
    index index.html;
    auth_basic "Login";
    auth_basic_user_file /etc/nginx/.htpasswd-goaccess;
}

This will allow us to navigate to https://website/goaccess to view the contents of the /var/www/goaccess/ directory. It's protected by basic HTTP authentication using the .htpasswd file we generated.

Automating the generation

At the moment, there is no dashboard at this location. Let's automate the generation with a cron job to get up-to-date analytics.

We should generate the file using the www-data user, the default web server user on Ubuntu since it can read the logs and write to the web root.

sudo crontab -e -u www-data

Add the following line to generate the file every 10 minutes. Change the cron expression as required.

*/10 * * * * goaccess -c /var/log/nginx/access.log -o /var/www/goaccess/index.html --log-format=COMBINED

Now, we can access our analytics from the goaccess path on our web root. Keep in mind, the larger the file, the longer it will take to process. For most sites, this should not be an issue as GoAccess is very fast.

Increasing the depth

On Ubuntu, the access logs are rotated daily. This doesn't provide a lot of depth for analytics. By configuring logrotate we can generate dashboards with a few days of history.

By default, the logrotate configuration at /etc/logrotate.d/nginx looks like this:

/var/log/nginx/*.log {
        daily # replace with weekly or monthly
        missingok
        rotate 2
        compress
        delaycompress
        notifempty
        create 0640 www-data adm
        sharedscripts
        prerotate
                if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
                        run-parts /etc/logrotate.d/httpd-prerotate; \
                fi \
        endscript
        postrotate
                invoke-rc.d nginx rotate >/dev/null 2>&1
        endscript
}

We can replace daily by monthly to generate a month's worth of history. Keep in mind, this affects all the NGINX log files in this case, you may want to add another block just for the access logs depending on the size of these files.

Keeping previous months of analytics

With logrotate, we can archive previous months analytics by making a copy of the latest HTML file. First, enable indexing on our GoAccess directory via the NGINX site config so that the files can be listed from the root.

location ^~ /goaccess {
    ...
    autoindex on;
    ...
}

Next, modify the command called by cron to generate a file called latest.html instead of index.html. This will allow us to list the files at the root instead of serving the index file.

*/10 * * * * goaccess -c /var/log/nginx/access.log -o /var/www/goaccess/latest.html --log-format=COMBINED

In our logrotate config, we can simply invoke a rotation of our GoAccess dashboard in the postrotate script. The resulting file will contain the month and the year in the name.

/var/log/nginx/*.log {
        ...
        postrotate
                invoke-rc.d nginx rotate >/dev/null 2>&1
                cp /var/www/goaccess/latest.html /var/www/goaccess/$(date +"%m-%Y" -d "1 day ago").html
        endscript
}

Conclusion

In conclusion, GoAccess is a great tool to harness the analytics data we already have. It requires no invasive JavaScript on the client side and does not slow the website down. It cannot compete with the feature richness of lets say Google Analytics, but it provides more than enough information for most websites without any user tracking.


Any comments or suggestions about this article? Feel free to contact me!

Latest posts

Managing secrets in Docker Compose and GitHub Actions deployments

When deploying Docker Compose applications, here's how you can manage secrets without embedding them in your containers

October 20, 2024
Deploy Docker Compose applications with zero downtime using GitHub Actions

This example demonstrates Blue-Green deployments using Docker Compose and GitHub Actions, deploy an app with zero downtime

July 21, 2024
Building and deploying a simple PHP application using GitHub Actions

GitHub Actions is a versatile CI/CD platform that can be used for free. Here's how to build and deploy a PHP application using Composer for dependencies.

December 12, 2022