Back in January I launched an open source product. It’s called Counterscale, and it’s a web analytics service to help you understand your website traffic.

Feature-wise, Counterscale isn’t that impressive. It does considerably less than Google Analytics or other commercial offerings. But what’s different about Counterscale – and why it may be of interest to you – is that it isn’t a managed SaaS product. It’s open source software purpose-built to self-host on Cloudflare’s developer cloud, in such a way that anyone can do it.

Deploying Counterscale is trivial. There’s no database migrations to manage, and no Docker container you need to deploy on a VPS. It’s run entirely inside a single Cloudflare Worker, which is deployed using a single terminal command. And because of how it’s architected on Cloudflare’s cloud, you don’t have to monitor its load or worry about whether the database will fall over. You can deploy it once and possibly never have to tinker with it again.

On top of all of this, you can run Counterscale and record over a million page views a month on as many properties as you want for $0 a month.

How is this even possible?

Built on Workers and Analytics Engine

Counterscale is deployed as a single Cloudflare Worker, which performs 3 different operations:

  1. It serves a JavaScript reporting snippet that you load on your website using a <script> include. This snippet records basic information about the website visitor, like the timestamp they visited, their browser name, operating system, etc., and transmits that data over HTTPS to the same Worker’s reporting endpoint (NOTE: it does not use cookies or store any IPs)

  2. It hosts a reporting endpoint (/collect), which ingests analytics data emitted from the snippet, processes it, and writes that data to the database – a new-ish Cloudflare product called Workers Analytics Engine (more on this below)

  3. Last, it hosts an entire dashboard UI (hosted at the root path) written in Remix that reads back that visitor data from Analytics Engine and assembles it into human-readable charts, tables, etc.

Workers Analytics Engine is really the heart of what makes Counterscale work. Here’s how Cloudflare describes it:

Workers Analytics Engine provides unlimited-cardinality analytics at scale, via a built-in API to write data points from Workers, and a SQL API to query that data.

Analytics Engine isn’t just a database. It’s a bunch of managed services Cloudflare has built on top of Clickhouse that allow you to write and query analytics-style data at tremendous scale.

You use Analytics Engine by calling a single magic function exposed to Workers called writeDataPoint. This function writes to a single table with predefined columns of blob (string) and double (number) types, where the order of the parameters dictates which column the data is stored in.

Here’s an example of calling writeDataPoint from Counterscale’s source code:

analyticsEngine.writeDataPoint({
  indexes: [data.siteId || ""], // index based on site id
  blobs: [
    data.host || "", // blob1
    data.userAgent || "", // blob2
    data.path || "", // blob3
    data.country || "", // blob4
    data.referrer || "", // blob5
    data.browserName || "", // blob6
    data.deviceModel || "", // blob7
    data.siteId || "", // blob8
  ],
  doubles: [
    data.newVisitor || 0, // double1
    data.newSession || 0, // double2
  ],
};);

This fixed schema is awkward but has a benefit in that there are no migrations to manage; the table and columns are there from the beginning.

Reading back the data is done by using a SQL-like API accessed over HTTP. For example, here’s how you can retrieve the number of hits grouped by path (blob3 in the writeDataPoint snippet above) in the last 24 hours using cURL:

curl -X \
   POST "https://api.cloudflare.com/client/v4/accounts/<account_id>/analytics_engine/sql" \
   -H "Authorization: Bearer <api_token>" \
   -d "\SELECT COUNT(),
          blob3 as path
        FROM metricsDataset
        WHERE timestamp > NOW() - INTERVAL '1'day
        GROUP by path"

You can perform some basic grouping and aggregation, but it’s not a fully featured SQL API and not all the queries you’re used to are supported. On the other hand, the SQL API does expose some unique Clickhouse functions meant to make working with time-series data easier, like toStartOfInterval. Counterscale makes heavy use of these functions.

Analytics Engine has some other limitations, most notably a 90 day retention window. But in exchange for these limitations, you get a basically free-to-run simple analytics service whose data you control end-to-end. I think that’s more than a fair deal.

So wait, it’s free?

For 99% of hypothetical users – probably.

As of June 2024, Cloudflare’s free plan allows for 100k Worker requests and 100k Analytics Engine writes per day.

Each page view invokes a Worker twice: once to serve the analytics script, and once to ingest the ensuing analytics event. This is followed by a single Analytics Engine write. So you could hypothetically handle tracking 50k page views per day on just Cloudflare’s free plan. (Visiting the dashboard also causes Worker requests and Analytics Engine reads, but unless you’re hitting the dashboard thousands of times per day, typical usage shouldn’t matter.)

To put this in perspective, Counterscale has been mentioned in several popular social posts, appeared in multiple high-circulation JavaScript newsletters, was mentioned on Cloudflare’s blog, and appeared on episode 761 of Syntax.fm. It has yet to eclipse 20k page views in a month.

Should I actually use this?

Right now Counterscale has pretty limited functionality, it only has 90 day retention, etc. While I think it’s operationally sound – I haven’t had a hiccup since January, even with intentional load testing – I don’t know that it should be used for anything besides hobby projects, personal blogs, or indie marketing websites. It still has a ways to go before it’s ready for serious commercial use.

But for my purposes, Counterscale has been successful. I now have decent observability into my website traffic, and I’ve paid basically zero for that privilege (not accounting for my time, of course). I also got to learn some light Remix and Tailwind in building the dashboard pages, and that’s worth something to me too.

The New Self-Hosted

Counterscale, to me, is an interesting illustration of how infrastructure products are evolving. Cloud providers aren’t just providing basic primitives for you to build on; they’re offering increasingly specialized stacks designed for specific use cases.

I guess you can think of this as “late stage” cloud computing. We went from “the cloud” letting you run VMs, to running containers, to running serverless code, and more recently even “serverless” databases. And now you can interop with a single black box cloud service that manages a queueing system and columnar database, with writes taking place at the edge (via Workers), and later retrieval from via SQL. And all of it is available to use with commodity pricing.

IMO, these new cloud primitives are making it possible for typical software developers like myself to build increasingly elaborate web software without deep infrastructure expertise. And by the same token, it’s making it possible for average developers to self-host infrastructure-demanding OSS with some basic cloud admin and “npm run deploy”.

Imagine a future where self-hosting OSS products on the cloud was as simple as running desktop OSS programs like VLC and Blender: something you install once and update periodically, perhaps even automatically.

Dare to dream?