Tracking user behavior via funnels

In the previous post I explained a bit about my side project. I want to help people avoid surprising expirations of their SSL certificates. While at it, I hope to learn a bit about building a helpful website, and perhaps more.

A quick recap — I created a website that tells you when will the SSL certificate of your server expire. I also present the user with a bunch of relevant hostnames for casual browsing within the website, checking on other hostnames that may be of interest. No ads, no subscriptions, no sign-in, no emails. Just one thing — tell me what hostname you are after, I’ll tell you when its SSL certificate is going to expire. Simple!

Just a bit more ‘building’…

Screenshot of www.haveibeenexpired.com/ssl/adrukh.medium.com, showing the SSL cert expiration and the domain expiration details.
Screenshot of www.haveibeenexpired.com/ssl/adrukh.medium.com, showing the SSL cert expiration and the domain expiration details.
That’s what you get on haveibeenexpired.com today

A few more technical bits to complete, just because I have my habits:

  • Logging. I’m using Papertrail as a Heroku addon, and dump a single line of log for every request served by my app. Making note of the User-Agent string, the URL that was requested, and some data on the response being sent — all goes in the log. This is a very blunt way to see what the app is handling, but I fully control the detail resolution, noticing whatever errors occur and so on.
  • Uptime. I’m signing up with updown.io and have their service poke my app once a minute so I can rest assured it is up and running. I expose a specific endpoint for this check, so that it doesn’t mess up my logging, and doesn’t masquerade as a user trying to get some value from my app. The free tier of updown.io can suffice for several apps at a high monitoring frequency, check them out!
  • Security. I’m adding helmet to my app to gain a bunch of security best-practices for a web app. CSP is driving me up the wall a bit, but I get around it with some more time. Security is table-stakes, and gets harder to do well the longer you wait!

Enough building, let’s see what the users think

My logs are showing me what people are searching for, and I notice one small thing. People don’t always act the way they are asked to! I know, big surprise here :) Specifically, some users would paste a whole URL (https://some.website.com/with/path) in the text field clearly marked with a ‘Hostname’ label. The nerve! :)

Noticing this, I have a ‘should have thought of that!’ moment, and implement some more delicate parsing of user input. After all, it’s better to assume the user is interested in the hostname of the URL they pasted, rather than telling the user ‘this is not what I asked for’. The difference between a consumer and an enterprise product, perhaps? Up to you to decide.

I stall for a minute, thinking about other not-exactly-correct types of input that I can expect to see. For example, the app doesn’t support plain IP addresses, while these can be relevant to my users. I stop right there, deciding to act on actual signal (seeing IP addresses being used) rather than future-proofing the app for every use-case.

Paying tribute to the SEO gods

I then create a long-ish sitemap based on this guide, populating it with a few thousands of pages, each with a different hostname. I submit it to Google Search Console, and… wait for the crawlers to come! Apparently, Google have a lot of work to do besides crawling my website, and here is the rate at which they go through my sitemap:

A graph of pages on my site indexed by Google
A graph of pages on my site indexed by Google
Crawling sitemap pages

You can see that they crawl some 1K pages once every 3–4 days.

You may wonder what the 21 errors are about! Here’s another piece of learning for me to share with you. I was following my past experience, thinking that if a request was made with a valid hostname, but my app failed to figure out the SSL certificate details, an error message presented on the page should be served with an HTTP 500 response status. While this makes sense in many cases, it glows in red on the Google Search Console, and probably downgrades my app’s search rating. I used this indication to step away from my past practice, and now every successfully rendered page is served with HTTP 200. So no more such errors, and hopefully Google will re-scan the errored pages and give a 0 errors grade sometime soon!

Analytics and funnels

Heap Analytics help me think differently about my app, and I come up with a few funnels to help me see what users come for, versus what they actually do on my site.

There are two main entry points to the app — either / for the homepage (i.e. a user searched for terms relevant to my app and clicked the link), or /ssl/some.host.name (i.e. a user clicked a link I posted to warn a major brand of an impending SSL certificate of domain registration expiration). It’s reasonable to assume users of both types will be different in their expectations. For example, the first type would be more likely to have their own website in mind, for which they want to know when will their SSL certificate expire. The second type would probably be after browsing a few websites, more popular or less.

Here are the funnels I created:

Funnels rendered by HeapAnalytics, tracking user behavior on my site
Funnels rendered by HeapAnalytics, tracking user behavior on my site
Tracking user behavior on a site with basically one page

While this is not exactly professional on my behalf, I’m surprised at the level of detail available when tracking user behavior on a site with basically a single page. This complexity is most likely working against me, but I’m learning!

So my idea is to see how ‘deep’ would users go, checking one hostname after the other. Say a user lands on a page of a specific hostname following a link I posted on Twitter. How many such users will click on a related link? How many will visit a page for a different hostname? And again? You get the point.

While at it, I also want to compare the difference between the two types of users I described above — those starting from the homepage versus those starting from a results page of a specific hostname.

I don’t have any expectations for what ‘good’ or ‘bad’ looks like. Instead, I want to see what the current user behavior is, and call it ‘baseline’. Then, any change I make can be measured in impact compared to the baseline.

Main takeaways for the current baseline:

  • A user landing on the homepage is 55% likely to check on a specific hostname
  • A user landing on a specific hostname check is 32% likely to check on another hostname
  • Both types of users are almost equally likely to ‘click through’ 4 additional hostname pages — 9% of the users go that far
  • The means I took to make it easier to stay on the site (present users with popular or somehow related hostnames for single-click navigation) are very negligible in contribution — between 2% and 5% of users actually click on those

So the interesting part now is to decide what ‘needle’ I want to move upwards, and come up with an experiment or two to see what I can do. This is on my todo list, please comment if you have suggestions, and you’ll appear in the next blog post!

Automating the reach-out function

To support this without much hassle, I implemented two API endpoints in my app, coupled with a list of relevant hostnames, and I crawl them periodically. The API is not sufficiently stable for me to promote it as part of the service, but I’ll leave a couple of breadcrumbs here should you want to experiment with them by yourself. Just try https://www.haveibeenexpired.com/api/ssl/medium.com or https://www.haveibeenexpired.com/api/domain/medium.com and take it from there. Not promising anything stable about these endpoints, buyer beware :)

I’m now scraping my own app using these endpoints every other day or so, generating CSV files that I later process manually with the help of Google Spreadsheets. Interesting pieces of trivia come from these exercises, as well as the ability to alert certain brands to the upcoming expiration of their assets.

An interesting fact in this is my ability to extend the list of relevant hostnames for crawling by relying on the additional hostnames data stored in most of the SSL certificates that I find. Starting with around 6K hostnames, I am now scraping through 72K! I’m sure there’s more interesting data to extract here…

I’m enjoying the manual work around this, not in a rush to fully automate it yet. I learn a lot from looking at the data, and I feel like there’s more value to extract from this scraping before I settle on something sufficient to be automated.

What did I learn?

  • Setting up the tech bits that give me peace of mind (monitoring, security) was a nice touch to leave the building stage aside for now. It helped me ‘turn the page’ in my mind to seeking user value with the existing offering. Don’t be afraid to ‘build just one more thing’ as long as you can see how it allows you to move forward to the next goal.
  • Seek out actions in how your users interact with your app! Finding that bit where someone used a URL in the field where I expected hostnames has gained me a few more points, instead of users giving up because the app didn’t understand what they wanted.
  • Analytics is a big deal! Many solutions, many different use-cases. Comparing the different alternatives out there was educational — I learnt what I can expect from analytics solutions. Having a reliable comparison benchmark in my own logs helped validate the analytics data I was seeing. Still a lot of ground for me to cover here, I’m sure I’m making rookie mistakes.

That’s it for now, thanks for reading through it all! Let’s see if I can keep a cadence of posting an update every other week :)

Mentoring VP Engineering folks. Previously VP Engineering and GM Israel at Snyk.