A Launch Story

I started as a wide-ranging research project. Once it was usable, I showed a few people and posted comments on Reddit and Hacker News.

The reddit posts led to write-ups on The Next Web and LifeHacker article.

Since then, several people have expressed interest in how this went and what I’ve learned. In this post, I’ll show you the stats from the launch and screenshots of the tools I’ve been using.

Prior to Dec 3 there was very little usage, so the charts start there. The first chart shows the total traffic, since then:
Google Analytics traffic during launch

For scale, the peak usage was 3,371 sessions in one day, and the lowest is 294. A “page view” occurs every time someone selects a facet in the search UI. At the time of writing, my personal blog gets a little over 19,000 sessions a month, mostly from SEO, so I’m pleased with the results so far.

The next Google Analytics screenshot shows the breakdown by which site referred the traffic. The Next Web uses Facebook for commenting on their articles, which I suspect is the source of that traffic. Similarly, the traffic from Flipboard, Pocket, Tumblr, and Feedly are all effectively engineered by TNW.

You can also see here that someone submitted to Producthunt, and it didn’t do well. A comparable utility called Class Central did well there, but site stability issues hurt me (more on that later). I suspect that design plays a big role as well.

A newspaper in Chennai also wrote up, in Tamil - note the traffic from

Google Analytics - Traffic by Referrer

Google Analytics has a view that groups traffic by “channel”. I.e. from a search engine, a regular website, or a social media site. This is a crude metric, since they conveniently don’t recognize DuckDuckGo as a search engine. They also treat Youtube and StackOverflow as “social media.”

The interesting thing about this screenshot is that it shows receiving several hundred click-throughs from search engines during the period when it was most active. Prior to this, didn’t show up in any results. This made me wonder, what are these people searching for, and will it affect the rankings in Google?

Google Analytics - Channels view

It turns out that we can answer both questions. Google Webmaster Tools shows that most of these people were searching for “findlectures”:

Google Webmaster Tools - Screenshot

I’m more interested in how I rank for “find lectures”, because that’s something people would actually look for without knowing about the site. The top spot for that query is held by the Church of Christian Science, who deserve it, having sponsored public lectures since 1879.

To find out how this affected my ranking, Google Webmaster Tools lets you drill into the results for a term. Here we can see that ranking for this term did increase after a few days. Whether it will stay this way remains to be seen.

Search engine queries give you interesting insights into people’s minds - who searches for “bill clinton hobbies”?

Google Webmaster Tools - Screenshot

If you build something useful, ideally you naturally get links to what you’ve built. For instance, the next screenshot shows traffic I get from Stackoverflow over time, which follows the ideal dream, increasing slowly over several years:

Gary's blog - Stackoverflow

Google Analytics also has a “search” integration - if you tell it what URL parameter contains people’s queries, they give you a nice report.

From these queries, it seems that many people using are software developers. This is a good thing, because if these people like what you build, they’re likely to refer friends and family. For free applications, there are many successful products that help with recruiting (e.g. Stackoverflow, LinkedIn ads) or continuing education (e.g., books, app academies).

There are some interesting rarer searches - speaker names come up a lot (e.g. Ayn Rand, Trump), as well as topics around sports and cars. “Football” is an interest specialty category, because it’s impossible to match people with which of the two sports they want. When possible, I’m trying to hide highly nuanced cultural terms which could unfairly bias you for/against a speaker. In the most prominent instance, I’m hiding titles: “president” could be a university president or U.S. president, “bishop” is different in a Roman catholic church than Baptist, UK universities have many titles that mean nothing to me, etc.

Top Searches on

For a while the site crashed periodically so I monitored the real time view of Google analytics to see if it was running. This shows you you how many people are on the site at a given time. For several hours after the Lifehacker article was posted, there were over a hundred concurrent users.

In the long run, I’d like to encourage people to discover talks they wouldn’t normally find on their own (there are some great historical lectures).

Anecdotally, I noticed that a lot of people clicked into topics on religion, philosophy, and spirituality. I think this is an area where the application can be useful. Some of these lectures are especially difficult to tag, but I think that the library style categorization system really shines. A well-designed tagging system can offer a neutral judgement on the value of the content, and lets you listen to speakers who you might not approach in the real world.

Google Analytics Real Time View

The next chart shows the top ten videos, which are almost entirely software development topics. I included tech talks to help me track with the state of the art in the field. This is a big differentiation point to similar search engines, as they typically focus on full courses covering introductions to computer science. There is some guilt associated with not finishing a class, and introductory CS material is not that useful on it’s own.

Top Ten most clicked lectures

Google Analytics also has the ability to do custom reports if you send it data to their Javascript API. The free version “limits” you to 20 metrics and dimensions (basically rows/columns describing an event, like a search, click, etc).

I set up a report on what portion of search results had lectures that playable in the search results, and how many talks were by known authors.

Number of search results with books or inline player

The API for Google Analytics is so easy to use that unscrupulous people write scripts to send fraudulent referrer data to your analytics. These typically are lead pipelines to sites that helpfully offer you the opportunity to join a botnet. The most interesting recent example of this is “vote trump” spam, which still continues well after the election.

Consequently you should be careful about trusting the analytics from any site. This spam is also regressive - smaller sites will be off far more than larger ones.

Real user actions speak more clearly to usage - I’ve received a dozen emailed thank-you notes, one hand-written card, and five people have written in recommending video series (usually their own). Four or five people also wrote in to report stability issues during outages.

After doing a few in-person demos, I realized that for some people faceted navigation is overwhelming. Positioning this application as “search” is a little misleading, because results without queries are a intentially a high-quality random to aid discovery. More specifically, results try to give you an even distribution of non-technical topics, correct for gender, and filter out things that have major issues (too short, bad audio, lots of ums, etc)

I decided to try making an email list to help the people who found this overwhelming and send out the best videos we’ve found. For now, I’m using Drip, but considering migrating to AWeber.

I initially assumed that no one would sign up for this, but you can see that the percentage of people who sign up is quite high (about 10x higher than I anticipated).

Many lecture collections came from research that I and others Wingspan did, finding videos for our lunch and learn program. My sister and I also share a spreadsheet containing 176 video ratings, so I have enough good videos to populate this for some time.

Drip - Overview Screenshot

The “Next Web” article was much more positive than Lifehacker, so I suspect that is why their signups are so much higher.

It’s interesting that there are 0 signups from “Facebook”. I think this is a combination of those being people on phones, and being from the comments section of TNW. If this were from people writing a Facebook status that reference the site, it would be much higher (among the first 20 subscribers, about half are my friends and family).

Drip - Signups by referrer

I set the emails up to be like a “course”, so that I can see how a small number of people react to the videos I select, before a large group sees them. Ideally, the initial sequence encourages people to trust my judgement on video choices (assuming I have any).

I notice that there is significant psychological pressure to emailing 300 people at once, which I imagine will be much higher if this list grows.

Drip also tracks metrics on what people click on - currently it’s not enough to be interesting, but I do notice that this seems to encourage people to return to the site.

Drip - Email Sequence

Overall I was hurt by site stability issues - it always takes time to stabilize an application when you set it up for the first time. In this case, any time an invalid HTTP request came through the site would briefly crash. Sometimes, the site would come back up.

The most significant stability improvement came from migrating the site to Heroku, which took me just over an hour.

Currently, the Node.js server runs there. I also use Solr, which runs on the same server as my personal blog. This makes Solr effectively free (since I already pay Linode for my blog). Heroku and Linode are in different data centers in New Jersey, so the traffic between the two goes out over the public internet. This adds approximately 25 ms latency to every request.

There is a “solr” offering as an add-on to Heroku, but it requires your data to match a preset schema, and there is no way to upload your own data, as it’s designed to be an add-on to a Rails app.

One of the big selling points of Heroku is that you can drag a slider to choose how many nodes you want (see below). This means you get a free load balancer. They bill per node and per day - for a two server setup, this costs $50 / mo, but could go down to $7 on a low-RAM single node configuration. However, one can easily increase or decrease the cost per day based on anticipated usage, which would be very expensive to build yourself.

Heroku - server configuration

At one point I tried using a VM on DigitalOcean, but found that it would require a $20/mo server to have enough RAM for both Solr and the Node.js server.

One of the cool things you get on Heroku is application health reporting - here, you can see how much RAM the servers use under actual usage:

Heroku - metrics #1

You can also see throughput - there are options for alerting, although I haven’t explored that because I don’t want to get paged for this site :)

Heroku metrics #2

There are a ton of add-ons in the Heroku market. Below, you can see a splunk-style search engine for logs. Heroku doesn’t retain these for long, so this is necessary.

Heroku - Logging

I noticed that iPhones have a particular tendency to send tons of HTTP requests to test what features your site supports, one of which you can see in the log above.

If you watch the logs for a web app in real time, you will see all kinds of strange things come through. As a taste, Google Analytics has a report of what browsers hit the site - it’s more varied than I ever imagined:

Google Analytics report of browser

I set up an application called Bugsnag, which captures Javascript errors. Some of these browsers have unexpected Javascript errors. 3/4 of the way down the screenshot, there is an error parsing the documentation comments in Lodash, which would make the site unusable for those people.

Bugsnag screenshot

There is a risk of spending too much time thinking about metrics. For me, the ideal valuable outcome of watching a talk is that it sinks into my mind and changes what decisions I make in the future, but this is difficult to measure. For a site like this, it’s only successful in “traffic” measures if people remember and return, and for now, it’s too early to see how that will play out.