This is a technical article about Thread Reader, for more context see the whole series here

Hardware and Server Stack

I have my own server from kimsufi.com (Canada based). It is an Intel i5 at 2.8GHz with 16GB RAM and 2TB of hard drive, Network is 100 Mbps.

I use a very basic setup, with Apache2, PHP and MySQL under Ubuntu. It has memcached for some caching layer, even if most of the content is cached on the hard drive.

I use Sphinx search to power the “related thread” feature as well as the search feature (obviously).

So, yes it is PHP + basic CSS (bootstrap) + basic javascript (jquery).

Notable

  • I’m using the Silex PHP micro framework and some Symfony components
  • PHP sessions are in memcached
  • Apache is configured to cache everything very violently
  • All this is backed by Cloudflare caching

Server Metrics as May 2018

  • Average load is close to 0.3 and can go close to 1 when 800+ visits/minutes
  • The database is < 2Gb and the whole hard drive caching + code is about 25Gb
  • RAM is half used on a normal day, no swapping

Monitoring

To monitor the server I mostly use cron jobs that check for specific signals:

  • I check the size of the job queue
  • I check that the bot has said something in the last hour

If something is wrong I get an email, I also have a basic alert mechanism from Kimsufi when the server is down.

On my laptop, I have a term open on my server with a tail -f on logs (I use SSH + mosh).

Logs includes many things: payments hooks, apache errors, bot mentions and replies, job outputs and more.
Theses logs are also accessible on the web via an admin page I have access to.

In case of emergency

If something goes wrong about performance I use:

  • top to check CPU and RAM usage
  • iotop to check disk usage
  • iftop to check network usage

Most of the time it is not a performance issue but more a bug that cause some edge case.

For example last one was some thread that was unrolled and posted on Twitter, got something like 300 RT and then the author deleted the thread. So any visitor would arrive on the thread page but nothing was found on the database and so the bot tried to unroll it but it was not on Twitter anymore (times 1000 visits). Made a mess. Fixed now.

Web Admin interface

I made an admin interface with the minimum to do some monitoring about the service (not the server).

  • I can get the last unrolls and trending pages.
  • I can check user details, and upgrade to Premium
  • I can do some action on threads like force refresh for example.

Scripts and Crons

I have plenty of scripts and crons that runs on the server:

21 3 * * * /threadreader/cron/CheckUser.php > /logs/cron.log
45 4 * * * /threadreader/cron/CheckThread.php > /logs/cron.log
0 8 * * * /threadreader/cron/UsageReporting.php > /logs/cron.log
0 8 * * * /threadreader/cron/MoneyReporting.php > /logs/cron.log
0 8 2 * * /threadreader/cron/MoneyReporting.php --period=month > /logs/cron.log
17 */3 * * * /threadreader/cron/StaticTrending.php > /logs/cron.log
10 * * * * /threadreader/cron/Health.php > /logs/cron.log

I have two more cron: database backup and software updates running every day.

Then I have all these scripts that loops all the time with sometime a little bit of waiting sleep. These are started in a screen that is configured to auto start if the server reboots (never happened)

  • Check for bot mentions
  • Update existing threads
  • Check for deleted threads
  • Check for deleted users
  • Run jobs

Queue and job mechanism

Thread Reader needs a queuing mechanism for two reasons:
long processing jobs (like thread archiving) and relative scheduled actions (DM/emails alerts)

This one is so low tech that I’m almost ashamed to talk about but…

It is a MySQL table where I enqueue a command name with it’s arguments, I add the time when I want it to be executed and a return code to check later.
Then a script pulls entries one by one and execute them.

Backup and security

  • SSH via public key and no root
  • Every port closed (via ufw) except ssh and http/s
  • Apache and PHP are configured with paranoia in mind

Thread Reader is quite a special case about data as it does not has so much content, nor original content.
Still. A full dump of the database is saved every day (cron script), encrypted and sync to dropbox.com (true)
No other data are saved as it is only caching.

External Services and Tools

For thread reader I only use a few external services:

  • Mail Gun to send emails: I chose them because the registration was the most straight forward and the pricing competitive.
  • Stripe to allow people to pay by Credit Card: They are the best for developers for sure.
  • Paypal so people can py without giving any bank info: I is widely used even if it is a pain for developers.
  • Google analytics and Google AdSense for analytics and ads.

Support and Twitter ecosystem

I use services to keep in touch with my users on Twitter:
I did use Tweetdeck but I changed to (the ugly) Hootsuite recently as they allow to see your DM inbox separately of the outbox. This is very important because I use the same account for the support and the bot (it sends thousands of DM by day).

Most of the support is done by Twitter and email (inbox by Gmail setup with custom domain).

Development Software and Services

I have a private Github repository for the code, I use the issue tracking and the project view to keep my TODOs. I use github for deployment too, master is clean and get pulled on production every 5 minutes.

For big updates that need a database change I jump on the server and do the necessary manually. This is very rare as the service is quite well structured now.

To connect to my database I use Sequel Pro via SSH, it has some custom options to plays nicely with the database (like not requesting full blobs etc).

Conclusion

As you can see nothing fancy, it is good old technology.
Boring working stuff.

I made a bunch of shell scripts to interact with Twitter API, but in general I am not in favor of making my own tools.

What I like in this setup is the performance: I get to handle 1000 people/minutes without to worry about anything, I’m quite sure it can goes up to 5000 before showing any sign of slowness.

All this for US$30/month