Public Cloud Saves The Day – Not That You’d Know

September 2, 2012
Reddit Icon

Reddit weathered its largest ever load spike – with a little help from the cloud.

The hue and cry when public cloud services fail – which they do with some frequency – is almost deafening. What we rarely hear about is when the public cloud works the way we hope it will.

To be fair, when Amazon takes a hit, it is an important event that begs to be over-analyzed. After all, how could anyone ever expect to possibly survive a day without harvesting digital crops, streaming an episode of Friends, making a cellphone pic look like crap, or checking into the local dive bar?! Internets are serious business!!*

However, the punditry rarely give credit where credit is due – particularly in the case of public cloud uptime. So let me now give public cloud its due.

Last week,we saw a great example as the popular community/forum website Reddit.com used cloud computing to successfully weather (well, mostly) one of the biggest peak loads it has ever experienced.

When the Internet found out that US President, Barack Obama, was planning to host an ‘AMA’ (“Ask Me Anything”) on Reddit’s ‘IAmA‘ subreddit (short for “I am a …” – an online Q&A forum with specialists, celebrities, and more), the massive response was perhaps predictable (to a degree).

In the first five minutes, there were 37 comments. By the ten minute mark, redditors had made 278 comments. Within half an hour that number jumped to 5,266 and was over 10,000 by the end of the first hour.

There were, in total, almost 3 million page views for this thread on the day, generating an unprecedented 30% of all visitors to Reddit at its peak, and transferring 48 MB of data per second (only compressed text too – no movie files here) to the Internet – between five to ten times the normal traffic for “an extremely popular submission”. This was the most traffic the site has ever seen.

However, Reddit is hosted on AWS, including EC2, S3, and EBS, so despite a record day for pageviews, the site was able to remain (more or less) available throughout, because the admins at Reddit responded to this load spike with the on-demand elasticity and scalability of cloud computing:

In preparation for the IAMA, we initially added 30 dedicated servers (20%~ increase) just for the comment thread. This turned out not to be enough, so we added another 30 dedicated servers to the mix.

The new servers were completely dedicated to serving the AMA. The Reddit systems are architected for the cloud, and able to isolate some infrastructure to support specific service features. In addition, Reddit uses scripting to automate the provisioning process, “to automatically take a base image to a server running reddit [sic] code in a few minutes.” As a result of this cloudbursting capability (yeah, I said it – wanna fight about it!?), Reddit stayed up, and was able to handle a major load spike, the biggest in its history. Pretty impressive, right? Not that you’d know from the lack of coverage from the usual suspects.

Not to say that this was all fluffy unicorn rainbows. It was not.  I logged in several times only to get a Reddit timeout page, so the availability was still a little sketchy. According to Reddit, this was caused by the freakishly high bandwidth overwhelming the Reddit load balancers, compounded by an issue with how the registration service interacted with Reddit’s CDN provider, Akamai.

Notably, the President was not immune from this availability impact, despite having a dedicated server allocated to him. The admins eventually had to give him “access to an internal server that didn’t go through the load balancers” to make sure he could answer the questions streaming in, suggesting that Reddit still runs at least some of its own servers (just like Netflix does).

Yet, despite these small (and quickly resolved) issues, imagine this same scenario without elastic scalability of pooled resources, and you have a much worse outcome – almost certainly, the site would have been completely unavailable until the load spike subsided, and possibly longer.

Of course, this same scenario could have played out with a private cloud. there is nothing in this scenario that could not have been duplicated with an on-premise cloud model, assuming the additional 60 servers were available in – or could be freed up and moved to – the resource pool. However, in this case, it was Amazon’s public cloud service that saved the day. On this occasion, Amazon did not fall over for 2 days, taking with it dozens of web sites and services, and generating an outcry that could be heard on other planets – and in other galaxies. No, on this occasion, AWS definitely proved its worth, and proved to Reddit and its thousands of users (and more) the value of the public cloud.

To quote Reddit admin rram:

Everything we have runs on Amazon Web Services. Being in the cloud certainly helped us with quickly scaling.

It is just a pity we won’t see a hundred frothy news pieces on that.

p.s. if you want more technical details, you can check out the Reddit code on github, and the AMA that the Reddit admins themselves held a few months back. Because Reddit is sorta awesome that way 🙂


*Fair disclosure – I do not play Farmville, stream Friends from Netflix, post pictures to Instagram, or check-in with FourSquare. But I do Reddit, and I know it is serious business! 😉

Tags: , , , , , , , , , ,

5 Responses to Public Cloud Saves The Day – Not That You’d Know

  1. October 16, 2012 at 12:02

    I completely agree with your point — people are extremely quick to bash the public cloud. I think they forget the advantages of scalability (which you mentioned here) and reliability (as if private resources never crash?).

    The bias is not occasional — rather it occurs with great frequency (for example, remember the AWS outage back in late June and the accompanying outcry of “see, the public cloud is dangerous” which could have just as easily been “see, even if a natural disaster occurs the public cloud provides complete service restoration in under 24 hours with no clean-up or repair costs to users”).

    It is nice to hear the other side.

    • October 16, 2012 at 14:04

      Thanks Aaron, I appreciate your comment.

      I totally agree – there is so much bias FUD around public cloud, especially when there is a big event like the AWS outage. I think we simply need to be more pragmatic, choose the right provider for each service, understand the possibilities, and manage the risks. There is a role for public and private cloud – and even for “legacy” infrastructure too.

      In the end I believe IT will need a mix of ‘all of the above’.

  2. September 12, 2012 at 08:23

    I found this post very interesting mainly due to the fact it spoke of how public cloud was used to handle massive amounts of traffic in the form of people commenting on Reddit. Without public cloud it would certainly not have gone so smoothly. In fact cloud computing as a whole has made life much simpler for society in general. Cloud computing services can offer businesses various benefits, helping them to reach their potential and maintain their competitive advantage. It can also add a significant amount of value to a business which will of course benefit them greatly while benefiting public society in general.

    • September 12, 2012 at 17:11

      Thanks for the comment Vincent. I could not agree more – the ability for public cloud to improve society is immense, and goes way past keeping a web forum up or even promoting democratic discussion.

      In our CloudViews Unplugged video series my colleague George Watt and I investigate new & cool cloud stories, and have talked about how the cloud is helping to predict weather, explore distant planets, reduce greenhouse gases, discover new lifeforms, and yes, even cure cancer!

      What an amazing time to be in technology!

      Andi.