95% Uptime
As we progress further into the whole Web 2.0 concept, an increasing level of importance will be placed on uptime. I thought the best way to start was to define "uptime". Google offers this:
"The amount of time a web site is available. The industry benchmark at this point in time for availability is 99.99%." - Google Definition
The question that nags me a little, is does 99.99% make a good benchmark? I don’t believe it does; it’s more of a stretch target. Is it feasible that applications running on servers, which are sometimes extremely far away from their users, are up for nearly 100% of the time?
I still feel the idea of executing full-blown applications over the Web is still a relatively "new" concept. If the industry benchmark is currently 99.9%, how long until it reaches 100%? It can’t be too far off…and when it hits, some SLA’s are going to cause CEO’s to lose many nights of sleep.
There is an article on C|Net about how Salesforce.com suffered its longest outage ever. It was for 7.5 hours over the course of a day, so almost an entire business day. Will this affect SLA’s? Not if Salesforce.com is up from now until the end of the month with only regularly scheduled outages. So, therein lies the ability to achieve this seemingly impossible 99.9% dream. It’s based on aggregate numbers. There are 8,760 hours in a year. To go above an beyond a common, "industry standard" SLA, a Web-based service would have to be up for the entire 8,760 hours. However, I’d say it’s a good guess to assume that most will schedule roughly 2-3 hours per month (in non-peak times) to do maintenance. That leaves 8,736 hours in a year. If we knock that back to 99.9% of the time, we’re left with 8,727.264 hours.
How does this work for Salesforce.com? Well, they proclaim on their site they live up to the 99.9% figure:
"In the year 2000, salesforce.com achieved a practically unheard-of 99.9% scheduled uptime level." - Salesforce.com Quote
Let’s work with that 7.5 hour outage they had today. If we remove that from the 8,727.264 figure, we’re left with 8,719.764. We’re left with 99.914% uptime, which is still pretty damn awesome in my book.
If you are still with me, you are a saint in my book. Thank you. Do I have a point? I hope so. It is simply that I don’t want exciting Web-based services like:
Get hit in the press when they go down. It can be argued both ways: it spins a buying decision positively, or it spins a buying decision negatively. Look at Salesforce.com. They have an awesome reputation in the Industry and are used by a ton of people everyday. Did it piss folks off they couldn’t get access to their leads and other data for over 7 hours? Of course. Does it mean that a business is going to make a huge investment and switch over to another service? I don’t think so. At the end of the day, it will only help a company like Salesforce.com as it brings generates added buzz.
What if Basecamp started getting stories written about the downtime it was experiencing because of a good thing (increased business / usage / etc…), and they lose hard-earned revenue because of it? This is why I think having 99.99% as the industry target is somewhat of a lark. It should be more like 95%, since that’s when the actual hours start to show impact. It would take an outage of over 700 hours to knock Salesforce.com down to that level. Since 99.99% will always just "be there", floating in the consumer’s collective conscience, it’s easy to set 95% as more realistic. If a company hovers between 97 and 98%, that’s awesome. If they are running down around 80%, you know there may be some issues to look into further.
At the end of the day, there are much more important things that a business can be judged on other than its uptime. Look, hardware is awesome nowadays, the coding is getting better by the line, and the means in which it can be delivered and charged for are increasingly complex and creative. The right fit needs to be the focus and less on "99.99%". That’s just my opinion - but I don’t really know much.