“Only entropy comes easy,” said Anton Chekhov, who probably wasn’t thinking about this week’s widespread issues triggered by outages in a popular cloud services platform, but was still bang on the money.
Outages and service interruptions are inevitable. No system has perpetually perfect uptime. For customer service teams, outages are both a tough day at the office and an opportunity to stand out.
Several studies have shown that recovering well from a failure in service can lead to a higher customer satisfaction level than never having a failure at all — the “Service Recovery Paradox.”
Emily and I chat how customer support teams can prepare for and respond to inevitable service interruptions, like those caused by the Feb. 28, 2017 AWS outage.
Building a full incident handling plan is a big project, but you can quickly make a significant impact on the customer experience by focusing on communicating with customers.
The elements of great outage communication
During the stress of a major service outage, it’s easy to forget that your customers are often in an even more difficult situation. They are impacted by the outage, but they also have far less information about what is happening. In many cases, they also have their own customers that are asking them for answers.
By being an accurate, clear and timely source of information, you can reduce their stress significantly. Communication during an outage should:
Inform the customer: Let them know what is happening and what that means for them
Build their confidence: Let them know the situation is being taken seriously and actively worked on, so they can safely do other work in the meantime.
Make your communication accessible
Great communication starts by making sure your message can be received. Your artisanal, exquisitely handcrafted status message means nothing to people who never see it, so wherever you store your status updates, make sure your customers know where to look.
Link to your status page prominently in key locations like your contact us page, your support/operations Twitter account, and your help documentation.
During an incident, push out messages on your primary support channels, acknowledging the issue and linking people to the status page as the source of updates.
Keep your status page on separate infrastructure to minimize the risk of an incident taking down your service and status page at the same time.
12 guidelines for writing great status updates
In my own career, writing status updates during major incidents have been some of the most nerve-wracking moments. You’re working under pressure, often with limited information, writing to an audience of justifiably upset people.
Do your future self a favor and plan for surprises — think ahead about the most common types of outages, and come up with some sample updates as a base to work from.
Write versions that will fit into an email, a status update and even a tweet, and put them into your outage action plan, preferably in one of those glass boxes with a tiny hammer to break them out when an emergency strikes.
- 1. Acknowledge the issue
When you know a significant number of your customers are impacted, get an initial message out. Nothing shakes customer confidence like a status page that is showing “all good!” when major problems are occurring.
- 2. Empathize
Show some genuine understanding for your customers, who have been at best delayed and perhaps much more heavily affected. Avoid cliches like “we apologize for any inconvenience” and go for something more specific and honest.
- 3. Be clear on the scope of the outage
It’s not always possible, but the more clearly you can define who is being affected, and in what ways, the easier you make it for your customer to understand if what they are seeing is the same issue you’re reporting on. If it’s a particular area of an application, or a geographic location, share that information.
- 4. Focus on customer impact
Describe issues in the way the customer is affected, instead of the internal cause. So “customers are unable to pay for goods” is better than “our payment gateway is down.”
- 5. Give alternatives where possible
If there are workarounds or backup options available that will work in the meantime, make those known.
- 6. Don’t lay blame; take responsibility
You’re still responsible for your customer’s experience, even if the fault is with a third-party system you use (and sometimes you can even solve problems outside your domain).
- 7. But do give important context
Mentioning a third party can be useful information if it gives your customer a better picture of what’s happening and how that will affect them. “We’re in contact with our payment gateway, and once we know more from them we’ll update you here.”
- 8. Write to your audience’s technical level
Provide as much detail as will be helpful, but no more. Too much technical detail can be confusing and unhelpful if much of your audience won’t understand it.
- 9. Use consistent voice and tone
Communicating in a single voice makes the messaging clearer and builds up your customer’s confidence in you.
- 10. Don’t over-promise
It can be so tempting to say “we should be up in 5 minutes!” ... but outages can develop so quickly that it’s better to reserve specifics for when your technical team have triple-confirmed it.
- 11. You can add personality, carefully
You don’t have to turn into a corporate robot when things are going wrong. As long as you’ve got honest, clear communication covered, a little empathetic gif sharing can help you connect with your customers.
- 12. Follow up regularly
Even if you don’t have new information to share, consistently updating your messages helps those affected know that you’re still working on it, and they haven’t been forgotten. Pick a cadence, and stick to it, and don’t forget to sound the all clear once the situation is resolved.
Example: Outage update language
To get started, here’s an example of how you might pre-write some responses. Use this as a template for your own prepared responses.
Generic error messages throughout the app, some people able to use the app, others not. Widespread issues affecting people varied ways.
A good status update title:
“Some customers seeing error messages and unable to use your product.”
This describes the issue in the way your customer will experience it, and in the language they’d use if they contacted you.
A bad status update title:
“Database errors when connecting”
“Servers not available.”
These are too specific and require too much knowledge to be useful for most audiences.
Example status detail:
“Some customers are seeing intermittent error messages throughout their account. We’re aware of the issue and are working on it urgently. Incoming messages are being safely received and stored, but won’t show in your account until the problem is resolved. We recommend not sending any outgoing messages at the moment.
We’re really sorry to be holding you up today! Please know our engineering and operations teams are working hard to get everything up and running and we will update you right here in 15 minutes with the latest information.”
This covers what the customer is seeing, tells them what is affected, lets them know if they need to change their usage, and tells them when to expect the next update.
A few of our favorite outage tweets:
Writing empathetic, informative communications is tough. Writing them for public view in fewer than 140 characters is tougher. Hats off to the teams behind these tweets!
We will be up and running once the Internet is feeling better. Please sit tight! 👍 pic.twitter.com/PyVa90UxVz— GIPHY (@GIPHY) February 28, 2017
Outages are stressful for customers and for the teams supporting them, but having a plan and some thought-out sample language can help make things easier. Customers will always appreciate the companies who communicate clearly during outages, even when the problem itself isn’t easily fixed.
Keep the conversation going: If you’ve written some helpful explanations, or you’ve seen a great example of well-written incident language, please share it with us in the comments below!