Salesforce was down.

Okay, not all of Salesforce — but enough of it and for long enough that everyone at least heard about the great NA14 failure from friends or newsfeeds, if they hadn’t experienced it directly in their work.

Do you know what you would do if your SaaS product went down? Do you have any protocol or processes in place if technical disaster hit? Your business may not be so big as to worry about press write-ups or internet memes about a product failure — and your failure may not garner its own hashtag — but the situation can be just as much of a crisis for you and your customers.

Such a scenario affects almost every team at the company, some more than others. And when all various functional teams have to jump into action to address the same problem in different ways, it’s truly a test of organization, communication, and efficiency.

To help with this, here’s a quick game plan for each arm of the business, with some extra tips about how your business can handle failure aftermath and future prevention.

First things first

Ideally your dev team should notice any service disruptions before you hear from your customers. This requires you to have the right monitoring in place, but that’s not always the case, so…

When customers report a service issue, check your product on various devices, browsers, and connections before doing anything else. See if the issue is isolated, if it’s local rather than global. Then proceed accordingly.

Tips around communication

Proactive. Proactive. Proactive.

Communicate early and often with both your customers and your teammates.

Take ownership.

Admit the failure. Don’t try to explain it away or pass off responsibility. Don’t layer your message with glossy jargon or marketing speak. That’s just more irritating to your customers, and this is a crucial moment to maintain their trust.

Honest and transparent.

This doesn’t mean give every detail. It just means be clear about the situation — your product is not working — and share what you can about what you do and don’t know.

Stay consistent.

The best way to retain your customers’ trust throughout the resolution process is to provide a clear, consistent message. What you’re saying on Twitter should align with what Customer Success says on the phone. Otherwise you risk appearing disorganized (or incompetent) in the face of customers who are already questioning your product.

Don’t give false hopes.

It’s best not to provide precise ETAs for fixes to avoid setting false expectations and worsening the customer experience. Instead stick to what you reasonably expect, such as “in the next few hours” or “by the end of the day.”

Game plans

CEO

Speak up.

A personal apology from the CEO can go a long way with customers. After all, the CEO is the face of the business. Salesforce CEO Marc Benioff responded to countless tweets about the NA14 failure, looping in his cofounder Parker Harris as well.

Salesforce product down

Put someone in charge.

Assign someone to lead all internal management of the issue. Communication and coordination runs through this one person, keeping the resolution process streamlined and less chaotic.

Side-story: The “Incident Commander”

There are more parallels between a server outage and a fire-based emergency than you think. Image source: victorops.com

Hi, Ed here. Let me tell you about Incident Commanders.

An Incident Commander is traditionally a role which is assigned in an emergency response situation (e.g. fire, natural disaster) to coordinate all aspects of operations. See Wikipedia
At a company I used to work at, we adapted the role for emergency response to issues with our software platform – a platform that was mission-critical in nature, with any down-time causing significant economic loss to the business and clients.
As soon as a problem had been identified and raised as such, an incident commander was assigned from a pool of volunteers by the CTO. From this point, the person became responsible for:

Documenting and reporting the problem accurately
Acting as the central point of communication for all internal teams involved in the issue
Giving updates to the management team on progress towards resolving the issue
Recruiting any people required to work on the solution

Having this central role owning all communication meant that the people involved in fixing the problem could focus on doing just that – fixing the problem. It also sped up important decisions, particularly when teams who don’t traditionally communicate closely were required to collaborate.

Customer Success and Support

Be at the ready.

Stay available for incoming messages from customers, with a full understanding of the nature of the problem.

Constantly align with Dev.

There should be constant updates between CS and Dev, like a feedback loop. CS needs to let Dev know what customers are experiencing and observing, and Dev needs to provide CS with possible workarounds, updates, and any other information to relay to antsy customers.

Help customers through it.

Suggest other things the customer can do in the meantime. If there’s any part of the product that still offers value, remind the customer how they can use it.

Sales

If Sales reps rely on your live product during demos or scheduled calls with potential customers, they will need to postpone these.

The sales team should be kept informed, but otherwise doesn’t play much of a role in managing the situation.

Marketing

Take to social media.

The marketing team should alert followers about the situation and also respond to every incoming message.

It’s probably better to use an “operations” Twitter to handle this, like we have at ChartMogul or like Salesforce has used throughout the NA14 crash. You don’t want to shout that your product is down — contain that message to your existing customers. Push it to another channel that’s more relevant, like this specific Twitter feed, and don’t interrupt your normal content.

Salesforce product down — Just a snapshot of the feed for @asksalesforce, Salesforce’s alternative Twitter account for support issues.

Help customers through it.

Just like customer support, relay any workarounds available. Suggest other things the customer can do in the meantime. If there’s any part of the product that still offers value, remind the customer how they can use it.

Pause paid advertising.

If the failure persists, it may be worth considering pausing any paid campaigns that direct people to your product. For example if there’s an issue with your sign-up page, you might want to hold off any efforts that drive people to that page.

Product

Identify and prioritize a fix.

Determine the best course of action and coordinate with the engineering team to execute it. Do you want to run with a quick hot fix, or develop something longer term? Do you want to completely change a feature, or should you rollback a feature that you recently deployed?

Lead further investigation.

Assess the damage from the product failure. Was any data lost? If so, what and how much?

Engineering

The team’s mission here is pretty obvious. Conducting triage to find the problem, working with Product to classify and prioritize it, and then actually executing the solution.

The Aftermath

Follow up with customers.

On both social media and with anyone who contacted the business via phone or email. Salesforce did a thorough job of this once NA14 was back up and running.

Salesforce attempted to satisfy some unhappy customers by following up with extra-mile customer service.

Salesforce product down

They even followed up with customers who did not even mention them directly but instead tweeted the hashtag, #NA14.

Salesforce product down

Figure out if there needs to be compensation.

This depends on the scenario and the damage caused, but might particularly come up in conversation if your company is B2B. Business customers have to deal with their own aftermath from any service interruptions their own customers experienced.

Share what you’ve learned.

Gather everyone who was involved with the service disruption to share what they learned from the experience. Come out with a clear set of actions that need to be taken to prevent it happening twice.

Prevention and Preparedness

Here are a few things you can put in place now:

1. Status page

Create a place on your site where people can check the service status of your product. First and foremost, a status page serves as a resource for customers. Your business appears professional and (at least somewhat) in control. Then, because customers can continuously check the site for updates instead of calling you, there is some relief for your customer support team.

Salesforce continuously redirected customers to their status page, which is branded as “Salesforce Trust”.

It greets you with an overview of service and security.

Salesforce product down
And the actual system status page provides a timeline bar of service updates, as well as a more technical breakdown of which exact segments are experiencing trouble.

2. Operations Twitter account

As mentioned above, an operations- or support-centric Twitter account can help you communicate during product emergencies. Let your customers know about this channel by advertising the handle in the following places:

In the bio of your official Twitter account
On your Support page
As a footnote on Customer Support tickets or emails
On your status page

3. A crisis manager, or your version of an Incident Commander

That’s the “disaster preparedness” plan we have to offer, inspired by the events at Salesforce. Have you been in this situation? Any tips or advice that we didn’t think of? Let us know in the comments!

Share and follow

NEW on @ChartMogul: What to do when your product is down — https://t.co/dodSlBk6Vx #NA14 #SaaS #SOS pic.twitter.com/a2Se1kYLUh

— ChartMogul (@ChartMogul) May 12, 2016

What to do when your SaaS product is down

First things first

Tips around communication

Proactive. Proactive. Proactive.

Take ownership.

Honest and transparent.

Stay consistent.

Don’t give false hopes.

Game plans

CEO

Speak up.

Put someone in charge.

Side-story: The “Incident Commander”

Customer Success and Support

Be at the ready.

Constantly align with Dev.

Help customers through it.

Sales

Marketing

Take to social media.

Help customers through it.

Pause paid advertising.

Product

Identify and prioritize a fix.

Lead further investigation.

Engineering

The Aftermath

Follow up with customers.

Figure out if there needs to be compensation.

Share what you’ve learned.

Prevention and Preparedness

1. Status page

2. Operations Twitter account

3. A crisis manager, or your version of an Incident Commander

Share and follow