How to fix a website crash: 6 steps to restore & safeguard your site
Is your website crashing? Here are the 6 key steps you can take right now fix a website crash, limit the damage & prevent it from happening again.
*If your website is down right now because of overwhelming traffic, we can help. Implement Queue-it’s virtual waiting room in under 30 minutes to regain control of your site performance.*
---
Your website may be crashing, but know this: you're not alone. Even internet giants like Amazon, H&M, Target, Twitter, and Walmart have seen their websites crash.
Websites can crash for several reasons:
Error |
Example |
---|---|
1. Code errors |
A typo at Amazon took a backbone of the internet offline in 2017. |
2. Domain name system (DNS) provider failures |
A DDoS attack aimed at DNS provider Dyn cut off dozens of top websites in 2016. |
3. Web hosting provider issues |
Danish construction workers dug up a fiber cable to a server center on Black Friday 2018, causing over 2,000 websites to go dark. |
4. Malicious attacks |
The BBC website crashed for several hours due to a DDoS attack in 2015. |
5. Expired domain name |
The Dallas Cowboys forgot to renew their domain name, causing their website to crash on the same day they fired their head coach in 2010. |
6. Website traffic surges |
After Meghan Markle wore one of their dresses, women’s fashion retailer Goat experienced a sudden surge in traffic that crashed their site in 2018. |
But when your site is crashing, it’s little consolation to know that other sites do too. Time is money, and getting control of the situation is critical.
How long does it take to fix a crashed website?
How long a website crash takes to fix depends on the severity of the crash and why it occurred. But 65% of organizations report it takes at least an hour to fix a crash, while 29% say it can take more than three hours.
No time to waste, then. Let’s get into the 6 steps you need to take to understand why your website is down and how to get it back online quickly and safely.
Table of contents
Before the alarm bells go off, ensure there’s a problem with your website in the first place.
There are tons of free services out there to do this in a few clicks. Host Tracker’s tool is especially helpful as it returns status codes from all around the world.
A reported issue could always be an isolated case of poor internet connection. Or if your site was down briefly, the cache in a visitor’s browser could continue showing an error page, even if your site is back up and running.
Always verify the problem exists before trying to solve it.
Get peace of mind on your busiest days with your free guide to managing ecommerce traffic peaks
Get a preliminary overview of the situation. Which of the typical causes of website crashes listed above are most likely in your case?
If there is a risk of a data breach, or loss or corruption of data, take measures to mitigate that first.
How you do this depends on the type of attack and the system(s) affected. Usually you’ll want to isolate the system(s) accessed by the bad actors to prevent their attack from spreading. This could involve stopping the database or disconnecting breached user accounts.
After you contain the attack, you’ll need to eradicate it. Again, this depends on the type of attack, but it could involve blocking certain IP addresses or deleting affected files and restoring them from a backup.
If your website has experienced a data breach, you’ll want to follow steps specific to breaches to perform extra cleanup and stakeholder management.
RELATED: The Cost of Downtime: IT Outages, Brownouts & Your Bottom Line
Now it’s time to enact your escalation plan and notifying the responsible contact people.
If you’re an online retailer, for example, you’d want to notify your internal IT, digital, and marketing departments, as well as contacting your hosting provider and any applicable consultants or digital agencies you work with.
Quick, clear communication can make a world of difference in mitigating your crashing website.
Even if your website is crashing, there are several concrete steps you can take to give visitors a better experience.
For example, you could redirect to a landing page that provides relevant information and keeps visitors feeling like they’re still in your ecosystem (cute dogs certainly help alleviate some stress, too).
Another option is to move visitors to a virtual waiting room. There you can send real-time updates for visitors in the virtual waiting room while you fix the website issues.
Once your website comes back online, you can provide transparent wait time information as visitors return to your site in a controlled, first-in-first-out manner.
If traffic peaks were the root cause of your crash, this could actually be the solution to your worries instead of a temporary band-aid.
“With Queue-it, we could communicate with customers and give them assurance. And when their turn came, they could enter the online store and continue the shopping smoothly ... We were struggling to find a solution to improve UX, and Queue-it was the match we were looking for.”
ATSUMI MURAKAMI, CHIEF OF INNOVATION
Studies on the service recovery paradox have shown that recovering well from a failure has the potential to generate to higher satisfaction than never failing to begin with.
But to do so, effective initial, ongoing, and post-downtime communication is absolutely essential. Clearly setting expectations, showing empathy for customers, and demonstrating earnest effort to fix the problem all factor into how customers perceive you handled the situation.
Hopefully you have a template prepared for such situations. But if not, there are fantastic resources that outline guidelines for great status updates. Remember that your goals in the communication are to inform your customers and build their confidence in you.
How should you communicate with your visitors? Here are the two main channels:
Have a status page that everyone can access. It doesn’t help if you’re pushing out communications on a page that no one can see. That’s why it makes sense to host your status page on separate infrastructure.
Queue-it's status page shows the availability of our services and website.
For example, during an August 2020 outage of G Suite products like Gmail, Google used its status page to keep users around the world informed in a centralized, controlled way.
Leverage your social media accounts to spread your outage communication, linking to your status page when applicable.
If you’re able to serve customers by phone, by email, or in-store, remind them of those opportunities. If your resources allow, use time to field and respond to complaints and questions on social media channels and email.
There’s no point in sending visitors to your site when it’s down. Your marketing team will need to be aware of the outage so they can pause any marketing campaigns (this again highlights why internal escalation plans are so important).
It could be they have a huge email or social media promotion planned that would just leave customers frustrated. What’s more, pausing paid ad campaigns ensures you’re not paying for ads that have no chance for ROI.
RELATED: Everything You Need To Know About Website Crashes [+Examples]
Your team will need to diagnose and treat the root cause of the website crash. Is it a code conflict with a new plugin you added to your site? Has traffic overwhelmed your payment and inventory bottlenecks, causing a cascading failure? If you have monitoring or logging set up, you’ll already be a step ahead in identifying the issue.
There’s no way to outline here exactly what steps you need to take, as that depends on many variables including your type of company, the root cause of the problem, your infrastructure setup, and what internal resources you have available. But do remember to continuously update your customer base using the channels outlined earlier.
Once your team has identified and resolved the issue(s) and your site is back up and running (congrats!), you’ll need to share the good news.
RELATED: The 8 Reasons Why Websites Crash & 20+ Site Crash Examples
You’re ready to communicate that your website is back up and running. But first, check a few things, especially if your website crashed because of overwhelming traffic.
If you’re using a CDN (if not, you really should), its cache normally removes a lot of strain from your web servers when people visitor your site. When your system fails, this cache can be cleared.
What happens then when the site returns online? It will crash again. Visitors hit the site while no content is cached, and everything has to load from databases and render at the same time. So, pre-load your cache before the system goes back online, if possible. Implementing a virtual waiting room is another way to ensure traffic remains under your website’s thresholds.
“Queue-it’s virtual waiting room reacts instantaneously to our peaks before they impact the site experience. It lets us avoid creating a bunch of machines just to handle a 3-minute traffic peak, which saves us time and money.”
THIBAUD SIMOND, INFRASTRUCTURE MANAGER
After you communicate your website is up and running, you should write a post-mortem statement explaining what went wrong and apologizing to your customers. This statement shouldn’t shift blame or beat around the bush. It should get straight to the point.
Atlassian recommends using the following outline:
- Acknowledge the problem, empathize with those affected and apologize
- Explain what went wrong and why
- Explain what was done to fix the incident and what was done to prevent repeat incidents
- Acknowledge, empathize, and apologize once again
Remember, even the biggest companies have outages. If you handle the situation well, you’ll be able to bolster and regain the trust of your customers.
The 6 steps to handling a website crash are analogous to handling a person in medical distress. If you encountered someone having a medical emergency, you'd:
- Check to see if the person is ok (check your website is actually crashed)
- Look for the problem and ensure it doesn't get worse (check for safety issues)
- Call emergency services for help (implement escalation plans)
- Perform first-aid, such as CPR (limit the damage)
- Let medical staff take over and provide treatment (resolve the issue)
- Tell the patient's loved ones what happened and that everything is ok (communicate the fix)
Now what should the patient do once the medical emergency is behind them?
They should take steps to prevent it from happening again. If your site crashed due to high traffic, for example, you should implement a virtual waiting room to protect it from usage spikes and ensure it doesn’t crash again.
If you’re looking to better understand and prevent future website crashes, here are three valuable posts just for you:
- How High Online Traffic Can Crash Your Website
- How to Prevent Website Crashes in 10 Simple Steps
- 11 Essential Steps to Build in Website Performance
(This post has been updated since it was originally written in 2019.)