1. Manage the client's expectations.
- Make sure the client knows from the start that critical incidents are natural.
- Disclose your backup plan and recovery process.
2. Assess the severity of the situation:
- What happened, who was affected and what's the impact of the issue?
3. Declare an incident:
- Does the issue have impact and complexity that requires a team effort?
4. Assign clear responsibilities for the team:
- Who will communicate the client?
- Who will fix the issue?
- Who will work on the restoration?
5. Have a transparency policy:
- Notify the client and take responsibility as a team.
6. Define a recovery and data restoration plan:
- Identify the bug causing the incident and issue a hotfix.
- Identify the latest backup with valid data.
- Define the time-frame not covered by the backup.
- Retrace the state of the system during the time-frame not covered by the backup.
- Write data restoration scripts.
- Specify all commands and steps required for the restoration.
7. Execute the plan while providing rapid status updates:
- Test the restoration locally.
- Backup the data.
- Restore the lost data.
- Identify the data that could not be recovered.
- Disclose the lost data to the client.
8. Write an incident postmortem with the team:
- What happened?
- Why the incident occurred?
- What was the resolution? And how effective?
- What would the team do differently?
- What problems did the team encounter?
- What actions will be taken to make sure the incident doesn’t happen again?
9. Update existing practices:
- Take time after the incident to read the postmortem and update what was necessary.