09.28.2022 - GovFTP Outage

11:15 AM EST 10.04.2022 - (RCA Update) The failed hardware is in transit to the vendors lab for investigation. Once received, the vendor estimates 2 - 3 weeks to perform tests and review results. The RCA should be available shortly after.

9:10 PM EST FTP Today update - All sites restored to full service.

 

8:30 PM EST Engineers are working to bring services online in a controlled manner to ensure data protection and consistency. The process will take anywhere from 40 minutes to 2 hours. Engineers will being workloads online as fast as can be done safely.

 

7:22 PM EST All recovery scripts have completed and the engineering teams are stepping through the startup processes manually on the storage system to ensure no errors are encountered. We estimate systems will begin coming back online within the next 15-20 minutes.

 

6:52 PM EST Databank update - The recovery scripts are currently running against the stable nodes to confirm data integrity and that all infrastructure errors have been addressed. The recovery scripts will need to run for approximately 20 more minutes before we can begin bringing the storage systems back online, assuming there are no problems found upon recovery script completion. We are currently estimating a 17:50 (Central) ETA to begin bringing the storage systems back online, assuming no errors are found that have to be addressed first.

 

6:10 PM EST Databank update - The vendor engineering and development teams are finalizing the adjustment of recovery scripts to bring the system back up in a redundant, but degraded state. There is 1 node that is suspect and continues to display potential errors that could have a negative impact on recovery. The remaining stable nodes are projected to begin their startup processes in the next 20-25 minutes. Once the stable nodes are back online, we will work with you to verify availability and stability of your environment while our engineering teams work to recover the remaining, offline node.

 

5:19 PM EST Databank update - Databank engineering teams have identified and replaced the failed components. The teams are currently working through the logs, monitoring and errors that have been capture to ensure that when the system is brought back online risk of data loss or corruption is minimized. This process could take additional time but will help limit the long term impact of data loss if the system is brought online in an error state.

 

4:59 PM EST Databank update - Databank and vendor engineers are working to identify and resolve an issue with the storage system. Engineers were on-site to perform a routine replacement of a failed drive, upon completion of the work an unknown error condition occurred and has caused the entire storage system to become unavailable.

 

4:23 PM EST The data center has acknowledged it is currently experiencing an outage. I am waiting on an update to detail what the issue is and eta on restoration of services.

 

4:22 PM EST We have opened a case with Databank for a critical outage.

 

4:11 PM EST We currently investigating connectivity to our Databank data center.