What happened
Between March 17th, 5:48 AM UTC and March 18th, 6:56 PM UTC, the Asset Manager experienced an indexing outage. During this period, newly uploaded assets and a very small percentage of pre-existing assets were not appearing in the Asset Manager UI. These assets were still successfully uploaded to the Origin and were accessible via the Rendering API.
How it happened
An unannounced maintenance from our upstream provider caused a node failure in our Asset Manager infrastructure, resulting in the temporary loss of indexed asset data.
What went wrong
Several things went wrong during this incident:
- Our service provider failed to notify us of the maintenance window
- Our Asset Manager infrastructure was not adequately provisioned to tolerate failures during the maintenance
- A large amount of data was required to be restored, resulting in a severely prolonged restoration process
What we are doing to address this incident
- We have migrated and optimized infrastructure configurations to better tolerate node failures
- We created additional backups and introduced additional indexing layers to improve redundancy and resilience
- We are evaluating alternative upstream providers to reduce dependency risks