[{"data":1,"prerenderedAt":42},["Reactive",2],{"featurePages":3,"blogPosts-incident-report-february-4":4},[],{"slug":5,"id":6,"uuid":7,"title":8,"html":9,"comment_id":6,"feature_image":10,"featured":11,"visibility":12,"created_at":13,"updated_at":14,"published_at":15,"custom_excerpt":16,"codeinjection_head":16,"codeinjection_foot":16,"custom_template":16,"canonical_url":16,"authors":17,"tags":26,"primary_author":36,"primary_tag":37,"url":38,"excerpt":39,"reading_time":40,"access":41,"comments":11,"og_image":16,"og_title":16,"og_description":16,"twitter_image":16,"twitter_title":16,"twitter_description":16,"meta_title":16,"meta_description":16,"email_subject":16,"frontmatter":16,"feature_image_alt":16,"feature_image_caption":16},"incident-report-february-4","698dc19b4a1c2004e083a938","d73b3273-9661-4d3a-8690-8926f878ddcf","Incident Report - February 4, 2026","\u003Cp>Chatwoot Cloud experienced an incident on February 4 which lasted approximately 12 minutes, from 12:48pm to 01:00pm UTC. During this time, all Chatwoot Cloud users were unable to access the platform. No data was lost during this incident.\u003C/p>\u003Cp>Our sincerest apologies for the disruption. Reliability is the top priority for us at Chatwoot. We have identified the risks and have taken steps to mitigate such events in future.\u003C/p>\u003Cp>\u003Cstrong>Timeline\u003C/strong>\u003C/p>\u003Cp>February 4, 2026\u003C/p>\u003Cul>\u003Cli>12:43 PM: Database instability began, connections started failing\u003C/li>\u003Cli>12:48 PM: Service disruption began, team started investigating\u003C/li>\u003Cli>12:58 PM: Root cause identified as storage exhaustion and Storage capacity increase initiated\u003C/li>\u003Cli>1:00 PM: Storage scaling completed, service fully restored\u003C/li>\u003C/ul>\u003Cp>\u003Cem>All times are in Coordinated Universal Time (UTC)\u003C/em>\u003C/p>\u003Cp>\u003Cstrong>What happened\u003C/strong>\u003C/p>\u003Cp>We were preparing a PostgreSQL version upgrade using AWS RDS blue-green deployments. The deployment failed, but it remained in a pending state and was not cleaned up.\u003C/p>\u003Cp>RDS blue-green deployments rely on logical replication. When the failed deployment was left behind, it retained a replication slot on the primary database. That replication slot prevented PostgreSQL from recycling write-ahead log (WAL) files.\u003C/p>\u003Cp>As a result, WAL files continued accumulating over three days. We had roughly 1 TB of actual data, but an additional ~1 TB of WAL built up, pushing us to our 2 TB storage autoscaling limit.\u003C/p>\u003Cp>Once storage was fully exhausted, the database stopped accepting connections, which caused the service disruption. After identifying the root cause, we immediately increased the storage capacity and restored service.\u003C/p>\u003Cp>\u003Cstrong>Follow-Up Actions and Preventive Measures\u003C/strong>\u003C/p>\u003Cp>To prevent similar incidents, we are implementing the following changes:\u003C/p>\u003Cul>\u003Cli>\u003Cu>Proactive storage monitoring\u003C/u>: We are adding alerts at multiple storage utilization thresholds (60%, 75%, 90%) to catch capacity issues before they become critical.\u003C/li>\u003Cli>\u003Cu>Replication slot monitoring\u003C/u>: We are implementing monitoring for database replication slots to detect orphaned slots that could cause WAL accumulation.\u003C/li>\u003Cli>\u003Cu>Database maintenance runbooks\u003C/u>: We are creating detailed runbooks for database upgrade procedures with mandatory cleanup steps when deployments fail.\u003C/li>\u003Cli>\u003Cu>Infrastructure capacity review\u003C/u>: We are reviewing storage limits and autoscaling configurations across all production systems to ensure adequate headroom.\u003C/li>\u003C/ul>","https://www-internal-blog.chatwoot.com/content/images/2026/02/Introduction---Figma-Thumbnail.png",false,"public","2026-02-12T12:03:39.000+00:00","2026-02-23T20:08:11.000+00:00","2026-02-10T12:08:00.000+00:00",null,[18],{"id":19,"name":20,"slug":21,"profile_image":22,"cover_image":16,"bio":16,"website":23,"location":16,"facebook":16,"twitter":24,"meta_title":16,"meta_description":16,"url":25},"611a190f4b8f26503f72d6a7","Vishnu Narayanan","vishnu","//www.gravatar.com/avatar/87734580b02f2def07ea9e48cc089466?s=250&d=mm&r=x","https://vishnunarayanan.com","@v_shnu","https://www-internal-blog.chatwoot.com/author/vishnu/",[27],{"id":28,"name":29,"slug":30,"description":31,"feature_image":16,"visibility":12,"og_image":16,"og_title":16,"og_description":16,"twitter_image":16,"twitter_title":16,"twitter_description":16,"meta_title":32,"meta_description":33,"codeinjection_head":16,"codeinjection_foot":16,"canonical_url":34,"accent_color":16,"url":35},"6308cfbf730b190d1c2d7f42","Engineering","engineering","We're always experimenting with new features and complex solutions, as well as improving our existing offerings. Read about how we do it, directly from Chatwoot engineers.","Engineering Blog | Learnings, tips, stories from Chatwoot Engineers","We're always experimenting with new features, & complex solutions, & improving our existing offerings. Read about how we do it – written by our engineers.","https://www.chatwoot.com/tags/engineering","https://www-internal-blog.chatwoot.com/tag/engineering/",{"id":19,"name":20,"slug":21,"profile_image":22,"cover_image":16,"bio":16,"website":23,"location":16,"facebook":16,"twitter":24,"meta_title":16,"meta_description":16,"url":25},{"id":28,"name":29,"slug":30,"description":31,"feature_image":16,"visibility":12,"og_image":16,"og_title":16,"og_description":16,"twitter_image":16,"twitter_title":16,"twitter_description":16,"meta_title":32,"meta_description":33,"codeinjection_head":16,"codeinjection_foot":16,"canonical_url":34,"accent_color":16,"url":35},"https://www-internal-blog.chatwoot.com/incident-report-february-4/","Chatwoot Cloud experienced an incident on February 4 which lasted approximately 12 minutes, from 12:48pm to 01:00pm UTC. During this time, all Chatwoot Cloud users were unable to access the platform. No data was lost during this incident.\n\nOur sincerest apologies for the disruption. Reliability is the top priority for us at Chatwoot. We have identified the risks and have taken steps to mitigate such events in future.\n\nTimeline\n\nFebruary 4, 2026\n\n * 12:43 PM: Database instability began, connectio",1,true,1775212115823]