Cloudflare blames massive internet outage on 'latent bug'

Tony Bark@pawb.social · 29 days ago

Cloudflare blames massive internet outage on 'latent bug'

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · edit-2 29 days ago

Eager Eagle@lemmy.world · edit-2 29 days ago

thanks for illustrating the corpo speak

I hope the bug is fine

moseschrute@lemmy.world · edit-2 29 days ago

Nobody ever asks if the bug is ok

aeronmelon@lemmy.world · 29 days ago

Fun fact time:

That’s why they’re called computer bugs.

In 1947, the Harvard Mark II computer was malfunctioning. Engineers eventually found a dead moth wedged between two relay points, causing a short. Removing it fixed the problem. They saved the moth and it’s on display at a museum to this day.

The moth was not okay.

And to be fair, the word bug had been used to describe little problems and glitches before that incident, but this was the first case of a computer bug.

FauxLiving@lemmy.world · 29 days ago

If you want a technical breakdown that isn’t “lol AI bad”:

https://blog.cloudflare.com/18-november-2025-outage/

Basically, a permission change cause an automated query to return more data than was planned for. The query resulted in a configuration file with a large amount of duplicate entries which was pushed to production. The size of the file went over the prealloctaed memory limit for a downstream system which died due to an unhandled error state resulting from the large configuration file. This caused a thread panic leading to the 5xx errors.

It seems that Crowdstrike isn’t alone this year in the ‘A bad config file nearly kills the Internet’ club.

phutatorius@lemmy.zip · 25 days ago

‘A bad config file nearly kills the Internet’ club

There’s no such thing as bad data, only shitty code to create it or ingest it, and bad testing that failed to detect the shitty code. The overflow of the magic config-file size threw an exception, and there was no handler for that? Jeez Louise.

And as for unhandled exceptions, you’d think static analysis would have detected that.

FauxLiving@lemmy.world · 25 days ago

Someone should make a programming language like Rust, but that doesn’t crash.

/s

AldinTheMage@ttrpg.network · 29 days ago

So the actual outage comes down to pre-allocating memory, but not actually having error handling to gracefully fail if that limit is or will be exceeded… Bad day for whoever shows up on the git blame for that function

hue2hri19@lemmy.sdf.org · 29 days ago

This is the wrong take. Git blame only show who wrote the line. What about the people who reviewed the code?

sugar_in_your_tea@sh.itjust.works · 28 days ago

If you have reasonable practices, git blame will show you the original ticket, a link to the code review, and relevant information about the change.

floquant@lemmy.dbzer0.com · edit-2 29 days ago

Plus the guys who are hired to ensure that systems don’t fail even under inexperienced or malicious employees, management who designs and enforces the whole system, etc… “one guy fucked up and needs to be fired” is just a toxic mentality that doesn’t actually address the chain of conditions that led to the situation

PiraHxCx@lemmy.ml · 29 days ago

I wonder if all recent outages aren’t just crappy AI coding

MagicShel@lemmy.zip · 29 days ago

Shitty code has been around far longer than AI. I should know, I wrote plenty of it.

floofloof@lemmy.ca · 29 days ago

They trained it on the work of people like you.

foo@feddit.uk · 29 days ago

But, AI can do the work of 10 of you humans, so it can write 10 times the bugs and deploy them to production 10 times faster. Especially if pesky testers stay out the way instead of finding some of the bugs.

AbidanYre@lemmy.world · 29 days ago

Humans are plenty capable of writing crappy code without needing to blame AI.

KazuyaDarklight@lemmy.world · 29 days ago

Absolutely, but it does feel like things have spiked a bit recently.

Cloudflare blames massive internet outage on 'latent bug'

Cloudflare blames massive internet outage on 'latent bug'

Cloudflare blames massive internet outage on 'latent bug' | TechCrunch