Attached: 1 image
This is fitting. The top topic on Xitter right now is of course the global Crowdstrike/Windows clusterfuck. But the AI summary of the discussion is hilarious, b/c it summarizes a bunch of sarcastic posts and makes it sound like a positive (or at least can-do) story.
The machines, now inaccessible, are arguably more secure than before.
@sailor_sega_saturn And given enough time and enough scale even the most improbably weird things will eventually happen. Update file corrupted by a storage controller that flips a couple of bits at random after every 720 hours of uptime but only if it’s 23.682 seconds after the hour? Weirder shit has happened.
I once helped one of my company’s customers troubleshoot an issue that had seen the same ridiculous edge case error happen three times over the course of a few years. At one point the actual sustaining developer we worked with was able to narrow down a specific bit that was getting flipped somehow, and pitched that cosmic radiation was a plausible solution given how rarely this kind of thing impacted other customers.
It was at this point that we remembered that the customer was either a university with a nuclear physics lab or a hospital with a nuclear medicine program (can’t remember now, ironically enough) that the server rack lived adjacent to.
some twenty four years ago i managed, amongst others, a company’s samba and print server (that was at the time when all the company’s servers were beige boxes with less memory and disk than the laptop i’m using to type this – and still they served a few hundred employees).
the machine developed a strange custom of hard-resetting itself, which we initially tracked to specific files being sent for printing; the behaviour was fully reproducible.
as it happened, it was a hardware fault somewhere between the mainboard and the integrated SCSI card; installing a separate SCSI card and reconnecting the disks and backup tape device fixed the problem. (i did not have the budget for a new serwer, no.)
establishing the actual cause took me fucking weeks.
@sailor_sega_saturn And given enough time and enough scale even the most improbably weird things will eventually happen. Update file corrupted by a storage controller that flips a couple of bits at random after every 720 hours of uptime but only if it’s 23.682 seconds after the hour? Weirder shit has happened.
I once helped one of my company’s customers troubleshoot an issue that had seen the same ridiculous edge case error happen three times over the course of a few years. At one point the actual sustaining developer we worked with was able to narrow down a specific bit that was getting flipped somehow, and pitched that cosmic radiation was a plausible solution given how rarely this kind of thing impacted other customers.
It was at this point that we remembered that the customer was either a university with a nuclear physics lab or a hospital with a nuclear medicine program (can’t remember now, ironically enough) that the server rack lived adjacent to.
some twenty four years ago i managed, amongst others, a company’s samba and print server (that was at the time when all the company’s servers were beige boxes with less memory and disk than the laptop i’m using to type this – and still they served a few hundred employees).
the machine developed a strange custom of hard-resetting itself, which we initially tracked to specific files being sent for printing; the behaviour was fully reproducible.
as it happened, it was a hardware fault somewhere between the mainboard and the integrated SCSI card; installing a separate SCSI card and reconnecting the disks and backup tape device fixed the problem. (i did not have the budget for a new serwer, no.)
establishing the actual cause took me fucking weeks.