After two major outages in as many weeks — including the CrowdStrike crash — alarm bells are ringing about the world's overreliance on Microsoft. Andrew Chan...
No. If everyone were on Linux and there was a breaking change introduced by a third-party there would be similar problems.
The problem is that critical infrastructure isn’t treated like critical infrastructure. If something you rely on can go down due to a single point of failure, maybe don’t fucking use it?! Have backups, have systems that can replace those systems, have contingency! Slapping Windows on to a small machine and running some shitty Chromium app to work as a cash register is a fucking stupid idea when you consider that it is responsible for your whole income.
The problem was never Windows. It was companies that were too cheap to have contingency, because an event like this was considered extraordinary and not worth investing in.
Nope, that’s not how it works on Linux, even if someone introduced the most heinous breaking change people would just not update until things were fixed, in fact the update is unlikely to do that because things are tested before being pushed. If someone were using latest of everything by having something like a Gentoo system with everything building from git maybe that person would be affected and he would have to rollback to an earlier version and keep going for a total downtime of 1h tops, and that is if someone was using the most stupid way possible in production.
The main reason why this will NEVER happen to a server running Linux is that updates are not automatic, i.e. they get triggered manually, so if there’s an issue upstream you don’t update, and if you encounter you rollback. The issue is not that Windows had a broken update, that can happen and it’s fine, the issue is when the OS forcefully installs that update and breaks your system without you doing anything.
And yeah, I know what I’m talking about, I worked as a software architect for a large website for a few years and now I work as a software engineer for the servers of one of the largest online games.
Edit: re-reading your post, I would like to ask you how would you build this critical infrastructure with Windows? Because independently of how you answer it you would have been affected by this.
They should be, but I remember reading a lot of people saying that even in enterprise environments Microsoft reserved the right to push security updates that bypassed those rules.
Windows in many workplaces has updates locked down too, except in circumstances where critical security or vulnerability patches are pushed through.
The same is true for many servers that run Linux.
As someone that works on tier1 services for arguably the biggest tech company right now, that’s how it works in most of FAANG. Updates are gated, sure, but like with many things there’s a vetting process where some things that look super important and safe just slip through.
In regards to your edit, I guess most cases are different from others, but if your entire business requires you to be able to use a machine 100% of the time then you should have the means to either use a different machine to continue transactions (ideally one with a known state that won’t change, or has been tested in the last few months). If you need to log transactions and process 24-48 hours later do that on something that’s locked down hard, with printed/hard backups if necessary.
Ultimately, risk is always something you factor in. If you don’t care about 48 hours of downtime over several years, it’s not a huge concern. I’d probably argue that many companies lost more money during these days than they would have spent in both money and people-hours training them on a contingency system to use in case of downtime.
Who determines which security updates are critical? In windows case it’s ultimately Microsoft, if they say this update is critical it will get installed on your machines whether you like it or not.
The update process in Linux needs to be triggered manually, so it’s a big difference. No one external to your company can say “that computer will get this new software NOW”, and that’s the point you’re missing.
In answer to the other dit answer, if all of those machines are windows they were all affected by the update, so having secondary or tertiary machines is pointless because all of them failed at the same time when an external source decided to install new software on all your computers.
the issue is when the OS forcefully installs that update and breaks your system without you doing anything.
The crowdstrike update was pushed out by their own software I thought, not the windows update system?
Plus crowdstrike has caused similar issues with Linux systems before, so the solution is to just not use crowdstrike and similar solutions on any OS.
The issue is not that Windows had a broken update, that can happen and it’s fine, the issue is when the OS forcefully installs that update and breaks your system without you doing anything.
I would have thought most businesses with windows would do staged rollouts.
The problem wasn’t with an update Microsoft pushed out. It was due to an update by crowdstrike which iirc ignored all settings for staged rollout (or there were no settings at all for that)
It’s not like anyone outside Crowdstrike chooses to have these updates installed. It happened automatically with no way of stopping it.
Yes, this specific problem wasn’t caused by Microsoft, but it was caused by the forced automatic update policy that crowdstrike has, which is the same behavior Windows has. So while this time it wasn’t Microsoft, next time it could be. And while you can prevent this from happening on your Linux box by choosing software that doesn’t do this, it’s impossible to prevent it on a Windows box because the OS itself does it.
You absolutely can (and should) do staged rollout for windows updates.
Source: We do that at work. We have 3 different patch groups. 1 “bleeding edge”, 1 delay by a day or two, and another one delayed by a bit more. This so so we can stop an update from rolling out to prod if dev breaks.
Correct me if I’m wrong, but others have told me that Microsoft reserves the right to push security upgrades that bypass any policy setup by the network administrator.
It’s possible that there is a way to for example bypass a company’s WSUS server but I don’t know if there is such a way and I couldn’t find any obvious way when searching.
Due to the source being hearsay I don’t really feel convinced and if I were you I wouldn’t spread such information further unless you found reliable sources first.
I’m open to any information about it if anyone can find any reliable information like documentation or blog posts from MS employees.
Still I highly doubt that is used often at all if it even exists. Only to be used in the absolute direst of times. I would also trust Microsoft much more in such a case that a third party like Crowdstrike.
I mean this is sort of like what the new NIS2 Regulations tries to achieve. Make critical infrastructure producers and maintainers aware and force them to treat their infrastructure accordingly.
No. If everyone were on Linux and there was a breaking change introduced by a third-party there would be similar problems.
The problem is that critical infrastructure isn’t treated like critical infrastructure. If something you rely on can go down due to a single point of failure, maybe don’t fucking use it?! Have backups, have systems that can replace those systems, have contingency! Slapping Windows on to a small machine and running some shitty Chromium app to work as a cash register is a fucking stupid idea when you consider that it is responsible for your whole income.
The problem was never Windows. It was companies that were too cheap to have contingency, because an event like this was considered extraordinary and not worth investing in.
Nope, that’s not how it works on Linux, even if someone introduced the most heinous breaking change people would just not update until things were fixed, in fact the update is unlikely to do that because things are tested before being pushed. If someone were using latest of everything by having something like a Gentoo system with everything building from git maybe that person would be affected and he would have to rollback to an earlier version and keep going for a total downtime of 1h tops, and that is if someone was using the most stupid way possible in production.
The main reason why this will NEVER happen to a server running Linux is that updates are not automatic, i.e. they get triggered manually, so if there’s an issue upstream you don’t update, and if you encounter you rollback. The issue is not that Windows had a broken update, that can happen and it’s fine, the issue is when the OS forcefully installs that update and breaks your system without you doing anything.
And yeah, I know what I’m talking about, I worked as a software architect for a large website for a few years and now I work as a software engineer for the servers of one of the largest online games.
Edit: re-reading your post, I would like to ask you how would you build this critical infrastructure with Windows? Because independently of how you answer it you would have been affected by this.
Windows updates don’t happen automatically in an Enterprise environment. They are tested and pushed out once the version is determined to be stable.
They should be, but I remember reading a lot of people saying that even in enterprise environments Microsoft reserved the right to push security updates that bypassed those rules.
That is a wild assumption with two key flaws
Windows in many workplaces has updates locked down too, except in circumstances where critical security or vulnerability patches are pushed through.
The same is true for many servers that run Linux.
As someone that works on tier1 services for arguably the biggest tech company right now, that’s how it works in most of FAANG. Updates are gated, sure, but like with many things there’s a vetting process where some things that look super important and safe just slip through.
In regards to your edit, I guess most cases are different from others, but if your entire business requires you to be able to use a machine 100% of the time then you should have the means to either use a different machine to continue transactions (ideally one with a known state that won’t change, or has been tested in the last few months). If you need to log transactions and process 24-48 hours later do that on something that’s locked down hard, with printed/hard backups if necessary.
Ultimately, risk is always something you factor in. If you don’t care about 48 hours of downtime over several years, it’s not a huge concern. I’d probably argue that many companies lost more money during these days than they would have spent in both money and people-hours training them on a contingency system to use in case of downtime.
In answer to the other dit answer, if all of those machines are windows they were all affected by the update, so having secondary or tertiary machines is pointless because all of them failed at the same time when an external source decided to install new software on all your computers.
The crowdstrike update was pushed out by their own software I thought, not the windows update system?
Plus crowdstrike has caused similar issues with Linux systems before, so the solution is to just not use crowdstrike and similar solutions on any OS.
I would have thought most businesses with windows would do staged rollouts.
Exactly, and since Windows is similar, therefore…
I’m not sure what you mean?
The problem wasn’t with an update Microsoft pushed out. It was due to an update by crowdstrike which iirc ignored all settings for staged rollout (or there were no settings at all for that)
It’s not like anyone outside Crowdstrike chooses to have these updates installed. It happened automatically with no way of stopping it.
Yes, this specific problem wasn’t caused by Microsoft, but it was caused by the forced automatic update policy that crowdstrike has, which is the same behavior Windows has. So while this time it wasn’t Microsoft, next time it could be. And while you can prevent this from happening on your Linux box by choosing software that doesn’t do this, it’s impossible to prevent it on a Windows box because the OS itself does it.
You absolutely can (and should) do staged rollout for windows updates.
Source: We do that at work. We have 3 different patch groups. 1 “bleeding edge”, 1 delay by a day or two, and another one delayed by a bit more. This so so we can stop an update from rolling out to prod if dev breaks.
Correct me if I’m wrong, but others have told me that Microsoft reserves the right to push security upgrades that bypass any policy setup by the network administrator.
Maybe, I’m not sure about that.
It’s possible that there is a way to for example bypass a company’s WSUS server but I don’t know if there is such a way and I couldn’t find any obvious way when searching.
Due to the source being hearsay I don’t really feel convinced and if I were you I wouldn’t spread such information further unless you found reliable sources first.
I’m open to any information about it if anyone can find any reliable information like documentation or blog posts from MS employees.
Still I highly doubt that is used often at all if it even exists. Only to be used in the absolute direst of times. I would also trust Microsoft much more in such a case that a third party like Crowdstrike.
I mean this is sort of like what the new NIS2 Regulations tries to achieve. Make critical infrastructure producers and maintainers aware and force them to treat their infrastructure accordingly.