SysAdmins often have this nightmare when they run the dreadful and deadly command ‘rm -rf /’ as root. How horrifying!
If you didn’t know already, / represents root. And running sudo rm -rf / will delete root directory and all of its content. In Linux file hierarchy, root contains everything. Deleting root means your system is gone, forever.
No wonder this is compared to drunken driving in the Linux world.
Sh*t happens
But shit happens in the IT world. And apparently it happened with this hapless SysAdmin Marco Marsala who runs a web hosting company serving over 1500 customers.
As per the question posted on Serverfault few days back, Marsala tried to run a Bash script that had the following command in it: rm -rf {foo}/{bar}. But it turned out to be ‘rm -rf /’ due to undefined variables and the inevitable happened.
In Marsala’s own words:
I run a small hosting provider with more or less 1535 customers and I use Ansible to automate some operations to be run on all servers. Last night I accidentally ran, on all servers, a Bash script with a
rm -rf {foo}/{bar}
with those variables undefined due to a bug in the code above this line.All servers got deleted and the offsite backups too because the remote storage was mounted just before by the same script (that is a backup maintenance script).
How I can recover from a
rm -rf /
now in a timely manner?
Oh, poor guy!! What did you just do?
What next?
What next? This is what Marsala wanted to know. Is there a way to recover from ‘rm -rf /’?
But chances of recovering all the data from a rm -rf / are thin. No wonder, this post started getting sarcastic (but honest) comments like:
If you really don’t have any backups I am sorry to say but you just nuked your entire company
Another one went like:
You’re going out of business. You don’t need technical advice, you need to call your lawyer.
Few people suggested to shutdown everything, don’t overwrite anything and use data recovery tools to get at least some data back.
And it seems like, it did work to a larger extent for Marsala as he did mention “luckily we recovered almost all data” later on.
Lessons to learn
As some people are speculating that it’s a hoax, there are still few lessons to learn for all of us.
- Backup everything. If it’s a professional server, have multiple, offline backups
- Don’t use a random tool or script from the internet and use it on a production machine directly
- Have test machines identical to that of production for testing out new stuff without risking the production system
Anything to add to this scary incident?
shouldn’t it be obvious? foobar? is “file-or-object” / “buffer-address-register” even a logical bash command?
I kind of preferred old theme. Perhaps you could have an option to switch?
I have. But at the same time, many readers liked it. In that case, it calls for a vote :)
I’ll create a poll to see which one is more liked :)
Whether it is a hoax or not, ‘Lessons to learn’, as enunciated by you and repeated by a lot of people in slightly different forms, holds good:
Backup everything. If it’s a professional server, have multiple, offline backups
Don’t use a random tool or script from the internet and use it on a production machine directly
Have test machines identical to that of production for testing out new stuff without risking the production system.
I have a PC for home use; I do not download any PPAs (except for GIMP); let the applications in the Ubuntu Software Centre be out of date.
Having been a sys admin in the past, I can understand what happened—sort of. Anytime I was going to run a script doing anything system-wide, I always did two complete backups–just in case. I know that for the company’s fiscal year-end I often worked all-night and day Saturday and Sunday to have everything ready for Monday morning. (Those were 4-bit machines with “giant” 12-inch “Winchester” drives.)
It’s a hoax / marketing stunt! see http://www.repubblica.it/tecnologia/2016/04/15/news/cancella_l_azienda_per_sbaglio_la_disavventura_tecnologica_di_marco_marsala-137693154/