Friday, February 5, 2010

Should a sys admin backup important data in anyway he can

SkyHi @ Friday, February 05, 2010

Recently, one of the main file servers at our company failed. It was using a 4 disk RAID array, but apparently 3 of the disks died, and all the data on the server has been lost.

Speaking to the sys admin, he says that he has been warning the upper management about the backup situation for months. He had been trying to get approval to buy an enterprise-level backup solution, but he never got the budget approved for it - because management thought it was over the top.

The sys admin is a dedicated properly certified sys admin, whereas his managers are not IT-oriented.

His manager is asking why he didn't buy a cheap external drive and use this to backup the file server. The sys admin thinks that this is just a mickey-mouse solution that's suitable for use at home, but not a professional IT company - which is why he did not do it.

It seems to me that the sys admin wants proper, text-book IT strategy that costs a lot more money, whereas the management (without a deep IT understanding) wants cheaper solutions that they think are adequate.

I'm wondering what is the opinion of other sys admins? Was this sys admin correct in his actions? Or should he always make sure there is a backup of the important data, even if he believes that the cheaper way is not good enough?


Edit: based upon the answers, I'll add that the sys admin has an IT manager who would've known of the situation. He reports to the ultimate boss. I don't know if the manager ever reported the full situation to the boss. I think it is quite tough for the manager, as he is stuck in the middle, and he wants to be diplomatic with both sides.

=============================================================


I would agree that doing it right is the preferred method. But, to stand by and do nothing is unprofessional. Was management informed that there was no backup in place? It is the admins job to present the options, including costs and risks, to management. He presented his preferred option, and when it was denied, he did nothing. Not cool.

would honestly say it is a failure on both parts.

The logistics of the situation might mean that he would have to take away time that he should have been spending doing other, immediate, important tasks.

However, ultimately, yes, he should have done something. A bunch of hard drives from here and there would have been better than nothing as has been said repeatedly.

On the other hand, the entire purpose of management is to make sure that the people beneath you can do their jobs, and do. and thus from a leadership point of view, the managers failed miserably and can be held equally responsible, if not moreso.


=============================================================

If there are no backups, as far as I'm concerned it's the sysadmin's responsibility to:

1) Explicitly tell the higher ups that there is NO backups, in no uncertain terms, so that they are aware of it

2) Back up the data anyway, any way he can

Frankly I would expect to be fired if this happened, because even if management are making my life hard, that's not an excuse, especially if they're still under the impression that they have something rather than nothing.


=============================================================

Unfortunately, companies skimping on backups is all too common. Most never change until they get burned and lose everything.

BUT

If you are employed to be the sysadmin you have to work with the tools that you have including your brain. No matter what management or anyone else says on good days, when the poop hits the fan everyone gets selective memory.

A mickey mouse backup is better than no backup at all.


==============.===============================================

It's damned if you do, damned if you don't. Frankly, if there was no money spent by management on a backup solution, then it's their fault. On the other hand, the admin should have been active in trying to work out a stopgap solution, rather than just sitting on his ass waiting for something to break (I don't think any sort of external drive solution is acceptable. You're never going to get a decent backup with that.) You can't just say, "Well I don't have what I want, so I'm not responsible" but you can say, "I've repeatedly tried to get you to do something and you've given me nothing and this is not my problem."

I was actually in a situation once--I wasn't even an ADMIN at this job--where I was working on a database, and made a backup before I changed it (which is s.o.p), and I (as I usually do, whenever I can) saved it to my own local machine. Two days later they lost the raid array, and ooops, turned out there was no backup solution. They'd been backing up the database to the raid array.

So I come in late on this, and I say, "Oh, I backed it up myself day before yesterday."

You know what the outcome was? I was censured for my bad backup solution. For a machine that I was in no way responsible for. And it wasn't because the backup I had was too old, it's because I'd only backed up the database I was working on, not every database.

So the problem is this: if you do a mickey mouse solution, if you do anything, and it's not quite good enough, you're going to get just as much hell as if you do nothing at all. If backups are your responsibilty, explicitly, and there is no budget, you should try to cobble something together, but you better make damn sure it works, and you need to raise hell about it. Repeatedly. At every opportunity.

If it's not your responsibility, point out that there exists a problem, and absolutely, categorically, refuse to take responsibility for an unfunded mandate when they try to assign it to you. No one makes disaster recovery a priority until there is a disaster, and then they scapegoat everyone to try and make up for their own shortsightedness.


=============================================================

I'll add my voice to those saying that the admin should have implemented something here. He's badly at fault for not having done so. There's a part of me that would like to sympathise with his position, but in an ideal world backup and restore would take no time, always work, and never be needed. This isn't that world and even the best backup solution is going to have flaws that you'll need to accept and learn to work with.

Half-assed is better than no-assed, and even using an el-cheapo USB HD would have gotten him out of the woods, and would have given weight to his position when management are told that they can't get data more than day or two old back. But it would have still saved his neck in this case.


==============================================================

Should a sys admin backup important data in anyway he can.

I don't know that I would say a you should make a backup under any conditions. There are some things you might be tempted to do that would possibly be illegal. For example I would not backup health records over the network to my personal computer. I would not do something illegal just to have a backup.

OTOH to have at least some backup system in place I would accept a lot of compromises. Then whenever a compromise was made I would make a point to make sure my objections are clear and documented about why it was a bad compromise that will cause problems, be inadequate, or become less useful in the future.



=============================================================

To me it sounds like the sysadmin wanted all or nothing. It's nice to get all, but if you can't have it should you accept nothing?

In my experience, the thing to do is evaluate all the possible options, (not in too much depth), and draw up a few bullet points for each indicating the pros and cons, costs (both inital and ongoing). Include in this the "do nothing" option.

Then you allow the managers to decide what solution they choose. It would seem to me that there was probably more than one possible option for your sysadmin. Perhaps he only saw the one he really wanted though?


=============================================================

As a sysadmin I believe it is my responsibility to ensure the systems under my care are as secure and reliable as I can possibly make them. Backups fall under the reliability tags. Frustrating as it may be to have to argue with non-understanding senior staff (I think we've all been there at some time or other), we still should be doing our jobs as best we can.

When the backup system I inherited in my current position failed and management hesitated about spending the money on the system I wanted I didn't leave the system without backups. Instead, I brought my personal external drive in and used that for a week or so. Despite having an absolute abhorrence for using hard drives for backups the fact remains that it was vastly preferable to having none at all.


=============================================================

If the sysadmin was unable to convince management of the importance of a good backup solution the only way they will ever be convinced is via catastrophic data loss, but as a sysadmin it is your responsibility to educate management and users about the importance of things like backup, and to make sure they thoroughly understand the current state (in this case "no backups") and potential consequences ("We lose a disk and your precious data is gone forever").

My personal opinion is that the admin kinda screwed up here: Ad-Hoc backups are a bad idea (you'll miss stuff, important data will be lost, if you're not around backups don't happen), but at the same time they should have been able to find a reliable "enterprise" backup solution within the company's budget.
Software like Bacula and Amanda is available for free, and both of those can work with removable USB media and CDs securely and reliably. Including the cost of media and server hardware you could have a good system for less than $2000 US - even cheaper if you recycle hardware for the server.

Now if management is also opposed to the admin spending TIME on getting backups working there's just no helping this company: As I said above sometimes the only way to teach people is catastrophic data loss, and if that's the case it sucks for the poor admin who has to take the blame for institutional stupidity.


=============================================================

My personal opinion is that it's my job as a sysadmin to inform and impress upon management the need for and the importance of having an adequate, appropriate backup solution and requesting the neccessary budget for such, and to explain the risks associated with not doing so. It's not my responsibility to go "outside" of the mandate of the management and just do whatever I think is right regardless of how poor those management decisions are. It's not my responsibility to cobble together some half-baked, half-assed solution.

If I was an insurance agent and I told you it was important to have fire coverage in your home owner's policy, and if I adequately explained the risk of not having fire coverage, and you declined said fire coverage, and your house burned down, who's responsibility is it? Should I have given you fire coverage anyway?

My opinion is that the sysadmin exercised due dilligence in performing the duties of his job by bringing the matter to the attention of management, explaining the importance of having an appropriate backup solution, explaining the risks of not having it, and requesting the neccessary budget for such. If he was rebuffed in his efforts then the responsibility lies squarely on the shoulders of the management.

People make poor decisions all the time and bad things happen because of those poor decisions, that's a fact of life. I can't be responsible for every bad decision my boss makes, regardless of the risks associated with those decisions.


=============================================================

Did the same situation happen with the RAID array? As soon as one disk dies, you are in a situation where one more means data loss.. you better replace that drive immediately.

If I was in the sys admin's shoes, the instant the first drive went:

  1. Email manager with formal request to replace the drive, reminding that no backup system has been approved so this is a critical situation. Cite the prior requests for the backup system by attaching that email, or even better, the manager's response denying the request.
  2. If no response, re-send message, this time CC'ing your manager's manager.
  3. If still no response, well.. not much more you can do. Polish the resume and start looking for a better job.

If you get denied along the way, at least you have it in writing for when the shit hits the fan (Get it in writing/email, do not accept a verbal response. You need a paper trail here. If your manager refuses to write it, then go over his/her head, because that's just shady -- there is no legit reason not to write it down.)

The same process should have been followed for getting a backup system, though perhaps without escalation as quickly (or going over your manager's head at all). If none of the requests are in writing, well.. shit rolls downhill. At least it's a good life lesson.

If you don't lose your job over the situation, well, start making that request again, citing the disaster it caused last time your request was denied. If it's still denied, then you need to decide if that's an environment you want to work in, and it's worth the stress. If every morning you expect to walk into work finding a panic because data was lost, well, that's no way to live.



============================================================

The company is clearly looking for a scape goat in this, the sys admin is quite right not to backup critical data to a removable device.

1) They are not reliable 2) They are not secure

Ultimately it lies with the managers for not ensuring a proper DR (Disaster recovery) solution was put in place.

Look at it this way, how much has this data loss cost the company? Suddenly I'm sure the "over the top" solution doesn't look so expensive.

edit: yes I concede the fact any backup is more reliable than none, but my original point remains if this person has managers, the managers should of ensured the backup was in place, I'm not pardoning the sys admin of all blame here, but this this what the manager should be checking.

And what if the server failed and the data on the removable drives was irrecoverable for whatever reason? having had this occur myself in the past USB drives are far from reliable, but to some they can be used in a "pinch" the problem is as it appears in this case the management would of allowed removable drive backup to be used in the long run.



REFERENCE

http://serverfault.com/questions/109793/should-a-sys-admin-backup-important-data-in-anyway-he-can-even-if-he-disagrees-w