We had a huge problem last Friday (on goddamn read-only Friday!) when someone crushed a power pole in their car across the street from our manufacturing plant.
We lost power instantly, but we have large APC UPSs on everything critical. So, we run down to the server room in the basement, and quickly attempt to shut everything down nicely. We had roughly ~5 minutes of power before the UPSs cut out and took everything down with it before we could even shut one server down.
My boss was in the process of fixing some crap in the RSM database and just finished rebuilding a few Hyper-V virtual servers when the power cut. We lost the RSM server, as well as a bunch of VMs that would just no longer come back up, so we scrapped them and rebuilt them. We luckily has an RSM-Test server setup that mirrored the RSM server, and had a solid RSM db backup from the previous day, so it was easy to migrate things over temporarily.
In the face of all of this...our power management plan is apparently completely inadequate. We have been talking about upgrading the UPSs for some time, but no one wanted to pay for it, it worked for now, excuses excuses. No more.
I have been tasked with roughly estimating the wattage of power consumption in the room in order to pass on to APC so we can get a preliminary estimate drawn up. Thing is, Googling to find the power consumption/wattage of various random equipment is proving to be quite the merry chase, as a lot of that information is not readily available outside of the major things like the servers themselves. Has anybody found/created a better approach to something like this? It doesn't have to be exact, but if it's off I would prefer it too be high.
Also, we will be adding an 80-100TB video surveillance system as part of our plant security/quality control program. The reason it is so large is that the plant runs 24/7, and we must hold 18 MONTHS of data on hand to reference. So we will have to factor this into the power consumption plan somehow, while still leaving room for expansion.
There are a lot of unknown variables here, but if you could share your process/how you did something similar, I would greatly appreciate it r/sysadmin!
[link] [9 comments]