Common Storage Related Challenges IT Managers Face
IT Managers can run into a plethora of issues within their infrastructures. Networking, hosts, and storage are just a few of the many pain points IT managers face within their IT environment. All pieces and parts must work together to create a balanced state where bottlenecks are dissolved. Here are few of the top issues that IT Managers face due to storage related tasks:
The most basic of daily tasks is most likely direct storage management. These can be simple storage tasks, such as viewing analytics and making sure everything is operating sufficiently within the infrastructure. However, it's not often that storage administrators are logging into their management portal just to make sure everything is in operation. Usually, a storage related problem that needs to be addressed prompts this action. For example, if a host or volume is running low on capacity, then this volume needs to be increased to provide ample space for the hungry application. Hopefully, the storage device in question provided some type of alert that a capacity threshold was met, and this usually initiates the needed action.
Weekly, monthly, quarterly, or annual reports are other common requirements for IT Managers. Watching storage, seeing tendencies of IO, throughput, growth, and preparing for peaks, each create a workload and burden on IT Managers. Having a storage solution in place that can publish reports and give historical charts relieves an abundance of workload that is placed upon an IT Manager.
This truly matters today as SSDs have come down in cost dramatically in the last few years and vendors are adding them at a whim. Traditional SANs and other units designed for HDDs and not designed around the capabilities of SSDs and can cause problems and headaches once in production. SSD thrashing from a poorly conceived caching algorithm, or poor tiering performance from a system designed around 15k and 7.2k drives can lead to misuse of SSDs. Just because a system has SSDs doesn't mean that the system can perform at a low latency for prolonged periods of time.
Requests may come in for recovery as well. Typically, when we think of disaster recovery, we think of entire infrastructure recovery. Sometimes recovery can be as simple as a deleted file or a lost email that needs to be found. If an IT Manager is running a snapshot schedule on the storage device, a simple mount and copy of the data will allow for file recovery. If a catastrophic event occurs, and a complete failover needs to occur, then a full backup may be required. This backup can range from simple replication to an off-site repository, a tape based solution, or even cloud backup. Discussion on this can revolve around a recovery point objective (RPO) as well as the recovery time objective (RTO), which should be defined before a backup solution is deployed. Once the backup is available for use, the IT Manager can then bring their company back to operating standards.
New buildouts also account for a great deal of an IT Managers time. New buildouts could be the planning of a backup or off-site solution. This could be a new facility that is being brought up, or it could be the ingestion of technology from a merger or purchase. Anytime a new buildout is happening, a lot of planning and time will be required to be successful. Quote gathering, which includes time meeting with storage vendors, seeing product demos, and discussing the options available, will produce a plethora of options just on the storage side of the equation. After this, the multitude of connection types, host configurations, and backup options adds up. The list can go on depending on the scope of the project, and this can be a true hindrance on a storage administrator's time.
Finally, failures, whether physical or software related, can cause havoc upon an infrastructure. Depending on the storage array and type, a disk failure can be a simple hot swap component, or in some cases, can slow a storage array and come one step closer to data loss. If a double or triple drive failure occurs, hopefully things like hot spares are available and the RAID can rebuild itself automatically. Otherwise, that backup solution mentioned earlier will come in quite handy. Other things like power supply, NIC, CPU, memory, or complete controller failure can cause a storage array to go down, or in some cases become overloaded when failover occurs. If the system is running in an overloaded state, thresholds will be hit and latency will increase across the infrastructure. IT Administrators need to be prepared so that if a failure occurs, there is a plan in place to recover, failover, or at worst case revert back to a stable point.The storage solution in place must have sufficient management tools to reduce time and complexity of typical tasks, built in reporting and threshold monitors to alert in the event of impeding catastrophes, integration with backup or the ability to replicate in the event of site loss, and of course meet the budgetary requirements within the scope of the plan.