The promise of mainframe technology is that systems management within these mega-machines is so well orchestrated that optimal resource allocation and the management of diverse processing and applications on multiple operating systems is possible under one computer frame. It is a major reason why mainframes continue to hold their own as preferred bastions of mission-critical applications in enterprises.
So what does this have do with big data platforms like Hadoop, which most often are processed in distributed computing environments?
As a preferred platform for big data processing, Hadoop doesn’t have the same robust systems management tools as mainframes. Part of the reason why is due to distributed computing’s heritage as a commodity platform that could produce cheaper hardware alternatives to sites than buying mainframes—without many of the advanced systems management techniques that were and are available on mainframes. The trade-off for sites choosing distributed hardware was simply to add more servers to compensate for poorer optimization of system throughput and storage.
Now that many companies are well past the proof-of-concept stage in their big data strategy, and they’re deploying big data applications across their organizations, resource management is becoming more of an issue.
“What companies are finding is that everyone is vying for the same cluster of big data processing in the data center, which also means that IT must satisfy internal customers in a multi-tenant scenario when it comes to processing big data queries on popular software like Hadoop. These internal customers are also demanding performance SLAs (service level agreements),” said Sean Suchter, cofounder and CTO of Pepperdata, which provides software that optimizes Hadoop cluster performance.
Users want their own dedicated Hadoop clusters that they don’t have to share. However, for organizations that have major investments into hundreds and even thousands of Hadoop nodes, these resources must be shared to meet all of the big data demands.
“This is where being able to effectively manage and optimize Hadoop clusters comes into play,” said Suchter, whose company offers a 72-hour Hadoop cluster health check to companies that register for it.
“We provide the health check program, which for a 72-hour period collects all Hadoop performance data on a production cluster and then produces a high-level diagnostics report with granular insights into common issues,” said Suchter.
Among these issues are problem users or jobs that are consuming large amounts of systems resources, wasted cluster capacity that can be redeployed and optimized, and identification of performance bottlenecks for root cause analysis. It’s helping companies get a handle on their Hadoop cluster resource management.
“As an example, one major telecom company had several hundred Hadoop cluster nodes processing IoT (Internet of Things) telemetry workloads,” said Suchter. “The data was continuously streaming and was ranked at a very high SLA priority level—but at the same time, the company was processing ad hoc Hadoop query jobs from various user departments, and developers were also testing software that was going against the same data in the same Hadoop cluster.”
The company wanted to find a way to rank the various jobs by priority so the high priority IoT work would get premium access to the cluster resources, with other jobs processing at a lower priority.
“Without the insights that the management software provided, the work would start backing up,” said Suchter. “In this case, the company gained 40% throughput and improved efficiency by reducing the amount of unused hardware by 30% to 50%.”
Does this solve every Hadoop performance issue? No, but it does put IT on the road to better managing a resource that is very expensive and in high demand. It also has the ability to lend greater credibility to commodity processing platforms by equipping them with the kinds of robust system management capabilities that are needed for mission-critical applications.