The Insider's Guide to Building Distributed, Big Data Applications with Apache(R) Hadoop(TM) YARN Apache Hadoop is central to the big data revolution. Now, its data processing has been completely overhauled: Apache(R) Hadoop(TM) YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache(R) Hadoop(TM) YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances. YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment. You'll find many examples drawn from their cutting-edge experience-first as Hadoop's earliest developers and implementers at Yahoo!, and now as Hortonworks developers moving the platform forward and helping customers succeed with it.
Coverage includes * YARN's goals, design, architecture, and components-how it expands the Apache Hadoop ecosystem * Exploring YARN on a single node * Administering YARN clusters and Capacity Scheduler * Optimizing existing MapReduce code * Developing a large-scale YARN clustered application * Discovering new open source frameworks that run under YARN