Details about Learning Spark: Lightning-Fast Data Analytics
Learning Spark: Lightning-Fast Data Analytics PDF free download – Most developers who grapple with big data are data engineers, data scientists, or machine learning engineers. This book is aimed at those professionals who are looking to use Spark to scale their applications to handle massive amounts of data. In particular, data engineers will learn how to use Spark’s Structured APIs to perform complex data exploration and analysis on both batch and streaming data; use Spark SQL for interactive queries; use Spark’s built-in and external data sources to read, refine, and write data in different file formats as part of their extract, transform, and load (ETL) tasks; and build reliable data lakes with Spark and the open source Delta Lake table format. For data scientists and machine learning engineers, Spark’s MLlib library offers many common algorithms to build distributed machine learning models.
We will cover how to build pipelines with MLlib, best practices for distributed machine learning, how to use Spark to scale single-node models, and how to manage and deploy these models using the open source library MLflow. While the book is focused on learning Spark as an analytical engine for diverse workloads, we will not cover all of the languages that Spark supports. Most of the examples in the chapters are written in Scala, Python, and SQL. Where necessary, we have infused a bit of Java. For those interested in learning Spark with R, we recommend Javier Luraschi, Kevin Kuo, and Edgar Ruiz’s Mastering Spark with R (O’Reilly).
Finally, because Spark is a distributed engine, building an understanding of Spark application concepts is critical. We will guide you through how your Spark application interacts with Spark’s distributed components and how execution is decomposed into parallel tasks on a cluster. We will also cover which deployment modes are supported and in what environments.