Episode image

Advanced Analytics with Spark: Patterns for Learning from Data at Scale

CyberSecurity Summary

Episode   ·  0 Play

Episode  ·  21:16  ·  Jun 27, 2026

About

Explores how to use Apache Spark and the Scala programming language to perform complex data science tasks at scale. The documentation focuses on record linkage, a data cleansing process used to identify duplicate records within massive datasets. It introduces fundamental Spark concepts, such as Resilient Distributed Datasets (RDDs) and DataFrames, while emphasizing the importance of iterative analysis. Readers learn to manage the entire data pipeline, from initial preprocessing and schema inference to executing distributed computations on a cluster. Ultimately, the source serves as a practical manual for transitioning from exploratory analytics to building robust, production-ready data applications.You can listen and download our episodes for free on more than 10 different platforms:https://linktr.ee/cyber_security_summaryGet the Book now from Amazon:https://www.amazon.com/Advanced-Analytics-Spark-Patterns-Learning/dp/1491912766?&linkCode=ll2&tag=cvthunderx-20&linkId=6f8d9eaa56d855b906418ee77d408b9c&language=en_US&ref_=as_li_ss_tlDiscover our free courses in tech and cybersecurity, Start learning today:https://linktr.ee/cybercode_academy

21m 16s  ·  Jun 27, 2026

© 2026 Spreaker (OG)