Big Data Project - Apache Spark and Java
1. Apache Spark introduction
2. Getting Started with Spark
2. Spark Dataframe basic operations
3. Spark Dataframe advanced operations
4. Spark SQL and other functionalities
5. Big data batching application
6. Deploy and cluster execution
7. Monitoring and performance fundamentals
Apache Spark introduction
Enter MapReduce
Spark arrives
Core components and architecture
Spark and the batch data processing model
Distributed processing model
Getting Started with Spark and Java
Spring boot CLI application
Spark Dataframe basic operations
Transformation and action
Transformation (I): Map and Filter
Transformation (II): Flatmap and Distinct
Action (I): Count, Take and Collect
Action (II): Reduce and Aggregation (Max, Min, Mean)
Deep dive: Internal of Spark execution
Spark Dataframe advanced operations
Data partitioning and shuffling
Transformation (III): GroupBy and GroupByKey
Transformation (IV): Join
Transformation (V): Union, UnionByName, UnionAll and DropDuplications
Sharing data in cluster: Accumulators and Broadcast variable
UDFs: User-defined functions
Spark SQL and other functionalities
1. CSV
2. Jsonline
3. Json
4. Text
5. XML
6. Parquet
1. Delta table (upgraded parquet)
Big data batching application
1. The application architecture ecosystem
Management and scheduling tier
Logging and monitoring tier
2. Cloud architecture - AWS
Deploy and cluster execution
Monitoring and performance fundamentals