PySpark Video Tutorial Series

The Complete Video Tutorial Series (Beginner to Advanced)

Chapter 1: Introduction to PySpark

Chapter 2: Core DataFrame Operations

Chapter 3: Reading and Writing DataFrames

How to Read CSV File into DataFrame from Azure Blob Storage
How to Read JSON File into DataFrame from Azure Blob Storage
How to Use toJSON() - Convert DataFrame Rows to JSON Strings
writeTo() Explained: Save, Append, Overwrite DataFrames to Tables
How to use Write function to Create Single CSV file in Blob Storage from DataFrame
How to Write DataFrame to Parquet File in Azure Blob Storage
How to Write DataFrame to JSON File in Azure Blob Storage
How to Write DataFrame to Azure SQL Table Using JDBC write() Function
How to Read Data from Azure SQL Table and Write to JSON File in Blob Storage
Read Multiple CSV Files from Blob Storage and Write to Azure SQL Table with Filenames

Chapter 4: Advanced DataFrame Operations

How to Use the display() Function in Databricks
How to Sort DataFrames Using orderBy()
limit() Function to Display Limited Rows
How to Use describe() for DataFrame Statistics
Difference Between union() and unionAll() | union() vs unionAll()
fillna() Function to Replace Null or Missing Values
groupBy() Function | Group & Summarize DataFrames
first(), head(), and tail() Functions | Retrieve Data Efficiently
exceptAll() Function Explained | Subtract and Find Differences Between DataFrames
How to Use Intersect() and IntersectAll() | Compare DataFrames Easily
na Functions and isEmpty Explained with Examples
How to Use Cube for GroupBy and Aggregations
How to Aggregate Data Using agg() Function
How to Use colRegex() for Column Selection
How to Get Column Names Using columns Function
How to Use Pivot Function | Transform and Summarize Data Easily
How to Perform Unpivot | Convert Columns to Rows Easily
How to Use createTempView | Run SQL Queries on DataFrames
How to Use createGlobalTempView | Share Views Across Sessions
Melt Function Explained | Reshape & Unpivot DataFrames Step by Step
How to Use dtypes | Get DataFrame Column Names and Types
How to Use randomSplit | Split Your DataFrame into Train and Test Sets
DataFrame.to Function | Schema Reconciliation & Column Reordering Made Easy
take Function | Get First N Rows from DataFrame Fast
DataFrame summary for Statistical Summary in One Command
freqItems() Function | Identify Frequent Items in Columns Fast
How to Use rollup() to Aggregate Data by Groups and Subtotals
How to Use crosstab() to Analyze Relationships Between Columns
How to Use unionByName() to Join DataFrames by Column Names
How to Use sample() to Randomly Select Data
How to Use withColumnRenamed() to Rename Columns
How to use withColumnsRenamed to Rename Multiple Columns
stat Function Tutorial | Perform Statistical Analysis on DataFrames

Chapter 5: DataFrame Performance and Optimization

How to Get a Column in PySpark by using DataFrame with Dot or DataFrame with Square Brackets
How to Use approxQuantile() in PySpark | Quick Guide to Percentiles & Median
How to Use Cache in PySpark to Improve Spark Performance
checkpoint() : Improve Fault Tolerance & Speed in Spark Jobs - Tutorial for Beginners
coalesce() Function Tutorial - Optimize Partitioning for Faster Spark Jobs
repartition() Function Tutorial: Optimize Data Partitioning for Better Performance
collect() Function Tutorial : Retrieve Entire DataFrame to Driver with Examples
How to Use hint() for Join Optimization – Broadcast, Shuffle, Merge
localCheckpoint Explained | Improve Performance with localCheckpoint()
How to Use persist() Function | Cache vs Persist Explained with Examples
Optimize Your Data with repartitionByRange()
unpersist Explained | How to Free Memory with unpersist Function With Examples

Chapter 6: DataFrame Analysis

How to Use corr() Function in PySpark : Finding Correlation Between Columns with corr()
cov() Tutorial | Covariance Analysis for Numerical Columns

Chapter 7: DataFrame Joins

How to Use crossJoin() Function for Cartesian Product
Joins Explained | Inner, Left, Right Join with Examples in PySpark DataFrames

Chapter 8: RDDs in PySpark

How to Convert PySpark DataFrame to RDD Using .rdd | PySpark RDD vs DataFrame
What is RDD in PySpark? | A Beginner’s Guide to Apache Spark’s Core Data Structure
Different Ways to create RDD in PySpark
What Are RDD Partitions in PySpark? | How Spark Partitioning works
PySpark PairRDD Transformations | groupByKey, reduceByKey, sortByKey Explained with Real Data
Understanding RDD Actions in PySpark - collect() vs count() vs reduce()
RDD Persistence in PySpark Explained | MEMORY_ONLY vs MEMORY_AND_DISK with Examples
Optimize Spark Shuffles: Internals of groupByKey vs reduceByKey
Boost PySpark Performance with Broadcast Variables & Accumulators
RDD vs DataFrame in PySpark – Key Differences with Real Examples

Chapter 9: Working with Functions

PySpark foreach Function Tutorial Apply Custom Logic to Each Row in a DataFrame
PySpark foreachPartition Explained Process DataFrame Partitions Efficiently with Examples
How to Use call_function() in PySpark – Dynamically Apply SQL Functions in Your Code
How to Use call_udf() in PySpark | Dynamically Apply UDFs in Real-Time
Boost PySpark Performance with pandas_udf
Create Custom Column Logic in PySpark Using udf() | Easy Guide with Real Examples
How to Build User-Defined Table Functions (UDTFs) in PySpark | Split Rows

Chapter 10: SQL and Metadata

PySpark inputFiles Function - How to Get Source File Paths from DataFrame
PySpark isLocal Function : Check If DataFrame Operations Run Locally
What is explain() in PySpark - Spark Logical vs Physical Plan - PySpark Tutorial for Beginners
PySpark offset() Function : How to Skip Rows in Spark DataFrame
Explore Databases and Tables in PySpark with spark.catalog | Guide to Metadata Management

Chapter 11: Sorting and Data Types

PySpark Sorting Explained | ASC vs DESC | Handling NULLs with asc_nulls_first & desc_nulls_last
PySpark cast() vs astype() Explained | Convert String to Int, Float & Double in DataFrame
Core Data Types in PySpark Explained - IntegerType, FloatType, DoubleType, DecimalType, StringType
PySpark Complex Data Types Explained : ArrayType, MapType, StructType & StructField for Beginners

Chapter 12: Nulls and Complex Types

PySpark Null & Comparison Functions : Between(), isNull(), isin(), like(), rlike(), ilike()
PySpark when() and otherwise() Explained | Apply If-Else Conditions to DataFrames
PySpark String Functions Explained | contains(), startswith(), substr(), endswith()
Working with Structs and Nested Fields in PySpark getField, getItem, withField, dropFields
PySpark Date and Timestamp Types - DateType, TimestampType, Interval Types
PySpark Array Functions : array(), array_contains(), sort_array(), array_size()
PySpark Set-Like Array Functions : arrays_overlap(), array_union(), flatten(), array_distinct()
Advanced Array Manipulations in PySpark _ slice(), concat(), element_at(), sequence()
PySpark Map Functions: create_map(), map_keys(), map_concat(), map_values
Transforming Arrays and Maps in PySpark : Advanced Functions_ transform(), filter(), zip_with()
Flatten Arrays & Structs with explode(), inline(), and struct()

Chapter 13: Dates and Times

Top PySpark Built-in DataFrame Functions Explained | col(), lit(), when(), expr(), rand() & More
Top PySpark Math Functions Explained | abs(), round(), log(), pow(), sin(), degrees() & More
Getting Current Date & Time in PySpark | curdate(), now(), current_timestamp()
Date Calculations in PySpark | Add, Subtract, Datediff, Months Between & More
PySpark Date Formatting & Conversion Tutorial : to_date(), to_timestamp(), unix_timestamp()
PySpark Date & Time Creation : make_date(), make_timestamp(), make_interval()
PySpark Date and Time Extraction Tutorial _ year(), hour(), dayofweek(), date_part()
PySpark Date Truncation Functions: trunc(), date_trunc(), last_day()

Chapter 14: String Manipulation

How to perform String Cleaning in PySpark lower, trim, initcap Explained with Real Data
Substring Functions in PySpark: substr(), substring(), overlay(), left(), right()
String Search in PySpark | contains, startswith, endswith, like, rlike, locate
String Formatting in PySpark | concat_ws, format_number, printf, repeat, lpad, rpad Explained
Split Strings in PySpark | split(str, pattern, limit) Function Explained with Examples
Split_part() in PySpark Explained | Split Strings by Delimiter and Extract Specific Part
Using find_in_set() in PySpark | Search String Position in a Delimited List
Extract Data with regexp_extract() in PySpark | Regex Patterns Made Easy
Clean & Transform Strings in PySpark Using regexp_replace() Replace Text with Regex
Extract Substrings Easily in PySpark with regexp_substr()

Chapter 15: JSON

PySpark JSON Functions Explained | How to Parse, Transform & Extract JSON Fields in PySpark

Chapter 16: Aggregations and Window Functions

PySpark Aggregations: count(), count_distinct(), first(), last() Explained with Examples
Statistical Aggregations in PySpark | avg(), mean(), median(), mode()
Summarizing Data with Aggregate Functions in PySpark _ sum(), sum_distinct(), bit_and()
PySpark Window Functions | Rank, Dense_Rank, Lead, Lag, NTILE | Real-World Demo
Master PySpark Sorting: sort(), asc(), desc() Explained with Examples

Chapter 17: Additional DataFrame Functions

PySpark toLocalIterator Explained: Efficiently Convert DataFrame to Iterator
PySpark sequence Function Explained: Generate Sequences of Numbers and Dates

Welcome To TechBrothersIT

Label

PySpark Tutorial for Beginners and Advanced Users

The Complete Video Tutorial Series (Beginner to Advanced)

Chapter 1: Introduction to PySpark

Chapter 2: Core DataFrame Operations

Chapter 3: Reading and Writing DataFrames

Chapter 4: Advanced DataFrame Operations

Chapter 5: DataFrame Performance and Optimization

Chapter 6: DataFrame Analysis

Chapter 7: DataFrame Joins

Chapter 8: RDDs in PySpark

Chapter 9: Working with Functions

Chapter 10: SQL and Metadata

Chapter 11: Sorting and Data Types

Chapter 12: Nulls and Complex Types

Chapter 13: Dates and Times

Chapter 14: String Manipulation

Chapter 15: JSON

Chapter 16: Aggregations and Window Functions

Chapter 17: Additional DataFrame Functions