PySpark Video Tutorial Series
The Complete Video Tutorial Series (Beginner to Advanced)
Chapter 3: Reading and Writing DataFrames
- How to Read CSV File into DataFrame from Azure Blob Storage
- How to Read JSON File into DataFrame from Azure Blob Storage
- How to Use toJSON() - Convert DataFrame Rows to JSON Strings
- writeTo() Explained: Save, Append, Overwrite DataFrames to Tables
- How to use Write function to Create Single CSV file in Blob Storage from DataFrame
- How to Write DataFrame to Parquet File in Azure Blob Storage
- How to Write DataFrame to JSON File in Azure Blob Storage
- How to Write DataFrame to Azure SQL Table Using JDBC write() Function
- How to Read Data from Azure SQL Table and Write to JSON File in Blob Storage
- Read Multiple CSV Files from Blob Storage and Write to Azure SQL Table with Filenames
Chapter 4: Advanced DataFrame Operations
- How to Use the display() Function in Databricks
- How to Sort DataFrames Using orderBy()
- limit() Function to Display Limited Rows
- How to Use describe() for DataFrame Statistics
- Difference Between union() and unionAll() | union() vs unionAll()
- fillna() Function to Replace Null or Missing Values
- groupBy() Function | Group & Summarize DataFrames
- first(), head(), and tail() Functions | Retrieve Data Efficiently
- exceptAll() Function Explained | Subtract and Find Differences Between DataFrames
- How to Use Intersect() and IntersectAll() | Compare DataFrames Easily
- na Functions and isEmpty Explained with Examples
- How to Use Cube for GroupBy and Aggregations
- How to Aggregate Data Using agg() Function
- How to Use colRegex() for Column Selection
- How to Get Column Names Using columns Function
- How to Use Pivot Function | Transform and Summarize Data Easily
- How to Perform Unpivot | Convert Columns to Rows Easily
- How to Use createTempView | Run SQL Queries on DataFrames
- How to Use createGlobalTempView | Share Views Across Sessions
- Melt Function Explained | Reshape & Unpivot DataFrames Step by Step
- How to Use dtypes | Get DataFrame Column Names and Types
- How to Use randomSplit | Split Your DataFrame into Train and Test Sets
- DataFrame.to Function | Schema Reconciliation & Column Reordering Made Easy
- take Function | Get First N Rows from DataFrame Fast
- DataFrame summary for Statistical Summary in One Command
- freqItems() Function | Identify Frequent Items in Columns Fast
- How to Use rollup() to Aggregate Data by Groups and Subtotals
- How to Use crosstab() to Analyze Relationships Between Columns
- How to Use unionByName() to Join DataFrames by Column Names
- How to Use sample() to Randomly Select Data
- How to Use withColumnRenamed() to Rename Columns
- How to use withColumnsRenamed to Rename Multiple Columns
- stat Function Tutorial | Perform Statistical Analysis on DataFrames
Chapter 5: DataFrame Performance and Optimization
- How to Get a Column in PySpark by using DataFrame with Dot or DataFrame with Square Brackets
- How to Use approxQuantile() in PySpark | Quick Guide to Percentiles & Median
- How to Use Cache in PySpark to Improve Spark Performance
- checkpoint() : Improve Fault Tolerance & Speed in Spark Jobs - Tutorial for Beginners
- coalesce() Function Tutorial - Optimize Partitioning for Faster Spark Jobs
- repartition() Function Tutorial: Optimize Data Partitioning for Better Performance
- collect() Function Tutorial : Retrieve Entire DataFrame to Driver with Examples
- How to Use hint() for Join Optimization – Broadcast, Shuffle, Merge
- localCheckpoint Explained | Improve Performance with localCheckpoint()
- How to Use persist() Function | Cache vs Persist Explained with Examples
- Optimize Your Data with repartitionByRange()
- unpersist Explained | How to Free Memory with unpersist Function With Examples
Chapter 6: DataFrame Analysis
- How to Use corr() Function in PySpark : Finding Correlation Between Columns with corr()
- cov() Tutorial | Covariance Analysis for Numerical Columns
Chapter 7: DataFrame Joins
- How to Use crossJoin() Function for Cartesian Product
- Joins Explained | Inner, Left, Right Join with Examples in PySpark DataFrames
Chapter 8: RDDs in PySpark
- How to Convert PySpark DataFrame to RDD Using .rdd | PySpark RDD vs DataFrame
- What is RDD in PySpark? | A Beginner’s Guide to Apache Spark’s Core Data Structure
- Different Ways to create RDD in PySpark
- What Are RDD Partitions in PySpark? | How Spark Partitioning works
- PySpark PairRDD Transformations | groupByKey, reduceByKey, sortByKey Explained with Real Data
- Understanding RDD Actions in PySpark - collect() vs count() vs reduce()
- RDD Persistence in PySpark Explained | MEMORY_ONLY vs MEMORY_AND_DISK with Examples
- Optimize Spark Shuffles: Internals of groupByKey vs reduceByKey
- Boost PySpark Performance with Broadcast Variables & Accumulators
- RDD vs DataFrame in PySpark – Key Differences with Real Examples
Chapter 9: Working with Functions
- PySpark foreach Function Tutorial Apply Custom Logic to Each Row in a DataFrame
- PySpark foreachPartition Explained Process DataFrame Partitions Efficiently with Examples
- How to Use call_function() in PySpark – Dynamically Apply SQL Functions in Your Code
- How to Use call_udf() in PySpark | Dynamically Apply UDFs in Real-Time
- Boost PySpark Performance with pandas_udf
- Create Custom Column Logic in PySpark Using udf() | Easy Guide with Real Examples
- How to Build User-Defined Table Functions (UDTFs) in PySpark | Split Rows
Chapter 10: SQL and Metadata
- PySpark inputFiles Function - How to Get Source File Paths from DataFrame
- PySpark isLocal Function : Check If DataFrame Operations Run Locally
- What is explain() in PySpark - Spark Logical vs Physical Plan - PySpark Tutorial for Beginners
- PySpark offset() Function : How to Skip Rows in Spark DataFrame
- Explore Databases and Tables in PySpark with spark.catalog | Guide to Metadata Management
Chapter 11: Sorting and Data Types
- PySpark Sorting Explained | ASC vs DESC | Handling NULLs with asc_nulls_first & desc_nulls_last
- PySpark cast() vs astype() Explained | Convert String to Int, Float & Double in DataFrame
- Core Data Types in PySpark Explained - IntegerType, FloatType, DoubleType, DecimalType, StringType
- PySpark Complex Data Types Explained : ArrayType, MapType, StructType & StructField for Beginners
Chapter 12: Nulls and Complex Types
- PySpark Null & Comparison Functions : Between(), isNull(), isin(), like(), rlike(), ilike()
- PySpark when() and otherwise() Explained | Apply If-Else Conditions to DataFrames
- PySpark String Functions Explained | contains(), startswith(), substr(), endswith()
- Working with Structs and Nested Fields in PySpark getField, getItem, withField, dropFields
- PySpark Date and Timestamp Types - DateType, TimestampType, Interval Types
- PySpark Array Functions : array(), array_contains(), sort_array(), array_size()
- PySpark Set-Like Array Functions : arrays_overlap(), array_union(), flatten(), array_distinct()
- Advanced Array Manipulations in PySpark _ slice(), concat(), element_at(), sequence()
- PySpark Map Functions: create_map(), map_keys(), map_concat(), map_values
- Transforming Arrays and Maps in PySpark : Advanced Functions_ transform(), filter(), zip_with()
- Flatten Arrays & Structs with explode(), inline(), and struct()
Chapter 13: Dates and Times
- Top PySpark Built-in DataFrame Functions Explained | col(), lit(), when(), expr(), rand() & More
- Top PySpark Math Functions Explained | abs(), round(), log(), pow(), sin(), degrees() & More
- Getting Current Date & Time in PySpark | curdate(), now(), current_timestamp()
- Date Calculations in PySpark | Add, Subtract, Datediff, Months Between & More
- PySpark Date Formatting & Conversion Tutorial : to_date(), to_timestamp(), unix_timestamp()
- PySpark Date & Time Creation : make_date(), make_timestamp(), make_interval()
- PySpark Date and Time Extraction Tutorial _ year(), hour(), dayofweek(), date_part()
- PySpark Date Truncation Functions: trunc(), date_trunc(), last_day()
Chapter 14: String Manipulation
- How to perform String Cleaning in PySpark lower, trim, initcap Explained with Real Data
- Substring Functions in PySpark: substr(), substring(), overlay(), left(), right()
- String Search in PySpark | contains, startswith, endswith, like, rlike, locate
- String Formatting in PySpark | concat_ws, format_number, printf, repeat, lpad, rpad Explained
- Split Strings in PySpark | split(str, pattern, limit) Function Explained with Examples
- Split_part() in PySpark Explained | Split Strings by Delimiter and Extract Specific Part
- Using find_in_set() in PySpark | Search String Position in a Delimited List
- Extract Data with regexp_extract() in PySpark | Regex Patterns Made Easy
- Clean & Transform Strings in PySpark Using regexp_replace() Replace Text with Regex
- Extract Substrings Easily in PySpark with regexp_substr()
Chapter 15: JSON
- PySpark JSON Functions Explained | How to Parse, Transform & Extract JSON Fields in PySpark
Chapter 16: Aggregations and Window Functions
- PySpark Aggregations: count(), count_distinct(), first(), last() Explained with Examples
- Statistical Aggregations in PySpark | avg(), mean(), median(), mode()
- Summarizing Data with Aggregate Functions in PySpark _ sum(), sum_distinct(), bit_and()
- PySpark Window Functions | Rank, Dense_Rank, Lead, Lag, NTILE | Real-World Demo
- Master PySpark Sorting: sort(), asc(), desc() Explained with Examples
Chapter 17: Additional DataFrame Functions
- PySpark toLocalIterator Explained: Efficiently Convert DataFrame to Iterator
- PySpark sequence Function Explained: Generate Sequences of Numbers and Dates