Split_part() in PySpark Explained | Split Strings by Delimiter and Extract Specific Part #pyspark | PySpark Tutorial

split_part() in PySpark | Extract String Parts by Delimiter

split_part() in PySpark – Extract String Parts by Delimiter

In this tutorial, you'll learn how to use the split_part() function in PySpark to extract specific substrings by a given delimiter, such as pulling username from an email, or ZIP code from a location string.

📘 Sample Data

data = [
    ("john.doe@example.com", "NY-10001-USA"),
    ("alice.smith@domain.org", "CA-90001-USA"),
    ("bob99@company.net", "TX-73301-USA")
]

columns = ["email", "location"]

df = spark.createDataFrame(data, columns)
df.show()

Output:

+----------------------+--------------+
| email                | location     |
+----------------------+--------------+
| john.doe@example.com | NY-10001-USA |
| alice.smith@domain.org | CA-90001-USA |
| bob99@company.net    | TX-73301-USA |
+----------------------+--------------+

📍 Extract Specific Parts Using split_part()

from pyspark.sql.functions import split_part, col

df = df.withColumn("username", split_part(col("email"), "@", 1)) \
       .withColumn("domain", split_part(col("email"), "@", 2)) \
       .withColumn("state", split_part(col("location"), "-", 1)) \
       .withColumn("zip", split_part(col("location"), "-", 2)) \
       .withColumn("country", split_part(col("location"), "-", 3))

df.select("email", "username", "domain", "location", "state", "zip", "country").show(truncate=False)

Output:

+----------------------+----------------+------------------+--------------+-----+-----+-------+
| email                | username       | domain           | location     |state| zip |country|
+----------------------+----------------+------------------+--------------+-----+-----+-------+
| john.doe@example.com | john.doe       | example.com      | NY-10001-USA | NY  |10001| USA   |
| alice.smith@domain.org| alice.smith   | domain.org       | CA-90001-USA | CA  |90001| USA   |
| bob99@company.net    | bob99          | company.net      | TX-73301-USA | TX  |73301| USA   |
+----------------------+----------------+------------------+--------------+-----+-----+-------+

🎥 Watch the Full Tutorial

Some of the contents in this website were created with assistance from ChatGPT and Gemini.