|
Re: Show distinct column values in pyspark dataframe [message #1860708 is a reply to message #1860697] |
Mon, 28 August 2023 11:29 |
Ashley coder Messages: 11 Registered: October 2022 |
Junior Member |
|
|
To list out all the unique values in a PySpark DataFrame column, you can use the df['col'].distinct() method. This method will return a new DataFrame with only the unique values in the specified column.
For example, the following code will list out all the unique values in the col column of the df DataFrame:
df = spark.createDataFrame([('a', 1), ('b', 2), ('c', 3)], ['col', 'val'])
unique_values = df['col'].distinct()
print(unique_values)
This code will first create a DataFrame called df with two columns: col and val. The col column will contain the values a, b, and c. The val column will contain the values 1, 2, and 3.
Then, the code will call the df['col'].distinct() method to get a new DataFrame with only the unique values in the col column. Finally, the code will print the unique values to the console.
The output of the code will be the following:
This means that the col column of the df DataFrame has three unique values: a, b, and c.
|
|
|
Powered by
FUDForum. Page generated in 0.03380 seconds