Eclipse Community Forums: PHP Development Tools (PDT) » Show distinct column values in pyspark dataframe

Home » Language IDEs » PHP Development Tools (PDT) » Show distinct column values in pyspark dataframe(pyspark)

Show distinct column values in pyspark dataframe [message #1860697]

Sat, 26 August 2023 01:14

Eclipse User

With pyspark dataframe, how do you do the equivalent of Pandas df['col'].unique().

I want to list out all the unique values in a pyspark dataframe column.

Not the SQL type way (registertemplate then SQL query for distinct values).

Also I don't need groupby then countDistinct, instead I want to check distinct VALUES in that column.

Re: Show distinct column values in pyspark dataframe [message #1860708 is a reply to message #1860697]

Mon, 28 August 2023 07:29

Eclipse User

To list out all the unique values in a PySpark DataFrame column, you can use the df['col'].distinct() method. This method will return a new DataFrame with only the unique values in the specified column.

For example, the following code will list out all the unique values in the col column of the df DataFrame:

df = spark.createDataFrame([('a', 1), ('b', 2), ('c', 3)], ['col', 'val'])

unique_values = df['col'].distinct()

print(unique_values)

This code will first create a DataFrame called df with two columns: col and val. The col column will contain the values a, b, and c. The val column will contain the values 1, 2, and 3.

Then, the code will call the df['col'].distinct() method to get a new DataFrame with only the unique values in the col column. Finally, the code will print the unique values to the console.

The output of the code will be the following:

[a, b, c]

This means that the col column of the df DataFrame has three unique values: a, b, and c.

Previous Topic:	SSHTunnel Configuration , How To setup?
Next Topic:	What should a PHP developer know?

Goto Forum:

-=] Back to Top [=-

Current Time: Tue Jul 15 09:03:01 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter