
stackoverflow.com
https://stackoverflow.com/questions/33224740/best-…
Best way to get the max value in a Spark dataframe column
Remark: Spark is intended to work on Big Data - distributed computing. The size of the example DataFrame is very small, so the order of real-life examples can be altered with respect to the small example. Slowest: Method_1, because .describe("A") calculates min, max, mean, stddev, and count (5 calculations over the whole column). Medium: Method_4, because, .rdd (DF to RDD transformation) slows ...