marcelomedre t1_j5xxv7t wrote on January 26, 2023 at 9:25 AM

Reply to [D] Simple Questions Thread by AutoModerator

Hi, I have a question about k-means. I have a data frame with 100 variables after removing low variance and high correlated ones. I know that the data must be normalized for the kmeans, specially to remove the range dependency, but I am facing a problem that if I do normalize my data the algorithm is not properly separating the clusters. I have 3 variables ranges in my data:

0-10^4;
-10^3 - 10^3;
0 - 10^3

I have at least 5 very specific clusters that I could characterize by not scaling the data, but I am not comfortable with this procedure.

I couldn’t find a reasonable explanation with is the algorithm performing better in non-scaled data instead of the scaled one.