marcelomedre

marcelomedre t1_j5xxv7t wrote

Hi, I have a question about k-means. I have a data frame with 100 variables after removing low variance and high correlated ones. I know that the data must be normalized for the kmeans, specially to remove the range dependency, but I am facing a problem that if I do normalize my data the algorithm is not properly separating the clusters. I have 3 variables ranges in my data:

  • 0-10^4;
  • -10^3 - 10^3;
  • 0 - 10^3

I have at least 5 very specific clusters that I could characterize by not scaling the data, but I am not comfortable with this procedure.

I couldn’t find a reasonable explanation with is the algorithm performing better in non-scaled data instead of the scaled one.

1