Hi, I have a question about k-means.
I have a data frame with 100 variables after removing low variance and high correlated ones.
I know that the data must be normalized for the kmeans, specially to remove the range dependency, but I am facing a problem that if I do normalize my data the algorithm is not properly separating the clusters.
I have 3 variables ranges in my data:
0-10^4;
-10^3 - 10^3;
0 - 10^3
I have at least 5 very specific clusters that I could characterize by not scaling the data, but I am not comfortable with this procedure.
I couldn’t find a reasonable explanation with is the algorithm performing better in non-scaled data instead of the scaled one.
marcelomedre t1_j5xxv7t wrote
Reply to [D] Simple Questions Thread by AutoModerator
Hi, I have a question about k-means. I have a data frame with 100 variables after removing low variance and high correlated ones. I know that the data must be normalized for the kmeans, specially to remove the range dependency, but I am facing a problem that if I do normalize my data the algorithm is not properly separating the clusters. I have 3 variables ranges in my data:
I have at least 5 very specific clusters that I could characterize by not scaling the data, but I am not comfortable with this procedure.
I couldn’t find a reasonable explanation with is the algorithm performing better in non-scaled data instead of the scaled one.