Submitted by AutoModerator t3_10cn8pw in MachineLearning
marcelomedre t1_j5xxv7t wrote
Hi, I have a question about k-means. I have a data frame with 100 variables after removing low variance and high correlated ones. I know that the data must be normalized for the kmeans, specially to remove the range dependency, but I am facing a problem that if I do normalize my data the algorithm is not properly separating the clusters. I have 3 variables ranges in my data:
- 0-10^4;
- -10^3 - 10^3;
- 0 - 10^3
I have at least 5 very specific clusters that I could characterize by not scaling the data, but I am not comfortable with this procedure.
I couldn’t find a reasonable explanation with is the algorithm performing better in non-scaled data instead of the scaled one.
trnka t1_j5z5e39 wrote
I've seen that before when the large range features were the most important for the clusters I wanted. It was essentially doing feature weighting but it was implicit in the scales
Viewing a single comment thread. View all comments