What I don't understand is, how the (more or less) raw loss can be used as a metric, since it is not really bounded. It may work when directly comparing specific examples with this method, but how does one compare these scores to other metrics with a fixed scale?
mayermensch69 t1_izk7m7b wrote
Reply to [D] Simple Questions Thread by AutoModerator
I came across this approach of dialog evaluation: https://github.com/Shikib/fed
What I don't understand is, how the (more or less) raw loss can be used as a metric, since it is not really bounded. It may work when directly comparing specific examples with this method, but how does one compare these scores to other metrics with a fixed scale?