mayermensch69 t1_izk7m7b wrote on December 9, 2022 at 6:34 PM

Reply to [D] Simple Questions Thread by AutoModerator

I came across this approach of dialog evaluation: https://github.com/Shikib/fed

What I don't understand is, how the (more or less) raw loss can be used as a metric, since it is not really bounded. It may work when directly comparing specific examples with this method, but how does one compare these scores to other metrics with a fixed scale?