Submitted by op_prabhuomkar t3_10iqeuh in MachineLearning
kkchangisin t1_j5gcgbe wrote
Nice work! Triton already looks good but have you tried optimizing with the Triton Model Analyzer?
https://github.com/triton-inference-server/model_analyzer
In various models I use with Triton I've found the output model formats and configurations for use with Triton can provide drastically increased performance whether that be throughput, latency, etc.
Hopefully I get some time soon to try it out myself!
Again, nice work!
op_prabhuomkar OP t1_j5i7oyj wrote
Thank you for the feedback. I am looking forward to using the Triton's model analyzer possibly with different batch sizes and also FP16! Lets see how that goes :)
kkchangisin t1_j5if8hc wrote
Depending on how much time I have there just might be a PR coming your way 😀…
Triton is really a somewhat hidden gem - the implementation and toolkit surrounding it is pretty impressive!
Viewing a single comment thread. View all comments