Submitted by Vegetable-Skill-9700 t3_10qx9po in deeplearning

I am building UpTrain - an open-source ML diagnostic toolkit that recently got investment from YCombinator.

As you know no ML model is 100% accurate, and, further, their accuracy deteriorates over time 😣. Additionally, due to the black boxiness ⬛ nature of Large Language models, it's challenging to identify and fix their problems.

The tool helps ML practitioners to:

  1. Understand how their models are performing in production
  2. Catch edge cases and outliers to help them refine their models
  3. Allow them to define custom monitors to catch under-performing data-points
  4. Retrain the model on them to improve its accuracy

You can check out the project here: https://github.com/uptrain-ai/uptrain. Would love to hear feedback from the community!

2

Comments

You must log in or register to comment.

grigorij-dataplicity t1_j6wkv70 wrote

Hey, your tool looks great! My question is: how it can improve ChatGPT answers? If your tool really can do it, I think you can win the market.

3

Vegetable-Skill-9700 OP t1_j6zpscd wrote

Firstly, by measuring data drift and analyzing user behavior, UpTrain identifies which prompts/questions were unseen by the model or the cases where the user was unsatisfied with the model output. It automatically collects those cases for the model to retrain upon.

Secondly, you can use the package to define a custom rule and filter out relevant data sets to retrain ChatGPT for your use case.

Say you want to use LLM to write product descriptions for Nike shoes and have a database of Nike customer chats:
a) Rachel - I don't like these shoes. I want to return them. How do I do that?
b) Ross - These shoes are great! I love them. I wear them every day while practicing unagi.
c) Chandler - Are there any better shoes than Nike? 👟 😍
You probably want to filter out cases with positive sentiments or cases with lots of emojis. With UpTrain, you can easily define such rules as a python function and collect those cases.

I am working on an example highlighting how all the above can be done. It should be done in a week. Stay tuned!

2

uwu-dotcom t1_j6xvy86 wrote

I've never heard of NLP model accuracy deteriorating over time, and a Google search hasn't yielded anything relevant. Is there a source about this you could point me to?

2

Vegetable-Skill-9700 OP t1_j6zpmiy wrote

Hey, so this typically happens when there is a change in vocabulary. Just sharing my experience of facing this issue, we built a chatbot to answer product onboarding queries and with a new marketing campaign, we got a great influx of younger audience. Their questions were generally accompanied with a lot of urban slang and emojis which our NLP model wasn't equipped to handle, causing the performance to deteriorate.

2