I am building UpTrain - an open-source ML diagnostic toolkit that recently got investment from YCombinator.

As you know no ML model is 100% accurate, and, further, their accuracy deteriorates over time 😣. Additionally, due to the black boxiness ⬛ nature of Large Language models, it's challenging to identify and fix their problems.

The tool helps ML practitioners to:

Understand how their models are performing in production
Catch edge cases and outliers to help them refine their models
Allow them to define custom monitors to catch under-performing data-points
Retrain the model on them to improve its accuracy

You can check out the project here: https://github.com/uptrain-ai/uptrain. Would love to hear feedback from the community!

Comments

You must log in or register to comment.

grigorij-dataplicity t1_j6wkv70 wrote on February 2, 2023 at 12:06 PM

Hey, your tool looks great! My question is: how it can improve ChatGPT answers? If your tool really can do it, I think you can win the market.

Vegetable-Skill-9700 OP t1_j6zpscd wrote on February 3, 2023 at 12:56 AM

Firstly, by measuring data drift and analyzing user behavior, UpTrain identifies which prompts/questions were unseen by the model or the cases where the user was unsatisfied with the model output. It automatically collects those cases for the model to retrain upon.

Secondly, you can use the package to define a custom rule and filter out relevant data sets to retrain ChatGPT for your use case.

Say you want to use LLM to write product descriptions for Nike shoes and have a database of Nike customer chats:
a) Rachel - I don't like these shoes. I want to return them. How do I do that?
b) Ross - These shoes are great! I love them. I wear them every day while practicing unagi.
c) Chandler - Are there any better shoes than Nike? 👟 😍
You probably want to filter out cases with positive sentiments or cases with lots of emojis. With UpTrain, you can easily define such rules as a python function and collect those cases.

I am working on an example highlighting how all the above can be done. It should be done in a week. Stay tuned!

grigorij-dataplicity t1_j7f6n95 wrote on February 6, 2023 at 10:17 AM

Ok, waiting for your update!

Vegetable-Skill-9700 OP t1_j7omfxw wrote on February 8, 2023 at 8:14 AM

Working on it :)

uwu-dotcom t1_j6xvy86 wrote on February 2, 2023 at 5:48 PM

I've never heard of NLP model accuracy deteriorating over time, and a Google search hasn't yielded anything relevant. Is there a source about this you could point me to?

Vegetable-Skill-9700 OP t1_j6zpmiy wrote on February 3, 2023 at 12:54 AM

Hey, so this typically happens when there is a change in vocabulary. Just sharing my experience of facing this issue, we built a chatbot to answer product onboarding queries and with a new marketing campaign, we got a great influx of younger audience. Their questions were generally accompanied with a lot of urban slang and emojis which our NLP model wasn't equipped to handle, causing the performance to deteriorate.