Submitted by zalivom1s t3_11da7sq in singularity
Olivebuddiesforlife t1_ja7f0tu wrote
They’ve years worth of data, and streets ahead.
Hawkent99 t1_ja915oq wrote
Streets ahead
Olivebuddiesforlife t1_ja9em7l wrote
Ya! That’s what I’m talking about! He gets it!
OutOfBananaException t1_jadwqrn wrote
What kind of data do you mean? I don't believe they have a high quantity of quality domestic text training data, and they have stated they don't want to use worldwide data. It's not clear how they plan to resolve this.
Olivebuddiesforlife t1_jaf19dw wrote
First, Chinese sample set is 1.4B and they have been training their AI, enterprise level - with cameras, image recognition and processing. There are huge farms of people, entire industries which are AI model’s human partners since 2017.
Second, the language model can work with the WeChat data, which is a lot and lot of person to person interaction, as opposed to Western data which does not include that, but just general public interactions. Even considering private, everything being consolidated on a single platform means a lot.
Third, TikTok data - one of the largest social media with large data sets, including language, culture and stuff.
So - guess this adds the quality. And they don’t want to expand to the west which places it in the understandable category.
There have been low level chat bots in China, and also they’ve thus far focused on enterprise and public (read government) use. They’re venturing into private, ig
Viewing a single comment thread. View all comments