sothatsit t1_j5hhb31 wrote
I’ve actually done some work on this and the real issue here is that:
- You’d need a lot of text from other sources with people’s real names.
- You’d need the user to have written a lot of Reddit comments or posts.
- The style of user’s writing would need to match between Reddit and your other source.
If you’re interested though, I made the following library for my Master’s thesis, which can be used for this: https://github.com/TycheLibrary/Tyche
However, it would need more work to get close to identifying thousands, never mind millions, of users.
Loquzofaricoalaphar OP t1_j5kljft wrote
That’s Awesome, thanks for sharing boss
Viewing a single comment thread. View all comments