Are you talking about when inserting an arXiv link to find similar papers? In that case, it is important that the paper being referenced is already stored in the database. If it's a very recent paper (as in less than a week or two old), it won't work. This should be easy to fix, though, by simply scraping the abstract from arxiv.org and using it as the query.
If you're talking about searching for specific papers, I'd be interested to know the queries and the desired result. Feel free to post it here or in a DM 🙂
First, do you mind sharing an example of different queries that return the same results? I have not been able to reproduce that (unless, of course, the queries are semantically similar, in which case that would be expected).
Also, of course exact search is far superior if you know the title of the paper you are looking for! In that regime, Google Scholar wins every time. However, semantic search might be better if you either a) can't remember the title but do remember some of the content or b) are simply looking to explore papers based on a handful of keywords.
Finally, the size of the database has no bearing on the quality of the embeddings, since I'm using the pretrained model by OpenAI. There is no notion of "popularity" except to rank the 10 papers with the highest cosine similarity to the query embedding according a citation score (if it's available).
Might be in some cases, maybe not in others. Anecdotally, a query like "model using only attention mechanism site:arxiv.org" on Google doesn't bring up "Attention Is All You Need", while it does here. Aside from that, it might be a useful resource for finding similar papers based on an arXiv link.
universal_explainer OP t1_j3les1h wrote
Reply to comment by ml-research in [P] searchthearxiv.com: Semantic search across more than 250,000 ML papers on arXiv by universal_explainer
Are you talking about when inserting an arXiv link to find similar papers? In that case, it is important that the paper being referenced is already stored in the database. If it's a very recent paper (as in less than a week or two old), it won't work. This should be easy to fix, though, by simply scraping the abstract from arxiv.org and using it as the query.
If you're talking about searching for specific papers, I'd be interested to know the queries and the desired result. Feel free to post it here or in a DM 🙂