Submitted by Legitimate-Gold-8711 t3_10cpj1c in deeplearning

I used ocr for extract text from images, and I want to correct this text by using deep learning algorithm I have a dataset contains files for wrong text and files for correct text correspondent. want to train the model with these data and at last if I give the model a text with some wrong letters, then the model predict correct text

can you propose some algorithm that I can use for this problem and how use it.

is BERT algorithm works with this case?

https://preview.redd.it/5w2ikho1t8ca1.png?width=446&format=png&auto=webp&v=enabled&s=deee6f71fd819a1ee8a4c31669829bac0bc16b4f

1

Comments

You must log in or register to comment.

shmollerup t1_j4htyot wrote

You could try something that works on a character level, like a sequence tobsequence model, or maybe a rnn approach like char2vec. Both approaches should work pretty good if you have enough training data

3

thatoneboii t1_j4jf8h5 wrote

Do you absolutely need to use deep learning? There are tons of way faster autocorrect implementations that use levenshtein distances and non-DL techniques such as SymSpell or Norvig’s algorithm. DL is complicated, expensive, and requires tons of data to train on - I would stay away from that unless you’re doing it for your own enrichment or a school project.

3