Skip to content Skip to sidebar Skip to footer

Tess-two (tesseract Ocr In Android) Shows Very Inaccurate Results

I use the following function to perform offline OCR using Tesseract OCR's Android fork Tess-Two : private String startOCR(Uri imgUri) { try { ExifInterface exif = new E

Solution 1:

I've made some tests, however, I have some points and conclusions that could improve your result.

  1. Try pass lowercase and uppercase letters in your VAR_WHITE_CHARLIST variable parameter:

See my results for this input:

enter image description here

a) Lowercase only:

Parameter:

baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "abcdefghijklmnopqrstuvwxyz1234567890',.?;/ ");

Result:

05 atenienses nnito, hdeleto e laicao, os principais acusadores de gocrates, nao defendiam apenas que o filosofo corrompia a juventude; eles lutavam tama bern pelas virtudes da tradigao poetica vinculada a liornero. nristofanes, um dos responsaveis, segundo socrates, dos preconceitos contra o filosofo, era outro grande defensor dessa virtude.

socrates, de certa forma, estava em guerra com a tradieao poetica grega. 0 metodo de socrates era o oposto a narrativa epica de tlornero. sua dialetica nao tinha nada de semideuses corn superpoderes 6

b) Uppercase and Lowercase letters:

Parameter:

baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ1234567890',.?;/ ");

Result:

Os atenienses Anito, Meleto e Licao, os principais acusadores de Socrates, nao defendiam apenas que o filosofo corrompia a juventude; eles lutavam tama bern pelas virtudes da tradigao poetica vinculada a Homero. Aristofanes, um dos responsaveis, segundo socrates, dos preconceitos contra o filosofo, era outro grande defensor dessa virtude.

socrates, de certa forma, estava em guerra com a tradieao poetica grega. O metodo de socrates era o Oposto a narrativa epica de Homero. Sua dialetica nao tinha nada de semideuses corn superpoderes 6

PS: I've ran this example using Portuguese language, check that in some words that need different chars like: 'é ó ç' it didn't work, because it wasn't passed as char into white list.

I also tried to ran using your picture, the result has improved (not so much):

Font 20; Which polrlrcran has caplured Ihe curve, summed up a growing mood. In a Ierocrous speech? 'Your iron industry is dead. dead as munon. Your coal yum mono greatly on the iron Vbur Ilk Mary is and. o Your woolen induslry is Why. Your canon Mr Wilding induslry. blmailf

So i checked how tesseract binarized the image:

Theresholded Image

Your image have so much noise, then the api try to binarize your image that made a huge part of your picture illegible. I suggest you try run again, but without pass to grayscale, and try to research how to decrease the noise in your image.

To help you in your debug task, you can save the theresholded image:

WriteFile.writeBitmap(baseApi.getThresholdedImage())

I hope that it would be useful for you! Thank you for sharing your issue!

Abraços!

Solution 2:

In this line options.inSampleSize = 4; Change the number from 4 to 1 and try to do ocr again

Post a Comment for "Tess-two (tesseract Ocr In Android) Shows Very Inaccurate Results"