Image preprocessing before OCR

Reading documents with OCR can sometimes be very tough to deal with when accuracy is concerned. So we need to do some preprocessing before we feed the image to the OCR.

Here are the steps which I am performing before doing OCR.

Step 1: Proper dimensions

I am using tesseract so it's better that our image is store in 300 DPI. If your image has more than 300 words so it’s better to make your image dimension around 2500 * 2500.

image = Image.open(filename)
image = image.convert(mode='L')
factor = max(1, float(2500.0 / length_x))
if factor>1:
size = int(factor * length_x), int(factor * width_y)
image = image.resize(size, Image.ANTIALIAS)
image.save("image.png", dpi=(300, 300))

First of all change image to the grayscale mode for a better result because tesseract is trained on images like binary.

Here I am checking that if my image width is less than 2500 then resize it by a factor as shown in code.

step 2: Threshold and Denoise

For the OCR purpose image must be put through a threshold to get a good result. The following example will calculate the threshold according to the image. There are 4 methods (generic, mean, median, gaussian ) you can play with it.

Source: https://gist.github.com/ttchengab/293fc3ca782b20cf9b05c33f13583338

#Threshold

Denoise

When your document has some extra noise around the text it might change the whole OCR prediction context. So it’s better to give a clean image to the OCR. Here we are doing the same we filter small dotted regions.

#Dilation
kernel =np.ones((1,1), np.uint8)
ero = cv2.erode(thresh, kernel, iterations= 1)
img_dilation = cv2.dilate(ero, kernel, iterations=1)

step 3: Deskew

The document might be rotated if it’s not placed properly while scanning or when you take a photo. This can be confusing for the OCR system. It might happen you will get no result if the OCR system is not able to understand the image.

special thanks to Stéphane Brunner the creator of deskew library

source: https://github.com/sbrunner/deskew

import numpy as np
from skimage import io
from skimage.color import rgb2gray
from skimage.transform import rotate

from deskew import determine_skew

image = io.imread('input.png')
grayscale = rgb2gray(image)
angle = determine_skew(grayscale)
rotated = rotate(image, angle, resize=True) * 255
io.imsave('output.png', rotated.astype(np.uint8))

If you like this post, HIT Buy me a coffee! Thanks for reading.

Your every small contribution will encourage me to create more content like this.

Howdy & Welcome. I am a content creator, machine learning researcher, and consultant. consultancy: dlmade.ml or dlmadeblog@gmail.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store