In this article, I follow techniques used in Google Translate app for the case of license plates and I compare performances of deep learning nets with what we could have previously done with Tesseract engine.

I first created a simple “plate annotation tool”

./annotate input_dir output.csv


in order to create a text file labelling the data, one line per character

image_path,character,x,y,width,height,orientation


in a CSV format. License plates are detected with a cascade classifier and letters with the findContours method from OpenCV. I filter contours that are too small compared to the license plate dimensions, and contours that do not contain enough black/dark colors, because french license plates are always written in black color.

I created a conversion tool

./extract file.csv output_dir --backend=[lmdb|leveldb|directory|tesseract]


to convert this CSV file to their respective format for Tesseract and Caffe learning. The directory option exports the extracts into a tree of directories, one directory per letter, which is a great way for me to check the labelling errors.

For Tesseract, this will bring me such a Tiff file with a .box file :

For Caffe, it will populate a LMDB database that I can inspect in Python :

The training set is composed of 5000 letters, and the test set of 160 letters. For Tesseract I had to use a subset for the training set of 800 letters, otherwise training was not working properly.

For deep learning, I used a standard LeNet neural network with dropout layers.

I trained both technologies and here is the result :

Technology Correct results
Tesseract eng language 64
Tesseract trained language 71
Caffe trained (NN) 154

For Tesseract I consider a match when the letter returned by the best confidence matches the ground truth.

Caffe is 97% right. The wrong matches are :

M W
0 D
B 8
1 A
D 0
D Q


Given that we can now infer the letter/number shema for a licence plate (LL-NNN-LL or NN-LLL-NN) with a good precision, it’s in fact a 99% correctness, that means one wrong letter every hundred letters thanks to deep learning.

There remains a group of letters that is still hardly distinguishable (O-D-Q) under poor conditions.

Let’s try some optimizations : previous results were computed on the gray image of the image. Let’s give a try on the thresholded image with

cvtColor(image,gray_image,CV_BGR2GRAY);