In this article, I follow techniques used in Google Translate app for the case of license plates and I compare performances of deep learning nets with what we could have previously done with Tesseract engine.
I first created a simple “plate annotation tool”
./annotate input_dir output.csv
in order to create a text file labelling the data, one line per character
in a CSV format. License plates are detected with a cascade classifier and letters with the findContours method from OpenCV. I filter contours that are too small compared to the license plate dimensions, and contours that do not contain enough black/dark colors, because french license plates are always written in black color.
I created a conversion tool
./extract file.csv output_dir --backend=[lmdb|leveldb|directory|tesseract]
to convert this CSV file to their respective format for Tesseract and Caffe learning. The directory option exports the extracts into a tree of directories, one directory per letter, which is a great way for me to check the labelling errors.
For Tesseract, this will bring me such a Tiff file with a .box file :
For Caffe, it will populate a LMDB database that I can inspect in Python :
The training set is composed of 5000 letters, and the test set of 160 letters. For Tesseract I had to use a subset for the training set of 800 letters, otherwise training was not working properly.
For deep learning, I used a standard LeNet neural network with dropout layers.
I trained both technologies and here is the result :
|Tesseract eng language||64|
|Tesseract trained language||71|
|Caffe trained (NN)||154|
For Tesseract I consider a match when the letter returned by the best confidence matches the ground truth.
Caffe is 97% right. The wrong matches are :
M W 0 D B 8 1 A D 0 D Q
Given that we can now infer the letter/number shema for a licence plate (LL-NNN-LL or NN-LLL-NN) with a good precision, it’s in fact a 99% correctness, that means one wrong letter every hundred letters thanks to deep learning.
There remains a group of letters that is still hardly distinguishable (O-D-Q) under poor conditions.
Let’s try some optimizations : previous results were computed on the gray image of the image. Let’s give a try on the thresholded image with
The training file has letters with a better background :
|Tesseract eng language||82|
|Tesseract trained language||93|
|Caffe trained (NN)||152|
Tesseract results are better on black & white images. Caffe results do not improve, this comes from the fact the first layers of the neural network do the preprocessing.