Signature Verification with Decision Trees
An experiment with CART, C4.5, and Random Forest.
Have you developed a project without knowing why? Well, you aren’t alone. This project consists of a signature checker using the decision tree as the theme. This was the mission to me on Intelligence Computation class project in the master degree. The feeling was:
why? my current project is using Deep Neural Network with handwritten text, nothing with signatures much less with decision trees.
I confess that I started the master degree and I never used artificial intelligence before, so, many questions surged in the middle of the process. I only understand this project in the end and I conclude that it is better to use Neural Network to this job (spoiler).
For code lovers ~ https://github.com/arthurflor23/signature-verification.
Hey, ho, let’s go
What is the concept of a decision tree? From Wikipedia [here]:
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.
Boring, I know, so... the decision tree is a set of decisions to be made, interconnected by conditions. This set of decisions are represented in a tree diagram. And what about the logic to construct it? Well, we can say that the most important construct algorithms are: CART, C4.5, and Random Forest.
If it’s sunny and windless, I’ll probably play golf/tennis (the example most common for the decision tree, of course, with math, tables, and diagrams, as in Fig. 2).
Dataset
The dataset used to this project was from SigComp2009 with 3,462 signatures, divided into authentic and forged. You can find here.
Preprocessing
The Otsu thresholding was applied, with a minimum bounding box and a resize to 512x512, as in Fig. 4. Some interesting points I learned in the process:
- Initially, I didn’t do the minimum bounding box, but this technique gave me a 10% increase in accuracy (it’s obvious to someone experienced);
- Resize the image to 512x512, allowed me to use a specific CNN for signatures (we’ll get there).
Extraction Features
The features of each image will be our attributes to the conditions on the decision tree. To this, two ways were tested:
- Using Hu Moments, that returns a vector with 7 values;
- Using Convolutional Neural Network, CNN (sigver_wiwd), that returns a vector with 2048 values.
Algorithms
To construct the decision tree, 3 algorithms were used:
- C4.5, own implementation… this is important after;
- CART, from sklearn implementation;
- Random Forest, from sklearn implementation as well.
Experiment
With all explain, was easy to implement by sklearn. The C4.5, instead, was hard from scratch, and yep, it wasn’t necessary.
The experiment consisted in combining all the techniques, so:
- Hu moments + C4.5;
- Hu moment + CART;
- Hu moments + Random Forest;
- CNN + C4.5;
- CNN + CART;
- CNN + Random Forest.
All this executed 30 times to get a median of the results of time and accuracy. Why 30? Magic number.
Results
The best result was CNN + Random Forest, with 86,40% accuracy (Fig. 5). Do you remember that I said Neural Network is better? So, well, NN can reach 99% (I saw after in a dissertation project here). But, let’s go to the results.
And time results (Fig. 6). It’s funny, ’cause I thought my C4.5 implementation was “ok”, but when I saw the time compared to the others… hahaha… 😢
To the finish, a plot of the Decision Tree from Random Forest (Fig. 7).
That’s all folks! Thank you for the time shared, dear reader. Now I can breathe a little bit after sharing this project (haha). See ya.