Efficient generation of comprehensive database for online arabic script recognition Conference Paper uri icon


  • The difficulties in segmenting cursive words into indi- vidual characters have shifted the focus of handwriting recognition research from segmentation-based approaches to segmentation-free (holistic) methods. However, main- taining and training large number of prototypes (models) that represent the words in the dictionary make the train- ing process extremely expensive and difficult in comput- ing resources. In this paper we present an efficient system that automatically generates prototypes for each word in a given dictionary using multiple appearance of each let- ter shape. Multiple appearance allows for many permuta- tion of shapes for each word and thus complicates search- ing for the right prototype. To simplify the training, re- duce the maintained prototypes, and avoid over fitting, we used dimensionality reduction followed by clustering tech- niques to reduce the size of these sets without affecting their ability to represent the wide variations of the handwriting styles. A set of generated fonts are created by professional writers imitating all handwriting styles for each charac- ter in each position. These Fonts are used to generate all shapes for writing each word-part in a comprehensive dic- tionary. Principal component analysis and k-means cluster- ing techniques are performed to select the minimal number of shapes representing the wide variations of handwriting styles for a word-part. Experimental results using an on- line recognition system proves the credibility of this process compared to manually generated databases.

publication date

  • July 26, 2009