Keyword searching for Arabic handwritten documents Academic Article uri icon

abstract

  • In this paper we present a system for searching keywords in Arabic hand-written and historical documents using two algorithms, Dynamic Time Warping (DTW) and Hidden Markov Models (HMM). The HMM based system provides satisfying results when it is possible to provide adequate training samples (which is not always possible in historical documents). The DTW algorithm with a slight modification provides better results even with a small set of training sam-ples. The observation sequences for the matching algorithms are generated by extracting a set of geometric features that already shown to obtain good recog-nition rates for on-line Arabic handwriting. We have adopted the segmentation-free approach, i.e., continuous word-parts are used as the basic alphabet, instead of the usual alphabet letters. The contours of the complete word-parts are used to represent the shapes of the compared word-parts. Additional strokes, such as dots and detached short segments, which are very common in Arabic scripts, are used via a rule-based system to improve the search algorithm and determine the final comparison decision. The search for a keyword is performed by the search for its word-parts, including the additional strokes, in the right order. The results for our modi?ed DTW algorithm are very encouraging, even when using a small set of samples for training.

publication date

  • January 1, 2008

published in

  • The  Journal