Robust text and drawing segmentation algorithm for historical documents Conference Paper uri icon

abstract

  • We present a method to segment historical document images into regions of different content. First, we segment text elements from non-text elements using a binarized version of the document. Then, we refine the segmentation of the non-text regions into drawings, background and noise. At this stage, spatial and color features are exploited to guarantee coherent regions in the final segmentation. Experiments show that the suggested approach achieves better segmentation quality with respect to other methods. We examine the segmentation quality on 252 pages of a historical manuscript, for which the suggested method achieves about 92% and 90% segmentation accuracy of drawings and text elements, respectively.

publication date

  • January 1, 2013