A coarse-to-fine approach for layout analysis of ancient manuscripts Conference Paper uri icon

abstract

  • Many applications along the manuscript analysis pipeline rely on the accuracy of pre-processing steps. Perfectly detecting the main text area in ancient historical documents is of great importance for these applications. We propose a learning-free approach to detect the main text area in ancient manuscripts. First, we coarsely segment the main text area by using a texture-based filter. Then, we refine the segmentation by formulating the problem as an energy minimization task and achieving the minimum using graph cuts. The energy function is derived from properties of the text components. Spatial coherence of the segmented text regions is explicitly encouraged by the energy function. We evaluate the suggested method on a publicly available dataset of 38 historical document images. Experiments show that the suggested approach outperforms another state-of-the-art page segmentation method in terms of segmentation quality and time performance.

publication date

  • September 1, 2014