VML-HD: The historical Arabic documents dataset for recognition systems Conference Paper uri icon


  • In this paper we present a new database with handwritten Arabic script. It is based on five books written by different writers from the years 1088–1451. We took 680 pages from these five books, and fully annotated them on the sub-word level. For each page we manually applied bounding boxes on the different sub-words and annotated the sequence of characters. It consists of 121,636 sub-word appearances consisted of 244,553 characters out of a vocabulary of 1,731 forms of sub-words. The database is described in detail and …

publication date

  • April 3, 2017