Gradual Learning of Deep Recurrent Neural Networks. Academic Article uri icon


  • Deep Recurrent Neural Networks (RNNs) achieve state-of-the-art results in many sequence-to-sequence tasks. However, deep RNNs are difficult to train and suffer from overfitting. We introduce a training method that trains the network gradually, and treats each layer individually, to achieve improved results in language modelling tasks. Training deep LSTM with Gradual Learning (GL) obtains perplexity of 61.7 on the Penn Treebank (PTB) corpus. As far as we know (as for the 20.05. 2017), GL improves the best state-of-the-art performance by a single LSTM/RHN model on the word-level PTB dataset. Subjects: Machine Learning (stat. ML); Learning (cs. LG) Cite as: arXiv: 1708.08863 [stat. ML](or arXiv: 1708.08863 v1 [stat. ML] for this version) Submission history From: Ziv Aharoni [view email][v1] Tue, 29 Aug 2017 16: 18: 44 GMT (509kb)

publication date

  • January 1, 2017