A study of accessible motifs and RNA folding complexity. Academic Article uri icon

abstract

  • mRNA molecules are folded in the cells and therefore many of their substrings may actually be inaccessible to protein and microRNA binding. The need to apply an accessability criterion to the task of genome-wide mRNA motif discovery raises the challenge of overcoming the core O(n 3) factor imposed by the time complexity of the currently best known algorithms for RNA secondary structure prediction [24, 25, 43]. We speed up the dynamic programming algorithms that are standard for RNA folding prediction. Our new approach significantly reduces the computations without sacrificing the optimality of the results, yielding an expected time complexity of O(n 2ψ(n)), where ψ(n) is shown to be constant on average under standard polymer folding models. Benchmark analysis confirms that in practice the runtime ratio between the previous approach and the new algorithm indeed grows linearly with increasing sequence size. The fast new RNA folding algorithm is utilized for genome-wide discovery of accessible cis-regulatory motifs in data sets of ribosomal densities and decay rates of S. cerevisiae genes and to the mining of exposed binding sites of tissue-specific microRNAs in A. Thaliana. Further details, including additional figures and proofs to all lemmas, can be found at: http://www.cs.tau.ac.il/~michaluz/QuadraticRNAFold.pdf

publication date

  • January 1, 2007