A Biclique Approach to Reference Anchored Gene Blocks and Its Applications to Pathogenicity Islands Conference Paper uri icon


  • We formalize a new problem variant in gene-block discovery, denoted Reference-Anchored Gene Blocks (RAGB). Given a query sequence Q of length n, representing the gene-array of a DNA element, a window size bound d on the length of a substring of interest in Q, and a set of target gene sequences \(\mathcal {T}=\{T_1 \ldots T_c\}\). Our objective is to identify gene-blocks in \(\mathcal {T}\) that are centered in a subset q of co-localized genes from Q, and contain genomes from \(\mathcal {T}\) in which the corresponding orthologs of the genes from q are also co-localized. We cast RAGB as a variant of a (colored) biclique problem in bipartite graphs, and analyze its parameterized complexity, as well as the parameterized complexity of other related problems. We give an \(O(nm+2^dnm/\lg m)\) time algorithm for the uncolored variant of our biclique problem, where m is the number of areas of interest that are parsed from the target sequences, and n and d are as defined above. Our algorithm can be adapted to compute all maximal bicliques in the graph within the same time complexity, and to handle edge-weights with a slight \(O(\lg d)\) increase to its time complexity. For the colored version of the problem, our algorithm has a time complexity of \(O(2^dnm)\). We implement the algorithm and exemplify its application to LEE, a well-known pathogenicity island from the e. coli genome harboring virulence genes. Our code and supplementary materials, including omitted proofs and figures, are available at https:// www. cs. bgu. ac. il/ ~negevcb/ RAGB/ .

publication date

  • January 1, 2016