The Quadron methodology, developed via a large-scale gradient boosting machines trained on experimental G4-seq data, is detailed and validated in the citation brought below. To use the program and/or read the detailed instructions, please refer to its GitHub page (see the left panel). At its current state, Quadron model assesses the actual G-quadruplex formation propensity for all canonical (with four tracts of G3+) sequence motifs with extended (12-nt) maximum loop size. Such initial putative quadruplex sequence (PQS) motifs, independently of their stringency (7 or 12-nt maximum loop size), contain many sequences (~50%) that do not normally form G4 structures. Quadron helps to pinpoint those and focus on the sequences that are capable of forming G4s. Quadron also accounts for the crucial role of the flanking sequences in modulating G4 stabilities, by processing a 50-nt-long flanking seqeunce information from both 5’ and 3’ ends for each PQS. Quadron has the best performance (TPR, TNR, FPR, FNR, FDR) of all the existing methodologies on this large subset of quadruplex sequences. Developments are ongoing to extend the methodology to account for non-canonical sequences (~35% of all observed G4 structures in the human genome) with bulges in G-tracts.
Aleksandr B. Sahakyan, Vicki S. Chambers, Giovanni Marsico, Tobias Santner, Marco Di Antonio and Shankar Balasubramanian, “Machine learning model for sequence-driven DNA G-quadruplex formation”, Scientific Reports, 2017, 7:14545. The article along with the Supporting Information document can be accessed via: http://doi.org/10.1038/s41598-017-14017-4