The virus HIV-1 has a tiny genome. All of its nine genes fit on one single RNA molecule, and the organism’s entire library of genetic material consists of only 10 kilobases (for context, the human genome is around 3 million kilobases). But despite the virus’ small pool of genes, it is able to use a method called alternative splicing to produce many various proteins with different purposes. The RNA transcripts for these proteins are like individual words hidden in a wall of text, says Whitehead Institute Fellow Silvi Rouskin: “You cut and paste them [through alternative splicing], and then when you put them all together you have a sentence that makes sense.”
Since none of the HIV’s genes even encode the cellular machinery needed to “cut and paste” RNA — it hijacks its host’s materials for that — scientists are still working out exactly how every HIV molecule is able to control where it is spliced. Rouskin and others hypothesized that the conformation, or shape, of the RNA molecules might have something to do with this process. RNA sequences in the virus — even those with the exact same sequence of nucleotides — might curl and twist in different ways, leading to differences in how they are chopped up later to create transcripts for proteins. Now, in a study published May 6 in the journal Nature, Rouskin and coauthors suggest this hypothesis is correct — and introduce a new algorithm that can effectively identify and sort RNA molecules by shape.
To begin her investigation of HIV-1 RNA structures, Rouskin turned to a method she has developed over her past few years at Whitehead. The method, called DMS-MaPseq, involves tagging RNA molecules with tiny methyl groups. The methyl groups bind to unpaired bases along the RNA strand, which occur either on long straight stretches of exposed RNA, or in loops that form when complimentary sections bind to each other. These methyl groups can be detected because they lead to mutations when the RNA is reverse transcribed to DNA. Rouskin first introduced the technique in 2017 in a paper in Nature Methods.
In her new paper, Rouskin used DMS MaPseq to mark HIV-1 molecules with these mutations. Then, she and collaborators at Walter and Eliza Hall Institute of Medical Research and elsewhere designed an algorithm that uses sequencing data on where the mutations occurred to reveal the different ways the same RNA template can be shaped. For instance, if one base is mutated only half of the expected frequency, there are at least two shapes that the RNA sequence can assume — one conformation in which the base is exposed in a loop or open stretch, and another where it is securely bound to a complementary region on the RNA sequence.
Where older methods of determining RNA structure would assume every RNA molecule looked basically the same, Rouskin’s new algorithm considers the possibility that there might be many different conformations — and then sorts its results by what shape they are and the relative frequencies of each. This allows researchers to not just observe the frequency of known shapes, but also to discover new conformations. Another benefit of the algorithm, Rouskin says, is that unlike thermodynamic methods, which use mathematical models to calculate possible structures of RNA molecules, this algorithm can be used to analyze how they actually appear in living cells.
To validate the algorithm, Rouskin and her collaborators created their own RNA transcripts from a human gene to use as a test template. They picked a sequence that naturally assumes two different known conformations. They then mixed the two structures together and used DMS MaPseq to tag them with methyl groups and induced mutations. When they applied the new algorithm to sequencing data, it was able to correctly identify the two structures until the concentration of one fell to below 6 percent of the mixture.
Next, they returned to the HIV genome to see whether the method could be used both in vitro and on living viruses as they infected human cells. They first focused on a specific part of the HIV-1 RNA sequence which previous studies have shown to form structures with either four or five branching stems. When they tested the algorithm on a mixture of the two conformations, it was able to accurately gauge the relative prevalence of each. Another experiment on HIV-1 virus infecting human T cells revealed that the algorithm could also gauge the prevalence of the structures in vivo.
Then, they zeroed in on how the different structures might affect RNA splicing by examining one specific splice site. If the RNA strand was split at this site, it could go on to code for a protein called tat. If not, none of the protein could be made. The algorithm identified two conformations present in HIV-1-infected cells: one conformation left the splice site exposed for the host cell’s cutting machinery to snip the strand; the other hid it away in the molecule so the splicing molecules could not bind. When they tested whether mutations to the RNA that made the latter structure more prevalent, they observed a decrease in the amount of tat transcript the virus could produce. “Hiding or exposing those signals is a way for the virus to take control of it splicing,” Rouskin says.
The virus may also use alternate conformations to make sure that some molecules of RNA remain completely unspliced at all times — which ensures that there will be enough full copies of the viral genome to transmit as the virus replicates. This hypothesis will testable when sequencing technology allows researchers to analyze the structure of the entire HIV-1 RNA molecules at once (right now they must break the sequence into smaller chunks). If one RNA molecule can be observed hiding all of its splice sites at the same time, “that would be the killer,” Rouskin says. “We’re missing that one piece of data.”
The hypothesis does seem likely, though, based on the team’s findings on the heterogeneity of the HIV-1 genome as a whole: when they used their algorithm to assay the structures formed by HIV RNA, they found the HIV RNA was extremely variable, with at least two alternative structures for more than 90 percent of the sections of RNA that the team analyzed.
Being able to sort RNA molecules by their conformation also has applications for human RNA structure, says Rouskin. “What we're doing right now is basically taking those lessons that we've learned from viruses, and we're asking, ‘Is human RNA also doing the same thing?’” Rouskin says. “People have learned a lot of things from viruses and then realize the human cell is doing the same thing. We're very excited. And our preliminary results are suggesting that actually, yes, this is also a mechanism that human RNAs use too, to regulate their alternative splicing.”
Knowing more about this mechanism could one day be useful in a clinical setting, says Phil Tomezsko, a graduate student in the virology program at Harvard studying in Rouskin’s lab, and a co-first author on the paper. “There are some rare genetic diseases that can be either treated or potentially cured by changing splicing patterns of different genes,” he says. “More broadly than that, there are most likely lots of alternative splicing decisions that happen in complex diseases — and if we know that alternative RNA structure is a way that human splicing is regulated, it could open a lot of insight into different diseases that we don't really understand yet.”
The DREEM algorithm is also being used to study the novel coronavirus, Tomezsko says. “By applying this technique to study RNA structure in the SARS-CoV-2 replication cycle, we hope to find novel regulatory steps that could be targeted by antivirals,” he says.
Rouskin’s work on HIV-1 RNAs is supported, in part, by the National Institutes of Health, the Center of HIV-1 RNA Studies, the Smith Family Foundation, and the Burroughs Wellcome fund.