Humans and chimpanzees split from a common ancestor roughly 6 million years ago, when we set off down separate branches on the evolutionary tree of life. Humans continued to birth completely new genes after that split, some of which arose from regions of the genome long thought to be “junk,” a new study highlights.
In the new research, which was published Tuesday (Dec. 20) in the journal Cell Reports (opens in new tab), scientists scoured the human genome for evidence of brand-new genes being “born.” Specifically, they looked for so-called de novo genes that don’t arise through the usual process, in which genes pick up letter changes, or mutations, as cells make copies of their DNA. This modified DNA gives rise to different versions of the proteins than was made from the original version of the gene.
In contrast, de novo genes spontaneously arise from snippets of DNA that don’t code for proteins but may code for molecules that switch genes “on” and “off” or perform other functions in the cell. Thus, when de novo genes code for proteins, they’re kind of developing that code “from scratch,” rather than iterating on protein-coding DNA that already existed in the cell.
The new study revealed 155 of these made-from-scratch human genes that all code for tiny proteins, or microproteins, many of which contained fewer than 100 amino acids, the building blocks of proteins. “We found two that are strictly human-specific,” meaning they didn’t appear in any of the other animal genomes studied, first author Nikolaos Vakirlis (opens in new tab), a junior investigator at the Alexander Fleming Biomedical Sciences Research Center in Athens, Greece, told Live Science. These two genes appeared after humans split from chimps.
Early data from lab dish experiments hint that at least 44 of these 155 puny proteins — including the two human-specific ones — may play important roles in cell growth, but this will need to be verified in future studies. “The question is whether that effect that we see at the cell culture level translates to something real at the organism level,” Vakirlis said.
Vakirlis and his team started their hunt for de novo genes in a publicly available data set. First released in 2020 and described in the journal Science (opens in new tab), the data set contains information on hundreds of short-length snippets of DNA that code for microproteins. These DNA snippets are considered “noncanonical,” meaning their building blocks line up in unusual sequences not typically seen in protein-coding genes. The team behind the data set also ran experiments to see whether these microproteins fulfill important roles in cells and found that some seem to be key for cell growth, at least in lab dishes.
“Without that dataset, a study like the one we did would be impossible,” Vakirlis told Live Science. Historically, scientists considered such supershort DNA sequences and the teensy proteins they encode to be largely unimportant — insignificant in comparison with large, more familiar proteins, he noted. That notion has since been challenged, now that modern methods allow scientists to more easily study microproteins and their associated DNA, he said.
With the rich data set in hand, the team worked backward to estimate when each snippet of microprotein-coding DNA was first introduced to humans’ evolutionary lineage. To do so, they looked for the same DNA snippets in the genomes of 99 other vertebrate species, including chimps, gorillas, horses, alligators and platypuses. “We know the phylogenetic relationships between these animals; we know that human and chimp are closer together than human and gorilla, etcetera,” Vakirlis said.
Taking these relationships into account, the team used computational methods to roll back the evolutionary clock and determine which human ancestor first carried each microprotein-coding gene. They could then look back to earlier ancestors that didn’t carry the gene and see whether that gene likely originated de novo — from non-protein-coding sequences.
In addition, the team looked at data from most of the 100 species to see which genes are actually switched on in different animals and, therefore, are actively used to make proteins. “If it’s not expressed, it won’t do anything,” Vakirlis said.
Some of the 155 de novo genes in the human genome date back to the origin of mammals, while others appeared much more recently, the study suggests.
The research does have some limitations, however. For example, the gene expression data weren’t available for all 100 species, so this raises some uncertainty as to when each gene became active within the human lineage. There’s also some uncertainty as to whether the 44 genes flagged as important to cell function in petri dishes actually make a difference in living organisms, Vakirlis said.
On that point, though, there’s “probably a few false positives, but a lot more false negatives, if I had to guess,” he noted. In other words, there are likely some microproteins that appeared unimportant to cell growth in the initial lab dish studies but whose true functions have yet to be revealed — “which means there is much more to discover,” he said.