The mystery of de novo proteins is revealed by a team of CZ and GE researchers

12. 04. 2023

Proteins do not usually form from junk DNA – but if they do and they take hold, they become part of the cell’s protein make-up. They are called de novo proteins because they are practically created out of nothing, anew, and not much is known about them. In a new study published in Nature Ecology & Evolution, researchers from BIOCEV and their German colleagues described a large set of these proteins, helping unravel the otherwise hard-to-decipher properties of the mystery proteins.

The human genome is a tangle of DNA strands containing intersecting gene sequences: those that code for functional proteins and those that do not – i.e., non-coding or junk DNA, whose function is still understudied, even though it makes up most of the genome. The advent of modern methods has revealed that even non-coding DNA can carry information needed for the synthesis of proteins, the basic building blocks of the cells of living organisms. A synthesis that was initially unstable, without a specific aim, and purely experimental.

If junk DNA does yield a protein, either of use to the cell or as a harmless experiment, the product takes hold and the so-called de novo protein becomes a permanent part of the cell’s protein make-up. While conventional evolutionary mechanisms work with pre-existing genetic material, these de novo sequences emerge “from scratch” and may not initially have a specific function. The mechanisms of their origin still constitute a great unknown, despite many of these proteins being related to essential life functions and pathological processes.

What are de novo proteins and how are they formed?

A team of researchers from the Faculty of Science at Charles University and the Czech Academy of Sciences, working at the BIOCEV Centre in Vestec near Prague and led by biochemist Klára Hlouchová, joined forces with the group of bioinformatician Erich Bornberg-Bauer from the University of Münster in Germany. The researchers compared 1,800 putative de novo proteins identified in human and fly genomes with 1,800 completely randomly generated protein sequences. The comparison of such sequences may reveal which properties are decisive in the emergence of de novo genes. The study appears in the current issue of the prestigious journal Nature Ecology & Evolution.

While bioinformatic predictions of biophysical properties do not significantly distinguish between de novo protein sequences and their random counterparts, their experimental characterisation reveals noticeable differences.

De novo proteins exhibit higher solubility, which seems to allow them to integrate better into the cellular environment. “However, this means that sequences with higher structural disorder are preferentially selected during the emergence of de novo proteins,” notes study co-author Filip Buchel, a PhD student at the Faculty of Science of Charles University. “Higher structural content in proteins with no evolutionary history often goes hand in hand with a tendency to form insoluble aggregates that are toxic for cells.”

“All of this suggests that significant selection based on biophysical properties occurs at an early stage of de novo protein evolution,” adds Vyacheslav Tretyachenko, a former PhD student at the Faculty of Science of Charles University (now on a postdoctoral fellowship at the Weizmann Institute of Science in Israel).

The selection of new proteins based on their biophysical properties echoes the selection processes from the dawn of life on Earth that defined the basic properties of early proteins, as the team led by Klára Hlouchová recently published in J. Am. Chem. Soc.

The large set of proteins required special methods

In this study, the researchers had to adapt traditional biochemical and biophysical characterisation methods to working with the entire set of proteins in order to compare a large number of sequences in parallel. One of the key methods comprised mass spectroscopy, which made it possible to detect individual sequences in the analysed samples. “Working on this project was very exciting because it allowed us to identify proteins that essentially do not exist. It was a proteomic conundrum of sorts,” adds Petr Novák from the Institute of Microbiology of the CAS. The study constitutes the first characterisation of a large set of de novo proteins that has gone beyond purely theoretical research, helping unravel the proteins’ otherwise hard-to-decipher properties.

On the Czech side, the study was supported by the Primus and GAUK grants from Charles University.


Brennen Heames, Filip Buchel, Margaux Aubel, Vyacheslav Tretyachenko, Dmitry Loginov, Petr Novák, Andreas Lange, Erich Bornberg-Bauer, Klára Hlouchová. (2023) “Experimental characterization of de novo proteins and their unevolved random-sequence counterparts”. Nature Ecology & Evolution, DOI: 10.1038/s41559-023-02010-2


Contacts for Media

Markéta Růžičková
Public Relations Manager
 +420 777 970 812

Eliška Zvolánková
 +420 739 535 007

Martina Spěváčková
+420 733 697 112

Logos of the CAS for download

Annual Reports of the CAS

Press Releases