New method identifies protein-protein interactions on basis of sequence alone (PNAS)

By Catherine Zandonella, Office of the Dean for Research

Protein-protein interaction
Researchers can now identify which proteins will interact just by looking at their sequences. Pictured are surface representations of a histidine kinase dimer (HK, top) and a response regulator (RR, bottom), two proteins that interact with each other to carry out cellular signaling functions. (Image based on work by Casino, et. al. credit: Bitbol et. al 2016/PNAS.)

Genomic sequencing has provided an enormous amount of new information, but researchers haven’t always been able to use that data to understand living systems.

Now a group of researchers has used mathematical analysis to figure out whether two proteins interact with each other, just by looking at their sequences and without having to train their computer model using any known examples. The research, which was published online today in the journal Proceedings of the National Academy of Sciences, is a significant step forward because protein-protein interactions underlie a multitude of biological processes, from how bacteria sense their surroundings to how enzymes turn our food into cellular energy.

“We hadn’t dreamed we’d be able to address this,” said Ned Wingreen, Princeton University‘s Howard A. Prior Professor in the Life Sciences, and a professor of molecular biology and the Lewis-Sigler Institute for Integrative Genomics, and a senior co-author of the study with Lucy Colwell of the University of Cambridge. “We can now figure out which protein families interact with which other protein families, just by looking at their sequences,” he said.

Although researchers have been able to use genomic analysis to obtain the sequences of amino acids that make up proteins, until now there has been no way to use those sequences to accurately predict protein-protein interactions. The main roadblock was that each cell can contain many similar copies of the same protein, called paralogs, and it wasn’t possible to predict which paralog from one protein family would interact with which paralog from another protein family.  Instead, scientists have had to conduct extensive laboratory experiments involving sorting through protein paralogs one by one to see which ones stick.

In the current paper, the researchers use a mathematical procedure, or algorithm, to examine the possible interactions among paralogs and identify pairs of proteins that interact. The method was able to correctly predict 93% of the protein-protein paralog pairs that were present in a dataset of more than 20,000 known paired protein sequences, without being first provided any examples of correct pairs.

Interactions between proteins happen when two proteins come into physical contact and stick together via weak bonds. They may do this to form part of a larger piece of machinery used in cellular metabolism. Or two proteins might interact to pass a signal from the exterior of the cell to the DNA, to enable a bacterial organism to react to its environment.

When two proteins come together, some amino acids on one chain stick to the amino acids on the other chain. Each site on the chain contains one of 20 possible amino acids, yielding a very large number of possible amino-acid pairings. But not all such pairings are equally probable, because proteins that interact tend to evolve together over time, causing their sequences to be correlated.

The algorithm takes advantage of this correlation. It starts with two protein families, each with multiple paralogs in any given organism. The algorithm then pairs protein paralogs randomly within each organism and asks, do particular pairs of amino acids, one on each of the proteins, occur much more or less frequently than chance? Then using this information it asks, given an amino acid in a particular location on the first protein, which amino acids are especially favored at a particular location on the second protein, a technique known as direct coupling analysis. The algorithm in turn uses this information to calculate the strengths of interactions, or “interaction energies,” for all possible protein paralog pairs, and ranks them. It eliminates the unlikely pairings and then runs again using only the top most likely protein pairs.

The most challenging part of identifying protein-protein pairs arises from the fact that proteins fold and kink into complicated shapes that bring amino acids in proximity to others that are not close by in sequence, and that amino-acids may be correlated with each other via chains of interactions, not just when they are neighbors in 3D. The direct coupling analysis works surprisingly well at finding the true underlying couplings that occur between neighbors.

The work on the algorithm was initiated by Wingreen and Robert Dwyer, who earned his Ph.D. in the Department of Molecular Biology at Princeton in 2014, and was continued by first author Anne-Florence Bitbol, who was a postdoctoral researcher in the Lewis-Sigler Institute for Integrative Genomics and the Department of Physics at Princeton and is now a CNRS researcher at Universite Pierre et Marie Curie – Paris 6. Bitbol was advised by Wingreen and Colwell, an expert in this kind of analysis who joined the collaboration while a member at the Institute for Advanced Study in Princeton, NJ, and is now a lecturer in chemistry at the University of Cambridge.

The researchers thought that the algorithm would only work accurately if it first “learned” what makes a good protein-protein pair by studying ones discovered in experiments. This required that the researchers give the algorithm some known protein pairs, or “gold standards,” against which to compare new sequences. The team used two well-studied families of proteins, histidine kinases and response regulators, which interact as part of a signaling system in bacteria.

But known examples are often scarce, and there are tens of millions of undiscovered protein-protein interactions in cells. So the team decided to see if they could reduce the amount of training they gave the algorithm. They gradually lowered the number of known histidine kinase-response regulator pairs that they fed into the algorithm, and were surprised to find that the algorithm continued to work. Finally, they ran the algorithm without giving it any such training pairs, and it still predicted new pairs with 93 percent accuracy.

“The fact that we didn’t need a gold standard was a big surprise,” Wingreen said.

Upon further exploration, Wingreen and colleagues figured out that their algorithm’s good performance was due to the fact that true protein-protein interactions are relatively rare. There are many pairings that simply don’t work, and the algorithm quickly learned not to include them in future attempts. In other words, there is only a small number of distinctive ways that protein-protein interactions can happen, and a vast number of ways that they cannot happen. Moreover, the few successful pairings were found to repeat with little variation across many organisms. This it turns out, makes it relatively easy for the algorithm to reliably sort interactions from non-interactions.

Wingreen compared this observation – that correct pairs are more similar to one another than incorrect pairs are to each other – to the opening line of Leo Tolstoy’s Anna Karenina, which states, “All happy families are alike; each unhappy family is unhappy in its own way.”

The work was done using protein sequences from bacteria, and the researchers are now extending the technique to other organisms.

The approach has the potential to enhance the systematic study of biology, Wingreen said. “We know that living organisms are based on networks of interacting proteins,” he said. “Finally we can begin to use sequence data to explore these networks.”

The research was supported in part by the National Institutes of Health (Grant R01-GM082938) and the National Science Foundation (Grant PHY–1305525).

Read the abstract.

The paper, “Inferring interaction partners from protein sequences,” by Anne-Florence Bitbol, Robert S. Dwyerd, Lucy J. Colwell and Ned S. Wingreen, was published in the Early Edition of the journal Proceedings of the National Academy of Sciences on September 23, 2016.
doi: 10.1073/pnas.1606762113

Ice cores reveal a slow decline in atmospheric oxygen over the last 800,000 years (Science)

Princeton University researchers used ice cores collected in Greenland to study 800,000 years of atmospheric oxygen. Image source: Stolper, et al.
Princeton University researchers used ice cores collected in Greenland (pictured here) and Antarctica to study 800,000 years of atmospheric oxygen. Image source: Stolper, et al.

By Morgan Kelly, Office of Communications

Princeton University researchers have compiled 30 years of data to construct the first ice core-based record of atmospheric oxygen concentrations spanning the past 800,000 years, according to a paper published today in the journal Science.

The record shows that atmospheric oxygen has declined 0.7 percent relative to current atmospheric-oxygen concentrations, a reasonable pace by geological standards, the researchers said. During the past 100 years, however, atmospheric oxygen has declined by a comparatively speedy 0.1 percent because of the burning of fossil fuels, which consumes oxygen and produces carbon dioxide.

Curiously, the decline in atmospheric oxygen over the past 800,000 years was not accompanied by any significant increase in the average amount of carbon dioxide in the atmosphere, though carbon dioxide concentrations do vary over individual ice age cycles. To explain this apparent paradox, the researchers called upon a theory for how the global carbon cycle, atmospheric carbon dioxide and Earth’s temperature are linked on geologic timescales.

“The planet has various processes that can keep carbon dioxide levels in check,” said first author Daniel Stolper, a postdoctoral research associate in Princeton’s Department of Geosciences. The researchers discuss a process known as silicate weathering in particular, wherein carbon dioxide reacts with exposed rock to produce, eventually, calcium carbonate minerals, which trap carbon dioxide in a solid form. As temperatures rise due to higher carbon dioxide in the atmosphere, silicate-weathering rates are hypothesized to increase and remove carbon dioxide from the atmosphere faster.

Researchers at Princeton University analyzed ice cores collected in Greenland and Antarctica to determine levels of atmospheric oxygen over the last 800,000 years. (Image: Stolper, et al.)
Researchers at Princeton University analyzed ice cores collected in Greenland (pictured here) and Antarctica to determine levels of atmospheric oxygen over the last 800,000 years. (Image: Stolper, et al.)

Stolper and his co-authors suggest that the extra carbon dioxide emitted due to declining oxygen concentrations in the atmosphere stimulated silicate weathering, which stabilized carbon dioxide but allowed oxygen to continue to decline.

“The oxygen record is telling us there’s also a change in the amount of carbon dioxide [that was created when oxygen was removed] entering the atmosphere and ocean,” said co-author John Higgins, Princeton assistant professor of geosciences. “However, atmospheric carbon dioxide levels aren’t changing because the Earth has had time to respond via increased silicate-weathering rates.

“The Earth can take care of extra carbon dioxide when it has hundreds of thousands or millions of years to get its act together. In contrast, humankind is releasing carbon dioxide today so quickly that silicate weathering can’t possibly respond fast enough,” Higgins continued. “The Earth has these long processes that humankind has short-circuited.”

The researchers built their history of atmospheric oxygen using measured ratios of oxygen-to-nitrogen found in air trapped in Antarctic ice. This method was established by co-author Michael Bender, professor of geosciences, emeritus, at Princeton.

Because oxygen is critical to many forms of life and geochemical processes, numerous models and indirect proxies for the oxygen content in the atmosphere have been developed over the years, but there was no consensus on whether oxygen concentrations were rising, falling or flat during the past million years (and before fossil fuel burning). The Princeton team analyzed the ice-core data to create a single account of how atmospheric oxygen has changed during the past 800,000 years.

“This record represents an important benchmark for the study of the history of atmospheric oxygen,” Higgins said. “Understanding the history of oxygen in Earth’s atmosphere is intimately connected to understanding the evolution of complex life. It’s one of these big, fundamental ongoing questions in Earth science.”

Read the abstract

Daniel A. Stolper, Michael L. Bender, Gabrielle B. Dreyfus, Yuzhen Yan, and John A. Higgins. 2016. A Pleistocene ice core record of atmospheric oxygen concentrations. Science. Arti­cle pub­lished Sept. 22, 2016. DOI: 10.1126/science.aaf5445

The work was supported by a National Oceanic and Atmospheric Administration Climate and Global Change postdoctoral fellowship, and the National Science Foundation (grant no. ANT-1443263).