Promise in Unraveled Proteins
okay, i know this is a long dry read, but just think of the possiblities if they perfect this...
Deciphering The Message of Life's Assembly
Scientists See Promise In Unraveled Proteins
David Brown, MD, Washington Post Staff Writer
Living organisms put themselves together, all by themselves. No one reads the instructions and tucks tab A into slot B. But getting into the right shape can't happen just by chance. So where are the directions? And how do living things follow them?
These questions echo through all of biology: in the transformation of embryo to infant; in the complexity of organs; in the design of a single cell. The mystery of "self-assembly" extends all the way to the inanimate building materials of life -- the proteins.
Proteins -- hundreds of thousands of different ones -- are the biochemical molecules that make up cells, organs and organisms. Like the larger structures, proteins also put themselves together, in a process termed "folding." How they do that is called "the protein-folding problem," and it may be the most important unanswered question in the life sciences. First posed in 1935, it has been the undeciphered message, the unsolved equation, the unassembled jigsaw puzzle for three generations of scientists.
"At the interface between chemistry and biology, this is the major problem," said Frederic M. Richards, a chemist at Yale University who has been working on the problem for decades.
Now, the answers to the protein-folding question may be in sight. When it's complete -- and if it's right -- the solution will have practical consequences in medicine, drug development and agriculture. Its real significance, however, will lie elsewhere.
Solving the protein-folding problem isn't likely to create a revolution the way Watson and Crick's discovery of the structure of DNA transformed biology, or that Einstein's theory of relativity transformed physics. But it will provide a profound new insight into life's basic units, and the evolutionary process that produced them.
The human body makes at least 50,000 different proteins, and possibly twice that many. They are the essential working parts of living matter. If a cell is thought of as a house, then proteins are just about everything in it. They are the furniture, the fixtures, the lumber.
Like those objects, each protein has a particular shape and function. The shapes and functions, in fact, are inextricably linked. Hemoglobin's shape lets it carry oxygen. Collagen's makes it a good connective tissue. Insulin fits in spaces like a key, enabling it to turn things on and off.
Two Days in the Life
Most proteins, however, are impermanent objects sojourning in the mostly watery world of living matter. The average protein survives only two days before it is broken down and its chemical parts either are recycled or excreted as waste. Consequently, cells must manufacture proteins around the clock, at a rate that is hard to imagine.
It takes less than a minute to make even the most complicated protein. A cell can make hundreds per second because it contains thousands of microscopic assembly sites, called "ribosomes," scattered throughout its interior space. Each cell manufactures tens of millions of protein molecules.
What's curious, though, is that despite their great variety, all proteins have a similar structure.
Proteins all begin as long, thin molecules that resemble strings of beads. Very few of them, however, do their work in that shape. Instead, they fold into globular or braided structures with distinct corners, bumps, grooves and planes that give them a three-dimensional identity. It's as if everything in your house were constructed from plastic, pop-together necklaces ingeniously combined, wrapped and twisted to make the stairs, windows and toasters.
The transformation happens quickly and spontaneously. It takes only a fraction of a second for a floppy chain of beads to fold into the shape it will keep for the rest of its working life.
How does that happen? How do the linear -- and, in some sense, one-dimensional -- structures of proteins carry the information that tells them to take on permanent three-dimensional shapes? Is it possible to study a protein chain and predict the folded shape it will take?
That is the protein-folding problem.
Two researchers at the Johns Hopkins University School of Medicine, George D. Rose and Rajgopal Srinivasan, have written a computer program that predicts how proteins will fold. The computer mimics nature in the sense that it doesn't possess intelligence. It doesn't know what it's looking for. It just applies a few basic rules of chemistry in a very particular way. And it comes up with the right answer six times out of seven.
The program isn't perfect, and Rose and Srinivasan haven't solved the protein-folding problem. But they are closer than anybody has been since the problem was first posed 60 years ago by chemist Linus Pauling.
"This is a very significant leap forward. It opens a whole new avenue of understanding," said Russell F. Doolittle, a professor of chemistry and biology at the University of California at San Diego.
Making Sense of Our Essence
The discovery comes at a fertile time in biology. Researchers around the world are engaged in the Human Genome Project, whose goal is to find and decode all 50,000-plus human genes. The great majority of genes encode instructions for making proteins. (As a general rule, there's a one-to-one correspondence between the two, with each gene carrying instructions for making one protein.) Using the latest gene-decoding techniques, scientists are "discovering" dozens of new proteins every week.
The problem is that very little -- and often nothing -- is known about these substances. Knowing what they look like in molecular detail will go a long way toward making sense of them. That's because once a protein's structure is known, it's often possible to deduce its function, or at least guess at it. That, in turn, can give a hint as to where to look for the protein in the living organism so that it can be studied in detail.
A solution to the protein-folding problem will also illuminate many "old" molecules.
Biologists know about the general function of thousands of proteins. They may know this one is an enzyme, that one a hormone, a third a piece of scaffolding. But that doesn't mean they know precisely how the substances work. Knowing a molecule's structure can shed light on the question. Both an ax and a handsaw can cut down a tree, but you have to get a look at the tools to say exactly how each does the job.
Three-dimensional information about proteins, however, is hard to come by. Getting it usually requires months of work, as well as specialized skills and equipment. If a fast, automated way of deducing a protein's shape existed, then scientists would get a peek at dozens of biological events that remain largely mysterious.
Such insights almost certainly would have practical effects on the development of drugs, most of which work by stimulating or blocking the action of proteins. Some researchers foresee the day when drugs are synthesized in the laboratory to interact with their targets in precisely calculated ways. Such "rationally designed" pharmaceuticals -- in theory, at least -- would have fewer side effects and greater potency than conventional ones.
A solution also would provide insights into the very heart of biology. Although proteins are the most abundant biological substances in the animal kingdom, researchers don't have a very good picture of what they're like in all their diversity. Contemporary protein scientists are a little like their brethren of the 16th century, who had heard rumors of exotic animals but didn't actually see them until voyages of discovery sent ships -- and collectors -- to distant lands.
It's hard to generalize about mammals or birds until you've seen many different kinds. Only then is it possible to make sense of similarities and differences. With enough samples in hand, however, it's possible to build systems of classification that bring order out of chaos and make further, subtler insights possible.
Proteins are evolution's humblest and most essential accomplishments. Their shapes are the oldest traces of the path life took from its inanimate beginning. Understanding the working parts of proteins -- and especially how the parts combined and changed, endowing their carriers with new capabilities -- will bring biologists as close as they're ever likely to get to evolution's footprints.
To understand what Rose and Srinivasan have done, you have to know a little about proteins.
These substances are made of compounds, called amino acids, that are chained together by a particular kind of linkage called a peptide bond. In the beads-on-a-string analogy, amino acids are the beads. Some proteins consist of strings of as few as a hundred amino acids. Others have thousands.
A Variety of Body and Chain
There are 20 amino acids available for stringing. Each has an identical eight-atom "body." Attached to the body is a tail -- in chemical parlance, a "side chain" -- that contains between one and 18 atoms, depending on which amino acid it is. Some side chains are short, others long. Some are linear, others circular. All contain hydrogen, most contain carbon, many contain oxygen, and a few contain nitrogen and sulfur.
This variety of molecular structure gives each amino acid a chemical "personality." Some prefer watery environments, others oily ones. Some have slightly positive or negative charges, and are capable of attracting or repelling their fellow molecules. Some are bulky, others petite.
This spectrum of behavior is one of the reasons tens of thousands of different proteins can be built from just 20 amino acids. But that isn't the only reason.
Just as important is the order in which amino acids are linked together to make the chain. This amino acid sequence is absolutely crucial to a protein's identity. Scrambling it destroys both structure and function. Sometimes changing even a single amino acid can make a big difference. An amino acid change at Position No. 6 in the 146-amino acid protein called beta-globin causes the devastating sickle cell anemia.
More than 30 years ago, a chemist named Christian B. Anfinsen, working at the National Institutes of Health, proved that a protein's "knowledge" of how to fold is stored in its sequence of amino acids.
Proteins can be unraveled, or "denatured," by heat and certain chemicals. Anfinsen denatured a protein called ribonuclease, which contains 124 amino acids, and, through chemical analysis and experiments, he showed it had lost both its native shape and function. He then removed the denaturing substance and ribonuclease's function returned. Using laborious chemical analysis, supplemented by deductive reasoning, Anfinsen proved the protein must have regained its original shape as well. Anfinsen, who died in May at the age of 79, won a Nobel Prize in chemistry in 1972 for this work.
What Anfinsen's experiments didn't show, however, was how the information "drives" a chain of amino acids to fold into one and only one shape.
Seeking the State of Minimum Energy
In recent decades, most chemists have concluded that the answer lies in the concept of a protein's "energy state." Energy state, for complicated molecules, is a little like comfort for human beings.
Imagine you are camping on a beach. Before falling asleep, you roll around and try out many different positions. The goal is to find the best arrangement of the body's contour and the sand's contour. You try to put yourself into the shape that causes the least pain and permits the most relaxation of your muscles. When you find it, you stay put. You're at a low "energy state." You're stable.
Similarly, molecules orient themselves in their environment in order to be comfortable. Most proteins, for example, fold into roughly globular shapes, with the water-loving amino acids on the surface and the oil-loving ones inside. For every protein chain, there is theoretically some combination of twists, turns and bends that puts it in a minimum "energy state" -- its most comfortable and stable position.
The dominant strategy in solving the protein-folding problem has been to find an amino acid chain's state of minimum energy. According to this theory, the shape that yields the lowest energy state must be a protein's natural shape, or, as chemists call it, its "native conformation."
Rose and Srinivasan, however, took another tack. They focused on the process that chains of amino acids undergo as they fold, rather than the state at which they end up.
Specifically, the researchers theorized that as a protein folds, its amino acids are likely to be most affected by what's happening right around them in the chain. These "local" interactions occur early in the folding process. They quickly determine a chain's fate. They eliminate all kinds of twists and turns that might -- theoretically -- have been possible later.
In the string-of-beads metaphor, consider that the beads are capable of attracting, repelling or hooking one another. If you start bending a string of such beads, the interactions are more likely to happen first between beads at the bend (which are close to each other) than between beads far apart, say, near the ends of the string. These early "local" interactions, in fact, will determine which more distant interactions are possible, and which are impossible.
Rose and Srinivasan theorized that as a chain of amino acids becomes more bent, curled and looped, the number of additional bends, curls and loops that are still possible diminishes. Its folding "paths" become more and more limited, until ultimately there is only one -- and it leads inevitably to the native conformation.
This theory was the crucial insight. Rose and Srinivasan termed it "hierarchic condensation." It lies at the heart of the computer program.
The new theory doesn't overthrow the classical one. A protein's energy state is still of utmost importance. A folded protein always gets to a relatively low energy state. But because of the step-wise route it takes, the chain doesn't necessarily reach the absolutely lowest -- the most "comfortable" -- energy state possible, as the classical theorists believe.
Simple Rules of Attraction, Repulsion
The computer program, however, had to contain more than simply a process for folding the amino acid chain. It also had to specify how the amino acids could interact.
In nature, there's a huge assortment of attractive and repulsive forces that atoms can exert on one another. But Rose and Srinivasan theorized that protein folding, no matter how it worked, had to be pretty simple. So they stripped the chemical rules of their program down to the bare minimum.
First, they specified that no two amino acids could occupy the same space at the same time. Second, they said that oil-loving amino acids would tend to bury themselves inside the folded protein, while water-loving amino acids would tend to be on the protein's surface, which is usually surrounded by water. The third rule was that the amino acids would be permitted to interact with one another through "hydrogen bonds," which are weak chemical attractions that, multiplied many-fold, help linear molecules twist or lie in a regular way.
With the one principle and the three rules, the researchers tried out their computer simulation.
To simplify things, they used 50-amino acid pieces of known proteins, instead of whole proteins. They fed the amino acid sequence -- Position No. 1 through Position No. 50 -- into LINUS, their computer program, and asked it to fold up the chain. They then compared the result to the actual structure, which was known from X-ray crystal analysis.
The simulation works like this:
LINUS rotates three consecutive amino acid beads in the chain by a random amount. This kinks the entire chain, sometimes a lot and sometimes a little. The computer then evaluates each amino acid's relationship to those near it -- specifically to the six on its left and the six on its right. Most of the time, these randomly generated shapes are physically impossible, and the computer moves on, rotating the next three beads. When an "allowable" shape arises, however, the program calculates its energy.
The computer keeps the chain in its previous shape unless the new shape it creates has a lower energy. When that happens, the new shape is adopted and the old one is rejected. This random trying out of poses goes on until the last three beads are rotated. The pose -- or conformation -- that remains at the end of a run is then stored in LINUS's memory.
LINUS walks up and down the amino acid chain 6,000 times, creating 6,000 final poses. This is only a fraction of the possible conformations a chain of amino acids can take, but statistically it's a big enough sample to explore the "territory" of possibilities. LINUS then sits back and takes a look.
Frozen Snippets in a Chain
The computer searches for amino acids that persistently group together, and that satisfy the oil-or-water and hydrogen bonding rules. It looks for patterns that keep coming up. If a given relationship appears in more than 70 percent of the "poses," that snippet of structure is "frozen" into the chain. It won't be allowed to change in any later steps of the computer program.
The whole process then repeats itself. However, two things are different this time. First, the chain has some parts that are not allowed to move freely -- the snippets locked in from the first round. The second difference is that the computer program assesses a larger neighborhood of amino acids. Instead of looking at stretches six beads long, it looks at stretches 12 beads long. This goes on and on, until the "viewing" interval is 48 amino acids, essentially the length of the whole 50-amino acid chain.
Each round greatly limits the possibilities of the next one. Bigger parts are assembled from smaller parts, leading to a structure of increasing complexity. In this way, the protein folds in a "hierarchical" manner.
In practice, LINUS predicts two levels of protein structure.
Think of a folded protein as a telephone cord piled on a desk. You can see two kinds of shapes -- the coils and the bends and folds of those coils within the pile.
In proteins, the tightly wound coils are in a "secondary" conformation, and the less regular loops of the coiled cord are in a "tertiary" conformation. In proteins, there are several possible secondary conformations -- they're called "helix," "sheet" and "turn" -- and a nearly unlimited range of tertiary conformations. Each is essential to a protein's shape. LINUS comes up with both.
It takes the program four or five days of continuous computing to reach a final answer. LINUS predicts secondary conformation with 99 percent accuracy. The tertiary conformation was recognizable -- though not perfect -- in six of the seven proteins tried. (And the researchers think they know why the seventh one didn't work).
Rose, a biophysical chemist, and Srinivasan, an organic chemist and the creator of the computer program, don't claim that proteins in nature go through the exact steps that LINUS puts them through. But they are reasonably sure that molecules fold in a hierarchical manner. It's really the only way to ensure that a chain of amino acids can achieve the same shape each time.
It's theoretically possible to make chains of amino acids that fold into more than one shape. It's even possible the shapes will be stable enough that the chain won't unfold.
Why Evolution Favors Parsimony
What would the living world be like if a protein could take many shapes?
The answer, almost certainly, is that the living world simply wouldn't be. Evolution could not have occurred. Things would have been just too complicated. It would be as if every Ford Taurus coming off an assembly line could -- by chance -- have five or six different sizes of spark plugs and tires and doors and radiator hoses. Such machines wouldn't work very well. In the case of living organisms, they wouldn't even come into being.
Nature favors "parsimony," which is to say, things and procedures that are no more complicated than they have to be. Proteins that fold into a variety of stable shapes violate the principle of parsimony.
On the other hand, chains of amino acids that just happen to fold into one particular shape or another are useful raw materials for evolution. That only occurs, Rose theorizes, in chains with a particular amino acid sequence that permits a folding process to occur in small, hierarchical steps. And, in fact, relatively few chains -- of the nearly infinite number that could be constructed -- have amino acid sequences that give the chain that property. Chains whose amino acid sequences allow them to fold into many shapes, or not do any folding at all, aren't useful. They are discarded by the side of the evolutionary road. They never become what we call proteins.
If the LINUS program proves to be accurate in further tests, scientists may be able to determine all the three-dimensional structures of many proteins in a relatively short period of time. The shapes are known for fewer than a thousand proteins now. Increasing that number tenfold would give biologists a whole new look at this astonishing family of chemical substances.
Today, there is no consensus among biologists as to how many general types of proteins exist in the natural world. Some scientists believe there are thousands, others think there are fewer than a hundred. Having a large library of structures to study would make the guesses more educated.
More important, scientists might be able to detect structural relationships between seemingly unrelated proteins. This is an essential step in tracing a protein's genealogy back to common and extinct "ancestor" molecules in the primordial world.
Such relationships now are deduced largely by finding similarities in either the amino acid sequences or in the DNA that provides the genetic information for those sequences. Over the eons, however, both of those can -- and usually do -- change through mutation. It's often hard to tell, long after the birth of life, whether two present-day proteins are truly related or just similar by coincidence.
This problem is like the one linguists face when they try to determine the relationship between languages.
Over thousands of years, two languages that descended from a common, extinct ancestor may have lost most of their shared words, idioms and pronunciation. They may appear to be entirely distinct tongues. But if beneath their differences they share a common grammar -- a common working structure -- then a linguist may be able to conclude the languages are related.
Similarly, if two proteins with marginally similar DNA or amino acid sequences have nearly identical three-dimensional structures, then they are likely to be true relatives. When such connections are made, it becomes possible to map out a family tree of proteins, which in understanding evolution is as important as a family tree of plants or animals.
Biologists may eventually be able to recognize repeated substructures -- runs of helix, sheet and turn of given lengths -- that have mixed together in varying combinations to make up the vast variety of proteins we know today. These substructures will be the "elementary particles" of life, living fossils from the dawn of life.
"We will never by sequence comparison be able to see the earliest proteins," Russell Doolittle, an evolutionary biologist, said recently. "But by three-dimensional structure -- it's a three-dimensional world, after all -- we should be able to root many modern proteins back to the ancient ones."
Rose and Srinivasan's work on the protein-folding problem was published in June in the journal Proteins: Structure, Function and Genetics. It was financed in part by the National Institute for General Medical Sciences at the National Institutes of Health.