How do I read ENCODE genome data

Genome digital

Then I discovered a toad that was very extraordinarily colored. It looked like it had fallen into the inkwell. Completely black. But her stomach and the soles of her feet glowed scarlet.

The naturalists roamed the world with open eyes. There was a lot to discover - back in the 19th century. And today? There are supposed to be biologists who cannot distinguish a deer from a stag.

I carried the ugly animal to a pool of water, assuming I was doing him a great favor. But the toad couldn't swim. Without my help, she would have drowned.

Anyone who wants to understand nature needs information. Knowledge of how an organism looks and functions. Today's researchers find this information in a single cell. They research and decipher the genetic make-up: the sequence. That is the sequence of the individual genetic components. In humans, three billion base pairs make up the entirety of the genetic information: the genome.

"We are trying to extract all kinds of information."

The center of modern biology is the American university city of Cambridge near Boston. Here Eric Lander founded the Broad Institute. Today one of the largest facilities for genome research in the world. Countries:

"We try to get all kinds of information. We compare the genetic makeup of humans and monkeys, dogs, elephants and rabbits."

Eric Lander is someone who leads the way. He sets the trends for modern biology. He has influence. The US-American President Barack Obama appointed him to his personal advisory team.

"There are biochemical differences between a human and a chimpanzee. Or a human and a dog. That is what genetics is about. We love to find such differences and then explain them."

I want to tell you about two remarkable lizards that I discovered on the island. One lives in the water - or not far away. It has a flat tail and webbed toes. In the water, she moves swiftly and gracefully. On land she dozes lazily in the sun. The other lizard lives in the interior of the island. She has no webbed feet and her tail is round. She crawls slowly, her tail and belly dragging across the floor.


Fruit fly, roundworm, mouse and rat. Chimpanzee, gorilla, human, chicken and opossum. Everything that lives is busily sequenced. And even extinction does not protect against the decipherment of the genetic make-up. The mammoth and the thylacine are in the works, the Neanderthals anyway.

In Hinxton, a small English village near Cambridge. On the Genome Campus at the Sanger Institute. Two people ensure that the machines work day and night. The devices are about the size of refrigerators. Always two stand on top of each other. There are a total of 75 of these sequencing machines. Ian Guthrie is the engineer on duty. He opens one of the machines.

"The DNA is prepared and passed to us in a plate which contains 394 samples."

The DNA samples are first prepared and pass through thin tubes onto small plastic plates. One plate for 394 samples. And then the hereditary molecule is read out letter by letter - fully automatically.

"Fairly simple in principle, but expensive to work correctly and sure."

In principle simple - but expensive because everything has to run absolutely correctly and reliably, says engineer Ian Guthrie.

The heyday of genome research began in the mid-1990s. After some bacteria were fully sequenced, the human genome organization Hugo decided to fully decipher the human genome. 15 years had been set for the mammoth project. But it should go much faster. Because the technology has been continuously improved. Many crucial improvements came from small businesses. The company "Illumina" is now one of the market leaders. David Bentley is their chief developer.

"The completion of the human genome by 2003 was achieved with a reliable, well-tested technology. Very precise - but also very expensive. In order to benefit from the genome data, we now want to sequence as many personal genomes as possible and then compare them with the reference genome This was not financially feasible with conventional technology. New methods were needed - a completely new approach. "

David Bentley envisions the sequencing of the future in such a way that the hereditary molecule can be read directly without the DNA - as before - having to dissolve and multiply in liquids. Bentley:

"With the new concept of DNA sequencing, we operate in the microscopic range. Many biochemical reactions take place at the same time. We read 32 million genetic material fragments in one pass. This is how we collect the information from a billion genetic components. A third of the human genome in three days with a machine. And everything takes place on a small slide. "


Sequencing a single genome no longer costs billions of euros like the Human Genome Project, but it still costs around $ 100,000. As a distant goal, the US National Institutes of Health, NIH, have stated that one day the sequencing of a genome will cost only $ 1,000. Bentley:

"The next step would be sequencing a genome for about $ 10,000. Then you could start large research programs in which many genomes of individuals would also be sequenced. So there is still a long way to go to the $ 1,000 genome. But the pace, The inventions of new things in this research field are enormous. An explosion is to be expected - from today's perspective. "

In January 2009, the California-based company "Complete Genomics" announced that it would use a new method to bring the price of a genome down to $ 5,000 by the summer. The race for the "genome for everyone" has begun.

I waited until the fertilized newt egg had become eight cells. I could see them clearly under the microscope. Then the big moment came. I plucked a hair from my temple and tied the embryo with the hair in the middle. So two embryos were under my microscope. I was amazed when two fully functional organisms actually matured.

When the human genome project began, the information on a human genome could only be accommodated on mainframes. Today everyone could save their personal genetic information on the hard drive of their notebook. The interpretation of the data is still difficult. A task for human geneticists like Marcus Pembrey from University College London.

"Such a personal genome is a huge amount of data. It has to be interpreted and made understandable to the individual. What many do not know: the genome does not say whether someone is necessarily ill, but rather about genetic variants. Statistically, they increase that Risk of developing a certain disease like Alzheimer's. That means the risk increases slightly. Even a doubling of the risk says little. Nothing is predetermined. "

This information is of little benefit to the individual at first. Pembrey:

"Do not forget: genetic data in itself is not meaningful information. It is not the case that the doctor has the genetic material sequenced and then prescribes the right pills for you. The sequence only reveals that you have an increased risk of diabetes, for example You still have to do a blood sugar test. It's a laboratory test. It tells me whether you are really diabetic. Both can be useful: for the prevention of diseases and for the treatment. "

Scientists are not interested in the genetic make-up of individual people. You want to know as many genomes as possible in order to be able to compare: What distinguishes an asthmatic from a non-asthmatic, a diabetic from a healthy person? Other comparisons can already be made: What is the difference between a cancer cell and a healthy cell in the same person? Because not all cancer is the same. This information is being collected in a global cancer genome project. Eric Lander, the director of the Broad Institute in Cambridge Massachusetts.

"We treat patients with lung cancer as if they all had the same disease. But in reality it is about diseases with different genetic causes. It is the same with brain tumors, the glioblastomas. We pretend they are fundamentally the same. We know that that they are caused by different mutations. "

The type and location of the mutations determine how aggressive a tumor is. And the doctor receives information on which therapy promises particular success. The cancer genome project is still in its infancy. Eric Lander believes this project will change cancer medicine.

"If one of my relatives had cancer, I would use any information I could get. Even if it were only of limited significance. Better than nothing. And so I believe that it will soon be routine in rich countries that a cancer patient has their tumor sequenced. "

Thanks to the radioactive radiation, there was an unusual variety of forms in Drosophila melanogaster. Some flies showed changed eyes or body colors, the bristles changed their texture. Numerous flies were deformed. Sometimes parts of the body got into unfamiliar places: legs instead of antennae or wings instead of swinging arches. The Drosophila flies looked different, but they still mated.

Diabetes, Alzheimer's, rheumatism, depression, heart disease, allergies. These are complex diseases. Genes play a role, but so does personal lifestyle. How these factors interact is still largely unknown. The first thing is to find the genes involved. The gene search is in full swing. Eric Lander:

"There has been a fantastic explosion in human genetics when it became possible to find at least the common genetic variants. They are found in 10, 20 or 30 percent of the population, and they increase the risk of certain diseases. A year or two ago it was impossible to track them down. "

Eric Lander's research institute, the Broad Institute in Cambridge, originally specialized in genome sequencing. Today, so-called genotyping is at least as important. The researchers want to find gene variants that play a role in the development of diseases. It's about the comparison of certain places in the genome. Which variant occurs more often in patients than in healthy people?

"My name is Stacey Gabriel and I direct the genetic analysis platform at the Broad Institute."

Stacey Gabriel heads the Genotyping Laboratory at the Broad Institute. It leads through a room with many smaller devices - hardly larger than conventional laser printers.

"Exactly what I always wanted to do. A great place. The whole floor here. Everything is genetic analysis. In my group: 50 employees. 15 of them computer specialists. Three companies offer the devices for gene analysis. We have devices of all three. What you see here costs seven to eight million dollars. Our throughput: Almost 2,000 DNA samples a week. Every morning: Reloads 27 times. That takes three hours in this machine: washing and staining. And then analyzing. Hundreds of samples every day. Thousands every week. Seven days a week. Day and night. Even on weekends. So that the machines are working to capacity. This is what it sounds like when you determine a few dozen million gene variants. Series named "Sponge Bob". Sandy. Quitwork. Patrick. All Sponge Bob characters. Sorry. I really have to go now. My husband is waiting. "

Numerous risk genes have been discovered in this way since 2007, for example for diabetes, high blood pressure, obesity and heart disease. More will follow. Eric Lander:

"It wasn't so long ago that it cost a million dollars to examine a million locations in a person's genome. Now it's only a few hundred dollars per person."

Do you prefer to run a marathon? Or are you a sprinter? Are you a late riser? Or is your motto: the morning has gold in its mouth? Your answers reveal a lot about your genes.

it says on the company's website "Twenty Three and me". She belongs to the Google group.

Twenty three and me will help you explore the genes that underlie your personality. Discover the genetic basis of your optimal diet. Compare yourself with your friends and family members and find genetic similarities. A test for one person costs only $ 399.

Thousands of people have already ordered their personal gene profile from "Twenty Three and me". They want to find out more about themselves and their future: Does the profile give the all-clear or does it indicate a risk of disease? Geneticists advise against such gene profiles, because the data are Often incomprehensible for laypeople, and advice before or after the test is not part of the company's offer. German human geneticists demand no genetic test without advice. This is also the case with Klaus Zerres from the RWTH Aachen University Hospital:

"This area of ​​application of tests is of course very problematic because: You take a single genetic test out of a complex structure that you have not yet understood, and you say: You have a slightly increased risk of disease, and you leave this information to the patient without him basically something can actually be used with it. "

The gene profiles à la "Twenty Three and me" remain vague. They speak of increased risks or "general predisposition" - for a heart attack, for example. Zerres:

"The only question is: If I have a threefold increased risk, what do I do with it? And here we are in a completely different area, because health-conscious behavior is not the automatic consequence. Such a gene profile with the statement of increased risks, you know we have long since, people are too fat, they smoke, and they are probably at a much higher risk and of course they know that, and yet it goes on. That means: recognizing risks and implementing health-conscious behavior are absolutely two different things. So I would think: You should smoke less, get thinner, and then you can save yourself the genetic test, because it only provides a small piece of the mosaic. "

Eric Lander:

“It's not mainly about telling individuals what personal genetic risks they carry. We want to understand the biological mechanisms behind a disease. We can learn a lot with these new genotyping technologies, but of course not everything. that we can get half to 70 percent of the important information from the genome. The results are inevitably incomplete. But let's start, I always say. And we have started. Technology will get better and in five or ten years we can anyway read all the information from the genome - with the help of sequencing. "

It's easy with these new enzymes. A few drops on the DNA and I can just cut out a single gene - and put it somewhere else. Maybe I can make my bacteria glow green?

The hereditary molecule DNA is first and foremost a store of information. For example, it contains the blueprints for proteins. They are read when necessary. RNA copies are created and these are translated into proteins. A specific region with a protein blueprint is called a "gene" in biology. One gene - one protein, was the principle used for a long time. Large parts of the DNA were considered to have no function - as "junk" DNA. Worthless garbage. In recent years, however, scientists have found more and more indications that there is no question of "garbage". Information is also stored between the classic genes. The DNA can be active without producing a protein. Ewan Birney from the European Bioinformatics Institute in Cambridge, UK, is coordinating an international project called "Encode".

"That was really a surprise. Because we saw that a lot more DNA was copied than we had expected. That didn't fit with the old concept of the 'gene'. According to this, the RNA is nothing more than an intermediate product, the messenger that the Carries information from the DNA into the protein factory so that a protein is produced there. "

In the Encode project, the scientists are investigating: Which regions are active? Which RNA is produced where? Birney:

"The Encode project proved that the classic messenger RNAs are actually made. But there are also many other RNA molecules. These come from the regions between the genes. Or even from the inside of a classic gene. Our view of the genome has changed completely. Lots of RNAs are made. And only a few are needed for protein production. We don't yet understand what they do or whether they are in any way important. "

Some of the newly discovered RNAs are broken down into tiny snippets, so-called micro-RNAs. They are known to play an important role in gene regulation. A new branch of research deals exclusively with these small RNA molecules. Basically, every copied DNA is a unit of inheritance - that is, a gene. DNA, RNA and proteins work together in complex ways.The interplay on different levels can only be understood with the help of the computer. Birney:

"Computers are playing an increasingly important role in biology. You can even say that biology is an information science. It's about how information is conveyed and changed - in our organism. Biology is less of an experimental science today - so, wet science '- as a theoretical science - that is,' dry science '. "

This sequence is supposed to have something to do with asthma? That's not a gene at all. But the database says: many animal species have the same sequence segment. Must be important. Mouse and rat have it too. Even the nematode. Riddle upon riddle.

Jon Beckwith of Harvard Medical School in Boston is a pioneer in molecular biology. He became known at the end of the 1960s. At that time, he managed to isolate a single gene. That was the beginning of genetic engineering. Jon Beckwith began to think about the consequences of his work. Today he is considered the moral authority of biology.

"When I started with biology, everyone knew everyone else. Today it's different. But we can't go back. Molecular biology was too successful. Nothing works today without robots and computers. A new industry has emerged. And that Research has become extremely expensive. "

Major projects follow one another at ever shorter intervals. They call themselves "Human Genome Project", "HapMap", "Transcriptome", "Epigenome", "Cancer Genome Project" and so on. It's always about the same thing: collecting, storing and comparing data. Beckwith:

"I admit that the mega-projects have provided a tremendous amount of information that we can now use. But what we often see is that the people who started these projects with great promises don't care to evaluate the information. You just move on to the next mega-project. I saw it myself once at a conference. Someone asked, 'An interesting gene. What is it good for?' And the speaker replied, 'That's not my job . Somebody else has to take care of it. 'The problem: There are hardly any people who take care of it. "

With this, Jon Beckwith turns against the makers and pullers of today's biology. Also against Eric Lander, the director of the Broad Institute.

"The question is: How much money should we spend to get basic information? In my opinion, that should be two percent, or three or five percent for any scientific discipline, in order to create a basic knowledge infrastructure. That is the data , the information that all other researchers need to advance biology. I think this is money well spent. Provided the information is really free and publicly available. This is the acid test: Can the information be used by all scientists for free? If so, all I can say is: a great investment. "

Jon Beckwith fears that the really important questions of biology could get out of focus. The mega projects, as he calls them, lead to a flood of data. Beckwith:

"This is information for the sake of information. There come people from other sciences - like physics or mathematics. They develop this mega-information, but are not trained biologists and are not interested in biological questions either."

Jon Beckwith of Harvard University in Boston is a representative of an earlier generation of biologists. Eric Lander, the head of the Broad Institute, on the other hand, stands for the new biology.

"When adults play video games, you can see that they didn't grow up with them. With children and adolescents, video games are part of their normal lives, just like cell phones and text messages. The same goes for science. Anyone who grew up in a world of unlimited genomic information Anyone who knows both sides from the start: the laboratory experiment and computer biology, is damn it a biologist and they don't separate them. For them, they're the two sides of the same coin. Something new is emerging. That might be no longer understandable for a biologist of the late 20th century. But that's what we call progress. "

It's going to be a long night for the administrator. The whole system has come to its knees. Internet - nil. Wait. Drink a latte.