How the human genome folds in 3-D

Lieberman-Aiden wins 2010 Lemelson-MIT Student Prize

By Ana Lyons Mar. 9, 2010

2812 lieberman 1 — Erez Lieberman-Aiden G invented a way to find out how the genome folds.
Lemelson-Mit Program

2813 lieberman 2 — Peano curvesDiscovered by Giuseppe Peano in 1890, Peano curves are a class of one-dimensional space-filling curves that densely fill higher dimensional space. Liberman-Aiden, winner of the Lemelson-MIT Student Prize, recently published a 3-D map of the genome that suggests long stretches of DNA fold into Peano-curve–like structures.
Lenoid A. Mirny and Erez Lieberman-Aiden

2814 lieberman 4 — Equilibrium globule modelThe “equilibrium globule” is an older model that describes how the genome might exist in three-dimensions. Unlike in the “fractal globule” model, here individual strands are highly entangled and regions nearby along the chain are far apart in 3-D. In this image, nearby regions on a chain of DNA are indicated using similar colors.

2815 lieberman 3 — Fractal globule model Lieberman-Aiden’s new technique for creating 3-D genomic maps (Hi-C), gave him evidence him to theorize that the structure of the human genome forms a “fractal globule.” Here the genome forms super-dense, knot-free “globules of globules of globules,” which allows for regions in 1-D to also occur nearby in 3-D. Nearby regions are indicated using similar colors.

CORRECTION TO THIS ARTICLE: This article made several conceptual errors regarding Lieberman-Aiden’s research on the fractal globule model. The article incorrectly stated that the evidence suggesting a fractal globule implies that “the genome separates into two clear compartments: one where stretches of DNA are known to be active, and another where DNA is inactive and stowed away for future use.” Instead, this compartmentalization of the genome is an observation that was made by the Hi-C team at a larger scale, and is unrelated to the presence of a fractal globule at the smaller scale. The article stated that “when unstretched onto its two-dimensional, double-helix form, the human genome spans nearly two meters in length,” which should read “when unstretched completely in one dimension.” The paragraph stating that the fractal globule can be reduced further to a Peano curve is also inaccurate. The fractal globule is itself a polymer analogue of the Peano curve: The fractal globule signature is seen both in active and inactive regions of the genome, not only “in order to store less often-used genes and pack them more densely” as the article suggests. The image credits for the fractal and equilibrium globule models were incorrect. These images should be credited to “Leonid Mirny and Maxim Imakaev” and not “Lenoid Mirny and Erez Lieberman-Aiden.” Mirny’s name was misspelled in these credits, appearing incorrectly as “Lenoid” instead of “Leonid.” The credit for the Peano curve graphic is correct.

Until recently, the process of how genomic DNA neatly folds itself into the nucleus of a cell — twisting and contorting into a work of astonishingly compact molecular origami — had perplexed biologists.

When unstretched onto its two-dimensional, double-helix form, the human genome spans nearly two meters in length, yet it must fit inside the cell nucleus, which is only a hundredth of a millimeter in diameter. How exactly the genome can compress into an unknown three-dimensional structure and retain some sort of underlying order, all while persisting tangle-free, remained a fundamental mystery in structural biology.

But last fall, Erez Lieberman-Aiden — a seventh year graduate student at the Harvard-MIT Division of Health Sciences and Technology — developed a new technique for creating 3-D genomic maps called “Hi-C.” His results led him to theorize that the structure of the human genome follows a fractal-like pattern, forming super-dense, knot-free “globules of globules of globules” in order to overcome their troublesome spatial and entanglement problems.

For leading this groundbreaking research, Lieberman-Aiden was awarded this year’s $30,000 Lemelson-MIT 2010 Student Prize last Wednesday at a ceremony held in the Bartos Theater at the MIT Media Lab.

In contrast to previous “equilibrium globule” model of the human genome — where related regions often occur far apart in three dimensions and various components are highly entangled — Lieberman-Aiden’s “fractal globule” model suggests that the genome separates into two clear compartments: one where stretches of DNA are known to be active, and another where DNA is inactive and stowed away for future use.

Whether or not this organizational model will hold for other cell types, however, is currently unclear.

Lieberman-Aiden was also recognized by the Prize committee for his linguistics research (which appeared on the cover of Nature in 2007), for founding a new field of mathematical biology known as “evolutionary graph theory” (published in Nature in 2005), and for developing an electronic insole for diagnosing poor balance in the elderly (called the “iShoe”).

Other finalists were Barry M. Kudrowitz and Amos G. Winter, both current Ph.D. students in Mechanical Engineering.

According to the Lemelson Foundation, the $30,000 Lemelson-MIT Student Prize is awarded annually to “an MIT senior or graduate student who has created or improved a product or process, applied a technology in a new way, redesigned a system, or demonstrated remarkable inventiveness in other ways.”

Students apply to the competition in an intensive process that requires essays and letters of recommendation. A panel of ten MIT judges who are “alumni including scientists, technologists, engineers and entrepreneurs” then choose the winner.

“I was very, very excited,” said Lieberman-Aiden.

Applications are now being accepted for the 2011 Lemelson-MIT Student Prize. Full details on the application process can be found here: http://web.mit.edu/invent/a-student.html

How Hi-C Works

To develop the “Hi-C” method — which constructs three-dimensional maps of entire genomes — Lieberman-Aiden worked with postdoctoral student Nynke van Berkum of UMass Medical School, and their advisors Eric S. Lander and Job Dekker. The team also collaborated with Leonid Mirny’s group (in the MIT Department of Physics and Harvard-MIT Division of Health Sciences and Technology) as well as graduate student Maksim V. Imakaev to simulate of the dynamic behavior of the fractal globule.

“I’ve thought about [the idea] for quite some time,” Lieberman-Aiden said.

“Earlier in 2007, I saw a talk where I heard it took six months to figure out that two pieces in the genome were touching,” he said. “I remember thinking ‘gosh, that’s a really long time.’”

After seeing this talk, “I thought we could do better and take advantage of modern sequencing technology,” he said.

Based on these initial ideas, Lieberman-Aiden and his colleagues developed their “Hi-C” method, which uses formaldehyde to freeze linkages of DNA that are far apart in the linear genome, but adjacent to each other in 3-D.

The linked pieces of DNA are then marked with biotin, extracted, and mapped onto the reference copy of the human genome to determine which loci neighbor each other.

To complete the process, a computer cross-references neighboring gene pairs and assemble the genome’s 3-D portrait.

In Lieberman-Aiden’s words, Hi-C is like “figuring out who is friends with who.”

“Imagine one day there is a security breach on Facebook, and all the pictures were now leaked to the public,” he said. From the leaked data, you can see if there are patterns where people show up in the same pictures.

“If people keep showing up in the same pictures over and over again, you can concluded that they’re probably friends,” he said.

“It’s the same idea is behind the 3-D technique, but instead of determining friends, we’re determining who’s nearby in 3-D space,” he said. “We know the 1-D sequence of the human genome, so we can use this as a reference when we reconstruct what the 3-D architecture must be like.”

Lieberman-Aiden also posted an interpretive dance of how the technique works, which can be found on YouTube at: http://www.youtube.com/watch?v=06UouUmuEbw

Local biochemical vs. global spatial modifications

“A very interesting idea at the core [of this research] is that all cells have the same genome, but perform very different functions,” Lieberman-Aiden said. “There’s an incredible variety of functions among cells, despite them all having the same information.”

In the past, differing function “has all been associated with local biochemical modifications: biochemical changes at certain sites in the genome, making certain [information] get turned on and off,” he said.

For example, “by adding or subtracting methyl groups, you can introduce instructions saying things like ‘you should express this more,’ but these biochemical changes are all occurring locally.”

But “here we find that it’s actually spatial modifications that can influence expression,” on the “global scale,” he explained. “It’s a totally different type of modification.”

Lieberman-Aiden used an analogy, likening the genome to a newspaper.

When thinking of the genome, “imagine a paper with writing on it, maybe even a newspaper....maybe even The Tech,” he said.

If everything on the page you were reading were the same dull font, you’d start reading somewhere at random, with no idea of what was most important that day, he said.

Suppose you’d like to change how various things are emphasized. “One thing you can do [to emphasize what’s most important] is underline things, make boxes around words — make various local modifications.”

“These modifications would tell you ‘Ahh...I should pay attention to this,” he said.

In a newspaper, these modifications might be the style of a headline or a box of color, and in the case of the genome, these would all be biochemical modifications.

But then say you realize there’s also another way to emphasize different things, which would affect the organization of the contents more globally: “Let me fold the paper in little ways and actually change what appears on the front page.”

Just as the different types of cells fold their genomes differently depending on their function. “Depending who you are and what you want to read about, you might fold the newspaper in different ways,” he said.

For example, “if you’re trying to sell the paper at a newstand, you might fold the paper in one way. But if you’re the president of the MIT origami club, you might decide to fold it into a crane instead.”

Similarly, if you put different sections in the front, you’ll get different newspapers, explains Lieberman-Aiden.

For example “If you put business in front, you’ll have the Wall Street Journal.”

And in the case of the cell, “in doing these reconfigurations [of the genome], you can control what’s on and off and thereby change the function of the cell as a whole.”

“It’s another type of way to modify a sort of universal substrate, and the genome is basically doing the same thing….different ways you configure the genome could give you different functions or identities,” he said.

Human genome is organized like a library…made of ramen

Lieberman-Aiden and his team also zoomed in further, examining how the genome folds at the scale of a megabase, or one million of the genome’s biochemical ‘letters’. The question was: “How does this megabase fold up?”

To help think of this question, Lieberman-Aiden recalled an analogy that had been suggested to him by Leonid Mirny, Professor of Physics and Health Science and Technology. “A genome contains information, a library of information.”

In this case, “you can imagine that [the human genome] should therefore be organized like a library,” he said.

“How should you organize a library? Well, you want it to be compact: everything is in one place. You want it to be organized: books on similar topics should be physically near each other. And you want it to be accessible: when you find the book you want, it shouldn’t be behind glass; you should be able to pull it off the shelf, read through it, and then put it back the same way you found it.”

Knowing this, the next question he said one might ask is: “how might one design such a library?”

“It turns out the standard way that a polymer might fold is totally incoherent with that [ideal library]; it’d be dense, but it would be totally disorganized and completely knotted,” he explained.

And “because it’s highly knotted, the information isn’t at all accessible,” he said.

But his Hi-C data, suggests that the genome forms an unknotted macroglobule or what the team calls a “fractal globule” — which interestingly, Lieberman-Aiden says in many ways is like a package of ramen noodles.

“It turns out actually that the fractal globule pretty deeply resembles the model of uncooked ramen noodles,” he said. “You can contrast this with the classic polymer structure, which is the arrangement that the noodles take once you’ve cooked them.”

If you “turn up the heat, and the noodles are going to oscillate and wiggle...and in the process they’ll get deeply, deeply entangled,” he said.

According to Lieberman-Aiden, “this is similar to the classic polymer conformation,” called the “equilibrium globule model.”

In the ramen analog of the equilibrium globule model of the genome, “the most salient property was that if you stick a fork in them, you can’t pull apart one or two noodles: you end up pulling out a whole clump because they are so entangled.”

“The fractal globule module is more like the uncooked ramen, whereas the classic equilibrium model of condensed polymers is more like the cooked noodles,” he said.

“Space-filling fractal curves pack space very, very densely, but can do this without knotting,” he said.

If you want to access something from a fractal globule structure, “you can just pull out a little piece and stretch it out to examine it. When you’re finished, you can just crumple it back up, and put it back where it came from,” making their use especially advantageous in the genome.

As one additional property of the “fractal globular” model — like in the case of ramen — is “if something is nearby in 1-D, it will be nearby in physical space,” he said.

“This may be why genes that are related in function tend to cluster in 1-D; by doing so, they are actually forming a spatial cluster when they fold up in 3-D.”

Peano curves appear in genome

Lieberman-Aiden’s research showed that the human genome likely forms fractal-like structures, but that’s only the half of it.

These fractals can then be reduced down even further to “Peano curves” in order to store less often-used genes and pack them more densely — a type of curve which Lieberman-Aiden says has a particularly interesting history.

As the first person who discovered such a curve, back in 1890, Giuseppe Peano was motivated by mathematics of the time to construct what Lieberman-Aiden calls “an extremely, extremely peculiar curve.”

Lieberman-Aiden said the Peano curve is what’s known is mathematics as a “space-filling curve.”

“Even though it is one dimensional, it can fill space so densely that it resembles higher dimensional objects,” he said. The discovery of this type of curve “blew mathematicians minds, and it really messed with their ideas of dimension.”

But after “Peano constructed this thing, and it led to a lot of rethinking of basic questions in math, eventually the mathematical agenda moved on.”

“It never really occurred to anyone that any actual existing contour in the world would resemble this structure,” he said, “until a team of physicists, nearly 100 years later, suggested that the initial state of a condensing polymer might resemble a Peano curve. But the observational evidence was limited until now.”

So for Lieberman-Aiden, a trained mathematician, to discover than the human genome may actually incorporate these curves is especially exciting.

Influence of cross-training in math and physics

Before coming to MIT as a graduate student, Lieberman-Aiden studied mathematics, physics, and philosophy at Princeton as an undergrad.

When asked about the influence of this training on his interdisplinary work in biology, he said that it doubtlessly contributed to his current views on research.

“I think that the analytic techniques you learn by doing math and physics are very powerful and really can help you,” he said.

“It really helps me usually when I’m analyzing data; sometimes I’m not really straining myself because I got really comfortable with thinking quantitatively as an undergrad.”

“Because of background, that actually means that I have an extra gear or two,” he said. “If I find a problem where I think that there might be a good opportunity, I’ll use that extra gear. It also means that the extra exposure to mathematical and physical techniques and literature exposes me to ideas like the fractal globule,” he said.

After completing graduate school, likely within the next year, Lieberman-Aiden said that he will continue his research on Hi-C on a Harvard Junior Fellowship at the Harvard Society of Fellows.