4 posts tagged “rna”
C Smolke
Caltech
This talk is more focused on synbio. There are many natural chemicals and materials with useful properties, and it would be great to be able to do things with them. Examples are taxol from pacific yew, codeine and morphine from opium poppies, and butanol from clostridium, spider silk and abalone shell, and the rubber tree. It is much more efficient to get these useful chemicals grown inside a bacterium rather than its natural source. These microbial factories are a useful application area for synbio. Similarly, intelligent therapeutics is another application area for synbio. In IT, two biomarkers together would (via other steps) produce a programmed output. You could link these programs to biosensors, or perform metabolic reprogramming, performed programmed growth and more. The ultimate goal is to be able to engineer systems. These systems generally need to interface with their environment.
Synbio *also* has circuitry, sensors and actuators, just like more traditional forms of engineering has. Foundational technologies (synthesis) -> Engineering Frameworks (standardization and composition) -> Engineered Biological Systems (environment, health and medicine). An information processing control (IPC) molecule would have three functions, as mentioned earlier: sensor, computation (process information from sensor and regulate activity of the actuator), and actuator. There are variety of inputs for sensor (small molecules, proteins, RNA, DNA, metal ions, temperature, pH, etc). The actuator could link to various mechanisms like transcription, translation, degradation, splicing, enzyme activity, complex formation, etc. Key engineering properties to think about are scalability, portability, utility, composability, and reliability.
What type of substrate should we build this IPC systems on? What about RNA synthetic biology? You'd go from RNA parts -> RNA devices -> engineered systems. Experimental frameworks provide general rules for assembling the parts into higher order devices. Then you organize devices into systems, which use in silico design frameworks for programming quantitative device performance. Why RNA? The biology of functional RNAs is one reason: noncoding regulatory RNA pathways are very useful. You can also have RNA sensor elements (aptamers), which bind a wide range of ligands with high specificity and affinity. Thirdly, RNA is a very programmable molecule.
They've developed a number of modular frameworks for assembling RNA devices, and she then gave a good explanation of one of them. In this explanation, she mentions that the transmitter can be modified to achieve desired gate function. The remaining nodes (or points of integration) can be used to assemble devices that exhibit desired information processing operations. A sensor + transmitter + actuator = device. The transmitter component for a buffer gate works via competitive binding between two strands. As the input increases in the cell a particular conformation is favored and gene expression is turned on. An inverter gate is the exact opposite. They wanted to make sure these sorts of frameworks are modular. They can do this by using a different receptor for the sensor to make it responsive to a different molecule.
You can also build higher-order information processing devices using these simpler modular devices. For instance, you might want to separate a gradient of an input signal into discrete parts. Another example would be the processing of multiple inputs, or cooperativity of the inputs.
The first architecture they proposed (SI 1): signal integration within the 3' UTR - multiple devices in series. They can build AND and NOR gates, as well as bandpass signal filters and others. In the output signal filter device, devices result in shifts in basal expression levels and output swing. Independent function is supported by matches to predicted values - the two devices linked in tandem are acting independently.
SI 2: a different type of architecture where signal integration is being performed at a single ribozyme core through both stems. You can make a NAND gate by coupling two inverter gates.
SI 3: Two sensor transmitter components are coupled onto a single ribozyme stem. This allows them to work in series. You can perform signal gain (cooperativity) as well as some gate types. With cooperativity, input A will modulate the second component which allows a second input A to bind to the second component.
Modularity of the actuator domain: using an shRNA switch - this exhibits similar properties to the ribozyme device.
How do we take these components and put them into real applications? One application is immune system therapies, where RNA-based systems offer teh potential for tight, programmable regulation over target protein levels. She had a really nice example of how she used a series of ribozymes to tune t-cell proliferation with RNA signal filters. After you get the right response, you need to create stable cell lines. Showed this working in mice.
Personal Comments: A very clear, very interesting talk on her work. Thanks very much!
Wednesday Session 1
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot - just let me know!
..of Saccharomyces cerevisiae
Gavin Sherlock
Afternoon Session, 1 September (11th MGED Meeting, 1-4 September, 2008)
The population structure in the presence of clonal interference is markedly different from that in a classic model. They needed a system to model population dynamics in a population. They use FACS and different-colored fluorescent cells to do this. They grew the cells in a chemostat, which is seeded with equal numbers of each of the 3 color types, and then measure the proportion of the population over time. The experiment has been done 8 different times. One run, for example, shows expansions and contractions followed by one color becoming the majority. Fixation of a color is not necessarily indicative of fixation of an adaptive event (multiple adaptive clones within a population with the same color).
Using yeast tiling microarrays, they can identifiy location nucleotide differences between the evolved and parent strains. Then sequence the candidate mutations. One of the mutations they found (in cox18) called Red 266 was discovered via decreased hybridization compared to the parent. Another example was where there was a comparatively higher level of hybridization. Mutation history can help determine which strains come from which earlier parent strains with mutations of their own.
Clonal interference is important in adaptive evolution of yeast. Specifically, glucose transport and signalling through the Ras pathway were both affected. In future, they wish to directly determine which mutations are adaptive, find out how general these adaptations are, and discover the effects of the adaptive mutations, and finally - what fraction of the adaptive landscape have we explored? Will it be the same or different in the other 7 experiments?
These are just my notes and are not guaranteed to be correct.
Please feel free to let me know about any errors, which are all my
fault and not the fault of the speaker. :)
Barbara Wold, California Institute of Technology
Plenary Lecture, Morning Session 1 September (11th MGED Meeting, 1-4 September, 2008)
A direct way to characterize a transcriptome is to sequence a cDNA copy of the entire transcriptome and then calculate the density of reads mapping to any given locus in the genome. Ultra-high-throughput sequencing platforms have made it practical for doing this genome-wide. They have done this for mouse mRNA and human tissues and cellines at levels ranging from 20 to 100M reads per transcriptome. These RNASeq texperiments detect RNA splice patterns including alternate splicing events, y identifiying sequence reads that cross known and theoretical splice junctions.
The methods of the previous speaker tells you about where specific parts are (starts, end). The RNASeq technique is a more broad way of looking at the transcriptome, and is a more brute-force method. However, you can still get really important data out of it. The main purpose of RNASeq is to be able to quantify RNAs, both relative and absolute. RNASeq is good at absolute numbers. It can also do transcript discovery and mapping, including revising gene models, splice isoforms, and RNA editing. Even in "boring" tissues like mouse liver or total mouse brain, you still come up with some robust newly-discovered transcripts using this technique that aren't quantitatively minor. There are limits to this technique, e.g. they're doing the work against known sets of genes. (Although they'd like to do work de novo). They're happy to help with providing data to help with this. A final function is in genetics, specifically expressed SNPs and private mutations that wouldn't normally appear on SNP arrays.
Two features of the data: when they do comparisons of technical replicates, they correlate very nicely. Biological replication can then really be about the biology. Secondly, the map of the RNA transcripts had a very nice linear shape on a log scale.
Should you look at RNA that can map equally well to multiple sites? Looking at 25mer reads in mammalian genomes. Let's see what happens to those can map equally well to 2-10 sites, inclusive, as well as the unique reads. 80% of the genome could be mapped uniquely, with 6% between 2-10, and 14% with more than 10. In myoblast transcriptome, the fraction that maps uniquely is smaller (69%), and this is something that happens generally. This is because there's lots of gene paralogy, and you'll get things that map due to a (recent or old) duplication. So if you ignore these multi-read sequences, you will risk missing out important stuff entirely.
What are the kinds of genes that are multiread sensitive? Their example is an actin gene (EL4r1). If you just map unique reads, you miss everything that is in the exons, and therefore would show as if it was NOT expressed if just using unique reads. RNASeq is really good in detecting alternative splicing. Really rare alternatice splicing events may just be the random events that are not intended, but which the system can tolerate - this should be taken into account.
They have discovered some candidate new genes: 161 in the brain, 95 in the muscle, and 77 in the liver, and some of these are overlaps between the three.
You need to include multireads to detect some true positives in ChIP: 5-10% of sites in the interactome are affected. Can you ID by ChIP essentially all sites predicted by FUNCTION assays? Yes, but strongly conditioned on good abs and good cells. Do you expect detectable function at every site with significant & reproducible in vivo occupancy? No - more data is needed, long range cis-interaction in big genomes, 3-C signals in our data etc. Significant ChIP at all instances of high consensus motif match? MyoD, Myogenin- NO!, as >1 million perfect motifs in genome. Yes for big, well-specified motifs (NRSF), and the meaning of binding seen at some 1/2 of the sites is unclear.
(Chromatin Immunoprecipitation: ChIP.)
These are just my notes and are not guaranteed to be correct.
Please feel free to let me know about any errors, which are all my
fault and not the fault of the speaker. :)
Dynamics and Complexity of the Coding and Non-Coding Transcriptome
Piero Carninci, RIKEN Omics Science Center
Keynote Lecture, Morning Session 1 September (11th MGED Meeting, 1-4 September, 2008)
They've been mapping the expressed part of the genome, aka the transcriptome. This will help us understand the genome output. There are many different issues with RNAs that are retrieved via standard methods: there are many different transcripts from a single gene, and different promotors. Each promotor will have different levels of activity. They use Cap Analysis Gene Expression (CAGE). They are using MAGE-TAB and SDRF formats to store their data.
While in the 1990s, people thought there were 70,000-100,000 protein-coding genes. Today, we expect that there are only about 20,000. Instead, there is a lot of complexity: post-translational modifications, many overlapping transcripts, multiple promoters, etc.
But what are the long non-coding RNAs (ncRNAs) doing? They are long stretches of sequences that are not conserved. However, their promoter sequences are often conserved. Perhaps the mechanisms of their action do not require long stretches of conservation in the gene. Most of the unknown RNA is polyA minus and nuclear. A large proportion of the long RNAs are cleaved (deriving short RNAS that are often conserved). These derived short RNAs are mapping on the 5' end (PASRs) and 3' ends (TASRs) of genes. Therefore the whole transcript is not conserved because it doesn't need to be: only those bits that are cleaved and used later on need to be conserved. Essentially, this means a large number of RNAs from an individual locus.
PolyA- CAGE is mostly nuclear, overlap introns and TSSs, while cytoplasmic is more on exons. 3' untranslated regions (UTR) also are interesting: they start from a conserved promoter which has a conserved GGG section. There is also RNA that starts from the middle of a gene. It is more prevalent in the tata-box, with sharp promoters. Mouse and humans have similar starting sites. There are also antisense RNAs. Most TU (72%) show antisense transcription. Are the sense-antisense RNAs co-expressed? Is there dynamic regulation? If you perturb antisense RNA, the sense will be overexpressed. It also seems that sense and antisense RNA aren't transcribed at the same time - that they might take turns (this is my impression from the slides, rather than something he said exactly). Sometimes sense-antisense work in the cytoplasm (with theproduction of natural siRNA). One example is the beta-secretase-1 antisense, which increases the sense RNA (feed-forward loop), which is important in Alzheimer's.
You can even get RNA expression from repeats. Repeat elements can produces short RNA, like natural siRNA. They have identified that 10-35% of the transcript correspond to repeat elements. Surprisingly, they have dynamic tissue-specific behaviour / patterns. There is overrepresentation of repeats in the nucleus among polyA- RNAs, and there is compartment specificity.
There is a lot of promoter plasticity. A switch to PyPu will increase transcription, while the reverse decreases it. They're having a look at preferentially-expressed promoters (PEPs). These are promoters that have >30 tags and are statistically significant. The distribution of PEPs in brain tissues: genes that have multiple-tissue-expressed PEPs. Different PEPs drive funtional variability of the proteome. PEPs create more proteome diversity. They make use of THP-1 cells as a model cell. 46% of genes in THP-1 have alternative promoters. Of these 18, 245 are high-confidence promoters. 1909 of these are newly-discovered. CAGE identifies the active set of promoters, and more precisely defines the TSS position.
CAGE is not dependent on microarray design, and measures expression including ncRNAs. They have some bioinformatics tools freely available for the CAGE protocol, and have tried to simplify the CAGE protocol. Please contact him if you wish help in making your own CAGE library.
These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker. :)