Create your own awesome maps

Even on the go

with our free apps for iPhone, iPad and Android

Get Started

Already have an account?
Log In

Make list of figures/experiments to make by Mind Map: Make list of figures/experiments to make
0.0 stars - reviews range from 0 to 5

Make list of figures/experiments to make

Notebook Entry with links

Richard's Suggestions

First, congratulations on your new paper draft! My understanding of your paper is that you have shown that you can identify a naked dsDNA sequence by comparing its unzipping force-index curve against a library of simulated curves. Very cool, and I want to thank your team for writing the paper in a way that I can understand. Here are my comments: 1. I think the major contribution of the paper is that you have an accurate simulation and matching method, which can easily account for elasticity of the tethering construct, viscosity, etc. I'd like to see more details (or at least know that you have the answers in your back pocket) on the information-recall side of the problem. See terms at the bottom of < http://en.wikipedia.org/wiki/Sensitiv... >. I'm no expert here, but here are my thoughts (and most of the answers don't necessarily belong in the paper): You matched experimental F-j curve vs simulated F-j curve. Why not interpret your experimental F-j curve into a sequence (with wild-card characters) and then do the matching in ASCII space? Is there a rough map between your match score and a BLAST similarity score? (But, since I'm no BLAST expert, I would need to do lots more research before talking about BLAST in the paper.) In the "Future Improvements" section, you discuss the general sequencing-by-unzipping problem. Imagine simulating all possible sequences from j=1200 to j=1700. In this space of 4^500 sequences, what are the characteristics of those close to your experimental sequence, and what is the minimum similarity that can be resolved? From the other side, in your matching problem, how much easier has the problem been made by considering only the 2700 restriction fragments instead of the 4^500 general genome? What if you don't restrict yourself to those restriction fragments, but allow any contiguous 500-bp sequence from yeast --- then how many false positives do you get? (In Figure 4, what is the outline that you would get by simulating random fragments from the 4^500 or yeast-sequence universe?) A related question is characterizing the sequences in the overlap of the Gaussians. Suppose when you unzip chromatin, you get only a naked-DNA signal over the linker DNA regions, and an un-simulated signal over the non-linker regions. Would you speculate on how that (change in signal) will affect your recall? Suppose you simulate all 2700 restriction fragments. What if you make another version of figure 4 by scoring all simulated sequences versus each other (sim vs sim instead of expt vs sim)? (Also, by symmetry, what is the spectrum of scores obtained for individual OT runs --- both match and mismatch --- against each other (expt vs expt, instead of expt vs sim)?) Is the difference in distance between your blue and red peaks roughly equal to the difference in distance between correct and incorrect peaks for the 2700 x 2700 test? (If not, then you will have to explain what's special about your 32 sequences.) Do the 2700 fragments include the reverse-complement sequences? 2. In the excerpt of the match score formula, I see kT / C / stdev(force difference). Here are the implications that I read from the formula: a. for a given match score, a 1-unit force difference on 4 indices (neighboring or separate) is worth a 2-unit force difference on 1 index; and b. for a given match score, the standard deviation of the force difference scales with the temperature of the environment. On (a), I recognize that this is a draft and see that you may not want or need to refine the formula. On (b), are you making a statistical-mechanical statement (and I see no support for it elsewhere in the paper), or are you just dressing up the standard deviation of force in statistical-mechanical clothing, by adding a kT and making it unitless? For the purposes of your experiment, kT/C is constant, so that portion of the relationship cannot be tested. My thought is, if you're not making a statistical-mechanical statement, then your match score should be as simple as possible: m2 = stdev(force difference/scale force) where the scale force is kT/2/C. Do you actually get a better Gaussian when you take exp(-1/m2)? My own biases would reverse your scale, suggesting numbers close to 1 as a good match, and numbers close to zero as a poor match --- maybe something that can be interpreted as a probability, like exp(-m2) / sum_{all sequences} exp(-m2). 3. The beginning of the paper reads to me like "peeing on the tree". In particular, I think mentioning Pol II is overreaching. 4. It really bothers me in figure 2 that the colors are swapped between A and B. Also, next to figure 4, in the last paragraph before "Future Improvements", you use the word "fell" instead of "felt". Also, I have not worked on the language very much, but the last line on page 8 is a little strange. 5. You mention viscosity but don't explicitly say whether your modeled force includes viscosity. I am going to guess that it does, because otherwi

Sensitivity and Specificity

Wikipedia Article

Sensitivity - the ability of a test to determine positive results, 100% sensitivity has no false positives

Specificity - ability of a test to determine negative results, 100% specificity has no false negatives

So I guess in relation to SDM we need to evaluate the highest and lowest match scores, we may have done this for version 2

Richard suggests simulating unzipping for all fragments 1200 - 1700bp in length and testing for false positives, etc.

how feasible is this?

he asks, that conversely has our simulation results been made less sophisticated because we are restricting our library to just 2700 fragments instead of the 4^500 fragments otherwise, that number seems a bit excessive, Steve proposes generating something like 100,000 random sequences to insert into the library, maybe a mix of random sequences from different genomes: human, yeast, e. coli, other species of fungus, drosophila, etc and other plasmids

test simulated sequences against each other

may have been done, but no data

also do unzipped data vs unzipped data, post data to figshare?

add noise to simulated data incrementally to see how much difference the matching can handle

match uncontinuous DNA unzipping to simulation library

ie histone unzipping will create "gaps" in unzipping profile, so in one unzipping event you get regions of high force mixed with base-pairing unzipping

matching this to the library of simulations would be interesting

add nearest neighbor energy values to unzipping simulation

debatable whether it is worth adding it now

Link to public site with comments from Richard and responses by Steve

Improvements/Ideas after reading the paper

use simple hamiltonian but allow for ssDNA extension instead of dsDNA

paper says "Further we ignored elastic energy from the dsDNA anchoring fragment used in the unzipping experiments", not sure what that means

put ALL software on github and link to it

Methods

Unzipping Data, put raw unzipping data on FigShare, put smoothed unzipping data on FigShare

Simulated Unzipping, link to specific yeast genome used (only states yeastgenome.org), note any new methods that differ from original paper

Matching Algorith, input new match score system

Results

unzipping fragments are 2000bp in length regardless whether there is an xhoi site in the fragment. in "real" life this wouldn't be the case, so would it be worth it to simulate unzipping for all actual xhoi fragments and match short fragments to the library?

add noise to simulated unzipping data to get "experimental" unzipping data to match, how much noise?, how long does matching take?

increase window size for matching, this can probably be done if we add noise to simulated data

maybe we can quantify a successful match score: ie say anything that is below a 0.4 is a match, this could help identify false positives, could display robustness better as opposed to just comparing the correct matches to the rest.

Other stuff to include

SNP

Alternative splicing stuff

sequence mutations