Research

Life Sciences & Biotechnology

Title :

A model for determination of DNA sequence from experimental data of DNA double-strand unzipping kinetics

Area of research :

Life Sciences & Biotechnology

Focus area :

Theoretical Sciences

Principal Investigator :

Dr. Hemachander Subramanian, National Institute Of Technology (NIT) Durgapur, West Bengal

Timeline Start Year :

2023

Timeline End Year :

2026

Contact info :

Details

Executive Summary :

Unzipping of the two strands of DNA using atomic force microscope, nanopores or optical tweezers, which have been employed for studying DNA unzipping kinetics for a while now, can be used to determine the nucleotide composition from the force signatures of unzipping. The force signatures of unzipping are reproducible, and are highly sequence-dependent, if done at near-equilibrium conditions. My lab at NIT Durgapur has been advancing a theoretical/computational model of DNA unzipping, based on Markov Chain methods, where, we have shown that the rates of unzipping of DNA (calculated using the first passage time method) from the two ends of the double strand would be very different. Our theoretical observations, substantiated by independent experimental observations in the literature, provide a way forward in using the unzipping kinetics data for the determination of DNA sequences. Within our model, the influence of a hydrogen bond between two nucleotides of the two strands of DNA on the kinetic barrier of its neighboring hydrogen bonds to the left and right, is unequal, and is dependent on the type of nucleotides composing the hydrogen bond. This asymmetry in the influence of a hydrogen bond on the kinetic barriers of its left and right neighbors result in distinct kinetics when the DNA is unzipped (or zipped) from either ends, which has been abundantly documented in the literature. The theoretical idea behind this project is to utilize unzipping data from both ends of the double-stranded DNA for sequence prediction. The central challenge in modeling DNA unzipping kinetics using Markov chain model is handling the transition matrices that are enormous in size. The size of the transition matrix for a five-nucleotide sequence is 2^5 X 2^5, since each nucleotide can be in two states, bonded or unbonded. The matrices become unhandleable for sequences of length of the order of twenty (with a size of 2^20), and this size constraint precludes the possibility of handling million base-pair sequences altogether. The primary aim of this project is to find algorithms that would allow us to stitch together unzipping kinetics of smaller sequences to model the unzipping of longer sequences, which implies approximating huge transition matrices with many smaller matrices, while minimizing the difference in behavior between the two.

Total Budget (INR):

6,60,000

Organizations involved