DNA is a polymer of the four nucleotides A, C, G, and T, which are joined through a backbone of alternating phosphate and deoxyribose sugar residues. These nitrogen-containing bases occur in complementary pairs as determined by their ability to form hydrogen bonds between them. A always pairs with T through two hydrogen bonds, and G always pairs with C through three hydrogen bonds. The spans of A:T and G:C hydrogen-bonded pairs are nearly identical, allowing them to bridge the sugar-phosphate chains uniformly. This structure, along with the molecule’s chemical stability, makes DNA the ideal genetic material. The bonding between complementary bases also provides a mechanism for the replication of DNA and the transmission of genetic information.

Chemical structure

In 1953 James D. Watson and Francis H.C. Crick proposed a three-dimensional structure for DNA based on low-resolution X-ray crystallographic data collected by biophysicists Rosalind Franklin and Maurice Wilkins and on Erwin Chargaff’s observation that, in naturally occurring DNA, the amount of T equals the amount of A and the amount of G equals the amount of C. Watson and Crick, who shared a Nobel Prize in 1962 for their efforts, postulated that two strands of polynucleotides coil around each other, forming a double helix. The two strands, though identical, run in opposite directions as determined by the orientation of the 5′ to 3′ phosphodiester bond. The sugar-phosphate chains run along the outside of the helix, and the bases lie on the inside, where they are linked to complementary bases on the other strand through hydrogen bonds.

The double helical structure of normal DNA takes a right-handed form called the B-helix. The helix makes one complete turn approximately every 10 base pairs. B-DNA has two principal grooves, a wide major groove and a narrow minor groove. Many proteins interact in the space of the major groove, where they make sequence-specific contacts with the bases. In addition, a few proteins are known to make contacts via the minor groove.

Several structural variants of DNA are known. In A-DNA, which forms under conditions of high salt concentration and minimal water, the base pairs are tilted and displaced toward the minor groove. Left-handed Z-DNA forms most readily in strands that contain sequences with alternating purines and pyrimidines. DNA can form triple helices when two strands containing runs of pyrimidines interact with a third strand containing a run of purines.

B-DNA is generally depicted as a smooth helix; however, specific sequences of bases can distort the otherwise regular structure. For example, short tracts of A residues interspersed with short sections of general sequence result in a bent DNA molecule. Inverted base sequences, on the other hand, produce cruciform structures with four-way junctions that are similar to recombination intermediates. Most of these alternative DNA structures have only been characterized in the laboratory, and their cellular significance is unknown.

Biological structures

Naturally occurring DNA molecules can be circular or linear. The genomes of single-celled bacteria and archaea (collectively, the prokaryotes), as well as the genomes of mitochondria and chloroplasts (certain functional structures within the cell), are circular molecules. In addition, some bacteria and archaea have smaller circular DNA molecules called plasmids that typically contain only a few genes. Many plasmids are readily transmitted from one cell to another. For a typical bacterium, the genome that encodes all of the genes of the organism is a single contiguous circular molecule that contains a half million to five million base pairs. The genomes of most eukaryotes and some prokaryotes contain linear DNA molecules called chromosomes. Human DNA, for example, consists of 23 pairs of linear chromosomes containing three billion base pairs.

In all cells, DNA does not exist free in solution but rather as a protein-coated complex called chromatin. In prokaryotes, the loose coat of proteins on the DNA helps to shield the negative charge of the phosphodiester backbone. Chromatin also contains proteins that control gene expression and determine the characteristic shapes of chromosomes. In eukaryotes, a section of DNA between 140 and 200 base pairs long winds around a discrete set of eight positively charged proteins called a histone, forming a spherical structure called the nucleosome. Additional histones are wrapped by successive sections of DNA, forming a series of nucleosomes like beads on a string. Transcription and replication of DNA is more complicated in eukaryotes because the nucleosome complexes have to be at least partially disassembled for the processes to proceed effectively.

Most viruses contain linear genomes that typically are much shorter and contain only the genes necessary for viral propagation. Bacterial viruses called bacteriophages (or phages) may contain both linear and circular forms of DNA. For instance, the genome of bacteriophage λ (lambda), which infects the bacterium Escherichia coli, contains 48,502 base pairs and can exist as a linear molecule packaged in a protein coat. The DNA of phage λ can also exist in a circular form (as described in the section Site-specific recombination) that is able to integrate into the circular genome of the host bacterial cell. Both circular and linear genomes are found among eukaryotic viruses, but they more commonly use RNA as the genetic material.

Biochemical properties

Denaturation

The strands of the DNA double helix are held together by hydrogen bonding interactions between the complementary base pairs. Heating DNA in solution easily breaks these hydrogen bonds, allowing the two strands to separate—a process called denaturation or melting. The two strands may reassociate when the solution cools, reforming the starting DNA duplex—a process called renaturation or hybridization. These processes form the basis of many important techniques for manipulating DNA. For example, a short piece of DNA called an oligonucleotide can be used to test whether a very long DNA sequence has the complementary sequence of the oligonucleotide embedded within it. Using hybridization, a single-stranded DNA molecule can capture complementary sequences from any source. Single strands from RNA can also reassociate. DNA and RNA single strands can form hybrid molecules that are even more stable than double-stranded DNA. These molecules form the basis of a technique that is used to purify and characterize messenger RNA (mRNA) molecules corresponding to single genes.

Ultraviolet absorption

DNA melting and reassociation can be monitored by measuring the absorption of ultraviolet (UV) light at a wavelength of 260 nm (nanometer; 1 nm = 10-9 meter). When DNA is in a double-stranded conformation, absorption is fairly weak, but when DNA is single-stranded, the unstacking of the bases leads to an enhancement of absorption called hyperchromicity. Therefore, the extent to which DNA is single-stranded or double-stranded can be determined by monitoring UV absorption.

Chemical modification

After a DNA molecule has been assembled, it may be chemically modified—sometimes deliberately by special enzymes called DNA methyltransferases and sometimes accidentally by oxidation, ionizing radiation, or the action of chemical carcinogens. DNA can also be cleaved and degraded by enzymes called nucleases.

Britannica Chatbot logo

Britannica Chatbot

Chatbot answers are created from Britannica articles using AI. This is a beta feature. AI answers may contain errors. Please verify important information using Britannica articles. About Britannica AI.

Methylation

Three types of natural methylation have been reported in DNA. Cytosine can be modified either on the ring to form 5-methylcytosine or on the exocyclic amino group to form N4-methylcytosine. Adenine may be modified to form N6-methyladenine. N4-methylcytosine and N6-methyladenine are found only in bacteria and archaea, whereas 5-methylcytosine is widely distributed. Special enzymes called DNA methyltransferases are responsible for this methylation; they recognize specific sequences within the DNA molecule so that only a subset of the bases is modified. Other methylations of the bases or of the deoxyribose are sometimes induced by carcinogens. These usually lead to mispairing of the bases during replication and have to be removed if they are not to become mutagenic.

Natural methylation has many cellular functions. In bacteria and archaea, methylation forms an essential part of the immune system by protecting DNA molecules from fragmentation by restriction endonucleases. In some organisms, methylation helps to eliminate incorrect base sequences introduced during DNA replication. By marking the parental strand with a methyl group, a cellular mechanism known as the mismatch repair system distinguishes between the newly replicated strand where the errors occur and the correct sequence on the template strand.

In higher eukaryotes, 5-methylcytosine controls many cellular phenomena by preventing DNA transcription. Methylation is also thought to signal imprinting, a process whereby some genes inherited from one parent are selectively inactivated. Correct methylation may also repress or activate key genes that control embryonic development. On the other hand, 5-methylcytosine is potentially mutagenic because thymine produced during the methylation process converts C:G pairs to T:A pairs. In mammals, methylation takes place selectively within the dinucleotide sequence CG—a rare sequence, presumably because it has been lost by mutation. In many cancers, mutations are found in key genes at CG dinucleotides.

Nucleases

Nucleases are enzymes that hydrolytically cleave the phosphodiester backbone of DNA. Endonucleases cleave in the middle of chains, while exonucleases operate selectively by degrading from the end of the chain. Nucleases that act on both single- and double-stranded DNA are known.

Restriction endonucleases are a special class that recognize and cleave specific sequences in DNA. Type II restriction endonucleases always cleave at or near their recognition sites. They produce small, well-defined fragments of DNA that help to characterize genes and genomes and that produce recombinant DNAs. Fragments of DNA produced by restriction endonucleases can be moved from one organism to another. In this way it has been possible to express proteins such as human insulin in bacteria.

Mutation

Chemical modification of DNA can lead to mutations in the genetic material. Anions such as bisulfite can deaminate cytosine to form uracil, changing the genetic message by causing C-to-T transitions. Exposure to acid causes the loss of purine residues, though specific enzymes exist in cells to repair these lesions. Exposure to UV light can cause adjacent pyrimidines to dimerize, while oxidative damage from free radicals or strong oxidizing agents can cause a variety of lesions that are mutagenic if not repaired. Halogens such as chlorine and bromine react directly with uracil, adenine, and guanine, giving substituted bases that are often mutagenic. Similarly, nitrous acid reacts with primary amine groups—for example, converting adenosine into inosine—which then leads to changes in base pairing and mutation. Many chemical mutagens, such as chlorinated hydrocarbons and nitrites, owe their toxicity to the production of halides and nitrous acid during their metabolism in the body.

Supercoiling

Circular DNA molecules such as those found in plasmids or bacterial chromosomes can adopt many different topologies. One is active supercoiling, which involves the cleavage of one DNA strand, its winding one or more turns around the complementary strand, and then the resealing of the molecule. Each complete rotation leads to the introduction of one supercoiled turn in the DNA, a process that can continue until the DNA is fully wound and collapses on itself in a tight ball. Reversal is also possible. Special enzymes called gyrases and topoisomerases catalyze the winding and relaxation of supercoiled DNA. In the linear chromosomes of eukaryotes, the DNA is usually tightly constrained at various points by proteins, allowing the intervening stretches to be supercoiled. This property is partially responsible for the great compaction of DNA that is necessary to fit it within the confines of the cell. The DNA in one human cell would have an extended length of between two and three metres, but it is packed very tightly so that it can fit within a human cell nucleus that is 10 micrometres in diameter.

Sequence determination

Methods of DNA sequencing, which determines the order of bases in DNA, were pioneered in the 1970s by Frederick Sanger and Walter Gilbert, whose efforts won them a Nobel Prize in 1980. The Gilbert-Maxam method relied on the different chemical reactivities of the bases, while the Sanger method was based on enzymatic synthesis of DNA in vitro. Both methods aimed to measure the distance from a fixed point on DNA to each occurrence of a particular base—A, C, G, or T. DNA fragments obtained from a series of reactions were separated according to length in four “lanes” by gel electrophoresis. Each lane corresponded to a unique base, and the sequence could be read directly from the gel. The Sanger method later was automated using fluorescent dyes to label the DNA, and a single machine produced tens of thousands of DNA base sequences in a single run.

The early DNA sequencing methods (sometimes also referred to as first-generation sequencing technologies) have been largely supplanted by next-generation sequencing technologies, also known as massively parallel or second-generation sequencing technologies. These newer approaches enable many DNA fragments (sometimes on the order of millions of fragments) to be sequenced at one time and are more cost-efficient and much faster than first-generation technologies. The utility of next-generation technologies was improved significantly by advances in bioinformatics that allowed for increased data storage and facilitated the analysis and manipulation of very large data sets.

Ribonucleic acid (RNA)

RNA is a single-stranded nucleic acid polymer of the four nucleotides A, C, G, and U joined through a backbone of alternating phosphate and ribose sugar residues. It is the first intermediate in converting the information from DNA into proteins essential for the working of a cell. Some RNAs also serve direct roles in cellular metabolism. RNA is made by copying the base sequence of a section of double-stranded DNA, called a gene, into a piece of single-stranded nucleic acid. This process, known as transcription (see below RNA metabolism), is catalyzed by an enzyme called RNA polymerase.

Chemical structure

Whereas DNA provides the genetic information for the cell and is inherently quite stable, RNA has many roles and is much more reactive chemically. RNA is sensitive to oxidizing agents such as periodate that lead to opening of the 3′-terminal ribose ring. The 2′-hydroxyl group on the ribose ring is a major cause of instability in RNA, because the presence of alkali leads to rapid cleavage of the phosphodiester bond linking ribose and phosphate groups. In general, this instability is not a significant problem for the cell, because RNA is constantly being synthesized and degraded.

Interactions between the nitrogen-containing bases differ in DNA and RNA. In DNA, which is usually double-stranded, the bases in one strand pair with complementary bases in a second DNA strand. In RNA, which is usually single-stranded, the bases pair with other bases within the same molecule, leading to complex three-dimensional structures. Occasionally, intermolecular RNA/RNA duplexes do form, but they form a right-handed A-type helix rather than the B-type DNA helix. Depending on the amount of salt present, either 11 or 12 base pairs are found in each turn of the helix. Helices between RNA and DNA molecules also form; these adopt the A-type conformation and are more stable than either RNA/RNA or DNA/DNA duplexes. Such hybrid duplexes are important species in biology, being formed when RNA polymerase transcribes DNA into mRNA for protein synthesis and when reverse transcriptase copies a viral RNA genome such as that of the human immunodeficiency virus (HIV).

Single-stranded RNAs are flexible molecules that form a variety of structures through internal base pairing and additional non-base pair interactions. They can form hairpin loops such as those found in transfer RNA (tRNA), as well as longer-range interactions involving both the bases and the phosphate residues of two or more nucleotides. This leads to compact three-dimensional structures. Most of these structures have been inferred from biochemical data, since few crystallographic images are available for RNA molecules. In some types of RNA, a large number of bases are modified after the RNA is transcribed. More than 90 different modifications have been documented, including extensive methylations and a wide variety of substitutions around the ring. In some cases these modifications are known to affect structure and are essential for function.

Types of RNA

Messenger RNA (mRNA)

Messenger RNA (mRNA) delivers the information encoded in one or more genes from the DNA to the ribosome, a specialized structure, or organelle, where that information is decoded into a protein. In prokaryotes, mRNAs contain an exact transcribed copy of the original DNA sequence with a terminal 5′-triphosphate group and a 3′-hydroxyl residue. In eukaryotes the mRNA molecules are more elaborate. The 5′-triphosphate residue is further esterified, forming a structure called a cap. At the 3′ ends, eukaryotic mRNAs typically contain long runs of adenosine residues (polyA) that are not encoded in the DNA but are added enzymatically after transcription.

Eukaryotic mRNA molecules are usually composed of small segments of the original gene and are generated by a process of cleavage and rejoining from an original precursor RNA (pre-mRNA) molecule, which is an exact copy of the gene (as described in the section Splicing). In general, prokaryotic mRNAs are degraded very rapidly, whereas the cap structure and the polyA tail of eukaryotic mRNAs greatly enhance their stability.