II Gene expres­sion

II.1 Introduction

The regulation of gene expression is a very significant field of study in Biology. This primer introduces you to what we know about gene regulation. The concepts covered in this primer are used in several chapters of iThink Biology, including malaria, rotavirus, and cotton.

All the cells in our body contain the same set of genetic instructions for development, growth, metabolism, reproduction, and all other functions. This set of genetic instructions is made of DNA (deoxyribonucleic acid) for all known organisms, including many viruses.

If all cells have the same set of instructions, how is it that we have so many different cell types? For example, neurons receive and send signals and have an elongated structure called the axon. Muscle cells pack together to form fibres, and intestinal cells have many hair-like villi on their surface. How can the same set of instructions give rise to such diverse cell morphology and function (see Figure II.1)?

Illustration of a variety of cells including muscle, blood, intestinal, liver, nerve, and stem cells.

Figure II.1 Various cell types.

Think of it this way, you and your friend are given a set of four identical instructions on how to manipulate this Phrase: I think biology.

  1. Replace ‘think’ with ‘love’.
  2. Replace biology with ‘therefore’.
  3. Add ‘I am’ at the end.
  4. Replace the first ‘I’ with ‘You’.

Suppose you decide to follow only instructions 1 and 4, while your friend follows only instructions 2 and 3. How would the phrase read? You would end up with ‘You love Biology’, and your friend would end up with ‘I Think therefore I am’.

Cells do something similar to differentiate themselves from each other. Suppose a strand of DNA carries four instructions in a coded form. One cell activates genetic instructions 1 and 4, while another activates instructions 2 and 3. The end result is two different cells. Note that this analogy does not mean that DNA itself is being modified, but rather that the instructions that DNA provides are selectively followed resulting in varying outcomes – in this case cell types.

genomically equivalent
The idea that all cells in an organism have the same genetic material, even though they express different genes and perform different functions.

Even though all cells in your body are genomically equivalent – they contain the same set of genetic instructions ­– they do not ‘follow’ all the instructions, in other words, they are selective. This selectivity is what gives rise first to the production of cell-type specific proteins, which in turn leads to cell differentiation. This selectivity is also what allows cells to respond appropriately to environmental cues. Note that this means that the cellular response not only has to be appropriate, but it also has to happen at the right time.

In this primer we will review the basics of what gene expression regulation is, and where it is important.

II.2 The central dogma

Genetic instructions or genes in the form of a sequence of DNA are first transcribed into mRNA (messenger RNA) and then translated into a protein. This process is often referred to as the ‘central dogma’.

The central dogma, representing the process of gene expression.

Figure II.2 The central dogma of gene expression.

Recall that DNA consists of a long sequence of nucleotides. Each nucleotide contains a sugar (deoxyribose), a phosphate group, and a nitrogenous base. There are four nitrogenous bases: adenine (A), thymine (T), guanine (G), and cytosine (C). The sequence of A, T, G, C ultimately determines the sequence of the RNA and protein that will be produced. A set of three nucleotides constitutes a codon. Each codon is translated into an amino acid (a protein is a polymer made of a string of amino acids).

If you wish to review these concepts (including DNA structure, amino acids, codons, and so on), read this excellent review.

DNA is double stranded, resulting from complementary bases forming hydrogen bonds (Figure II.3). So every strand of DNA contains a sequence of base pairs. You can see that DNA is a repository of information. This information is not stored in the binary language (with 0s and 1s) like in our computers, but with an alphabet of four bases (A, T, G, C).

Structure of the DNA including nitrogenous bases and the phosphate-deoxyribose backbone.

Figure II.3 DNA structure.

Madeleine Price Ball, Wikimedia commons, CC0 1.0

The human genome contains an astounding 3 billion base pairs. However, only a small fraction of these base pairs are genes that code for proteins: some sequences are transcribed into RNA, but never translated to protein. Such non-coding RNAs (ncRNA), have many functions, including the regulation of gene expression. More and more ncRNAs are being discovered every year. The role of a large fraction of the human genome remains unknown.

Single-celled or multicellular organism whose cell contains a distinct nucleus surrounded by a membrane.
A sequence of base pairs that does not code for any amino acids and is spliced out before the mRNA is translated into a protein. See also: splicing.
The sequence of base pairs that encodes amino acids and is translated into a protein.
The process of cutting out introns and joining the exons of a precursor mRNA before it is sent for translation. See also: intron and exon.

When mRNA is transcribed in eukaryotes, some portions of the RNA are cut out. Depending on the desired protein, different regions are cut out. The regions that are excluded are called introns, while the regions that are retained and eventually translated are called exons. As illustrated in Figure II.4, we see that the same set of instructions (the nascent, or freshly synthesised mRNA strand) can lead to different outcomes through splicing.

Transcription and translation processes.

Figure II.4 Transcription and translation.

Evolution has resulted in ingenious ways of storing and utilising information. Which genes are ultimately translated into a protein has to be tightly regulated, because cells should not differentiate inappropriately, nor should they respond inadequately to environmental stresses.

Single-celled or multicellular organism whose cell contains a distinct nucleus surrounded by a membrane.
Single-celled organism that does not have a distinct nucleus with a membrane.

When mRNA is transcribed in eukaryotes Although most of the organisms you study in this course are eukaryotes, some aspects of gene regulation are much simpler in prokaryotes than in eukaryotes. Most of the text that follows applies to prokaryotes. Here are some significant differences in gene expression between eukaryotic and prokaryotic cells:

II.3 Gene regulation

So how is gene expression regulated? When does a gene get transcribed into RNA? To answer these questions, let’s have a look at the anatomy of a gene in Figure II.5.

The anatomy of a gene, showing the location of the enhancer, transcription factor, and promoter region on DNA.

Figure II.5 The anatomy of a gene.

By studying Figures II.4 and II.5, you should note a few important components for gene regulation that are related to DNA, mRNA and proteins. These components and their functions are listed in Table II.1 for your reference. As you read through this primer, the role of each component should become clearer.

Component Function
DNA (inherited material, replicated during cell division) Gene A segment of DNA is considered the basic unit of heredity. It may encode an RNA molecule or a protein.
Operator A segment of DNA on which a repressor binds in prokaryotes.
Promoter A segment of DNA on which RNA polymerase binds.
Enhancer A segment of DNA on which various modulators such as transcription factors may bind to help recruit RNA polymerase to find the promoter.
RNA (produced via transcription) Exons Exons are the portions of the mRNA that are retained and joined together after splicing, and are in turn translated.
Introns Portions of the mRNA that are spliced out and hence do not get translated.
Non-coding RNAs (ncRNAs) RNAs that are never translated to protein.
Protein (produced via translation, and post-translational modifications) Transcription factors Proteins that recruit RNA polymerase to the DNA. They are sometimes called activators.
RNA polymerase The enzyme that is responsible for reading DNA and assembling a complementary mRNA strand.

Table II.1 Important components of gene regulation.

A region of DNA that precedes a DNA sequence that is going to be expressed. Specific proteins bind to the promoter to initiate transcription of the sequence.
A section of DNA that binds to transcription factors and enhances or promotes transcription of genes.
Proteins that catalyse chemical reactions, usually within cells.
To increase the rate of a reaction.

Figure II.5 shows the presence of a promoter sequence, an enhancer sequence and the gene sequence on a strand of DNA. RNA polymerase is an enzyme that catalyses transcription. It begins its work by first binding with the promoter of the gene. The promoter is usually just upstream of the gene.

Often, specialised proteins called transcription factors (TFs) recruit RNA polymerase to and mediate binding to the promoter. TFs often function by binding to the enhancer. The TF bound to the enhancer helps make DNA more accessible and makes it easier for RNA polymerase to bind to the promoter (Figure II.6). Enhancers can be located thousands of base pairs away from the promoter.

A transcription factor binding to the enhancer can cause DNA to fold and recruit RNA polymerase to the promoter.

Figure II.6 Effect of enhancer binding on gene activation.

The presence or absence of TFs strongly influences RNA polymerase recruitment to the promoter, thereby regulating gene expression. TFs play a very important role in cell differentiation. One example of such TFs are those encoded by Hox genes. These genes specify segment identity (such as the head, thorax, abdomen, and so on) during animal development. Hox genes are present throughout the animal kingdom, but are most extensively studied in the fruitfly Drosophila melanogaster.

The presence of a repressor on the operator prevents RNA polymerase from binding and beginning transcription.

Figure II.7 The role of the operator in responding to the environment in prokaryotes.

The cell’s environment is dynamic, whether the cell is part of a larger whole in a multicellular organism, or it comprises the entire unicellular organism.

Being able to respond to the environment is important for the cell’s survival. Table II.2 provides estimates of the rate of transcription and translation in the cells of various organisms.

Organism Rate (nucleotide/second)
E. coli (bacteria) 10–100
Monkey cell line 100
H. sapiens (humans) 6–70
Organism Rate (amino acid/second)
E. coli (bacteria) 10–20
S. cervisiae (yeast) 3–10
M. musculus (mouse) 6

Table II.2 Transcription (in nucleotides) and translation (in amino acids) rates in various organisms.

Adapted from Milo, R and Phillips, R, ‘What Is Faster, Transcription or Translation?’, accessed 9 June 2021.

As you can see from the table, in E. coli, a protein that consists of 100 amino acids may take about 10 seconds to get translated. In mice (M. musculus), it takes about double that. Response time scales are important when a cell or organism needs to react to rapid changes in the environment.

Controlling gene expression

In prokaryotes, repressor proteins prevent gene expression by binding to an operator and preventing RNA polymerase from transcribing a gene. Figure II.7 shows how a repressor protein prevents a gene from being transcribed into mRNA. The repressor protein ‘turns off’ the gene by displacing RNA polymerase from the operator. Most genes in a cell are blocked by repressor proteins.

How does a cell ‘turn on’ a gene? One way is to cause the repressor protein to disengage from the promoter. How can this be achieved?

Suppose a gene that expressed the enzyme to metabolise a food source was switched off. When that food source is present in the environment, it would make sense to produce this enzyme. The cell has to firstly detect the food and secondly ‘turn on’ the gene.

Both steps can be achieved if the food source, say a sugar, binds to the repressor, causing it to disengage from the operator. The operator becomes available for RNA polymerase to bind to, and the gene is turned on, allowing expression of the enzyme.

While this was a hypothetical example, the principle of repression and activation is very common. In fact, one classic example of gene regulation in prokaryotes is the lac operon. When glucose is depleted, but lactose is present, prokaryotes can use lactose as an energy source. To do so the cell needs firstly to detect the absence of glucose, and secondly, to detect the presence of lactose. Work through this interactive activity to learn more about the lac operon.

II.4 Adding to the central dogma

This primer on gene regulation highlights some nuances that we need to add to the original picture of the central dogma. These are splicing, the role of proteins in gene regulation (initiating transcription), and ncRNAs.

Given this additional information, would you agree with the modified version of the central dogma, as illustrated in Figure II.8?

Modified central dogma, reflecting ncRNAs, splicing, and gene regulation by transcription factors.

Figure II.8 Modified central dogma, reflecting ncRNAs, splicing, and gene regulation by transcription factors.

Gene regulation is an area that continues to be studied due to its significance and complexity. This primer covers a very small portion of what is known about gene regulation. You should however have an appreciation of what is meant by a gene being activated, expressed, or turned on, or conversely, when gene expression is repressed or turned off. These concepts are used in several chapter of iThink Biology, including malaria, rotavirus, and cotton.