The coding sequence of a gene is a series of three nucleotide codons that specify a sequence of amino acids in its polypeptide product. The codon for each amino acid is immediate adjacent of the codon of the next amino acid in a polypeptide chain in case of prokaryotic genome. Eukaryotic genome is composed of stretches of non-coding sequences interrupting the coding sequences. These intervening non-coding sequences are called introns and the coding sequences are called exons. Due to the presence of introns, the gene sequence is said to be in piece or split. The process of removing the non-coding sequences from a gene sequence is termed splicing. Also in humans, protein coding genes are mainly organized as interons and exons. There are intergenic regions that transcribe into various types of non-coding RNA (not translating into protein). Some introns may also harbor transcription units which are psuedogenes. Psuedogenes are the genomic DNA sequences similar to normal genes but non-functional. They are generally regarded as defunct relatives of functional genes.

Introns are often very much longer in size than the exons they separate. The primary transcript of the gene (called the pre-mRNA) however consists of the entire gene sequence containing exons as well as introns; the splicing is required at the time of translation by the protein synthesizing machinery of the cell. This process converts the pre-mRNA into mature mRNA and this must occur with great precision to avoid any addition or loss of nucleotide which otherwise would lead to selection of incorrect amino acids incorporated into proteins. Some pre-mRNA can be spliced in more than one way, generating alternative mRNAs so that different combination of introns get removed and a gene could therefore give rise to more than one polypeptide product. A Drosophila gene could generate 38,000 possible products as a result of alternate splicing. In an eukaryotic system, an intron was cloned within a transposable element and allowed to transpose from a plasmid to genomic DNA. The intron was found to be absent in the transposable element in its new location. It is termed as a retroposon.

The spliceosome machinery

The transesterification reactions are carried out in a huge molecular machine called the spliceosome. This is similar to a ribosome and comprises of 5 RNAs and 150 proteins. Many functions of the spliceosomeis are carried out by RNA components rather than protein molecules. The 5 RNAs U1, U2, U4, U5 and U6 are called the small nuclear RNAs (snRNAs). These are 100-300 nucleotides long and are complexed with several proteins. These RNA-protein complexes are called small nuclear ribonuclear proteins (snRNPs). The snRNPs have three roles in splicing. They recognize the 5ˈ splice site and the branch site; they bring sites together as required; they catalyze the RNA cleavage and joining reactions. Different snRNPs recognize the same or overlapping sequence in the pre-mRNA at different stages of the splicing reaction. The snRNA molecule U1 and U6 recognize the 5ˈ splice site. The snRNP U2 recognize the branch site. This leads to the pairing of RNA:RNA between the snRNPs U2 and U6. Finally, the same sequence within the pre-mRNA is recognized by a protein (not part of snRNP) and gets displaced by a snRNP at another. These changes accompany the arrival and departure of components of the spliceosome and the structural rearrangement that is required for the splicing to occur.

Splicing pathways

5ˈ split site is recognized by the U1 snRNP using base pairing between its snRNA and the pre-mRNA. One subunit of U2AF binds to the pyrimidine tract and the other to the 3ˈ splice site. The former subunit interacts with branch point binding protein (BBP) and helps binding the protein to its branch site. This arrangement is called the early (E) complex. U2 snRNP then binds to the branch site aided by U2AF and displacing BBP. This arrangement is called A complex. A residue is extruded from the resulting stretch of double helical RNA as a single nucleotide bulge. This is thus unpaired and available to react with the 5ˈ splice site. The rearrangement of A complex to bring together three splicing sites is achieved by U4 and U6 snRNPs along with the U5 snRNP, join the complex. Together these three snRNPs are called the tri-snRNP particle, within which the U4 and U6 snRNPs are hold together by complementary base pairing between their RNA components and the U5 snRNP is more loosely associated through protein:protein interactions. With the entry of the tri-snRNP, the A complex is converted into the B complex. U1 leaves the complex and U6 replaces it at the 5ˈ splice site. This requires the base pairing between the U1 snRNA and the pre-mRNA be broken, allowing the U6 RNA to anneal with the same region.

Self Splicing is the phenomenon that occurs for rare introns that form the ribozyme, performing the functions of spliceosome by RNA alone. Group I, Group II and Group III are the three different kinds of self splicing introns. As that of spliceosome group I and group II perform splicing without requiring any protein.

Two transesterifications characterize the mechanism in which group I introns are spliced

  1. 3‘OH of a free guanine nucleoside or nucleotide cofactor attacks phosphate at the 5‘splice site.
  2. 3‘OH of the 5‘exon becomes a nucleophile and the second transesterification results in the joining of two exons.

Mechanism of splicing of group II interons is as follows

  1. Formation of lariat as the 2‘OH of a specific adenosine in the intron attacks the 5‘ splice site.
  2. Joining of exons together as the 3‘OH of the 5‘ exon triggers the second transesterification at the 3‘ splice site

Alternate Splicing

Production of multiple products by a single gene is due to alternate splicing. This kind of splicing occurs often in the cell as some splice sites are used only some of the time, leading to the production of different versions of the RNA from different transcripts of the same gene. Alternative splicing could be either constitutive or regulated. In case of constitutive splicing one product is always made from the transcribed gene. In case of regulated splicing, different forms are generated at different times, under different conditions, or in different cell or tissue types. Constitutive splicing is observed in T antigen of the monkey virus SV40. Activators and repressors regulate the alternative splicing. An SR protein (Serine Arginine) directs the splicing machinery to different splice sites under given conditions and thereby determines whether a particular splice site is used in a particular cell type. The proteins arising as a result of alternative splicing are called their isoforms.