Glossary
A
Allele: An alternate sequence of a gene that may or may not affect how the gene is expressed. A sequence variant of a gene. There are two copies of each chromosome and thus two copies of each gene or region of the genome (for the most part). Differences between the two copies are referred to as alleles. In FSHD, there are two alleles commonly referenced, termed A and B, that refer to sequence differences at the end of the DUX4 gene. The A allele is permissive for FSHD while the B allele is nonpermissive for FSHD.
Amino acids: A group of organic compounds that form the building blocks of proteins that make up 75% of the body. They are involved in almost every body function, including growth and development, healing and repair, normal digestion, and providing energy for your body.
AON: Antisense oligonucleotide. Synthetic (typically) nucleic acid-based molecules used to knock down expression of any gene of interest by binding to the mRNA and either directing its degradation or blocking its translation into protein.
AONs targeting the DUX4 mRNA are a widely used strategy in FSHD preclinical therapeutics and will likely make it to clinic in the near future. As these target the mRNA, the gene is still ON and being expressed at the mRNA level, so the AON needs to be present continuously and at high enough concentrations so as to maintain the knockdown.
Apoptosis: A type of cell death, cellular suicide, programmed cell death. Cells initiate a death cascade in themselves in response to signals as a normal way to remove unwanted or damaged cells. However, aberrant apoptosis can play a role in disease pathology.
In FSHD, expression of DUX4 can lead to apoptosis in muscle cells.
Array: In the context of FSHD, the term array refers to the head-to-tail arrangement of the D4Z4 repeat DNA sequences found at chromosomes 4q35 and 10q26. These D4Z4 arrays can include anywhere from 1->100 D4Z4 repeat units.
Autosome: Chromosomes 1-22. Humans have two copies (one from Mom, one from Dad) of 23 chromosomes for a total of 46. The 23rd pair of chromosomes are the sex chromosomes. These are called X and Y. One is typically either XX or XY.
Autosomal dominant: A type of genetic mutation or type of disease in which a single mutant allele on an autosome (chromosome 1-22) results in the phenotype or disease. Thus, autosomal dominant diseases are passed on (inherited) to the next generation at a rate of 50%.
Not to be confused with X-linked diseases, which also require a single mutation on the X-chromosome to present in males, which only have a single X chromosome.
FSHD is an autosomal dominant disease. One only needs to have a contraction on one of your chromosome 4s to result in FSHD1 or mutation in one copy of the SMCHD1 gene (on chromosome 18) to have FSHD2.
Other neuromuscular diseases with somewhat similar clinical presentation to FSHD that show autosomal dominant inheritance include myotonic dystrophy (DM1 and DM2), type 1 limb-girdle muscular dystrophies (LGMD1A-1H) and Emery Dreifuss muscular dystrophy (EDMD2, 4, 5 and 7).
Autosomal recessive: A type of genetic mutation or type of disease in which both copies of a gene need to be mutated in order for the phenotype or the disease to be observed. Most human heritable diseases, including most neuromuscular diseases, are autosomal recessive and therefore are very rare. Inherited autosomal recessive diseases require both parents to have at least one mutation and thus are not often inherited by children, even if one parent is affected, unless there is consanguinity in the family.
Other neuromuscular diseases with somewhat similar clinical presentation to FSHD that show autosomal recessive inheritance include type 2 limb-girdle muscular dystrophies (LGMD2A-Q) and Emery Dreifuss muscular dystrophy (EDMD3)
B
Bisulfite sequencing: Abbreviated BSS, this is a type of DNA sequencing that provides the methylation status of the DNA. Sodium Bisulfite treatment converts cytosine to uracil while methylated cytosine is not converted. Upon sequencing, the converted (unmethylated) cytosine is read as a “T” instead of a “C” while the methylated cytosines remain as “C”. Thus, when one compares the sequence obtained with the known predicted sequence, every place there is a “T” instead of a “C” that “C” was unmethylated while the remaining “C”s were methylated.
C
Cas9: CRISPR associated protein 9. The bacterial enzyme that cuts DNA (a DNA endonuclease) in the CRISPR system. It is guided to cut a specific DNA sequence by an assoicated guide RNA.
Cas13: CRISPR associated protein 13. The bacterial enzyme that cuts RNA (an RNA endonuclease) in the CRISPR system. It is guided to cut a specific RNA sequence by an associated guide RNA.
Chromatin: A highly organized structure made up of a combination of DNA, RNA and proteins utilized to package the genomic DNA into the nucleus; makes up a chromosome.
The core packaging proteins are histones.
The DNA is wrapped around an octamer of core histones, two each of histone H2A, H2B, H3, and H4, which are then further packaged and condensed to compact the DNA. This packaging is regulated in a cell-type specific manner to affect gene expression.
Chromosome: A unit of DNA packaged around proteins and condensed, which is replicated and a copy is passed on as a unit during cell division. Humans have 23 pairs of chromosomes.
For comparison, the dog genome is about the same size and has about the same number of genes as the human genome but is organized into 39 pairs of chromosomes.
Mitochondria also contain a chromosome, although its structure is quite different.
CpG dinucleotide: There are four DNA bases, denoted C, T, A, G. Bases are linked by phosphate (p). There are 16 possible combinations of any two bases. Deoxycytosine (C) is the only base that can be methylated in mammals (mC) and only when followed by a deoxyguanine (G). Thus, the CpG dinucleotide can exist as unmethylated (CpG) or methylated (mCpG) and thus is the only DNA sequence utilized in DNA methylation analysis of human DNA samples.
CRISPR: Clustered regularly interspaced short palindromic repeats. A family of DNA sequences in the genomes of bacteria derived from bacteriophage (viruses that infect bacteria) that had previously infected the organism.
Exists naturally as a primitive immune system in bacteria to defend against bacteriophage infections.
The technology has been harnessed for use in virtually any living system to cut specific DNA sequences.
The term “CRISPR” is now used to refer to cutting or editing a specific DNA sequence.
CRISPR-Cas9: The system adapted from bacteria for use in molecular biology or biomedical approaches targeting the cutting of a specific genomic DNA sequence.
The system can be modified to not just cut the DNA, but edit (change the DNA sequence) the DNA, which will permanently change the DNA sequence.
For FSHD, CRISPR-Cas9 could theoretically be used to edit and destroy the DUX4 PAS, although there are significant hurdles to putting this into clinical practice.
CRISPR-Cas13: The system adapted from bacteria for use in molecular biology or biomedical approaches targeting the destruction of a specific mRNA in the cytoplasm.
For FSHD, CRISPR-Cas13 is used to target the DUX4 mRNA.
Cytosine: A pyrimidine, one of the four main bases found in nucleic acids. In DNA, it exists as a deoxycytosine ribonucleic acid (dC) and base pairs with deoxyguanine ribonucleic acid (dG). It is the only DNA base that can be methylated in vertebrates.
D
D4Z4: A D4Z4 element is a specific DNA sequence in the human genome. Each D4Z4 consists of ~3,303 base pairs of DNA and encodes the first two exons of the DUX4 gene. Multiple copies are repeated and aligned head to tail directly in an array, referred to as a D4Z4 macrosatellite array. There are several types of D4Z4 repeats in the human genome. Those found on chromosomes 4 and 10 are highly similar to each other and are associated with FSHD. Those found elsewhere in the genome have divergent sequence, do not encode DUX4, and are not associated with FSHD.
dCas9: Dead Cas9. An enzymatically inactive Cas9 that cannot cut DNA. However, the dCas9 protein can still be targeted to a specific DNA sequence through its associated guide RNA.
The highly specific DNA and genome targeting capacity of dCas9 is used to bring fusion proteins to a DNA sequence of interest. For example, dCas9-KRAB (KRAB is a transcriptional repressor) will turn OFF a gene of interest and dCas9-VP16 (VP16 is a transcriptional activator) will turn ON a gene of interest.
In FSHD, dCas9-KRAB had been used to shut off pathogenic DUX4 expression.
DNA: Deoxyribonucleic acid. The carrier of genetic information in humans and almost all other organisms.
DNA binding activity: Some proteins or nucleoprotein complexes are capable of interacting with and binding to DNA. There are two main types of DNA binding activities, nonspecific DNA binding and sequence specific DNA binding.
DUX4 protein is a sequence-specific DNA binding protein meaning that it binds to specific DNA sequences in the genome and then activates the nearby gene expression.
DNA methylation: A post-replicative modification of DNA involving the addition of a methyl-group to the DNA. Thus, this is a mechanism to label DNA without changing the sequence. In humans and other vertebrates, DNA methylation only occurs symmetrically on cytosine in the context of a CpG dinucleotide.
DNA Sequencing: The process of determining the order of (or reading) DNA bases in a piece of DNA.
DNMT3B: The gene encoding DNA methyltransferase 3B, the protein that de novo methylates the DNA of the D4Z4 array.
One of the genes mutated in FSHD2.
DUX4 (Double homeobox protein 4): The gene responsible for mediating FSHD pathogenesis (development of disease) is the DUX4 gene, which encodes the DUX4 protein. The gene name is based on the presence of two homeobox sequence motifs that mediate the DNA binding specificity of the DUX4 protein.
The DUX4 protein is a transcriptional activator that, under normal circumstances, turns on a specific set of genes (DUX4 target genes) early in human development but not expressed in muscle.
In FSHD, the DUX4 gene is aberrantly expressed in skeletal muscles and thus again turns on the specific set of DUX4 target genes, many of which would otherwise be off, or not expressed in muscle.
DUX4-fl: The DUX4 gene encodes (produces) at least two different mRNAs that produce two different proteins, DUX4-short (DUX4-s) and DUX4-full length (DUX4-fl). Only the expression of the DUX4-fl mRNA and protein lead to FSHD. When referring to mRNA or protein, “DUX4” generally refers to DUX4-fl.
E
Epiallele: An epigenetic variant of a gene. For example, when two copies of a gene differ by DNA methylation status.
In respect to FSHD, the DUX4 gene residing on a healthy chromosome 4 is highly methylated (hypermethylated) while the DUX4 gene on an FSHD1 contracted chromosome 4 is lowly methylated (hypomethylated). Thus, these two genetically identical but epigenetically different copies of the DUX4 gene are epialleles.
Epigenetics: Heritable context-dependent and DNA sequence independent regulation of gene expression patterns. Cellular memory of gene expression. Gene regulation in the absence of signal.
Eukarya/Eukaryotes/Eukaryotic cells: One of the three domains of life (the others being Archaea and Bacteria). Organisms whose cells contain a nucleus and are enclosed by a plasma membrane. This includes everything from yeast and protists to plants and people. Bacteria and archaea are not eukaryotes; they are both prokaryotes.
Exon: The parts of a gene sequence that become part of the final mature mRNA. There are coding exons, that encode (produce) the amino acid sequence for the gene and protein, and noncoding exons that contain regulatory sequence.
Exome sequencing: The technique of sequencing only the fraction of the genome (~1%) that is expressed as mRNA (exons). The majority of the human genome does not encode protein. Most (but not all) disease-causing mutations are in exons or at intron/exon boundaries and will be identified by exome sequencing. Thus, it is a cheaper and more efficient method to look for mutations when one does not know which gene to look at (this looks at them all).
F
FSHD: Facioscapulohumeral muscular dystrophy. There are two genetic classifications of FSHD, FSHD1 and FSHD2. Both forms of FSHD are clinically the same and caused by aberrant expression of the DUX4 gene in skeletal muscle.
FSHD1: When FSHD is caused genetically by a contraction of one FSHD permissive chromosome 4q35 D4Z4 array to between 1 and 10 repeat units or <38kb of an EcoRI/BlnI restriction fragment.
FSHD2: When FSHD is caused genetically by a pathogenic mutation in a chromatin regulatory protein involved in repressing gene expression from the 4q35 D4Z4 array.
Currently there are three known FSHD2 genes, SMCHD1, DNMT3B, and LRIF1.
FSHD nonpermissive: This refers to not having the genetic requirements associated with chromosome 4q35 distal to the D4Z4 array that lead to polyadenylation of the DUX4 mRNA. Thus, if one is FSHD nonpermissive, they do not have the potential to be FSHD regardless of other genetic and epigenetic conditions. Roughly 25% of the human population has two FSHD nonpermissive chromosome 4q35s.
The most common FSHD nonpermissive sequence is chromosome 4B (4B163 or 4B168).
Being nonpermissive for FSHD means you do not have FSHD and cannot develop FSHD.
FSHD permissive: This refers to having the genetic requirements associated with chromosome 4q35 distal to the D4Z4 array that lead to polyadenylation of the DUX4 mRNA and allows expression of the DUX4 gene. Thus, if one is FSHD permissive, they have the potential to be FSHD if other genetic and epigenetic conditions are met. Roughly 75% of the human population has at least one FSHD permissive chromosome 4q35.
Being permissive for FSHD does not mean you have FSHD.
G
Gene: The fundamental unit of heredity. There are protein coding genes and noncoding genes (do not lead to proteins).
Gene expression: The state of a gene in respect to being ON (expressed) or OFF (not expressed). Gene expression is often cell-type specific and can be responsive to stimuli.
Genome: All the genetic material of an organism. This includes the chromosomes, mitochondria, and chloroplasts (for plants).
The haploid genome is one copy of one’s genetic material. Sperm and eggs have haploid genomes (however, sperm do not have mitochondria). Human somatic cells have a diploid genome, one copy from Mom and one from Dad.
Your mitochondrial genome came entirely from Mom.
Germline mutation: Mutations in the sperm or egg. Germline mutations are passed on to the next generation. This is in contrast to somatic mutations that are acquired through life in other tissues and not passed on to the next generation.
H
Heritable: Able to be passed on to the next generation. This may be in the context of the next cell division or in the context of the next generation of an organism.
DNA methylation patterns are passed on to the next cellular generation and some are passed on after reproduction. Thus, some genes that are methylated in Mom (or Dad) are also methylated in baby.
Histone: Small, highly basic proteins in eukaryotic cells that are used to pack the DNA into the cell nucleus.
Hypermethylation: High levels of DNA methylation on a particular sequence that can show variable levels of DNA methylation.
Hypomethylation: Low levels of DNA methylation on a particular sequence that can show variable levels of DNA methylation.
I
Intron: The parts of a protein coding gene sequence that do not become part of the final mature mRNA.
K
Kb: Kilobase pairs of DNA or 1,000 base pairs of DNA. A measurement of DNA sequence length. Each D4Z4 repeat unit is 3.3kb or 3,300bp of DNA long.
L
LGMD: Limb-girdle muscular dystrophy. A class of muscular dystrophies clinically similar to FSHD but caused by completely different gene mutations and pathogenic mechanisms.
LGMD Type 1 (LGMD1A-1H) are the autosomal dominant forms and LGMD Type 2 (LGMD2A-Q) are the autosomal recessive forms.
M
Methylation: The addition of a methyl group (-CH3, or one carbon and three hydrogens) to DNA, RNA, or protein. These modifications are added post-synthesis. For example, DNA is replicated and then methylated by DNA methyltransferase enzymes.
Mitochondria: Cellular organelles that generate most of the ATP (energy) needed in a cell. They contain an independent genome, called mtDNA.
miRNA: microRNA. These are small noncoding RNAs that are naturally expressed in a cell to regulate gene expression. These RNAs are transcribed from miRNA genes and form a hairpin structure that is then processed to produce a single-stranded antisense RNA that binds to and regulates translation of a specific mRNA.
This technology has been harnessed in the lab to be used as a tool to repress the translation of target specific mRNAs of interest.
In FSHD, miRNAs have been engineered that target the DUX4 mRNA and are being used as a therapeutic approach.
mRNA: Messenger RNA. It is an RNA molecule that is complementary to the exons of a protein-coding gene. It is processed to remove the noncoding introns, capped at the 5’ end with a modified guanine and polyadenylated at the 3’ end using the polyadenylation site (PAS).
The mRNA molecule gets transported from the nucleus to the cytoplasm of a cell where it is translated by ribosomes into protein.
Pre-mRNAs that are not polyadenylated or capped are degraded and do not get translated into proteins. FSHD nonpermissive 4q chromosomes lack the PAS for the DUX4 gene. Thus, the DUX4 mRNA is not polyadenylated and gets degraded such that no DUX4 protein is made.
There are several therapeutic approaches targeting the DUX4 mRNA for degradation, including miRNAs, AONs, and CRISPR Cas13.
Muscular Dystrophy: A group of nine classes of neuromuscular diseases that cause progressive muscle weakness and loss of muscle mass. These are Duchene muscular dystrophy (DMD), Becker muscular dystrophy (BMD), myotonic dystrophy (DM), limb-girdle muscular dystrophy (LGMD), facioscapulohumeral muscular dystrophy (FSHD), congenital muscular dystrophy (CMD), distal muscular dystrophy (DD), oculopharyngeal muscular dystrophy (OPMD), and Emery-Dreifuss muscular dystrophy (EDMD). These muscular dystrophies affect specific muscle groups, have disease-specific symptoms including age of onset and severity, and are caused by mutations in different genes.
N
Next-generation sequencing (NGS): Any one of a number of relatively new high-throughput DNA sequencing technologies. Using NGS, the entire human genome can be sequenced in a single day.
P
PCR: Polymerase chain reaction (PCR) is a molecular biology technique that allows one to amplify any specific piece of DNA to a very high copy number so that it can be studied.
PAS or polyadenylation signal: A polyadenylation signal (PAS) is a specific DNA sequence at the end of a gene that signals for the termination of transcription and the addition of multiple adenylations to the end of an mRNA transcript. This is required for the mRNA to be stabilized and to ultimately become translated into protein. In FSHD, there are two alleles at the end of the DUX4 gene, termed A and B. The presence of a PAS in exon 3 of the DUX4 gene is considered permissive as it stabilizes the DUX4 mRNA and allows for DUX4 protein to be made. The absence of a PAS at the end of the DUX4 gene is considered nonpermissive as the DUX4 mRNA would not be polyadenylated and thus the mRNA is degraded and no DUX4 protein is made.
Penetrance: A term used in genetics that describes the extent to which the properties controlled by a gene, its phenotype, will be expressed. A highly penetrant gene will express itself almost regardless of the effects of environment; whereas a gene with low penetrance will only sometimes produce the symptom or trait with which it has been associated. In some cases, the phenotype in question will occur only when the gene is present; in other cases, they may occur for unrelated reasons. In the case of low penetrance, it is difficult to distinguish environmental from genetic factors.
Penetrance and heritability appear closely related at first glance, but in fact it is possible to carry a huge number of inherited genes with low penetrance and not be aware of them. The opposite is not the case however, if you carry a higher penetrant gene, you will know (assuming the gene has a noticeable effect, many do not). Relatively few of the genes in the genome show high penetrance. Most genes make their little contribution to a very complex milieu of biological interactions, to which many other genes are also contributing. As a result, most genes and their effects and mechanisms of action are very difficult to fully understand, because the required observations and experiments are complex and difficult to devise.
The penetrance of some diseases is age-related, complicating the determination of true penetrance. An example is multiple endocrine neoplasia 1 (MEN 1), a disorder characterized by parathyroid hyperplasia and pancreatic islet-cell and pituitary adenomas. It is due to a mutation in the MENIN gene on chromosome 11q13. In one study the age-related penetrance of MEN1 was 7 percent by age 10 years and nearly 100 percent by age 60 years.
FSHD is moderately penetrant, depending upon the size of one’s D4Z4 array. The shorter the array, the higher the disease penetrance.
Phenotype: The phenotype of an individual organism describes one of its traits or characteristics that is measurable and that is expressed in only a subset of the individuals within that population. Examples include “blue eyes” or “aggressive behavior”. Some phenotypes are controlled entirely by the individual’s genes. Others are controlled by genes but are significantly affected by extragenetic or environmental factors. Still other phenotypes are entirely extragenetic, for example, a person’s language or physical traits that were altered by surgery. Many phenotypes are determined by multiple genes and can also be influenced by environmental factors. Thus, the identity of one or a few alleles of an organism does not always enable prediction of its phenotype.
Polyadenylation: The addition of multiple adenosine monophosphates (As) to the end of an mRNA transcript, which is required for mRNA stability and translation into protein. The end of genes have one or more polyadenylation signals (PAS) that tell the transcription machinery to terminate the mRNA.
In FSHD, elimination of DUX4 polyadenylation is a common experimental therapeutic approach being considered.
Protein: The building blocks of body tissue and also a source of energy for the body. The central dogma of biology states that DNA contains the information (instructions) that is copied by RNA into a “readable” format, which is then translated (made) into protein.
Most genes are protein coding genes, thus the end products of much of your genetic information are proteins.
In FSHD, the DUX4 gene encodes the DUX4 protein.
Q
Quartile: When a population is split into four equal groups (quartiles) based on the distribution of values of a particular variable. The numbers are first put in order from smallest to largest, and the quartiles are the cuts.
For example, if you had a set of 16 values, 1-16, Q1 (first quartile) = 4.5, Q2 (second quartile) = 8.5 and Q3 (third quartile) =12.5.
In the FSHD research test, the DNA methylation levels are reported as quartiles to account for having different number of methylation readings between individual assays allowing for proper comparisons.
R
RNA: Ribonucleic acid. Most commonly, acts as the messenger between DNA and ribosomes to convert the genetic information encoded in the DNA to protein (termed mRNA or messenger RNA). However, RNA can have noncoding regulatory functions as well (termed ncRNA or noncoding RNA). There are many types of ncRNAs. Cells also contain transfer RNAs (tRNAs), which are linked to amino acids, and ribosomal RNAs (rRNAs), which are involved in building proteins (translation).
RNAi: RNA interference, but also called co-suppression or post-translational gene silencing (PTGS). This is a natural cellular mechanism used to destroy unwanted foreign gene expression. It is stimulated by double stranded RNA (dsRNA), which is commonly produced by viruses, in the cytoplasm of a cell.
The dsRNA is chopped up and then used by the cellular machinery to identify and eliminate all RNAs with the same sequences, too, thus protecting the cell from viral gene expression.
This natural cellular technology has been harnessed and is used to artificially knockdown the expression of any gene of interest either by introducing dsRNAs for the gene of interest, or more typically by introducing an RNA (or synthetic molecule) that is antisense to the mRNA of interest. In RNAi, the target gene is still transcribed, or is still expressed and ON, but no protein is made.
In FSHD, one can therapeutically knockdown DUX4 mRNA expression using an antisense RNA (or synthetic molecule) against the DUX4 mRNA and triggering the cellular RNAi response. The DUX4 gene is still ON, but no DUX4 protein is made.
S
Somatic cell: A cell that forms the body of an organism. Any cell that is not a gamete (egg or sperm) or germ cell (cells that become sperm or eggs) or embryonic stem cell.
Somatic mosaicism: See mosaicism (above).
Somatic mutation: These are spontaneous mutations that arise in cells other than the sperm or egg after fertilization and during development or as an adult. Somatic mutations are not passed on to the next generation.
SMCHD1: The gene encoding the Structural Maintenance of Chromosomes flexible Hinge Domain Containing 1 protein involved in maintaining DNA methylation of the D4Z4 array.
The most common gene mutated in FSHD2.
Subtelomere: The region of a chromosome adjacent to the telomeric repeat sequence and structure.
In FSHD, the FSHD-associated 4q35 D4Z4 array is very close to the chromosome 4 telomere, and the subtelomere region contains the 4A or 4B sequence.
T
Telomere: The structure at the end of linear chromosomes; each human chromosome has two telomeres, one at each end. Composed of repetitive DNA and proteins. Protects the chromosome ends from degradation, prevents chromosomes from fusing with each other, and provides a template for initiating DNA replication.
Transcription: The process of copying DNA into RNA by RNA polymerase.
In FSHD, blocking transcription of the DUX4 gene, and thus turning the gene OFF, is a common therapeutic approach.
Transgenic: Containing genetic material in the genome from another organism (or synthetic sequence) introduced using artificial means. The FLExDUX4 FSHD-like mouse is a transgenic mouse because it contains the human DUX4 gene in its mouse genome.
Translation: The process whereby ribosomes synthesize proteins based on the information or code in the mRNA.
In FSHD, preventing translation of the DUX4 mRNA into DUX4 proteins using antisense oligonucleotides (AONs) or miRNAs is a therapeutic approach.
Translocation: When chromosomes rearrange their genetic information due to a chromosome breakage and reattachment to a different chromosome.
X
Xenograft: Growing a tissue or organ from one species in another species.
For FSHD, mice can be generated that grow human skeletal muscle xenografts derived from human FSHD muscle cells. These FSHD xenograft mice recapitulate many of the aspects of FSHD muscle and are useful for preclinical testing potential FSHD therapeutics.