The Eukaryotic Linear Motif resource for
Functional Sites in Proteins
Accession:
Functional site class:
Integrin-binding motifs of the Collagen-type
Functional site description:
Integrins are cell adhesion-mediating receptors present in all metazoans. Each integrin is composed of one α and one β subunit; in humans, 18 α and 8 β subunits can combine to form 24 different dimers, each with unique ligand specificities. Four of the human integrins: α1β1, α2β1, α10β1, α11β1 are major interactors of the extracellular matrix, recognizing collagens by binding to SLiMs embedded into the collagen triple helix [Gullberg,2002]. The motif-binding pocket is located on the α-I domain, containing a divalent cation, which is a critical component of the interaction with the ligand. The integrin binding motifs in collagens contains an invariable Glu residue, which helps in the proper coordination of this divalent cation, together with residues from the α-I domain itself [22030389], propagating the signal towards other parts of the integrin [Emsley,2000]. Thus, the interaction is ion-mediated and is sensitive to the presence of the correct ions in the buffer, similarly to RGD-like interactions.
ELM Description:
This integrin-interaction motif is found exclusively in various collagens. Collagens form trimers in a collagen triple helix conformation. This unique structural unit requires repeats of Pro-Pro-Gly residue triplets, which constitutes large sections of various collagens. The motif is centered around the proline preceding the glycine, and this proline is most often hydroxylated, marked by the parentheses in the regular expression. Widespread proline hydroxylation is a hallmark of collagens, and this motif seems to favour it [Sipila,2018], even though it is not a strict requirement for integrin binding, as certain peptides conforming to the motif definition have been observed to bind integrins in their activated states.
A variant of the collagen motif is recognized by the collagen-binding family of integrins which, in humans, has four members: α1β1, α2β1, α10β1, α11β1. In all cases, the motif recognition is mediated by the I-domain in the α subunit. The glutamic acid in the +2 position counting from the hydroxylated proline is essential, as it coordinates a divalent cation embedded in the interacting I-domain of the integrin. The hydrophobic residue in the -1 position seems to be variable based on peptide binding assays, with methionine probably being suboptimal, being recognized only by activated integrins.
In the last position of the motif, Lys is accepted but is suboptimal, leading to weaker binding, while substituting Asp in the preceding +2 position is unlikely to bind, as a Glu-Asp mutation completely abolished binding in the case of the collagen αlpha-1(I) chain [Knight,2000].
Based on synthetic peptide assays, NMR and SAXS, the GL(P)GEN peptide (also called GLOGEN peptide, with O marking hydroxylated proline), found in collagen α-1(XXII) chain and collagen α-1(III) chain, is a good binder for integrin α1, but α2 integrin might have different selectivity profile [Chin,2013].
Pattern: G[FLMR](P)GE[RKNA]
Pattern Probability: 0.0000019
Present in taxon: Metazoa
Interaction Domains:
o See 7 Instances for LIG_Integrin_collagen_1
o Abstract
Integrins are metazoan-specific receptors not present in the other crown group taxa fungi or viridiplantae. All human cells express one or more of the 24 types of dimeric integrins spanning the plasma membrane [Barczyk,2009], which mediate signals between the intracellular space, and neighbouring cells or the extracellular matrix [Takada,2007; Campbell,2011; Hynes,2002]. The presence and ratio of various integrins reflect the cell’s function. Eight of the 24 human integrins, which resemble the evolutionarily most ancient metazoan integrins, can recognize RGD and RGD-like sequence patterns in their ligands: components of the extracellular matrix (ECM), cell surface proteins of cells or other extracellular signaling proteins. In these integrins, the motif binding site is a deep groove between the α and the β subunits of the integrin, with the bound ligand making contacts with both. However, in evolutionarily newer integrins, an additional domain, called the α-I domain, or the inserted interaction domain, can be found on the α subunit. This domain fills up the canonical RGD binding site, and can bind a different type of motif on its opposing side. Once ligand binding occurs, the α-I undergoes a structural rearrangement, filling up the RGD-binding site and initiating downstream signaling similar to RGD-binding integrins.

α-I domains of four integrins - α1β1, α2β1, α10β1, α11β1 - have been described to bind collagens. While there is ample structural and biochemical data on the direct interactions between the described motif in collagens, and α1β1 and α2β1 integrins, the exact details of collagen binding by α10β1 and α11β1 integrins are much less explored. Currently, there is no direct proof that these integrins recognize the same motifs in the same binding mode; however, their similarities to α1β1 and α2β1 integrins makes this very likely.

So far, the well described proteins harbouring this motif are all collagens, and it is presumed that this motif is highly characteristic of this protein family. This is supported by the notion that, contrary to most linear motifs, this specific integrin binding motif occurs in the ordered structural context of trimeric collagen helices. Well studied instances of these collagen motifs seem to lose their binding capacity when taken out of this structural context [Knight,2000]. However, at least one other non-collagen protein has been described in detail that seems to utilize the same binding mode, the streptococcal collagen-like protein [Humtsoe,2005].

In the collagen instances detailed in this entry, the motif is embedded in the typical collagen PPG repeats, with the middle proline being the central residue. This proline is most often hydroxylated, as is common for prolines in collagens in general. Most of our understanding of this motif comes from binding assays with synthetic peptides. Based on these observations [Carafoli,2013], there are two binding modes: high affinity is interacting with two out of the 3 chains of a collagen trimer, while low affinity is interacting with only one. This motif, at least in the high affinity mode, needs the ordered scaffold of the collagen trimer, but does not need the integrin to be activated prior to binding. In contrast, the low affinity mode requires the integrin to be activated, and the two binding modes have different preferences for residues. Several variations can bind integrin α2β1, including GFOGER, GMOGER, GROGER, GLOGEN, GLOGER, GLOGEA, GFOGEK and GFPGER (with ‘O’ representing the hydroxy-proline) although the affinities towards activated and non-activated integrins vary in a wide range. Met in the -1 position seems to be suboptimal but tolerated, and can bind with high affinity to the activated integrin subunit. If the peptide is ideal, such as GFOGER [Knight,2000], then the interaction can form even without the proline hydroxylation; however, proline hydroxylation seems to enhance the binding.
o 8 selected references:

o 20 GO-Terms:

o 7 Instances for LIG_Integrin_collagen_1
(click table headers for sorting; Notes column: =Number of Switches, =Number of Interactions)
Acc., Gene-, NameStartEndSubsequenceLogic#Ev.OrganismNotes
P02462 COL4A1
CO4A1_HUMAN
385 390 PGQAGAPGFPGERGEKGDRG TP 5 Homo sapiens (Human)
1 
P02453 COL1A1
CO1A1_BOVIN
679 684 SGARGERGFPGERGVQGPPG TP 2 Bos taurus (Cattle)
2 
P02452 COL1A1
CO1A1_HUMAN
680 685 SGARGERGFPGERGVQGPPG TP 2 Homo sapiens (Human)
2 
P02461 COL3A1
CO3A1_HUMAN
288 293 PGLKGENGLPGENGAPGPMG TP 2 Homo sapiens (Human)
1 
Q8NFW1 COL22A1
COMA1_HUMAN
1432 1437 TGLMGPQGLPGENGPVGPPG TP 2 Homo sapiens (Human)
1 
P02458 COL2A1
CO2A1_HUMAN
702 707 VGPRGERGFPGERGSPGAQG TP 2 Homo sapiens (Human)
1 
P02458 COL2A1
CO2A1_HUMAN
327 332 PGPMGPRGLPGERGRTGPAG TP 1 Homo sapiens (Human)
1 
Please cite: The Eukaryotic Linear Motif resource: 2022 release. (PMID:34718738)

ELM data can be downloaded & distributed for non-commercial use according to the ELM Software License Agreement