ABSTRACTAnalogous to human leukocyte antigens, blood group antigens are surface markers on the erythro-cyte cell membrane whose structures differ among individuals and which can be serologically identified. The Blood Group Antigen Gene Mutation Database (BGMUT) is an online repository of allelic variations in genes that determine the antigens ofvarious human blood group systems. The database is manually curated with allelic information collated from scientific literature and from direct submissions from research laboratories. Currently, the database documents sequence variations of a total of 1251 alleles of all 40 gene loci that together are known to affect antigens of 30 human blood group systems. When available, information on the geographic or ethnic prevalence of an allele is also provided. The BGMUT website also has general in-formation on the human blood group systems and the genes responsible for them. BGMUT is a part ofthe dbRBC resource of the National Center for Biotechnology Information, USA, and is available online at  http://www.ncbi.nlm.nih.gov/projects/gv/rbc/xslcgi.fcgi?cmd=bgmut. The database should be of use to members of the transfusion medicine community, those interested in studies of genetic variation and related topics such as human migrations, and students as well as members of the general public.INTRODUCTIONThe BGMUT Blood Group Antigen Gene Mutation Database documents variations in genes that encode antigens for human blood groups. It is a part of the dbRBC resource (1) of the National Center for Biotechnology Information (NCBI) of USA and can be freely accessed online at http://www.ncbi.nlm.nih.gov/projects/gv/rbc/xslcgi.fcgi?cmd=bgmut.Recent documentation of the extent and the surprisingly high numbers of mutations in the human genome have suggested that, perhaps with the exception of identical twins, no two individuals bear exact copies of chromosomal DNA. In those studies, DNA of random subjects is compared but more often phenotypic differences observed in disease states, whether single gene inherited disorders or in association studies of complex conditions are taken as criteria for selection of individuals whose DNA is examined for sequence changes. In the latter studies, large fragments of DNA are usually examined and compared statistically to matched control individuals. Changes in blood group phenotypes are another criterion for selection of subjects who may show differences in sequences of two or more defined sets of genes. These genes encode a group of red cell membrane proteins that are polymorphic in the population and are defined as blood group antigens; in addition, these genes may encode certain glycosyl transferases that are involved in the synthesis of red cell membrane glycans whose structures also differ among individuals. The former group consists of structural molecules, channels, adhesion molecules or enzymes and, excluding their red cell membrane location, can be considered as representative of any human protein, whereas the latter are similar tomost other glycosyl transferases. Evidence suggests that sequence changes in these proteins or even their absence,such as in null phenotypes, is not, in most cases, physiologically harmful. The proteins or the glycans fulfill their role as blood group antigens because they are polymorphic in the population and their sequence changes can be readily predictedby serological approaches; they are known as ‘antigens ’because in the course of transfusion, or pregnancy, the presence of a variant protein epitope is recognized as‘non-self’ and may ultimately result in an adverse immunological reaction. The use of transfusion being ubiquitous in the practice of medicine, populations world-wide are serologically tested and variant antigens and their genes, in contrast to many other variant genes arebeing documented in a large number of diverse populations. Although some variants occur rarely and, may onlybe observed in a single individual or family, others appearin unexpectedly large populations, such as the MiIIIphenotype encoded by the MiIIIGYPAgene, in Taiwan (incidence can be as high as 88% among Ami tribes) (2).For many alleles, the database also provides informationon the geographic or ethnic origin of alleles and/or theirassociated serological phenotypes when such informationwas presented in the publications describing the alleles.  This may be of use to those interested in populationmigrations.HISTORY AND CURRENT STATEBGMUT was developed in 1999 as a locus-specific gene mutation database under the aegis of the HumanVariation Genome Society. It was curated under the direction of one of the authors (OOB) with original information contributed by more than a dozen blood group system experts. The database was hosted online by the Department of Biochemistry at the Albert Einstein College of Medicine, New York. BGMUT was identifiedas one of three model locus-specific databases from morethan 200 in a scholarly review (3). In 2006, BGMUTbecame a part of the dbRBC resource of the NCBI.  At dbRBC, curatorship and direction for maintenanceof the database has been provided by another of the authors (WH). The number of alleles in BGMUT has ap-proximately doubled since 2004 when the database wasfirst described in a scholarly publication (4). This publica-tion has been referred to many times in the scientific literature indicating that BGMUT has been a usefulresource. Links to BGMUT database records are availableon relevant pages of the Wikipedia online encyclopedia,and on many of NCBI’s online resources. BGMUT is alsoa part of the PhenCode project which attempts to integrate genetic variation data with the UCSC Genome Browser (5).As of August 2011, BGMUT had 1251 alleles belongingto 40 genes that are together responsible for 30 humanblood group systems (Table 1). Alleles of some genes, such as ABO and RHCE/D which are, respectively, responsible for the ABO and Rh blood group systemsmost frequently examined in the populations, are morenumerous than those of others (Table 1). As per theInternational Society for Blood Transfusion, there are 30human blood group systems (http://ibgrl.blood.co.uk/ISBT%20Pages/ISBT%20Terminology%20Pages/Table%20of%20blood%20group%20systems.htm), all of whichare covered by BGMUT. The Globoside (GLOB) system iscurrently considered a part of the P1PK system inBGMUT, which additionally considers the system of T and Tn antigens as a separate blood group system.DATABASE ARCHITECTUREBGMUT is accessible online for view, search or for ad-ministration as a website in the form of HTML (hyper-text markup language) pages (Figures 1 and 2).  A Microsoft SQL Server relational database is used fordata storage. Programmatic code in SQL (structuredquery language), C++and XSLT (extensible structuredlanguage transformations) languages is used for inter-action with the SQL Server database and for renderingBGMUT’s web-based interface. Raw data on all or asubset of the alleles described in BGMUT can be down-loaded as tab-delimited or comma-separated (CSV) textformats from the BGMUT website. Compilations ofallelic sequences for the ABO, H, MNS and Rh systemsin the Microsoft Excel format are also available fordownload.DATABASE CURATION AND ALLELE SUBMISSIONThe BGMUT database is curated manually. Allelic information is periodically collated from scientific literature, asis the case for a majority of the alleles listed in BGMUT, or is obtained as direct submissions from researchers through the database website. During the process ofcuration, good quality of methods used in a study is ascertained. The new candidate allele’s sequence that has been published and/or submitted to a publicly available repository such as NCBI’s GenBank is compared to BGMUT’s reference allele for the gene sequence. The sequence positions and the kinds of the deduced aminoacid changes are also verified. Authors are consulted incase of a question or disagreement. For direct submission of information on a new allele for inclusion in the database, a scientific publication describing the allele is not required. However, submitters are encouraged to deposit the allele’s sequence in a publicly available repository. In the absence of a scientific publication, this is arequirement.ALLELES IN BGMUTAlleles in BGMUT are grouped by the blood group system that the genes they belong to affect. For each allele in the database, BGMUT provides details on the nucleotide changes and the deduced amino acid changes in the protein encoded by the gene the allele belongs to. These changes are in context of a ‘reference’ allele that itself is included in BGMUT, and is the same for all alleles of agene. Besides the information on the sequence changes, BGMUT also details for an allele the frequency of occurrence, the associated blood group phenotype, references to the studies that identified and characterized the allele and accession numbers of the relevant sequences in NCBI GenBank when such information is available. GenBank accession numbers, however, are not available for many alleles because though they have been described in published literature, their sequences were not deposited inthe repository by the authors. When known, the regions of the gene or cDNA that were sequenced to identify the allele, the prevalence of the allele in different geographical regions or ethnic populations and association of the allele with diseases are also noted. Often, a name is also provided for an allele. Names make it easier to refer to alleles and can indicate the associated phenotype and/or nucleotide or amino acidvariation. Table 1. Blood group systems in the BGMUT database