Alignment and germline assignment == Each sequence was first aligned to each V gene using the SmithWaterman algorithm with an affine gap penalty [56]. or directly neutralizing their target. This diversity is made possible by the processes ofVDJ recombination, in which random joining of V, D and J genes generates an initial combinatorial diversity of BCR sequences, andaffinity maturation, which further modifies these sequences. The affinity maturation process, in which antibodies increase binding affinity for their cognate antigens, is essential to mounting a precise humoral immune response. Affinity maturation proceeds via a nucleotide substitution process that combines Darwinian mutation and selection processes. Mutational diversity is usually generated bysomatic hypermutation(SHM), in which a targeted molecular mechanism mutates the BCR sequence. This diversity is usually then exceeded through a selective sieve in which B cells that bind well to antigen are stimulated to divide, whereas those that do not bind well or bind to self are marked for destruction. The combination of VDJ recombination and affinity maturation enables B cells to respond to an almost limitless diversity of antigens. Understanding the substitution process and selective causes shaping the diversity of the memory B-cell repertoire thus has implications for disease prophylaxis and treatment. It has recently become possible to gain detailed information about the B-cell repertoire using high-throughput sequencing [15]. Recent reviews have highlighted the need for new computational tools that make use of BCR sequence data to bring new insight, including the need for reproducible computational pipelines [69]. Demanding analysis of the B-cell repertoire will require statistical analysis of how evolutionary processes define affinity maturation. Statistical nucleotide molecular development models are often described in terms of three Ptgs1 interrelated processes: mutation, the process generating diversity; selection, the process determining survival or loss of mutations and substitution, the observed process of evolution that follows from your first two processes. One major vein of research has focused on how nucleotide mutation rates depend on the identity of surrounding nucleotides (examined in [10]; see also [11,12]), but little has been carried out concerning other aspects of Tipifarnib (Zarnestra) the process, such as how the substitution process differs between gene segments. Along with mutation, selection owing to competition for antigen binding forms the other key part of the affinity maturation process. Inference of selective pressures in this context is complicated by nucleotide context-dependent mutation, leading some authors to proclaim that such selection inference is not possible [13]. Indeed, if one does not correct for context-dependent mutation bias, interactions between those motifs and the genetic code can lead to false-positive identification of selective pressure. Previous work has developed methodology to analyse selection on sequence tracts in this context (examined in 3b), but no methods have yet Tipifarnib (Zarnestra) achieved the goal of statistical per-residue selection estimates. This has, however, been recently recognized as an important goal [11]. Such selection estimates could be used to better direct generation of synthetic libraries of Tipifarnib (Zarnestra) antibodies for high-throughput screening. Another application would be to the engineering of antibody Fc regions with specific properties, such as for bispecific monoclonal antibodies or antibody-derived fragments, while preserving overall stability. The ensemble of germline V, D and J genes that rearrange to encode antibodies (equivalently: immunoglobulins) are divided into Tipifarnib (Zarnestra) nested units. They can first be recognized by theirlocus: IGH, denoting the heavy chain; IGK, denoting the kappa light chain; or IGL, denoting the lambda light chain. Our dataset contains solely the IGH locus, so we will frequently omit the locus prefix for simplicity. Genes within a locus can be first subdivided by theirsegment, which is whether they are a V, D or J gene. IGHV genes are further divided intosubgroupswhich share at least 75% nucleotide identity. Genes also have polymorphisms that are grouped intoalleles, which represent polymorphisms of the gene between individuals [14]. VDJ recombination does not usually produce a functional antibody, such as when the V and J segments are not in the same reading frame after recombination (anout-of-framerearrangement).