A user-friendly workflow for analysis of Illumina gene expression bead array data available at the arrayanalysis.org portal.

Related Articles

A user-friendly workflow for analysis of Illumina gene expression bead array data available at the arrayanalysis.org portal.

BMC Genomics. 2015;16:482

Authors: Eijssen LM, Goelela VS, Kelder T, Adriaens ME, Evelo CT, Radonjic M

Abstract

BACKGROUND: Illumina whole-genome expression bead arrays are a widely used platform for transcriptomics. Most of the tools available for the analysis of the resulting data are not easily applicable by less experienced users. ArrayAnalysis.org provides researchers with an easy-to-use and comprehensive interface to the functionality of R and Bioconductor packages for microarray data analysis. As a modular open source project, it allows developers to contribute modules that provide support for additional types of data or extend workflows.

RESULTS: To enable data analysis of Illumina bead arrays for a broad user community, we have developed a module for ArrayAnalysis.org that provides a free and user-friendly web interface for quality control and pre-processing for these arrays. This module can be used together with existing modules for statistical and pathway analysis to provide a full workflow for Illumina gene expression data analysis. The module accepts data exported from Illumina’s GenomeStudio, and provides the user with quality control plots and normalized data. The outputs are directly linked to the existing statistics module of ArrayAnalysis.org, but can also be downloaded for further downstream analysis in third-party tools.

CONCLUSIONS: The Illumina bead arrays analysis module is available at http://www.arrayanalysis.org . A user guide, a tutorial demonstrating the analysis of an example dataset, and R scripts are available. The module can be used as a starting point for statistical evaluation and pathway analysis provided on the website or to generate processed input data for a broad range of applications in life sciences research.

PMID: 26122086 [PubMed – in process]

Determination of DNA Methylation Levels Using Illumina HumanMethylation450 BeadChips.

Related Articles

Determination of DNA Methylation Levels Using Illumina HumanMethylation450 BeadChips.

Methods Mol Biol. 2015;1288:143-92

Authors: Carless MA

Abstract

DNA methylation is a modifiable epigenetic phenomenon that has a strong influence over transcriptional regulation and as such has been consistently implicated in development and disease. Several platforms are targeted toward the identification of DNA methylation changes that might be pertinent to the disease process and include regional analysis (e.g., pyrosequencing) as well as genome-wide analysis (e.g., next-generation sequencing and microarray). The Illumina HumanMethylation450 BeadChip is one of the most comprehensive microarray platforms available, and due to the high costs associated with next-generation sequencing, it is becoming a widely used tool for the analysis of genome-wide DNA methylation levels. Providing quantitative DNA methylation levels at 482,421 CpG sites within CpG islands, shores, and shelves, as well as intergenic regions, the HumanMethylation450 BeadChip can allow accurate assessment of differential methylation across large studies. This chapter outlines the laboratory methodologies associated with performing the Illumina Infinium Methylation Assay, including bisulfite conversion, whole-genome amplification, BeadChip hybridization, XStain procedures, and imaging systems. Furthermore, this chapter provides an outline of data analysis tools, including the GenomeStudio pipeline, quality control measures, and additional statistical considerations. This comprehensive overview can aid not only in performing the Illumina Infinium Methylation Assay but also in the interpretation of data derived from this platform.

PMID: 25827880 [PubMed – in process]

Hidden Markov Model-Based CNV Detection Algorithms for Illumina Genotyping Microarrays.

Hidden Markov Model-Based CNV Detection Algorithms for Illumina Genotyping Microarrays.

Cancer Inform. 2014;13(Suppl 7):77-83

Authors: Seiser EL, Innocenti F

Abstract

Somatic alterations in DNA copy number have been well studied in numerous malignancies, yet the role of germline DNA copy number variation in cancer is still emerging. Genotyping microarrays generate allele-specific signal intensities to determine genotype, but may also be used to infer DNA copy number using additional computational approaches. Numerous tools have been developed to analyze Illumina genotype microarray data for copy number variant (CNV) discovery, although commonly utilized algorithms freely available to the public employ approaches based upon the use of hidden Markov models (HMMs). QuantiSNP, PennCNV, and GenoCN utilize HMMs with six copy number states but vary in how transition and emission probabilities are calculated. Performance of these CNV detection algorithms has been shown to be variable between both genotyping platforms and data sets, although HMM approaches generally outperform other current methods. Low sensitivity is prevalent with HMM-based algorithms, suggesting the need for continued improvement in CNV detection methodologies.

PMID: 25657572 [PubMed]

Transfer of clinically relevant gene expression signatures in breast cancer: from Affymetrix microarray to Illumina RNA-Sequencing technology.

Transfer of clinically relevant gene expression signatures in breast cancer: from Affymetrix microarray to Illumina RNA-Sequencing technology.

BMC Genomics. 2014 Nov 21;15(1):1008

Authors: Fumagalli D, Blanchet-Cohen A, Brown D, Desmedt C, Gacquer D, Michiels S, Rothé F, Majjaj S, Salgado R, Larsimont D, Ignatiadis M, Maetens M, Piccart M, Detours V, Sotiriou C, Haibe-Kains B

Abstract

BACKGROUND: Microarrays have revolutionized breast cancer (BC) research by enabling studies of gene expression on a transcriptome-wide scale. Recently, RNA-Sequencing (RNA-Seq) has emerged as an alternative for precise readouts of the transcriptome. To date, no study has compared the ability of the two technologies to quantify clinically relevant individual genes and microarray-derived gene expression signatures (GES) in a set of BC samples encompassing the known molecular BC’s subtypes. To accomplish this, the RNA from 57 BCs representing the four main molecular subtypes (triple negative, HER2 positive, luminal A, luminal B), was profiled with Affymetrix HG-U133 Plus 2.0 chips and sequenced using the Illumina HiSeq 2000 platform. The correlations of three clinically relevant BC genes, six molecular subtype classifiers, and a selection of 21 GES were evaluated.

RESULTS: 16,097 genes common to the two platforms were retained for downstream analysis. Gene-wise comparison of microarray and RNA-Seq data revealed that 52% had a Spearman’s correlation coefficient greater than 0.7 with highly correlated genes displaying significantly higher expression levels. We found excellent correlation between microarray and RNA-Seq for the estrogen receptor (ER; rs =0.973; 95%CI: 0.971-0.975), progesterone receptor (PgR; rs =0.95; 0.947-0.954), and human epidermal growth factor receptor 2 (HER2; rs =0.918; 0.912-0.923), while a few discordances between ER and PgR quantified by immunohistochemistry and RNA-Seq/microarray were observed. All the subtype classifiers evaluated agreed well (Cohen’s kappa coefficients >0.8) and all the proliferation-based GES showed excellent Spearman correlations between microarray and RNA-Seq (all rs >0.965). Immune-, stroma- and pathway-based GES showed a lower correlation relative to prognostic signatures (all rs >0.6).

CONCLUSIONS: To our knowledge, this is the first study to report a systematic comparison of RNA-Seq to microarray for the evaluation of single genes and GES clinically relevant to BC. According to our results, the vast majority of single gene biomarkers and well-established GES can be reliably evaluated using the RNA-Seq technology.

PMID: 25412710 [PubMed – as supplied by publisher]

EXPRSS: an Illumina based high-throughput expression-profiling method to reveal transcriptional dynamics.

Related Articles

EXPRSS: an Illumina based high-throughput expression-profiling method to reveal transcriptional dynamics.

BMC Genomics. 2014 May 6;15(1):341

Authors: Rallapalli G, Kemen EM, MacLean D, Robert-Seilaniantz A, Segonzac C, Etherington G, Sohn KH, Jones JD

Abstract

EXpression Profiling through Randomly Sheared cDNA tag Sequencing (EXPRSS) employs adaptive focused acoustics to randomly shear cDNA and generate sequence tags at a relatively defined position (~150-200 bp) from the 3′ end of each mRNA. EXPRSS is a strand specific and restriction enzyme independent tag sequencing method that does not require cDNA length-based data transformations, reveals alternative polyadenylation, polyadenylated antisense transcripts and is highly reproducible. It is high-throughput, cost-effective using barcoded multiplexing, avoids the biases of existing SAGE and derivative methods and can reveal polyadenylation position from paired-end sequencing. Implementation of the EXPRSS method was verified through comparative analysis of expression data generated from EXPRSS, NlaIII-DGE and Affymetrix microarray and through qPCR quantification of selected genes. Unlike array-based methods, it can be applied to genomes for which high-quality reference sequences are unavailable.

PMID: 24884414 [PubMed – as supplied by publisher]

Design of Large-Insert Jumping Libraries for Structural Variant Detection Using Illumina Sequencing.

Related Articles

Design of Large-Insert Jumping Libraries for Structural Variant Detection Using Illumina Sequencing.

Curr Protoc Hum Genet. 2014;80:7.22.1-7.22.9

Authors: Hanscom C, Talkowski M

Abstract

Next-generation sequencing is an important and efficient tool for the identification of structural variation, particularly balanced chromosomal rearrangements, because such events are not routinely detected by microarray and localization of altered regions by karyotype is imprecise. Indeed, the degree of resolution that can be obtained through next-generation technologies enables elucidation of precise breakpoints and has facilitated the discovery of numerous pathogenic loci in human disease and congenital anomalies. The protocol described here explains one type of large-insert “jumping library” and the steps required to generate such a library for multiplexed sequencing using Illumina sequencing technology. This approach allows for cost-efficient multiplexing of samples and provides a very high yield of fragments with large inserts, or “jumping” fragments. Curr. Protoc. Hum. Genet. 80:7.22.1-7.22.9. © 2014 by John Wiley & Sons, Inc.

PMID: 24789519 [PubMed – as supplied by publisher]

iCall: A genotype-calling algorithm for rare, low-frequency and common variants on the Illumina exome array.

Related Articles

iCall: A genotype-calling algorithm for rare, low-frequency and common variants on the Illumina exome array.

Bioinformatics. 2014 Feb 23;

Authors: Zhou J, Tantoso E, Wong LP, Ong RT, Bei JX, Li Y, Liu J, Khor CC, Teo YY

Abstract

MOTIVATION: Next-generation genotyping microarrays have been designed with insights from 1000 Genomes Project and whole exome-sequencing studies. These arrays additionally include variants that are typically present at lower frequencies. Determining the genotypes of these variants from hybridization intensities is challenging as there is less support to locate the presence of the minor alleles when the allele counts are low. Existing algorithms are mainly designed for calling common variants and are notorious for failing to generate accurate calls for low-frequency and rare variants. Here we introduce a new calling algorithm, iCall, to call genotypes for variants across the whole spectrum of allele frequencies.

RESULTS: We benchmarked iCall against four of the most commonly used algorithms, GenCall, optiCall, illuminus and GenoSNP, as well as a post-processing caller zCall that adopted a two-stage calling design. Normalized hybridization intensities for 12,370 individuals genotyped on the Illumina HumanExome BeadChip were considered, of which 81 individuals were also whole-genome sequenced. The sequence calls were used to benchmark the accuracy of the genotype calling and our comparisons indicated that iCall outperforms all four single-stage calling algorithms in terms of call rates and concordance, particularly in the calling accuracy of minor alleles which is the principal concern for rare and low-frequency variants. The application of zCall to post-process the output from iCall also produced marginally improved performance to the combination of zCall and GenCall.

AVAILABILITY: iCall is implemented in C++ for use on Linux operating systems and is available for download at http://www.statgen.nus.edu.sg/~software/icall.html.

CONTACT: zhoujin@nus.edu.sg, statyy@nus.edu.sg.

PMID: 24567545 [PubMed – as supplied by publisher]

The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing.

Related Articles

The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing.

Hum Genet. 2013 Jun 12;

Authors: Boland JF, Chung CC, Roberson D, Mitchell J, Zhang X, Im KM, He J, Chanock SJ, Yeager M, Dean M

Abstract

We assessed the performance of the new Life Technologies Proton sequencer by comparing whole-exome sequence data in a Centre d’Etude du Polymorphisme Humain trio (family 1463) to the Illumina HiSeq instrument. To simulate a typical user’s results, we utilized the standard capture, alignment and variant calling methods specific to each platform. We restricted data analysis to include the capture region common to both methods. The Proton produced high quality data at a comparable average depth and read length, and the Ion Reporter variant caller identified 96 % of single nucleotide polymorphisms (SNPs) detected by the HiSeq and GATK pipeline. However, only 40 % of small insertion and deletion variants (indels) were identified by both methods. Usage of the trio structure and segregation of platform-specific alleles supported this result. Further comparison of the trio data with Complete Genomics sequence data and Illumina SNP microarray genotypes documented high concordance and accurate SNP genotyping of both Proton and Illumina platforms. However, our study underscored the problem of accurate detection of indels for both the Proton and HiSeq platforms.

PMID: 23757002 [PubMed – as supplied by publisher]

Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array.

Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array.

Epigenetics Chromatin. 2013 Mar 3;6(1):4

Authors: Price EM, Cotton AM, Lam LL, Farré P, Emberly E, Brown CJ, Robinson WP, Kobor MS

Abstract

BACKGROUND: Measurement of genome-wide DNA methylation (DNAm) has become an important avenue for investigating potential physiologically-relevant epigenetic changes. Illumina Infinium (Illumina, San Diego, CA, USA) is a commercially available microarray suite used to measure DNAm at many sites throughout the genome. However, it has been suggested that a subset of array probes may give misleading results due to issues related to probe design. To facilitate biologically significant data interpretation, we set out to enhance probe annotation of the newest Infinium array, the HumanMethylation450 BeadChip (450k), with >485,000 probes covering 99% of Reference Sequence (RefSeq) genes (National Center for Biotechnology Information (NCBI), Bethesda, MD, USA). Annotation that was added or expanded on includes: 1) documented SNPs in the probe target, 2) probe binding specificity, 3) CpG classification of target sites and 4) gene feature classification of target sites. RESULTS: Probes with documented SNPs at the target CpG (4.3% of probes) were associated with increased within-tissue variation in DNAm. An example of a probe with a SNP at the target CpG demonstrated how sample genotype can confound the measurement of DNAm. Additionally, 8.6% of probes mapped to multiple locations in silico. Measurements from these non-specific probes likely represent a combination of DNAm from multiple genomic sites. The expanded biological annotation demonstrated that based on DNAm, grouping probes by an alternative high-density and intermediate-density CpG island classification provided a distinctive pattern of DNAm. Finally, variable enrichment for differentially methylated probes was noted across CpG classes and gene feature groups, dependant on the tissues that were compared. CONCLUSION: DNAm arrays offer a high-throughput approach for which careful consideration of probe content should be utilized to better understand the biological processes affected. Probes containing SNPs and non-specific probes may affect the assessment of DNAm using the 450k array. Additionally, probe classification by CpG enrichment classes and to a lesser extent gene feature groups resulted in distinct patterns of DNAm. Thus, we recommend that compromised probes be removed from analyses and that the genomic context of DNAm is considered in studies deciphering the biological meaning of Illumina 450k array data.

PMID: 23452981 [PubMed – as supplied by publisher]

Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray.

Related Articles

Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray.

Epigenetics. 2013 Jan 11;8(2)

Authors: Chen YA, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R

Abstract

DNA methylation, an important type of epigenetic modification in humans, participates in crucial cellular processes, such as embryonic development, X-inactivation, genomic imprinting and chromosome stability. Several platforms have been developed to study genome-wide DNA methylation. Many investigators in the field have chosen the Illumina Infinium HumanMethylation microarrays for its ability to reliably assess DNA methylation following sodium bisulfite conversion. Here, we analyzed methylation profiles of 489 adult males and 357 adult females generated by the Infinium HumanMethylation450 microarray. Among the autosomal CpG sites that displayed significant methylation differences between the two sexes, we observed a significant enrichment of cross-reactive probes co-hybridizing to the sex chromosomes with more than 94% sequence identity. This could lead investigators to mistakenly infer the existence of significant autosomal sex-associated methylation. Using sequence identity cutoffs derived from the sex methylation analysis, we concluded that 6% of the array probes can potentially generate spurious signals because of co-hybridization to alternate genomic sequences that are highly homologous to the intended targets. Additionally, we discovered probes targeting polymorphic CpGs that overlapped SNPs. The methylation levels detected by these probes are simply the reflection of underlying genetic polymorphisms but could be misinterpreted as true signals. The existence of probes that are cross-reactive or target polymorphic CpGs in the Illumina HumanMethylation microarrays can confound data obtained from these microarrays. Therefore, investigators should exercise caution when significant biological associations are found using these array platforms. A list of all cross-reactive probes and polymorphic CpGs identified by us are annotated in this paper.

PMID: 23314698 [PubMed – as supplied by publisher]