EXPRSS: an Illumina based high-throughput expression-profiling method to reveal transcriptional dynamics.

Related Articles

EXPRSS: an Illumina based high-throughput expression-profiling method to reveal transcriptional dynamics.

BMC Genomics. 2014 May 6;15(1):341

Authors: Rallapalli G, Kemen EM, MacLean D, Robert-Seilaniantz A, Segonzac C, Etherington G, Sohn KH, Jones JD

Abstract

EXpression Profiling through Randomly Sheared cDNA tag Sequencing (EXPRSS) employs adaptive focused acoustics to randomly shear cDNA and generate sequence tags at a relatively defined position (~150-200 bp) from the 3′ end of each mRNA. EXPRSS is a strand specific and restriction enzyme independent tag sequencing method that does not require cDNA length-based data transformations, reveals alternative polyadenylation, polyadenylated antisense transcripts and is highly reproducible. It is high-throughput, cost-effective using barcoded multiplexing, avoids the biases of existing SAGE and derivative methods and can reveal polyadenylation position from paired-end sequencing. Implementation of the EXPRSS method was verified through comparative analysis of expression data generated from EXPRSS, NlaIII-DGE and Affymetrix microarray and through qPCR quantification of selected genes. Unlike array-based methods, it can be applied to genomes for which high-quality reference sequences are unavailable.

PMID: 24884414 [PubMed - as supplied by publisher]

Design of Large-Insert Jumping Libraries for Structural Variant Detection Using Illumina Sequencing.

Related Articles

Design of Large-Insert Jumping Libraries for Structural Variant Detection Using Illumina Sequencing.

Curr Protoc Hum Genet. 2014;80:7.22.1-7.22.9

Authors: Hanscom C, Talkowski M

Abstract

Next-generation sequencing is an important and efficient tool for the identification of structural variation, particularly balanced chromosomal rearrangements, because such events are not routinely detected by microarray and localization of altered regions by karyotype is imprecise. Indeed, the degree of resolution that can be obtained through next-generation technologies enables elucidation of precise breakpoints and has facilitated the discovery of numerous pathogenic loci in human disease and congenital anomalies. The protocol described here explains one type of large-insert “jumping library” and the steps required to generate such a library for multiplexed sequencing using Illumina sequencing technology. This approach allows for cost-efficient multiplexing of samples and provides a very high yield of fragments with large inserts, or “jumping” fragments. Curr. Protoc. Hum. Genet. 80:7.22.1-7.22.9. © 2014 by John Wiley & Sons, Inc.

PMID: 24789519 [PubMed - as supplied by publisher]

iCall: A genotype-calling algorithm for rare, low-frequency and common variants on the Illumina exome array.

Related Articles

iCall: A genotype-calling algorithm for rare, low-frequency and common variants on the Illumina exome array.

Bioinformatics. 2014 Feb 23;

Authors: Zhou J, Tantoso E, Wong LP, Ong RT, Bei JX, Li Y, Liu J, Khor CC, Teo YY

Abstract

MOTIVATION: Next-generation genotyping microarrays have been designed with insights from 1000 Genomes Project and whole exome-sequencing studies. These arrays additionally include variants that are typically present at lower frequencies. Determining the genotypes of these variants from hybridization intensities is challenging as there is less support to locate the presence of the minor alleles when the allele counts are low. Existing algorithms are mainly designed for calling common variants and are notorious for failing to generate accurate calls for low-frequency and rare variants. Here we introduce a new calling algorithm, iCall, to call genotypes for variants across the whole spectrum of allele frequencies.

RESULTS: We benchmarked iCall against four of the most commonly used algorithms, GenCall, optiCall, illuminus and GenoSNP, as well as a post-processing caller zCall that adopted a two-stage calling design. Normalized hybridization intensities for 12,370 individuals genotyped on the Illumina HumanExome BeadChip were considered, of which 81 individuals were also whole-genome sequenced. The sequence calls were used to benchmark the accuracy of the genotype calling and our comparisons indicated that iCall outperforms all four single-stage calling algorithms in terms of call rates and concordance, particularly in the calling accuracy of minor alleles which is the principal concern for rare and low-frequency variants. The application of zCall to post-process the output from iCall also produced marginally improved performance to the combination of zCall and GenCall.

AVAILABILITY: iCall is implemented in C++ for use on Linux operating systems and is available for download at http://www.statgen.nus.edu.sg/~software/icall.html.

CONTACT: zhoujin@nus.edu.sg, statyy@nus.edu.sg.

PMID: 24567545 [PubMed - as supplied by publisher]

The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing.

Related Articles

The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing.

Hum Genet. 2013 Jun 12;

Authors: Boland JF, Chung CC, Roberson D, Mitchell J, Zhang X, Im KM, He J, Chanock SJ, Yeager M, Dean M

Abstract

We assessed the performance of the new Life Technologies Proton sequencer by comparing whole-exome sequence data in a Centre d’Etude du Polymorphisme Humain trio (family 1463) to the Illumina HiSeq instrument. To simulate a typical user’s results, we utilized the standard capture, alignment and variant calling methods specific to each platform. We restricted data analysis to include the capture region common to both methods. The Proton produced high quality data at a comparable average depth and read length, and the Ion Reporter variant caller identified 96 % of single nucleotide polymorphisms (SNPs) detected by the HiSeq and GATK pipeline. However, only 40 % of small insertion and deletion variants (indels) were identified by both methods. Usage of the trio structure and segregation of platform-specific alleles supported this result. Further comparison of the trio data with Complete Genomics sequence data and Illumina SNP microarray genotypes documented high concordance and accurate SNP genotyping of both Proton and Illumina platforms. However, our study underscored the problem of accurate detection of indels for both the Proton and HiSeq platforms.

PMID: 23757002 [PubMed - as supplied by publisher]

Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array.

Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array.

Epigenetics Chromatin. 2013 Mar 3;6(1):4

Authors: Price EM, Cotton AM, Lam LL, Farré P, Emberly E, Brown CJ, Robinson WP, Kobor MS

Abstract

BACKGROUND: Measurement of genome-wide DNA methylation (DNAm) has become an important avenue for investigating potential physiologically-relevant epigenetic changes. Illumina Infinium (Illumina, San Diego, CA, USA) is a commercially available microarray suite used to measure DNAm at many sites throughout the genome. However, it has been suggested that a subset of array probes may give misleading results due to issues related to probe design. To facilitate biologically significant data interpretation, we set out to enhance probe annotation of the newest Infinium array, the HumanMethylation450 BeadChip (450k), with >485,000 probes covering 99% of Reference Sequence (RefSeq) genes (National Center for Biotechnology Information (NCBI), Bethesda, MD, USA). Annotation that was added or expanded on includes: 1) documented SNPs in the probe target, 2) probe binding specificity, 3) CpG classification of target sites and 4) gene feature classification of target sites. RESULTS: Probes with documented SNPs at the target CpG (4.3% of probes) were associated with increased within-tissue variation in DNAm. An example of a probe with a SNP at the target CpG demonstrated how sample genotype can confound the measurement of DNAm. Additionally, 8.6% of probes mapped to multiple locations in silico. Measurements from these non-specific probes likely represent a combination of DNAm from multiple genomic sites. The expanded biological annotation demonstrated that based on DNAm, grouping probes by an alternative high-density and intermediate-density CpG island classification provided a distinctive pattern of DNAm. Finally, variable enrichment for differentially methylated probes was noted across CpG classes and gene feature groups, dependant on the tissues that were compared. CONCLUSION: DNAm arrays offer a high-throughput approach for which careful consideration of probe content should be utilized to better understand the biological processes affected. Probes containing SNPs and non-specific probes may affect the assessment of DNAm using the 450k array. Additionally, probe classification by CpG enrichment classes and to a lesser extent gene feature groups resulted in distinct patterns of DNAm. Thus, we recommend that compromised probes be removed from analyses and that the genomic context of DNAm is considered in studies deciphering the biological meaning of Illumina 450k array data.

PMID: 23452981 [PubMed - as supplied by publisher]

Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray.

Related Articles

Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray.

Epigenetics. 2013 Jan 11;8(2)

Authors: Chen YA, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R

Abstract

DNA methylation, an important type of epigenetic modification in humans, participates in crucial cellular processes, such as embryonic development, X-inactivation, genomic imprinting and chromosome stability. Several platforms have been developed to study genome-wide DNA methylation. Many investigators in the field have chosen the Illumina Infinium HumanMethylation microarrays for its ability to reliably assess DNA methylation following sodium bisulfite conversion. Here, we analyzed methylation profiles of 489 adult males and 357 adult females generated by the Infinium HumanMethylation450 microarray. Among the autosomal CpG sites that displayed significant methylation differences between the two sexes, we observed a significant enrichment of cross-reactive probes co-hybridizing to the sex chromosomes with more than 94% sequence identity. This could lead investigators to mistakenly infer the existence of significant autosomal sex-associated methylation. Using sequence identity cutoffs derived from the sex methylation analysis, we concluded that 6% of the array probes can potentially generate spurious signals because of co-hybridization to alternate genomic sequences that are highly homologous to the intended targets. Additionally, we discovered probes targeting polymorphic CpGs that overlapped SNPs. The methylation levels detected by these probes are simply the reflection of underlying genetic polymorphisms but could be misinterpreted as true signals. The existence of probes that are cross-reactive or target polymorphic CpGs in the Illumina HumanMethylation microarrays can confound data obtained from these microarrays. Therefore, investigators should exercise caution when significant biological associations are found using these array platforms. A list of all cross-reactive probes and polymorphic CpGs identified by us are annotated in this paper.

PMID: 23314698 [PubMed - as supplied by publisher]

Analyzing Illumina Gene Expression Microarray Data from Different Tissues: Methodological Aspects of Data Analysis in the MetaXpress Consortium.

Analyzing Illumina Gene Expression Microarray Data from Different Tissues: Methodological Aspects of Data Analysis in the MetaXpress Consortium.

PLoS One. 2012;7(12):e50938

Authors: Schurmann C, Heim K, Schillert A, Blankenberg S, Carstensen M, Dörr M, Endlich K, Felix SB, Gieger C, Grallert H, Herder C, Hoffmann W, Homuth G, Illig T, Kruppa J, Meitinger T, Müller C, Nauck M, Peters A, Rettig R, Roden M, Strauch K, Völker U, Völzke H, Wahl S, Wallaschofski H, Wild PS, Zeller T, Teumer A, Prokisch H, Ziegler A

Abstract

Microarray profiling of gene expression is widely applied in molecular biology and functional genomics. Experimental and technical variations make meta-analysis of different studies challenging. In a total of 3358 samples, all from German population-based cohorts, we investigated the effect of data preprocessing and the variability due to sample processing in whole blood cell and blood monocyte gene expression data, measured on the Illumina HumanHT-12 v3 BeadChip array.Gene expression signal intensities were similar after applying the log(2) or the variance-stabilizing transformation. In all cohorts, the first principal component (PC) explained more than 95% of the total variation. Technical factors substantially influenced signal intensity values, especially the Illumina chip assignment (33-48% of the variance), the RNA amplification batch (12-24%), the RNA isolation batch (16%), and the sample storage time, in particular the time between blood donation and RNA isolation for the whole blood cell samples (2-3%), and the time between RNA isolation and amplification for the monocyte samples (2%). White blood cell composition parameters were the strongest biological factors influencing the expression signal intensities in the whole blood cell samples (3%), followed by sex (1-2%) in both sample types. Known single nucleotide polymorphisms (SNPs) were located in 38% of the analyzed probe sequences and 4% of them included common SNPs (minor allele frequency >5%). Out of the tested SNPs, 1.4% significantly modified the probe-specific expression signals (Bonferroni corrected p-value<0.05), but in almost half of these events the signal intensities were even increased despite the occurrence of the mismatch. Thus, the vast majority of SNPs within probes had no significant effect on hybridization efficiency.In summary, adjustment for a few selected technical factors greatly improved reliability of gene expression analyses. Such adjustments are particularly required for meta-analyses.

PMID: 23236413 [PubMed - in process]

Generalization of the normal-exponential model: exploration of a more accurate parametrisation for the signal distribution on Illumina BeadArrays.

Generalization of the normal-exponential model: exploration of a more accurate parametrisation for the signal distribution on Illumina BeadArrays.

BMC Bioinformatics. 2012 Dec 11;13(1):329

Authors: Plancade S, Rozenholc Y, Lund E

Abstract

ABSTRACT: BACKGROUND: Illumina BeadArray technology includes non specific negative control features that allow a precise estimation of the background noise. As an alternative to the background subtraction proposed in BeadStudio which leads to an important loss of information by generating negative values, a background correction method modeling the observed intensities as the sum of the exponentially distributed signal and normally distributed noise has been developed. Nevertheless, Wang and Ye (2012) display a kernel-based estimator of the signal distribution on Illumina BeadArrays and suggest that a gamma distribution would represent a better modeling of the signal density. Hence, the normal-exponential modeling may not be appropriate for Illumina data and background corrections derived from this model may lead to wrong estimation. RESULTS: We propose a more flexible modeling based on a gamma distributed signal and a normal distributed background noise and develop the associated background correction, implemented in the R-package NormalGamma. Our model proves to be markedly more accurate to model Illumina BeadArrays: on the one hand, it is shown on two types of Illumina BeadChips that this model offers a more correct fit of the observed intensities. On the other hand, the comparison of the operating characteristics of several background correction procedures on spike-in and on normal-gamma simulated data shows high similarities, reinforcing the validation of the normal-gamma modeling. The performance of the background corrections based on the normal-gamma and normal-exponential models are compared on two dilution data sets, through testing procedures which represent various experimental designs. Surprisingly, we observe that the implementation of a more accurate parametrisation in the model-based background correction does not increase the sensitivity. These results may be explained by the operating characteristics of the estimators: the normal-gamma background correction offers an improvement in terms of bias, but at the cost of a loss in precision. CONCLUSIONS: This paper addresses the lack of fit of the usual normal-exponential model by proposing a more flexible parametrisation of the signal distribution as well as the associated background correction. This new model proves to be considerably more accurate for Illumina microarrays, but the improvement in terms of modeling does not lead to a higher sensitivity in differential analysis. Nevertheless, this realistic modeling makes way for future investigations, in particular to examine the characteristics of pre-processing strategies.

PMID: 23231059 [PubMed - as supplied by publisher]

Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis.

Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis.

BMC Med Genomics. 2012 Aug 21;5(1):35

Authors: Turnbull AK, Kitchen RR, Larionov AA, Renshaw L, Dixon JM, Sims AH

Abstract

ABSTRACT: BACKGROUND: Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis. RESULTS: Using Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN), significantly outperform mean-centering and distance-weighted discrimination (DWD) in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets. CONCLUSION: Normalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.

PMID: 22909195 [PubMed - as supplied by publisher]

Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms.

Related Articles

Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms.

Nucleic Acids Res. 2005;33(18):5914-23

Authors: Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P

Abstract

The growth in popularity of RNA expression microarrays has been accompanied by concerns about the reliability of the data especially when comparing between different platforms. Here, we present an evaluation of the reproducibility of microarray results using two platforms, Affymetrix GeneChips and Illumina BeadArrays. The study design is based on a dilution series of two human tissues (blood and placenta), tested in duplicate on each platform. The results of a comparison between the platforms indicate very high agreement, particularly for genes which are predicted to be differentially expressed between the two tissues. Agreement was strongly correlated with the level of expression of a gene. Concordance was also improved when probes on the two platforms could be identified as being likely to target the same set of transcripts of a given gene. These results shed light on the causes or failures of agreement across microarray platforms. The set of probes we found to be most highly reproducible can be used by others to help increase confidence in analyses of other data sets using these platforms.

PMID: 16237126 [PubMed - indexed for MEDLINE]