Since its emergence a little more than a decade ago, we have witnessed how RNA-Seq technology has RNA-Seq technology technology has revolutionized our knowledge of the transcriptome. This methodology performs gene expression estimation based on the number of RNA fragments from sequencing that align with a given region of a reference genome or transcriptome.
Differential expression analysis compares transcriptional activity between various conditions of interest in order to identify which genes are responsible for the differences between them. It should be kept in mind that in this type of experiments there is a noise inherent to the RNA-Seq methodology itself resulting from the combination of biological variability and that generated during sample processing and sequencing. Therefore, an appropriate experimental design is required to counteract this drawback and guarantee an accurate interpretation of the results.
In general, a larger number of replicates yields statistically more robust data, allowing differentially expressed genes to be identified and expression levels to be estimated more accurately. Therefore, the ideal would be to start with the largest possible number of samples per condition in order to be able to discard those in which the results are not adequate. However, mainly due to economic and time constraints, it is not always easy to obtain replicates. Although the use of 3 is often suggested, this is not applicable to all types of projects.
- If you want to compare expression differences between different groups, you need a minimum of 3 samples per condition, as long as you expect little variation between them, as is often the case with samples from bacterial cultures. In contrast, if they are of human origin, the variation between them is greater and it is recommended to use at least 6 biological replicates to increase statistical power.
- If, on the other hand, the aim is to identify biologically significant differentially expressed genes, or to characterise transcripts and/or splicing variants, it is preferable to increase the number of samples to 12 per condition. This ensures a sufficient number to obtain quality results even if some have to be eliminated.
- If you are studying genes with very low expression or want to compare differential expression at the isoform level, in addition to having a minimum of 6 replicates, it is necessary to increase the sequencing depth.
In summary, although the decision on how many replicates to use is ultimately up to the researcher, increasing the number always leads to an increase in sensitivity and specificity, and ultimately in the quality of the results.
- Lamarre S. et al. (2018) Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size. Front Plant Sci.
- Schurch NJ. et al. (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA.
- Gierliński M. et al. (2015) Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment. Bioinformatics.
- Wang Z. et al. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet.