An “idealized” dataset of RNA-seq reads was created using the polyester R package (Frazee et al., 2015). This simulated dataset is clean of technical biases and fold changes between isoforms are known, allowing testing of the sensitivity limits of the method in the absence of external factors. The simulation experiment comprises 20 samples, 10 in each of two conditions. For the "main" dataset, polyadenylation sites splitting each transcript into two isoforms (short and long) were obtained from the poly(A) site atlas (Gruber et al., 2016) for 11000 human transcripts. Each isoform (“short” and “short + long”) was simulated as a different transcript. The expression of the “shot + long” isoform was unchanged between conditions, whereas eleven different fold changes were applied between conditions for the “short” isoform in order to produce a range of different ratios, R. Hence, each fold change is represented by ~ 1000 transcripts in the dataset. Additionally, for each fold change category we assigned 100 different mean expression levels (from 100 to 1000) with the aim of sampling the effect of the expression level on the ability of the method to detect alternative polyadenylation events. For the "biased" dataset, the aim was to create a scenario where fold changes between two conditions are confounded by the presence of an additional factor. In the specific example set up, we created an imbalanced dataset with 1000 transcripts where male and female-origin samples are present in unequal numbers in the control (7 males and 3 females) and condition (3 males and 7 females) groups. Although the group membership for the factor of interest (condition) plays no role in the choice of polyadenylation site of these transcripts, membership to male or female group does, confounding the outcome of methods that do not take into account additional covariates.