Speedup-Test: Statistical Methodology to Evaluate Program Speedups and their Optimisation Techniques
- Sid-Ahmed-Ali Touati, university of Versailles Saint-Quentin en Yvelines
- Grigori Fursin, INRIA-Saclay
1. General introduction:
2. Common observed non rigorous experimental methodology
3. Different kinds of observed speedups
4. Checking the statistical significance of the speedup of the average execution time
5. Checking the statistical significance of the speedup of the median execution time
6. Proportion of accelerated benchmarks
7. Conclusion on the Speedup-Test protocole
8. Tool presentation
9. Using collaborative research tools with common API and collective optimization database to improve the quality and reproducibility of research (with demo)
Description
Numerous code optimisation methods are usually experimented by doing multiple observations of the initial and the optimised executions times in order to declare a speedup. Even with fixed input and execution environment, programs executions times vary in general, especially for toy/kernel benchrmaks. With the introduction of multi-core architectures, execution times variability is becoming increasingly unstable. So hence different kinds of speedups may be reported: the speedup of the average execution time, the speedup of the minimal execution time, the speedup of the median, etc. Many published speedups in the literature are observations of a set of experiments that do not guarantee reproducibility. In order to improve the reproducibility of the experimental results, this tutorial presents a rigorous statistical methodology regarding program performance analysis. We rely on well known statistical tests (Shapiro-wilk's test, Fisher's F-test, Student's t-test, Kolmogorov-Smirnov's test, Wilcoxon-Mann-Whitney's test) to study if the observed speedups are statistically significant or not. By fixing $0<\alpha<1$ a desired risk level, we are able to analyse the statistical significance of the average execution time as well as the median. We can also check if P(X>Y)>1/2, the probability that an individual execution of the optimised code is faster than the individual execution of the initial code. Our methodology defines a consistent improvement compared to the usual performance analysis method in high performance computing as in \cite{Jain:1991:ACS,lilja:book}. We explain in each situation what are the hypothesis that must be checked to declare a correct risk level for the statistics. The Speedup-Test protocol certifying the observed speedups with rigorous statistics is implemented and will distributed for the tutorial as an open source tool based on the R software.
Links: