However, the statistical analysis that SPT is based on can require many runs of the application for the analysis. This may be undesirable for long running applications. For example, for a program that runs for several hours, it may not be feasible to execute the program 200 times to provide sufficient analysis. In addition, large programs may have proportionately more points that the user may wish to analyze, requiring additional runs of the program. Production parallel programs are often large and long running, compounding the problem. High Performance parallel machines may also be in high demand making excessive number of runs less desirable.
SPT has several advantages that mitigate some of the above concerns (e.g., because SPT does not change the computed results, the experimental runs can be actual production runs producing useful results). However, minimizing the number of runs is a central concern for performance tuning systems based on SPT. The proposed work will augment SPT techniques via analysis of parallel programs to:
DPM techniques will be used to
develop pre-processors and post-processors to ``wrap''
around the SPT ``engine''.
The pre-processing tools will identify potentially important factors by
statically (and dynamically) analyzing program structure.
The synchronization structure of the program will also be analyzed
to determine which synchronizations are likely to play an important role in
determining overall execution time. The amount of idle time incurred due
to each of these synchronizations are additional responses that can be
analyzed by SPT.
Post-processing tools will be used to interpret the results of SPT
experiments and to automatically direct future experimentation.
The additional knowledge to be gained from analysis of program structure
and behavior has the potential to substantially augment and enhance
S-Check and SPT analysis.