The execution of Spark applications is based on the execution order and parallelism of the different jobs, given data and available resources. Spark reifies these dependencies in a graph that we refer to as the (parallel) execution plan of the application. All the approaches that have studied the estimation of the execution times and the dynamic provisioning of resources for this kind of applications have always assumed that the execution plan is unique, given the computing resources at hand. This assumption is at least simplistic for applications that include conditional branches or loops and limits the precision of the prediction techniques. This paper introduces SEEPEP, a novel technique based on symbolic execution and search-based test generation, that: i) automatically extracts the possible execution plans of a Spark application, along with dedicated launchers with properly synthesized data that can be used for profiling, and ii) tunes the allocation of resources at runtime based on the knowledge of the execution plans for which the path conditions hold. The assessment we carried out shows that SEEPEP can effectively complement dynaSpark, an extension of Spark with dynamic resource provisioning capabilities, to help predict the execution duration and the allocation of resources.
Baresi, L., Denaro, G., Quattrocchi, G. (2019). Symbolic execution-driven extraction of the parallel execution plans of Spark applications. In ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp.246-256). 1515 BROADWAY, NEW YORK, NY 10036-9998 USA : Association for Computing Machinery, Inc [10.1145/3338906.3338973].
Symbolic execution-driven extraction of the parallel execution plans of Spark applications
Denaro G.Membro del Collaboration Group
;
2019
Abstract
The execution of Spark applications is based on the execution order and parallelism of the different jobs, given data and available resources. Spark reifies these dependencies in a graph that we refer to as the (parallel) execution plan of the application. All the approaches that have studied the estimation of the execution times and the dynamic provisioning of resources for this kind of applications have always assumed that the execution plan is unique, given the computing resources at hand. This assumption is at least simplistic for applications that include conditional branches or loops and limits the precision of the prediction techniques. This paper introduces SEEPEP, a novel technique based on symbolic execution and search-based test generation, that: i) automatically extracts the possible execution plans of a Spark application, along with dedicated launchers with properly synthesized data that can be used for profiling, and ii) tunes the allocation of resources at runtime based on the knowledge of the execution plans for which the path conditions hold. The assessment we carried out shows that SEEPEP can effectively complement dynaSpark, an extension of Spark with dynamic resource provisioning capabilities, to help predict the execution duration and the allocation of resources.File | Dimensione | Formato | |
---|---|---|---|
3338906.3338973.pdf
Solo gestori archivio
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Dimensione
1.11 MB
Formato
Adobe PDF
|
1.11 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.