Towards Seamless Configuration Tuning of Big Data Analytics


The execution of distributed data processing workloads (such as those running on top of Hadoop or Spark) in cloud environments presents a unique opportunity to explore multiple trade-offs between elasticity (and types of resources being allocated), overall runtime and total costs. However, beyond high-level constraints and objectives, it’s not the end-users who should be mainly concerned with those optimizations, but the cloud providers. They have both the vantage point to collect actionable information, economies of scale and position to adjust parameters when dynamic conditions change, in order to fulfil SLOs that go beyond classic measures of latency and throughput. This is at odds with the existing approach of making software (including the interfaces to the cloud and the processing frameworks) as configurable as possible. We propose that rather than configurability, self-tunability (or the illusion of it as far as the end-user is concerned) is a better long-term goal.

In International Conference on Distributed Computing Systems (ICDCS 2019), IEEE.