Home
Tetrium
tags: #geo-distributed #article
source: Wide-Area Analytics with Multiple Resources [link]
- Problem
- Past approaches:
- has been to locate map task in the sites with input data and placing reduce tasks there to minimize shuffle time.
- based on assumption that all sites have infinite resources to run the job which is wrong.
- only fraction of reduce task execute at a time due to constraints on compute resources
- has been to locate map task in the sites with input data and placing reduce tasks there to minimize shuffle time.
- Approach/Intuition
- Allocate multiple resources (compute and network slots) to analytics jobs with parallel tasks on geo-distributed systems with variability
- Past approaches: