Home
Geo Distributed Analytics Architecture
tags: #geo-distributed
- Architecture of geo-distributed-systems
- There is a central master where queries are submitted; this is for SparkSQL, HiveQL, Pig Latin, etc
- Query Optimizer at master is responsible for preparing Query Execution plan.
- There is also a centralized scheduler which places tasks in Nodes
- placement is based on resource availability and task dependencies (upstream/ downstream, etc)