Home

problems and approaches (geo-distributed systems)

tags: #geo-distributed

  • Problems and Approaches in [[geo-distributed-systems]]
    • low response times(latency) for analytics queries issued across all geo-distributed sites
      • Most common approach is loading relevant data to a datacenter and centrally executing it.
        • Not ideal as this is slow and we might need (near) real time analysis
      • Latency depends heavily on Query Execution Plan
        • But query optimizers are not aware
      • There also could be bottlenecks at the runtime
        • lube - a system framework that detects and minimize bottlenecks
      • Data transfer issue across sites & network Performance bottleneck
        • running queries over geo-distributed inputs using the current intra-DC analytics frameworks
          • also leads to high latency because these frameworks cannot cope with the relatively low and variable capacity of WAN links
          • Also Expensive to transfer all data to a site
            • iridium- WAN-aware input data and task placement for two-stage MapReduce jobs
            • clarinet - A WAN-aware query optimizer attempts to work on it.
        • Running parallel jobs across geo-distributed sites
      • network transfer cost also need to be considered
        • kimchi - a network cost aware geo-distributed analytics system