A Cache-based Data Movement Infrastructure for On-demand Scientific Cloud Computing

  • David Abramson1
  • Jake Carroll1
  • Chao Jin1
  • Michael Mallon1
  • Zane van Iperen1
  • Hoang Nguyen1
  • Allan McRae1
  • Liang Ming2
  1. 1The University of Queensland, St Lucia, QLD 4072, Australia
  2. 2Huawei Technologies Co., Ltd., Shenzhen, China

2019

Asian Conference on Supercomputing Frontiers 2019

https://doi.org/10.1007/978-3-030-18645-6_3

Abstract

As cloud computing has become the de facto standard for big data processing, there is interest in using a multi-cloud environment that combines public cloud resources with private on-premise infrastructure. However, by decentralizing the infrastructure, a uniform storage solution is required to provide data movement between different clouds to assist on-demand computing. This paper presents a solution based on our earlier work, the MeDiCI (Metropolitan Data Caching Infrastructure) architecture. Specially, we extend MeDiCI to simplify the movement of data between different clouds and a centralized storage site. It uses a hierarchical caching system and supports most popular infrastructure-as-a-service (IaaS) interfaces, including Amazon AWS and OpenStack. As a result, our system allows the existing parallel data intensive application to be offloaded into IaaS clouds directly. The solution is illustrated using a large bioinformatics application, a Genome Wide Association Study (GWAS), with Amazons AWS, HUAWEI Cloud, and a private centralized storage system. The system is evaluated on Amazon AWS and the Australian national cloud.

Downloads

Paper

Original