TY - JOUR
T1 - PEFS: AI-driven Prediction based Energy-aware Fault-tolerant Scheduling Scheme for Cloud Data Center
AU - Marahatta, Avinab
AU - Xin, Qin
AU - Chi, Ce
AU - Zhang, Fa
AU - Liu, Zhiyong
PY - 2020/8/11
Y1 - 2020/8/11
N2 - Cloud data centers (CDCs) have become increasingly popular and widespread in recent years with the growing popularity of cloud computing and high-performance computing. Due to the multi-step computation of data streams and heterogeneous task dependencies, task failure frequently occurs, resulting in poor user experience and additional energy consumption. To reduce task execution failure as well as energy consumption, we propose a novel AI-driven energy-aware proactive fault-tolerant scheduling scheme for CDCs in this paper. Firstly, a prediction model based on the machine learning approach is trained to classify the arriving tasks into "failure-prone tasks" and "non-failure-prone tasks" according to the predicted failure rate. Then, two efficient scheduling mechanisms are proposed to allocate two types of tasks to the most appropriate hosts in a CDC. The vector reconstruction method is developed to construct super tasks from failure-prone tasks and separately schedule these super tasks and non-failure-prone tasks to the most suitable physical host. All the tasks are scheduled in an earliest-deadline-first manner. Our evaluation results show that the proposed scheme can intelligently predict task failure and achieves better fault tolerance and reduces total energy consumption better than the existing schemes.
AB - Cloud data centers (CDCs) have become increasingly popular and widespread in recent years with the growing popularity of cloud computing and high-performance computing. Due to the multi-step computation of data streams and heterogeneous task dependencies, task failure frequently occurs, resulting in poor user experience and additional energy consumption. To reduce task execution failure as well as energy consumption, we propose a novel AI-driven energy-aware proactive fault-tolerant scheduling scheme for CDCs in this paper. Firstly, a prediction model based on the machine learning approach is trained to classify the arriving tasks into "failure-prone tasks" and "non-failure-prone tasks" according to the predicted failure rate. Then, two efficient scheduling mechanisms are proposed to allocate two types of tasks to the most appropriate hosts in a CDC. The vector reconstruction method is developed to construct super tasks from failure-prone tasks and separately schedule these super tasks and non-failure-prone tasks to the most suitable physical host. All the tasks are scheduled in an earliest-deadline-first manner. Our evaluation results show that the proposed scheme can intelligently predict task failure and achieves better fault tolerance and reduces total energy consumption better than the existing schemes.
U2 - 10.1109/TSUSC.2020.3015559
DO - 10.1109/TSUSC.2020.3015559
M3 - Article
SP - 106152
EP - 106168
JO - IEEE Transactions on Sustainable Computing
JF - IEEE Transactions on Sustainable Computing
ER -