aHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters (Nov-2017)
The main aim of this project is to propose an erasure-coded data archival system called aHDFS for Hadoop clusters, where codes are employed to archive data replicas in the Hadoop distributed file system or HDFS.
To tackle this problem, we develop an erasure-coded data archival system called aHDFS, which archives rare accessed data in large-scale data centers to minimize storage cost. One way to reduce storage cost is to convert a 3Xreplica-based storage system into an erasure-coded storage. It makes sense to maintain 3X replicas for frequently accessed data. Importantly, managing non-popular data using erasure-coded schemes facilitates savings in storage capacity without adversely imposing performance penalty. A significant portion of data in data centers are considered as non-popular data, because data have an inevitable trend of decreasing access frequencies. Evidence shows that most of data are accessed within a short duration of the data’s lifetime.
The following three factors motivate us to develop the erasure-code-based archival system for Hadoop clusters:
o A pressing need to lower storage cost,
o High cost-effectiveness of erasure-code storage and,
o The popularity of Hadoop computing platforms.