Nie Xuejun(聂雪军),Qin Leihua,Zhou Jingli.[J].高技术通讯(英文),2012,18(1):45~50 |
|
A content aware chunking scheme for data de-duplication in archival storage systems① |
|
DOI: |
中文关键词: |
英文关键词: data de-duplicate, content aware chunking (CAC), candidate anchor histogram (CAH) |
基金项目: |
Author Name | Affiliation | Nie Xuejun(聂雪军) | | Qin Leihua | | Zhou Jingli | |
|
Hits: 865 |
Download times: 0 |
中文摘要: |
|
英文摘要: |
Based on variable sized chunking, this paper proposes a content aware chunking scheme, called CAC, that does not assume fully random file contents, but tonsiders the characteristics of the file types. CAC uses a candidate anchor histogram and the file-type specific knowledge to refine how anchors are determined when performing de-duplication of file data and enforces the selected average chunk size. CAC yields more chunks being found which in turn produces smaller average chunks and a better reduction in data. We present a detailed evaluation of CAC and the experimental results show that this scheme can improve the compression ratio chunking for file types whose bytes are not randomly distributed (from 11.3% to 16.7% according to different datasets), and improve the write throughput on average by 9.7%. |
View Full Text
View/Add Comment Download reader |
Close |
|
|
|