文章摘要
Nie Xuejun(聂雪军),Qin Leihua,Zhou Jingli.[J].高技术通讯(英文),2012,18(1):45~50
A content aware chunking scheme for data de-duplication in archival storage systems①
  
DOI:
中文关键词: 
英文关键词: data de-duplicate, content aware chunking (CAC), candidate anchor histogram (CAH)
基金项目:
Author NameAffiliation
Nie Xuejun(聂雪军)  
Qin Leihua  
Zhou Jingli  
Hits: 865
Download times: 0
中文摘要:
      
英文摘要:
      Based on variable sized chunking, this paper proposes a content aware chunking scheme, called CAC, that does not assume fully random file contents, but tonsiders the characteristics of the file types. CAC uses a candidate anchor histogram and the file-type specific knowledge to refine how anchors are determined when performing de-duplication of file data and enforces the selected average chunk size. CAC yields more chunks being found which in turn produces smaller average chunks and a better reduction in data. We present a detailed evaluation of CAC and the experimental results show that this scheme can improve the compression ratio chunking for file types whose bytes are not randomly distributed (from 11.3% to 16.7% according to different datasets), and improve the write throughput on average by 9.7%.
View Full Text   View/Add Comment  Download reader
Close

分享按钮