Thursday, September 17, 2009

The Efficient Storage of Text Documents in Digital Libraries, by Przemysław Skibiński, et al

In  this  paper  we  investigate  the  possibility  of  improving  the  efficiency  of  data  compression,  and  thus  reducing  storage  requirements,  for  seven  widely  used  text document  formats.  We  propose  an  open-source  text compression  software  library,  featuring  an  advanced word-substitution  scheme  with  static  and  semidynamic word dictionaries. The empirical results show an average storage space reduction as high as 78 percent compared to uncompressed documents, and as high as 30 percent compared to documents compressed with the free compression software gzip.