LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS

Artur Ferreira; Arlindo Oliveira; Mario Figueiredo

doi:10.34629/ipl.isel.i-ETC.6

LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS

Artur Ferreira, Arlindo Oliveira, Mario Figueiredo

Abstract

The sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have been used to speedup these searches, at the expense of memory usage. Previous work has shown how suffix arrays (SA) can be used for dictionary representation and LZ77 decomposition. In this paper, we improve over that work by proposing a new efficient algorithm to update the sliding window each time a token is produced at the output. The proposed algorithm toggles between two SA on consecutive tokens. The resulting SA-based encoder requires less memory than the conventional tree-based encoders. In comparing our SA-based technique against tree-based encoders, on a large set of benchmark files, we find that, in some compression settings, our encoder is also faster than tree-based encoders.

Keywords

Lempel-Ziv compression; suffix arrays; sliding window update; substring search

Full Text:

PDF

DOI: http://dx.doi.org/10.34629/ipl.isel.i-ETC.6

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Username
Password
Remember me