Information about http://www.gzip.org/zlib/rfc1951.pdf

Network Working Group …

Tags: aladdin enterprises, compilations, compression methods, copyright notice, dard, deletions, general purpose, huffman coding, input data stream, intellectual property rights, intermediate storage, internet community, lz77 algorithm, network working group, peter deutsch, png, request for comments, rfc 1951, substantive changes, url ftp,
Pages: 15
Language: english
Created: Wed Feb 9 11:13:01 1910
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Page 5
image
Page 6
image
Page 7
image
Page 8
image
Page 9
image
Page 10
image
Page 11
image
Page 12
image
Page 13
image
Page 14
image
Page 15
image
Network Working Group                                                                            P. Deutsch
Request for Comments: 1951                                                               Aladdin Enterprises
Category: Informational                                                                           May 1996


    DEFLATE Compressed Data Format Specification version 1.3


Status of This Memo

This memo provides information for the Internet community. This memo does not specify an Internet stan-
dard of any kind. Distribution of this memo is unlimited.


IESG note:

The IESG takes no position on the validity of any Intellectual Property Rights statements contained in this
document.


Notices

Copyright c 1996 L. Peter Deutsch
Permission is granted to copy and distribute this document for any purpose and without charge, including
translations into other languages and incorporation into compilations, provided that the copyright notice and
this notice are preserved, and that any substantive changes or deletions from the original are clearly marked.
A pointer to the latest version of this and related documentation in HTML format can be found at the URL
 ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html .


Abstract

This specification defines a lossless compressed data format that compresses data using a combination of
the LZ77 algorithm and Huffman coding, with efficiency comparable to the best currently available general-
purpose compression methods. The data can be produced or consumed, even for an arbitrarily long sequen-
tially presented input data stream, using only an a priori bounded amount of intermediate storage. The format
can be implemented readily in a manner not covered by patents.




Deutsch                                        Informational                                         [Page 1]
RFC 1951                       DEFLATE Compressed Data Format Specification                                                   April 1996


Contents
1         Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .    .   .    2
      1.1     Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .    .   .    2
      1.2     Intended audience . . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .    .   .    2
      1.3     Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .    .   .    2
      1.4     Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .    .   .    3
      1.5      Definitions of terms and conventions used . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .    .   .    3
      1.6     Changes from previous versions . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .    .   .    3
2         Compressed representation overview . . . . . . . . . . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .    .   .    3
3         Detailed specification . . . . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .    .   .    4
      3.1     Overall conventions . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .    .   .    4
             3.1.1 Packing into bytes . . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .    .   .    4
      3.2     Compressed block format . . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .    .   .    5
             3.2.1 Synopsis of prefix and Huffman coding . . . . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .    .   .    5
             3.2.2 Use of Huffman coding in the "deflate" format . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .    .   .    6
             3.2.3 Details of block format . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .    .   .    7
             3.2.4 Non-compressed blocks (BTYPE=00) . . . . . . . . . . .             .   .   .   .   .   .   .   .   .   .   .   .    .   .    9
             3.2.5 Compressed blocks (length and distance codes) . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .    .   .    9
             3.2.6 Compression with fixed Huffman codes (BTYPE=01) . .                .   .   .   .   .   .   .   .   .   .   .   .    .   .   10
             3.2.7 Compression with dynamic Huffman codes (BTYPE=10)                  .   .   .   .   .   .   .   .   .   .   .   .    .   .   10
      3.3     Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .    .   .   12
4         Compression algorithm details . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .    .   .   12
5         References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .    .   .   13
6         Security Considerations . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .    .   .   13
7         Source code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .    .   .   13
8         Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .    .   .   14
9         Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .    .   .   14


1 Introduction

1.1     Purpose

The purpose of this specification is to define a lossless compressed data format that:
        Is independent of CPU type, operating system, file system, and character set, and hence can be used
        for interchange;
        Can be produced or consumed, even for an arbitrarily long sequentially presented input data stream,
        using only an a priori bounded amount of intermediate storage, and hence can be used in data com-
        munications or similar structures such as Unix filters;




Deutsch                                            Informational                                                                      [Page 2]
RFC 1951                     DEFLATE Compressed Data Format Specification                             April 1996


      Compresses data with efficiency comparable to the best currently available general-purpose compres-
      sion methods, and in particular considerably better than the "compress" program;
      Can be implemented readily in a manner not covered by patents, and hence can be practiced freely;
      Is compatible with the file format produced by the current widely used gzip utility, in that conforming
      decompressors will be able to read data produced by the existing gzip compressor.
The data format defined by this specification does not attempt to:
      Allow random access to compressed data;
      Compress specialized data (e.g., raster graphics) as well as the best currently available specialized al-
      gorithms.
A simple counting argument shows that no lossless compression algorithm can compress every possible input
data set. For the format defined here, the worst case expansion is 5 bytes per 32K-byte block, i.e., a size
increase of 0.015% for large data sets. English text usually compresses by a factor of 2.5 to 3; executable
files usually compress somewhat less; graphical data such as raster images may compress much more.


1.2   Intended audience

This specification is intended for use by implementors of software to compress data into "deflate" format
and/or decompress data from "deflate" format.
The text of the specification assumes a basic background in programming at the level of bits and other prim-
itive data representations. Familiarity with the technique of Huffman coding is helpful but not required.


1.3   Scope

The specification specifies a method for representing a sequence of bytes as a (usually shorter) sequence of
bits, and a method for packing the latter bit sequence into bytes.


1.4   Compliance

Unless otherwise indicated below, a compliant decompressor must be able to accept and decompress any data
set that conforms to all the specifications presented here; a compliant compressor must produce data sets that
conform to all the specifications presented here.


1.5    Definitions of terms and conventions used

Byte: 8 bits stored or transmitted as a unit (same as an octet). For this specification, a byte is exactly 8 bits,
even on machines which store a character on a number of bits different from eight. See below, for the num-
bering of bits within a byte.

Deutsch                                          Informational                                          [Page 3]
RFC 1951                     DEFLATE Compressed Data Format Specification                         April 1996


String: a sequence of arbitrary bytes.


1.6   Changes from previous versions

There have been no technical changes to the deflate format since version 1.1 of this specification. In version
1.2, some terminology was changed. Version 1.3 is a conversion of the specification to RFC style.


2 Compressed representation overview

A compressed data set consists of a series of blocks, corresponding to successive blocks of input data. The
block sizes are arbitrary, except that non-compressible blocks are limited to 65,535 bytes.
Each block is compressed using a combination of the LZ77 algorithm and Huffman coding. The Huffman
trees for each block are independent of those for previous or subsequent blocks; the LZ77 algorithm may use
a reference to a duplicated string occurring in a previous block, up to 32K input bytes before.
Each block consists of two parts: a pair of Huffman code trees that describe the representation of the com-
pressed data part, and a compressed data part. (The Huffman trees themselves are compressed using Huffman
encoding.) The compressed data consists of a series of elements of two types: literal bytes (of strings that
have not been detected as duplicated within the previous 32K input bytes), and pointers to duplicated strings,
where a pointer is represented as a pair length, backward distance . The representation used in the "de-
flate" format limits distances to 32K bytes and lengths to 258 bytes, but does not limit the size of a block,
except for uncompressible blocks, which are limited as noted above.
Each type of value (literals, distances, and lengths) in the compressed data is represented using a Huffman
code, using one code tree for literals and lengths and a separate code tree for distances. The code trees for
each block appear in a compact form just before the compressed data for that block.


3     Detailed specification

3.1   Overall conventions

In the diagrams below, a box like this:
      +---+
      |   |