Gzip

From Crypto++ Wiki
Jump to navigation Jump to search
Gzip
Documentation
#include <cryptopp/gzip.h>

Gzip is a lossless compression format standardized in RFC 1952, GZIP file format specification. Gzip is actually a file format with additional metadata (like original filename, file modified time and comments), and the underlying compression occurs using Deflator from RFC 1951. Crypto++ provides GZIP compression through the Gzip class, and decompression though the Gunzip class.

The Gzip compressor takes a pointer to a BufferedTransformation. Because a pointer is taken, the Gzip owns the attached transformation, and therefore will destroy it. See ownership for more details.

The Crypto++ implementation supports filenames, comments and filetimes as of Crypto++ 6.0. The support was added under Issue 420, Add Gzip Filename, Filetime and Comment support. You will have to patch the library for filenames, comments and filetimes for Crypto++ 5.6.5 and below.

The Gzip class inherits from the Deflator class (which provides the RFC 1951 implementation), so many of the constants used by Gzip are provided by Deflator in zdeflate.h.

Construction

Gzip (BufferedTransformation *attachment=NULL,
      unsigned int deflateLevel=DEFAULT_DEFLATE_LEVEL,
      unsigned int log2WindowSize=DEFAULT_LOG2_WINDOW_SIZE,
      bool detectUncompressible=true)

Gzip (const NameValuePairs &parameters,
      BufferedTransformation *attachment=NULL)

attachment is a BufferedTransformation, such as another filter or sink. If attachment is NULL, then the Gzip object will internally accumulate the output byte stream.

deflateLevel is the deflation level. The value should be between 0 and 9. 0 provides minimum compression and executes quickly, while 9 provides maximum compression and executes the slowest. zdeflate.h provides some constants for the deflateLevel. MIN_DEFLATE_LEVEL is 0, DEFAULT_DEFLATE_LEVEL is 6, and MAX_DEFLATE_LEVEL is 9.

log2WindowSize controls the table size used for compression. The value should be between 9 and 15, meaning the table will be between 29 and 215. 9 provides the smallest table size, while 15 provides the largest table size. zdeflate.h provides some constants for the log2WindowSize. MIN_LOG2_WINDOW_SIZE is 9, DEFAULT_LOG2_WINDOW_SIZE is 15, and MAX_LOG2_WINDOW_SIZE is 15.

detectUncompressible means the library should try to detect if a file is uncompressible. From zdeflate.h, detectUncompressible makes it faster to process uncompressible files, but if a file has both compressible and uncompressible parts, it may fail to compress some of the compressible parts.

parameters are NameValuePairs used in the alternate constructor. The names recognized are Log2WindowSize, DeflateLevel and DetectUncompressible.

Sample Programs

The following is a small collection of sample programs to demonstrate using the Gzip compressor.

In-memory String

string data = "abcdefghijklmnopqrstuvwxyz";
string compressed;

Gzip zipper(new StringSink(compressed));
zipper.Put((byte*) data.data(), data.size());
zipper.MessageEnd();

On-disk File

string filename("test.txt.gz");
string data = "abcdefghijklmnopqrstuvwxyz";

Gzip zipper(new FileSink(filename.c_str(), true));
zipper.Put((byte*) data.data(), data.size());
zipper.MessageEnd();

String using Pipeline

string data = "abcdefghijklmnopqrstuvwxyz";
string compressed;
        
StringSource ss(data, true,
    new Gzip(
        new StringSink(compressed)
));

File using Pipeline

string filename("test.txt.gz");
string data = "abcdefghijklmnopqrstuvwxyz";
        
StringSource ss(data, true,
    new Gzip(
        new FileSink(filename.c_str(), true)
));

String using Put/Get

Gzip zipper;
zipper.Put((byte*)data.data(), data.size());
zipper.MessageEnd();
        
word64 avail = zipper.MaxRetrievable();
if(avail)
{
    string compressed;
    compressed.resize(avail);
            
    zipper.Get((byte*)&compressed[0], compressed.size());
}

Array using Put/Get

Gzip zipper;
zipper.Put((byte*)data.data(), data.size());
zipper.MessageEnd();
        
word64 avail = zipper.MaxRetrievable();
if(avail)
{
    vector<byte> compressed;
    compressed.resize(avail);
            
    zipper.Get(&compressed[0], compressed.size());
}

Patch

The patch below adds the ability to read and write the original filename, the modified filetime and comments for an archive. The sample program below shows how it could be used.

try {
        
    string filename("test.txt.gz"), s1, s2;
    string data = "abcdefghijklmnopqrstuvwxyz";
        
    // Create a compressor, save stream to memory via 's1'
    Gzip zipper(new StringSink(s1));

    // Add some Gzip specific fields
    zipper.SetFilename(filename);
    zipper.SetFiletime((word32)time(0));
    zipper.SetComment("This is a test of filenames and comments");
    
    // Write the data to the stream
    zipper.Put((byte*) data.c_str(), data.size());
    zipper.MessageEnd();
        
    // Save the compressed data to a file
    FileSink fs(filename.c_str(), true);
    fs.Put((byte*) s1.data(), s1.size());
    fs.MessageEnd();
    
    // Create a decompressor, save stream to memory via 's2'
    Gunzip unzipper(new StringSink(s2));

    // Add the compressed data to it
    unzipper.Put( (unsigned char*) s1.data(), s1.size());
    unzipper.MessageEnd();
        
    // Print the Gzip specific data
    cout << "Filename: " << unzipper.GetFilename() << endl;
    cout << "Filetime: " << unzipper.GetFiletime() << endl;
    cout << "Comment: " << unzipper.GetComment() << endl;

    // Print the uncompressed stream
    cout << "Data: " << s2 << endl;        
}
catch(CryptoPP::Exception& ex)
{
    cerr << ex.what() << endl;
}

A typical run of the program is showed below.

$ ./cryptopp-test.exe
Filename: test.txt.gz
Filetime: 1420337339
Comment: This is a test of filenames, filetimes and comments
Data: abcdefghijklmnopqrstuvwxyz

Saving to the original filename with a pipeline using Crypto++ can be tricky because the original filename is not available when the FileSink is created. Here's one way to do it:

// Create a decompressor, save stream to ByteQueue'
ByteQueue queue;
Gunzip unzipper(new Redirector(queue));

// Add the compressed data to it
unzipper.Put( (unsigned char*) compressed.data(), compressed.size());
unzipper.MessageEnd();

FileSink fs(unzipper.GetFilename().c_str(), true);
queue.TransferTo(fs);
fs.MessageEnd();

To unpack the archive using the original filename from the command line, you would use gunzip -N. It can be tested by renaming test.txt.gz to something else, like test.gz.

And a view of the archive under The Archive Browser:

Gzip-archive-with-patch.png

Note: The Archive Browser on OS X displays the implicit filename (the archive name without the gz extension), and not the original filename embedded in the header. Also see Issue 802: The Archive Browser does not honor original filename field in a GZIP header.

Downloads

gzip.diff.zip - patch that adds the ability to set and retrieve the original filename, the modified filetime and comments on a GZIP archive. The ZIP includes the diff of changes to gzip.h and gunzip.h, and the modified gzip.h and gunzip.h files themselves.