Gunzip

From Crypto++ Wiki
Jump to navigation Jump to search
Gunzip
Documentation
#include <cryptopp/gzip.h>

Gzip is a lossless compression format standardized in RFC 1952, GZIP file format specification. Gzip is actually a file format with additional metadata (like original filename, file modified time and comments), and the underlying compression occurs using Deflate from RFC 1951. Crypto++ provides GZIP compression through the Gzip class, and decompression though the Gunzip class.

The Gunzip decompressor takes a pointer to a BufferedTransformation. Because a pointer is taken, the Gunzip owns the attached transformation, and therefore will destroy it. See ownership for more details.

The Crypto++ implementation does not allow you to set or retrieve the original filename, the modified filetime or comments in the archive. A patch provided below allows programs to do it, but the library will have to be recompiled.

The Gunzip class inherits from the Inflator class (which provides the RFC 1951 implementation).

Construction

Gunzip (BufferedTransformation *attachment=NULL,
        bool repeat=false,
        int autoSignalPropagation=-1)

attachment is a BufferedTransformation, such as another filter or sink. If attachment is NULL, then the Gzip object will internally accumulate the output byte stream.

repeat signals whether the object will decompress multiple compressed streams in series. The default value is false.

autoSignalPropagation indicates whether MessageEnd should be called and propagated to attached transformations. Set ot 0 to disable class to MessageEnd. The default value is -1, which propagates all MessageEnd calls to all attached transformations.

Sample Programs

The following is a small collection of sample programs to demonstrate using the Gunzip decompressor.

In-memory String

string decompressed, data = ...;
Gunzip unzipper(new StringSink(decompressed));

unzipper.Put((byte*) data.data(), data.size());
unzipper.MessageEnd();

On-disk File

string filename("test.txt.gz");
FileSource fs(filename.c_str(), true);

Gunzip unzipper;
fs.TransferTo(unzipper);

String using Pipeline

string decompressed, data = ...;
        
StringSource ss(data, true,
    new Gunzip(
        new StringSink(decompressed)
));

File using Pipeline

string filename("test.txt.gz");
string decompressed;
        
FileSource fs(filename.c_str(), true,
    new Gunzip(
        new StringSink(decompressed)
));

String using Put/Get

Gunzip unzipper;
unzipper.Put((byte*)data.data(), data.size());
unzipper.MessageEnd();
        
word64 avail = unzipper.MaxRetrievable();
if(avail)
{
    string decompressed;
    decompressed.resize(avail);
            
    unzipper.Get((byte*)&decompressed[0], decompressed.size());
}

Array using Put/Get

Gunzip unzipper;
unzipper.Put((byte*)data.data(), data.size());
unzipper.MessageEnd();
        
word64 avail = unzipper.MaxRetrievable();
if(avail)
{
    vector<byte> decompressed;
    decompressed.resize(avail);
            
    unzipper.Get(&decompressed[0], decompressed.size());
}

Patch

The patch below adds the ability to read and write the original filename, the modified filetime and comments for an archive. The sample program below shows how it could be used.

try {
        
    string filename("test.txt.gz"), s1, s2;
    string data = "abcdefghijklmnopqrstuvwxyz";
        
    // Create a compressor, save stream to memory via 's1'
    Gunzip unzipper(new StringSink(s1));

    // Add some Gzip specific fields
    unzipper.SetFilename(filename);
    unzipper.SetFiletime((word32)time(0));
    unzipper.SetComment("This is a test of filenames and comments");
    
    // Write the data to the stream
    unzipper.Put((byte*) data.c_str(), data.size());
    unzipper.MessageEnd();
        
    // Save the compressed data to a file
    FileSink fs(filename.c_str(), true);
    fs.Put((byte*) s1.data(), s1.size());
    fs.MessageEnd();
    
    // Create a decompressor, save stream to memory via 's2'
    Gunzip unzipper(new StringSink(s2));

    // Add the compressed data to it
    unzipper.Put((byte*) s1.data(), s1.size());
    unzipper.MessageEnd();
        
    // Print the Gzip specific data
    cout << "Filename: " << unzipper.GetFilename() << endl;
    cout << "Filetime: " << unzipper.GetFiletime() << endl;
    cout << "Comment: " << unzipper.GetComment() << endl;

    // Print the uncompressed stream
    cout << "Data: " << s2 << endl;        
}
catch(CryptoPP::Exception& ex)
{
    cerr << ex.what() << endl;
}

A typical run of the program is showed below.

$ ./cryptopp-test.exe
Filename: test.txt.gz
Filetime: 1420337339
Comment: This is a test of filenames, filetimes and comments
Data: abcdefghijklmnopqrstuvwxyz

To unpack the archive using the original filename, you would use gunzip -N. It can be tested by renaming test.txt.gz to something else, like test.gz.

And a view of the archive under The Archive Browser:

Gzip-archive-with-patch.png

Note: The Archive Browser on OS X displays the implicit filename (the archive name without the gz extension), and not the original filename embedded in the header. Also see Issue 802: The Archive Browser does not honor original filename field in a GZIP header.

Downloads

gzip.diff.zip - patch that adds the ability to set and retrieve the original filename, the modified filetime and comments on a GZIP archive. The ZIP includes the diff of changes to gzip.h and gunzip.h, and the modified gzip.h and gunzip.h files themselves.