Monday, December 31, 2012

.net framework 4.5 Compression : DeflateStream & GZipSream

.Net framework includes DeflateStream and GZipStream to convert a stream into a compressed stream. Both of them were introduced first in .net framework 2.0. Starting from .net framework 4.5, they use zlib library for compression. DeflateStream and GZipStream are available in System.IO.Compression namespace in System assembly.


DeflateStream and GZipStream directly inherit Stream class.


Since they work on streams, they read the data byte by byte. So they cannot determine the best compression based on the overall data. Since they are based on DEFLATE algorithm which works like a dictionary encoder, they are not good with already compressed data. Neither of them supports adding individual files to the compressed stream. Both of these types are supported in metro style (.net for Windows Store) apps and Portable Class Library projects.


DeflateStream and GZipStream were introduced in .net framework 2.0 to compress a stream. In .net framework 4.5, they are updated to use zlib library. Because of the changes in Stream, they also support asynchronous behavior.

Compression using DeflateStream
DeflateStream is based on DEFLATE algorithm. Compressing a stream is just about copying another stream to DeflateStream instance with the proper compression mode set. In the following code, the path of source and destination file is specified as parameters to a method. It just reads the source file using FileStream and compresses it using DeflateStream to destination path.


Decompression is also as easier as compression. Here we are reading a source file using FileStream and passing it to DeflateStream with Decompression mode set. Then it is just a matter of copying it to a destination FileStream object. The term Inflate is used for decompression.


GZipStream Compression
GZip uses an industry standards data format for compression. The compressed stream can be saved with a *.gz extension. This compressed file can be decompressed with commonly available decompression tools and utilities. Using GZipStream is similar to using DeflateStream. The following code reads a file using FileStream and compresses it using a GZipStream object.


And the following is the code which can be used for decompressing a file using GZipStream. The file must have been compressed with GZip format.


Async compression and decompression
GZipStream and DeflateStream supports two Microsoft Async patterns. They are as follows:
  1. Asynchronous Programming Model [APM]
  2. The compression streams support APM by providing BeginRead / EndRead and BeginWrite / EndWrite method pairs. This is a legacy asynchronous pattern by Microsoft and is no longer recommended for new development. The pattern is also sometimes known as ASynchResult pattern. The BeginX / EndX method pair starts and ends the asynchronous operations. The BeginX method returns an ASyncResult object which can be used with the corresponding EndX method to block until the operation started with BeginX is completed. The IASyncResult can also be used to wait on the asynchronous operation to complete. The exceptions are also rethrown in the calling thread when EndX operation is called. Although BeginRead / EndRead and BeginWrite / EndWrite method pairs are inherited from Stream. They do get overridden in GZipStream and DeflateStream but since APM is no more recommended after .net framework 4.5, these methods are not recommended to be used for new development.

  3. Task-based Asynchronous Pattern [TAP]
  4. .net framework 4.5 introduced new XAsync methods to support another Microsoft asynchronous pattern TAP [Task based Asynchronous Pattern]. There are following methods provided in .net framework 4.5:

    1. FlushAsync
    2. ReadAsync
    3. WriteAsync
    4. CopyToAsync

    These methods, including their overrides, are inherited from Stream. There are overrides for these methods which support cancellation.
Word of Caution
As we discussed, DeflateStream is based on LZ77 and Huffman coding. LZ77 work on dictionary based encoding. Based on the extra bits created for dictionary keys, they might result in bigger compressed data, even bigger than the source data. So if the data size is too small, then it is better not to use them at all.

Download Code

No comments: