Monday, December 31, 2012

.net framework 4.5 Compression : DeflateStream & GZipSream

.Net framework includes DeflateStream and GZipStream to convert a stream into a compressed stream. Both of them were introduced first in .net framework 2.0. Starting from .net framework 4.5, they use zlib library for compression. DeflateStream and GZipStream are available in System.IO.Compression namespace in System assembly.

DeflateStream and GZipStream directly inherit Stream class.

Since they work on streams, they read the data byte by byte. So they cannot determine the best compression based on the overall data. Since they are based on DEFLATE algorithm which works like a dictionary encoder, they are not good with already compressed data. Neither of them supports adding individual files to the compressed stream. Both of these types are supported in metro style (.net for Windows Store) apps and Portable Class Library projects.

DeflateStream and GZipStream were introduced in .net framework 2.0 to compress a stream. In .net framework 4.5, they are updated to use zlib library. Because of the changes in Stream, they also support asynchronous behavior.

Compression using DeflateStream
DeflateStream is based on DEFLATE algorithm. Compressing a stream is just about copying another stream to DeflateStream instance with the proper compression mode set. In the following code, the path of source and destination file is specified as parameters to a method. It just reads the source file using FileStream and compresses it using DeflateStream to destination path.

Decompression is also as easier as compression. Here we are reading a source file using FileStream and passing it to DeflateStream with Decompression mode set. Then it is just a matter of copying it to a destination FileStream object. The term Inflate is used for decompression.

GZipStream Compression
GZip uses an industry standards data format for compression. The compressed stream can be saved with a *.gz extension. This compressed file can be decompressed with commonly available decompression tools and utilities. Using GZipStream is similar to using DeflateStream. The following code reads a file using FileStream and compresses it using a GZipStream object.

And the following is the code which can be used for decompressing a file using GZipStream. The file must have been compressed with GZip format.

Async compression and decompression
GZipStream and DeflateStream supports two Microsoft Async patterns. They are as follows:
  1. Asynchronous Programming Model [APM]
  2. The compression streams support APM by providing BeginRead / EndRead and BeginWrite / EndWrite method pairs. This is a legacy asynchronous pattern by Microsoft and is no longer recommended for new development. The pattern is also sometimes known as ASynchResult pattern. The BeginX / EndX method pair starts and ends the asynchronous operations. The BeginX method returns an ASyncResult object which can be used with the corresponding EndX method to block until the operation started with BeginX is completed. The IASyncResult can also be used to wait on the asynchronous operation to complete. The exceptions are also rethrown in the calling thread when EndX operation is called. Although BeginRead / EndRead and BeginWrite / EndWrite method pairs are inherited from Stream. They do get overridden in GZipStream and DeflateStream but since APM is no more recommended after .net framework 4.5, these methods are not recommended to be used for new development.

  3. Task-based Asynchronous Pattern [TAP]
  4. .net framework 4.5 introduced new XAsync methods to support another Microsoft asynchronous pattern TAP [Task based Asynchronous Pattern]. There are following methods provided in .net framework 4.5:

    1. FlushAsync
    2. ReadAsync
    3. WriteAsync
    4. CopyToAsync

    These methods, including their overrides, are inherited from Stream. There are overrides for these methods which support cancellation.
Word of Caution
As we discussed, DeflateStream is based on LZ77 and Huffman coding. LZ77 work on dictionary based encoding. Based on the extra bits created for dictionary keys, they might result in bigger compressed data, even bigger than the source data. So if the data size is too small, then it is better not to use them at all.

Download Code

Saturday, December 15, 2012

.Net Framework 4.5 and Compression

In this post we are going to discuss the new features in System.IO.Compression introduced in .net framework 4.5. We are going to specially discuss how easy is to create zip archives using those new features. We are also going to discuss how to use the zip packages and extract from it using a simple project.

ZipArchive in .net 4.5
Let's create a new .net 4.5 project AppCompression. It's a console application based project.

Let's add the references of System.IO.Compression and System.IO.Compression.FileSystem assemblies to the project. Here System.IO.Compression has additional types and extensions to ease working with compression / decompression.

Let's also add a test folder with some files as follows:

The folder also has a sub-folder. The contents of the sub-folder are shown below:

ZipArchive is introduced in .net framework 4.5. It allows us to create a zipped package / compressed file. All individual files are added as entries to the package. The confirms to universal zip format. This zip file can be transmitted and decompressed using a regular decompression utility including Winzip. The following code uses ZipArchive to compress a folder. We are using the main folder discussed above as the source folder. As you can see, we are creating a ZipArchive instance and adding individual files as entries from the folder.

As expected, when executed, the above creates a zip file in the specified path. We notice that it hasn't maintained the folder hierarchy in the compressed file. So we would not be able to decompress this file with the same folder structure as the original uncompressed folder.

As a matter of fact, ZipArchive does support maintaining the folder hierarchy. We just need to specify the file name including the relative path. This should be enough information for ZipArchive to create the intended folder structure. In the code below, we are using Uri.MakeRelativeUri to determine the relative path of each entry using the base folder's path. Now this relative path is used to create entries as follows:

When we look at the compressed folder in Windows Explorer, we do see the folder hierarchy being maintained. It creates the sub folder and adds the intended files to the sub folder. The folder looks as follows:

If we don't want to add System.IO.Compression.FileSystem assembly reference then we can create a simple ZipArchiveEntry from the ZipArchive instance. We can then use stream to copy the file's contents to the ZipArchiveEntry. The following code can be used to replace ZipArchive.CreateEntryFromFile method provided as an extension method to ZipArchive type in System.IO.Compression.FileSystem assembly.

ZipArchive can also be used to decompress a zip file in a directory. First let us see the easier way to do it. There are extension methods for ZipArchive available in System.IO.Compression.FileSystem assembly to do the same. The below code ExtractToDirectory extension method to decompress a zip file to the specified folder. The method extracts the contents of the compressed file while still maintaining the folder hierarchy. As you can see, we can use ZipArchiveMode.Read as we don't need to write any new entries to the package.

We can also enumerate through the entries in a ZipArchive and decompress each entry. In order to maintain the folder hierarchy, ZipArchiveEntry's FullName method can be used. The ExtractToDirectory method can be replaced with the following code. The code would take care of creating the expected folder structure for the compressed file:

ZipFile in .net 4.5
ZipFile is an easier alternative in .net 4.5 for working with compression / decompression. It also supports opening a ZipArchive, we can use it to add entries to a package. The package can later be used to create and save a new / same compressed file.

How simple is that?? Now let's see how we can use ZipFile to extract and compressed file into a folder. The following code extracts the contents of a zip file into the specified folder. We don't even have to make any extra effort to maintain the folder hierarchy during the extraction, ZipFile automatically takes care of that.

As discussed earlier, ZipFile can also be used to Open a compressed file. There are two overloads of Open() which allow us to load a compressed file using either of Read / Create or Update ZipArchiveMode. There is another method, OpenRead(), which opens the compressed package in Read mode. These modes apply restrictions on what we can do with the returned ZipArchive instance. The following sample code uses ZipFile to open a compressed package and writing the names of the entries on the console.

It results in the following output:

Download Code