GZIP compression is an extremely popular technique used for compressing web content. From web pages alone to videos and photos that are referenced on them, GNU Zip’s lossless compression is used by over fifty percent of all websites on the Internet (source: https://w3techs.com/technologies/details/ce-gzipcompression). Despite GZIP’s current popularity, its compression ratio is often worse than that of Brotli, which offers a modest improvement over its predecessor. Simply put, GZIP’s adoption is slowly trending downward as websites move to more modern technologies.
That is not to say that GZIP, or other compression formats such as
xz are going away: they have their own unique advantages and/or disadvantages. For example,
gzip compressed pages still offer slightly lower decompression times on the client, which could be useful for lower-end devices. Crucially, GZIP is much faster server-side compared to Brotli — lower end devices and/or servers will run better with the older compression technique.
xz, their disadvantages are not their compression ratios (in fact, they are far better than
brotli); the issue is that they take very long to decompress client-side, which could potentially ruin the end-user experience.
Outside of web technologies, GZIP is also commonly used for transferring files. You may have seen the large number of Linux packages that come packaged in “tarballs.” They also terminate with
gz; all of which were compressed by the GNU Zip (GZIP) algorithm.
Latency (latency refers to “page load times” in this context) is an important metric to keep an eye on, particularly because slow websites will (obviously) drive away traffic. Measured in
gzip compressed web content will often be an order of magnitude smaller than the original file.
gzip, or even Brotli, latency to websites tends to be far lower than if they were sending uncompressed data: from reduced SSL overhead to file sizes, it is clear why compression is used, even with rising Internet speeds around the world (i.e. time to first render).
Given the nature of
gzip’s lossless compression, one popular topic in Computer Science should come up: Huffman Coding. After
gzip identifies repetition in content that is being transmitted (through the LZ77 algorithm), a Huffman code (tree) is formed and attempts to further compress the data. An example could be with the small
String “Hello, world! Hello, bunny.net!”:
First, the LZ77 algorithm attempts to link repeated occurrences of words back to one single reference:
(The second reference points back to “hello.”)
gzip’s final processing algorithm (i.e. Huffman coding) can be applied (this is not an accurate representation of how
gzip compresses; it is merely a visualization; that is, individual characters won’t be represented in this way):
With that complete, data can begin to be sent to a client (in a binary stream). The client, or browser, will then decode the stream, followed by the tree, and finally, the LZ77-compressed content that will eventually yield the original, untouched content (HTML/images/videos/etc.).
Having mentioned the speed of GZIP’s compression algorithms, it becomes immediately clear that it is both designed to run on virtually any client/server all while providing an acceptable level of compression for static and dynamic content (live streams, etc.). In essence, GZIP works well with all types of content, while technologies such as bz2, xz, and Brotli work well with static content that does not change (i.e. videos, images, CSS, static HTML pages, JS, etc.).
While support for GNU Zip (on the web) is slowly trending downward, it still has many uses that will keep it in use for years to come. Even with newer compression technologies, the fundamental limitations in lossless compression mean that compression ratios will always be a trade-off between server and client-side processing (Brotli performs nearly the same as GZIP for client-side decompression but takes many magnitudes more time to compress server-side, making it only realistic to use with content that is compressed, then cached for subsequent requests).
Compression involves running an algorithm to make a file/image/etc. smaller. There are two modes of compression: lossy and lossless.
GZIP stands for GNU Zip.