Linux users: Know thy compression utilities

Linux offers a specific compression utility for almost any job. But don't get scared off by the number of Linux utilities for this task. Learn the basics to see how flexible and simple Linux compression can be.

When asked about file compression, most administrators envision the likes of Winzip or PKZip—both proprietary to the Windows operating system. But when those same administrators hear the names bzip2, bunzip2, gzip, gunzip, unzip, and zip, their heads begin to spin.

The head spinning is needless, however, when you consider that Linux's compression offerings are relatively simple to use and much more flexible than their Windows counterparts. The Linux command-line compression utilities each handle compression/decompression differently, making them ideal for specific jobs. In this Daily Feature, I will show you how to use the different Linux compression utilities. I'll also explain which tools are best for which job.

bzip2 and bunzip2

The bzip and bunzip2 compression utilities both use the Burrows-Wheeler Transform (BWT) algorithm. The BWT takes a block of data and rearranges it using a specific sorting algorithm. The resulting output block contains exactly the same data elements with which it started. The only difference in the compressed data block and the original data block is that the data has been placed in a different order. The transformation is completely reversible with zero loss of integrity. The biggest difference in the BWT method and the more popular methods is that BWT acts on an entire block of data at once whereas most other compression utilities act on data in streaming mode (one byte or a few bytes at a time).

Since the BWT handles its process in memory, the block of data is limited in size, which is the main drawback of the BWT algorithm. If the memory size is small, the block that BWT can handle will be small. If the data block goes beyond the limits of the memory, the data block must be broken into pieces.

Because of this limitation, bzip and bunzip2 are best suited for small to midsize blocks of data in need of compressing. Instead of using these tools for the compression of hard drive images, databases, or source code; use bzip2 and bunzip2 for images, e-mail attachments, and smaller compression needs where data integrity is critical.

bzip2 and bunzip2 usage

Using the bzip2 and bunzip2 utilities is as simple as using any other command-line tool. There are switches to use with the main command but typical usage will be without switches.

The most important thing to remember is that bzip2 compresses and bunzip2 decompresses. If you have a file named todays_payroll and you need this file compressed with bzip2, run the command bzip2 todays_payroll, which will result in the file todays_payroll.bz2. To decompress the new file, run the command bunzip2 today_payroll.bz2, and the original file will appear intact.

The bzip2recover (part of bzip2) utility has the ability to recover data from a damaged transmission error or damaged media. This utility should only be used on larger .bz2 files because the larger the file, the more recoverable blocks it will contain. To attempt recovery, run the command bzip2recover file_name. The recovered file will have a leading recov00001 (where 00001 equals the number of the extracted block).

gzip and gunzip

Unlike bzip2 and bunzip2, the gzip compression utilities use Lempel-Ziv coding (LZ77). This compression technique is based on numerically indexing character string segments, based on their first appearance in a file, and then replacing those strings with numeric values in future occurrences. The algorithm is complex, and doesn't offer an enormous upside in file size reduction. A 14-character test string, abaabaaabbabb, that I compressed using Lempel-Ziv, dropped to 13 characters, 0a0b1a2b1ab45.

I compressed a 34-MB file with bzip2 down to 11 MB; gzip compressed the file to 12 MB but took nearly half the time. Remember: bzip2 has to rearrange blocks in such a way as to make the overall file smaller; gzip simply makes each string smaller by replacement.

Because gzip doesn't have quite the compression ratio of bzip2, yet is able to compress much faster, gzip is best suited for on-the-fly compression where size is not an issue. Other than speed, gzip holds one other benefit over bzip2;gzip is able to work with multiple formats. Where bzip2 is only able to handle files with the .bz2 extension, gzip can work with .gz, .Z, .tgz, and .zip extensions.

gzip and gunzip usage

Using the gzip tools is very similar to using bzip2. The syntax of the compression command is gzip file_name, which will result in a compressed file namedfile_name.gz. The decompression can be done with either gzip -D file_name.gz or gunzip file_name.gz.

Both gzip and gunzip have a number of switches that can be passed to the command. The three most useful switches are:

  • -N: This always saves the original file name and time stamp.
  • -r: This recursively compresses a directory.
  • -c: This concatenates two files.

The -c switch must be used with caution. The syntax of this command requires two steps:
  1. Step 1: gzip -c file1 > file.gz
  2. Step 2: gzip -c file2 >> file.gz

Note that in Step 2, the second greater-than sign indicates that file2 is to be concatenated into file.gz.

zip and unzip

Identical to the MSDOS/Windows NT command-line compression utility, zip is compatible with MSDOS zip and PKZIP. The one aspect of zip that makes it a bit more compelling to use is its flexibility. Not only is zip a compression utility, it is also an archiving utility that can encrypt using passwords.

The main reason to use the zip and unzip utilities is for cross-platform compatibility. A .zip file compressed with WinZip can be decompressed with the Linux unzip utility (and vice versa). The compression of zip is nearly identical to that of gzip.

Say you have a directory, /var/log, that you want to compress and password protect. To do this, run the command zip -e log /var/log, which will result in the file log.zip.

zip and unzip usage

Basic usage of these tools, as shown above, is relatively simple. There are, of course, many options that can be applied to both zip and unzip. The most useful switches for zip include:

  • -b: This switch dictates where the resulting archive will be placed.
  • -e: This switch encrypts the archive with a password.
  • -f: This switch replaces a specified file in the archive, if the specified file is more recent than the file contained in the archive.
  • -r: This switch travels the directory structure recursively, which will compress all files within the directory.
  • -T: This switch tests the integrity of a specified zip file.

There are many more switches that can be seen in the zip man page. (Simply type the command man zip to see this page.)

Remember their uses

When deciding which utility to use, remember that each one is best suited for specific jobs. If you have small to midsize files where data integrity is critical, use bzip2. For larger files and on-the-fly compression, gzip is the tool for the job. Finally, for cross-platform compatibility, use zip.

Three different tools, three different uses. This just goes to show that Linux is nothing if not flexible.

TechRepublic is the online community and information resource for all IT professionals, from support staff to executives. We offer in-depth technical articles written for IT professionals by IT professionals. In addition to articles on everything from Windows to e-mail to fire walls, we offer IT industry analysis, downloads, management tips, discussion forums, and e-newsletters.

©2001 TechRepublic, Inc.

Talkback

Add your opinion

In order to post a comment, you need to be registered. (Sign In or register below)

Post your comment

Terms of Service - As a ZDNet registrant, and by using this service, you indicate that you agree to our Terms and Conditions and have read and understand our Privacy Policy.

ZDNet Australia Live

FugsFargy mulberry

34 minutes ago by BuhBypeepheri on Microsoft, Barnes & Noble ink $300m deal

Yes HC they have a whinge for every occasion, which contradicts itself (much like proverbs). Precious and most humorous, aren't they...!...

38 minutes ago by Beta on NBN users opt for 100Mbps

RT @sortius: #NBN users opt for 100Mbps http://t.co/lr7yE0A8 via @zdnetaustralia | do you have a reaction to this @TurnbullMalcolm?

RT @sortius: #NBN users opt for 100Mbps http://t.co/lr7yE0A8 via @zdnetaustralia | do you have a reaction to this @TurnbullMalcolm?

#NBN users opt for 100Mbps http://t.co/lr7yE0A8 via @zdnetaustralia | do you have a reaction to this @TurnbullMalcolm?

Notice how he didn't tell us when the "honeymoon" will end. It's all very convenient a NBN success story = artificial honeymoon, lol, but...

1 hour ago by Hubert Cumberdale on NBN users opt for 100Mbps

Oh look it's the multiple banned, multi named fool alain... back from the dead. How many blogs are you banned at (not just one, eh - the...

1 hour ago by Beta on NBN users opt for 100Mbps

LOL, you wanted the money I was going to donate to the "bubububu please stop the nbn waste fund" Since I was only going to donate somethi...

2 hours ago by Hubert Cumberdale on NBN users opt for 100Mbps

I think everyone is missing the big picture here and that is the anti-NBN zealots have effectively admitted defeat by complaining about t...

2 hours ago by Hubert Cumberdale on NBN's Tassie upgrade to cost $1.3 million

Internet users certainly want the speed once they can get it#NBN

NBN users opt for 100Mbps http://t.co/JTQbWghv via @zdnetaustralia

It will be intersting to know what residences will sign up for when the NBN Co stops subsidising it all. 'NBN Co, the public-private par...

3 hours ago by advocate on NBN users opt for 100Mbps

RT @zdnetaustralia: NSW outs datacentre deal details: http://t.co/DmebN1on

Australian NBN subscribers are opting for 100/40 over 12/1 speeds: http://t.co/QsWk7u6Y That's the least surprising news I've ever read! :)

UK 'cookie law' takes effect: What you need to know http://t.co/u7LZZ1oM

RT @juhasaarinen: NBN users opt for 100Mbps http://t.co/T7uk1hbK by @joshgnosis

Poor Oracle, poor, poor Oracle, I feel so sorry for them. I really hope they don't go bust, for at least another 5 or 6 months. Sucked in...

6 hours ago by Rex Alfie on Google didn't infringe on Oracle patents: jury

The point of pilot schemes is to determine the best practice and save money in the broader picture. The Tasmanian rollout planning actua...

6 hours ago by GregoryB1 on NBN's Tassie upgrade to cost $1.3 million

I think that a CBA is unlikely because with the high proportion of customers now electing for the highest rate (50% of connections in Apr...

6 hours ago by GregoryB1 on NBN cost-benefit analyses are so 2011

Pentaho adds native integration with MongoDB http://t.co/uJCqDA9B

RT @pussyeatingclub: Why you should pay for porn. A good read. http://t.co/PfhedCQs

DDoS works because you have enough compromised machines to clog the pipe or servers of the victim. If, the victim's pipe is widened by a ...

6 hours ago by GregoryB1 on National Botnet Network coming: Earthwave

Please stop with the analytical, common sense and facts, Gregory. Those opposed to the NBN don't want to hear such things, which is why ...

6 hours ago by Beta on Blowing the digital dividend on wireless NBN

But, yet again, Turnbull is clearly in error when he says that other companies cannot roll out copper. In South Brisbane Telstra chose to...

7 hours ago by GregoryB1 on Copper greenfield dominance irrelevant: Conroy

Not much point running fiber back to the exchange if that exchange itself is connected by copper. It is access to fiber backhaul that de...

7 hours ago by GregoryB1 on Copper greenfield dominance irrelevant: Conroy

+1

7 hours ago by Beta on Copper greenfield dominance irrelevant: Conroy

So instead you want these estates wired up with fiber and then left, unconnected with no service, until the fiber rollout reaches them in...

7 hours ago by GregoryB1 on Copper greenfield dominance irrelevant: Conroy

@paulbrislen @juhasaarinen Prices compared here: http://t.co/WnZzXP5Z

RT @joshgnosis: @paulbrislen @juhasaarinen Prices compared here: http://t.co/WnZzXP5Z

Water, roads and electricity were all rolled out by government because there private companies weren't interested as the ROI in the early...

7 hours ago by GregoryB1 on Five pros and cons of the NBN

NBN users opt for 100Mbps http://t.co/T7uk1hbK by @joshgnosis

Chrome beats Internet Explorer in global Web browser race | ZDNet http://t.co/3XfMdUXM

The case you outline, South Brisbane, is in fact the coalitions prefered model. They WANT the incumbent telco, Telstra, to provide the f...

7 hours ago by GregoryB1 on Five pros and cons of the NBN

Cybersecurity #collaboration between the US & Australia. http://t.co/p2uKLSBi

So, over time, the Coalition policy will cost much much more than Labor's because they intend to subsidise the broadband of farmers and t...

7 hours ago by GregoryB1 on Malaysia held up as NBN king

Any form of science training counts against you as a politician, in the coalition parties, doubly so. There may be others who keep quiet...

7 hours ago by GregoryB1 on NBN FUD: will Abbott ever learn?

Qld govt IT to be cleaned up by audit http://t.co/r4oNuNW8 #qldpol

Travel Tech Q and A: Skyscanner's Ewan Gray http://t.co/7ZfXZk19

Microsoft is serious about open source: 10 proof points | ZDNet http://t.co/2OtDR11D

Sex Tech: Faceporn win, Parental revenge porn, Google: No Porn ...: Google opposes UK porn filters, a fake porn ... http://t.co/0OR87oEt

Q&A of the Week: 'The current state of the cybercrime ecosystem' featuring Mikko Hypponen http://t.co/6lUYFs0X

RT @DellEnterprise: Dell Secureworks talks with ZDNet about Android's biggest #security flaws - http://t.co/LSFLQVFq #infosec

NBN users opt for 100Mbps: Customers are picking the top fibre plan that is available on the National Broadband ... http://t.co/sjtFSU3g

"Customers are picking the top fibre plan that is available on the National Broadband Network (NBN), more than a... http://t.co/M3P24Htn

RT @CorrieB: An iPad for every child: Inevitable or impossible? http://t.co/I7uS8l9s Thx to @timbuckteeth for this; http://t.co/jxkqIRIp

RT @MADinMelbourne: roxon "will enable more families to access credit" @MLolderandwiser: Privacy Act amendments http://t.co/Mv4c7PC2 via @zdnetaustralia

NBN users opt for 100Mbps - ZDNet Australia http://t.co/fLfHMzPn #australia #technews

RT @konradski: Whaddayaknow - turns out Wi-Fi CAN interfere with a plane's navigation systems http://t.co/ospQCU2S

This story has been voted 5 times in the last 24 hours!

20 hours ago, NBN's Tassie upgrade to cost $1.3 million

NBN users opt for 100Mbps - Communications - News - ZDNet Australia: NBN users opt for 100Mbps - Communications ... http://t.co/btB9gKWg

NBN users opt for 100Mbps http://t.co/xKqEb4bE via @zdnetaustralia

Biometric bugs too dangerous for public? http://t.co/8JLz5tdF via @zdnetaustralia

Exploring: http://t.co/rT7RPZLA

War talk dominates #AusCERT 2012 - http://t.co/SlBpMj0c - #security #cyber

Travel Tech Q&A: Skyscanner's Ewan Gray http://t.co/vYexrDwu #ipad

Exploring: http://t.co/YNVjdrct

Exploring: Travel Tech Q and A: Skyscanner's Ewan Gray: Ewan Gray, Skyscanner's director for Asia ... http://t.co/bNLCyobv #ICTChallenge

This story has been voted 12000 times in the last 24 hours!

3 days ago, Is Bill Gates a great leader?

Facebook Activity

Keep up with ZDNet Australia

ZDNet Events Calendar

ZDNet Events Calendar