XZ Utils

XZ Utils (previously LZMA Utils) is a set of free software command-line lossless data compressors, including the programs lzma and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows. For compression/decompression the LempelâZivâMarkov chain algorithm (LZMA) is used. XZ Utils started as a Unix port of Igor Pavlov's LZMA-SDK that has been adapted to fit seamlessly into Unix environments and their usual structure and behavior.

Original authorLasse Collin

DeveloperThe Tukaani Project

Stable release

5.8.2 / 17 December 2025; 4 months ago

Written inC

Quick facts Original author, Developer ...

XZ Utils
Original author	Lasse Collin
Developer	The Tukaani Project

Stable release	5.8.2 / 17 December 2025; 4 months ago (2025-12-17)

Written in	C
Operating system	Cross-platform
Type	Data compression
License	Public domain Some components GPL^[1]
Website	tukaani.org/xz/
Repository	github.com/tukaani-project/xz

Close

Features

XZ Utils can compress and decompress the xz and lzma file formats. Since the LZMA format has been considered legacy,^[2] XZ Utils by default compresses to xz. In addition, decompression of the .lz format used by lzip is supported since version 5.3.4.^[3]

In most cases, xz achieves higher compression rates than alternatives like zip,^[4] gzip and bzip2. Decompression speed is higher than bzip2, but lower than gzip. Compression can be much slower than gzip, and is slower than bzip2 for high levels of compression, and is most useful when a compressed file will be used many times.^[5]^[6]

XZ Utils consists of two major components:

xz, the command-line compressor and decompressor (analogous to gzip)
liblzma, a software library with an API similar to zlib

Various command shortcuts exist, such as lzma (for xz --format=lzma), unxz (for xz --decompress; analogous to gunzip) and xzcat (for unxz --stdout; analogous to zcat).

Usage

Both the behavior of the software and the properties of the file format have been designed to work similarly to those of the popular Unix compressing tools gzip and bzip2.

Just like gzip and bzip, xz and lzma can only compress single files (or data streams) as input. They cannot bundle multiple files into a single archive â to do this an archiving program is used first, such as tar.

Compressing an archive:

xz   my_archive.tar    # results in my_archive.tar.xz
lzma my_archive.tar    # results in my_archive.tar.lzma

Decompressing the archive:

unxz    my_archive.tar.xz      # results in my_archive.tar
unlzma  my_archive.tar.lzma    # results in my_archive.tar

Version 1.22 or greater of the GNU implementation of tar has transparent support for tarballs compressed with lzma and xz, using the switches --xz or -J for xz compression, and --lzma for LZMA compression.

Creating an archive and compressing it:

tar -c --xz   -f my_archive.tar.xz   /some_directory    # results in my_archive.tar.xz
tar -c --lzma -f my_archive.tar.lzma /some_directory    # results in my_archive.tar.lzma

Decompressing the archive and extracting its contents:

tar -x --xz   -f my_archive.tar.xz      # results in /some_directory
tar -x --lzma -f my_archive.tar.lzma    # results in /some_directory

Single-letter tar example for archive with compress and decompress with extract using short suffix:

tar cJf keep.txz keep   # archive then compress the directory ./keep/ into the file ./keep.txz
tar xJf keep.txz        # decompress then extract the file ./keep.txz creating the directory ./keep/

xz has supported multi-threaded compression (with the -T flag)^[7] since 2014, version 5.2.0;^[3] since version 5.4.0 threaded decompression has been implemented. Threaded decompression requires multiple compressed blocks within a stream which are created by the threaded compression interface. The number of threads can be less than defined if the file is not big enough for threading with the given settings or if using more threads would exceed the memory usage limit.^[7]

File format

Quick facts Filename extension, Internet media type ...

xz (file format)
Filename extension	`.xz`
Internet media type	application/x-xz
Magic number	`FD 37 7A 58 5A 00`
Developed by	Lasse Collin Igor Pavlov
Initial release	14 January 2009; 17 years ago (2009-01-14)
Latest release	1.2.1 8 April 2024; 2 years ago (2024-04-08)
Type of format	Data compression
Open format?	Yes
Free format?	Yes
Website	tukaani.org/xz/format.html

Close

An xz file is a sequence of one or more streams. There may be null bytes (padding) after each stream.

The xz format improves on lzma by allowing for preprocessing filters (BCJ and delta). The exact filters used are similar to those used in 7z, as 7z's filters are available in the public domain via the LZMA SDK. xz's RISC-V BCJ filter is its own addition.

Meta research from 2016 shows that xz has the highest compression ratio among lz4, zstd, zlib, and the slowest compression/decompression.^[8]

The author of lzip claims that the xz format is inadequate for general use due to its complexity and extensibility, saying that it puts xz-compressed data at an increased risk of undetected corruption.^[9]

Endianness is little-endian.^[10]

Stream structure

More information Offset (bytes), Field ...

Offset (bytes) Field Size (bytes) Description

0 Header magic number 6 Magic number. Must be FD 37 7A 58 5A 00.

6

Flags

2

Flags. The first byte and the four most significant bits of the second byte must be zero (reserved for future use).

The type of check (last field in the block structure) is encoded in the four least significant bits of the second byte:

Value	Size (bytes)	Description
0	0	None
1	4	CRC-32
2	4	Reserved
3	4	Reserved
4	8	CRC-64
5	8	Reserved
6	8	Reserved
7	16	Reserved
8	16	Reserved
9	16	Reserved
10	32	SHA-256
11	32	Reserved
12	32	Reserved
13	64	Reserved
14	64	Reserved
15	64	Reserved

8 Header CRC32 4 CRC-32 of the flags field. Used to distinguish between a corrupted file and unsupported flags (i.e. non-zero reserved bit).

12 Blocks Varies Sequence of zero or more blocks.

Varies Index Varies See index below.

Varies Footer CRC32 4 CRC-32 of the flags and backward size.

Varies Backward Size 4 Size of the index field.

Varies Flags 2 Copy of the flags field above.

Varies Footer magic number 2 Magic number. Must be 59 5A.

Close

Block structure

More information Offset (bytes), Field ...

Offset (bytes)	Field	Size (bytes)	Description
0	Header size	1	Size of the header. Note: `real_header_size = (encoded_header_size + 1) * 4`.
1	Flags	1	Flags Bits 0â1: Number of filters (1â4). Bits 2â5: Must be zero. Bit 6: Compressed size field is present. Bit 7: Uncompressed size field is present.
2	Compressed size	0 or varies	Size of the compressed data. Present if bit 6 of the flags is set. Encoded as a variable-length integer.
Varies	Uncompressed size	0 or varies	Size of the block after decompression. Present if bit 7 of the flags is set. Encoded as a variable-length integer.
Varies	Filter flags	Varies	Sequence of filter flags. The amount is encoded in bits 0â1 of the flags.
Varies	Header padding	Varies	As many null bytes as needed to make the header (i.e. fields before the compressed data) have the size specified in the header size field.
Varies	CRC32	4	CRC-32 of all bytes in the block up to (not including) this field.
Varies	Compressed data	Varies	The compressed data.
Varies	Block padding	0, 1, 2 or 3	0â3 null bytes to make the size of the block a multiple of 4.
Varies	Check	0, 4, 8, or 32	Error-detecting mechanism calculated from the data before compression. The type of check is encoded in the flags of the stream structure.

Close

Index structure

More information Offset (bytes), Field ...

Offset (bytes)	Field	Size (bytes)	Description
0	Index indicator	1	Must be zero to distinguish the index from a block, because this field overlaps with the first field in the block structure.
1	Number of records	Varies	Number of records in the next field. Must be the same as the number of blocks in the stream. Encoded as a variable-length integer.
Varies	Records	Varies	Sequence of records. Each record contains two variable-length integers: Unpadded size: size of the block excluding the padding field. Uncompressed size: size in bytes of the uncompressed block.
Varies	Padding	0, 1, 2 or 3	0â3 null-bytes to make the size of the index a multiple of 4.
Varies	CRC32	4	CRC-32 of all bytes in the index except this field.

Close

Variable-length integer

Values from 0 to 127 are stored as is, in one byte. Values greater than 127 (and up to 2^63) are stored in two or more bytes (up to 9). All bytes except the last one have the most significant bit set.^[11]

The following Python code implements functions to encode and decode a variable-length integer.

def encode(num):
	if num >= 2**63:
		raise ValueError("num must not have more than 63 bits")

	buf = b""

	while num >= 0x80:
		buf += (0x80 | (num & 0x7f)).to_bytes(length=1)
		num >>= 7

	buf += num.to_bytes(length=1)

	return buf

def decode(buf):
    if len(buf) == 0:
        raise ValueError("buf must not be empty")

	num = 0

	for i, byte in enumerate(buf):
		if i > 8:
			raise ValueError("num must not have more than 63 bits")
		num |= (byte & 0x7f) << (i * 7)
		if byte & 0x80 == 0:
			return num

Development and adoption

Development of XZ Utils took place within the Tukaani Project, a small group of developers who once maintained a Linux distribution based on Slackware. The chosen name "XZ" is not an abbreviation but instead appears to be a random given name for the data compressors, as there is no mention anywhere in the official specification on the meaning of "XZ".^[12] The .xz file format specification version 1.0.0 was officially released in January 2009.^[13]

All of the source code for xz and liblzma has been released into the public domain. The XZ Utils source distribution additionally includes some optional scripts and an example program that are subject to various versions of the GNU General Public License (GPL).^[1] The resulting software xz and liblzma binaries are public domain, unless the optional LGPL getopt implementation is incorporated.^[14]

Binaries are available for FreeBSD, NetBSD, Linux systems, Microsoft Windows, and FreeDOS. A number of Linux distributions, including Fedora, Slackware, Ubuntu, and Debian use xz for compressing their software packages. Arch Linux previously used xz to compress packages,^[15] but as of 27 December 2019, packages are compressed with Zstandard compression.^[16] Fedora Linux also switched to compressing its RPM packages with Zstandard with Fedora Linux 31.^[17] The GNU FTP archive also uses xz.

Backdoor incident

On 29 March 2024, Andres Freund, a PostgreSQL developer working at Microsoft, announced that he had found a backdoor in XZ Utils, impacting versions 5.6.0 and 5.6.1. Malicious code for setting up the backdoor had been hidden in compressed test files, and the configure script in the tar files was modified to trigger the hidden code. Freund started his investigation "After observing a few odd symptoms around liblzma (part of the xz package)"; specifically that ssh logins using sshd were "taking a lot of CPU" and producing valgrind errors.^[18] The vulnerability received a Common Vulnerability Scoring System (CVSS) score of 10 (the highest).^[19]

Features

Usage

File format

Stream structure

Block structure

Index structure

Variable-length integer

Development and adoption

Backdoor incident

References

External links

Related Articles