| LZMA SDK 4.57 |
| ------------- |
| |
| LZMA SDK Copyright (C) 1999-2007 Igor Pavlov |
| |
| LZMA SDK provides the documentation, samples, header files, libraries, |
| and tools you need to develop applications that use LZMA compression. |
| |
| LZMA is default and general compression method of 7z format |
| in 7-Zip compression program (www.7-zip.org). LZMA provides high |
| compression ratio and very fast decompression. |
| |
| LZMA is an improved version of famous LZ77 compression algorithm. |
| It was improved in way of maximum increasing of compression ratio, |
| keeping high decompression speed and low memory requirements for |
| decompressing. |
| |
| |
| |
| LICENSE |
| ------- |
| |
| LZMA SDK is available under any of the following licenses: |
| |
| 1) GNU Lesser General Public License (GNU LGPL) |
| 2) Common Public License (CPL) |
| 3) Simplified license for unmodified code (read SPECIAL EXCEPTION) |
| 4) Proprietary license |
| |
| It means that you can select one of these four options and follow rules of that license. |
| |
| |
| 1,2) GNU LGPL and CPL licenses are pretty similar and both these |
| licenses are classified as |
| - "Free software licenses" at http://www.gnu.org/ |
| - "OSI-approved" at http://www.opensource.org/ |
| |
| |
| 3) SPECIAL EXCEPTION |
| |
| Igor Pavlov, as the author of this code, expressly permits you |
| to statically or dynamically link your code (or bind by name) |
| to the files from LZMA SDK without subjecting your linked |
| code to the terms of the CPL or GNU LGPL. |
| Any modifications or additions to files from LZMA SDK, however, |
| are subject to the GNU LGPL or CPL terms. |
| |
| SPECIAL EXCEPTION allows you to use LZMA SDK in applications with closed code, |
| while you keep LZMA SDK code unmodified. |
| |
| |
| SPECIAL EXCEPTION #2: Igor Pavlov, as the author of this code, expressly permits |
| you to use this code under the same terms and conditions contained in the License |
| Agreement you have for any previous version of LZMA SDK developed by Igor Pavlov. |
| |
| SPECIAL EXCEPTION #2 allows owners of proprietary licenses to use latest version |
| of LZMA SDK as update for previous versions. |
| |
| |
| SPECIAL EXCEPTION #3: Igor Pavlov, as the author of this code, expressly permits |
| you to use code of the following files: |
| BranchTypes.h, LzmaTypes.h, LzmaTest.c, LzmaStateTest.c, LzmaAlone.cpp, |
| LzmaAlone.cs, LzmaAlone.java |
| as public domain code. |
| |
| |
| 4) Proprietary license |
| |
| LZMA SDK also can be available under a proprietary license which |
| can include: |
| |
| 1) Right to modify code without subjecting modified code to the |
| terms of the CPL or GNU LGPL |
| 2) Technical support for code |
| |
| To request such proprietary license or any additional consultations, |
| send email message from that page: |
| http://www.7-zip.org/support.html |
| |
| |
| You should have received a copy of the GNU Lesser General Public |
| License along with this library; if not, write to the Free Software |
| Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA |
| |
| You should have received a copy of the Common Public License |
| along with this library. |
| |
| |
| LZMA SDK Contents |
| ----------------- |
| |
| LZMA SDK includes: |
| |
| - C++ source code of LZMA compressing and decompressing |
| - ANSI-C compatible source code for LZMA decompressing |
| - C# source code for LZMA compressing and decompressing |
| - Java source code for LZMA compressing and decompressing |
| - Compiled file->file LZMA compressing/decompressing program for Windows system |
| |
| ANSI-C LZMA decompression code was ported from original C++ sources to C. |
| Also it was simplified and optimized for code size. |
| But it is fully compatible with LZMA from 7-Zip. |
| |
| |
| UNIX/Linux version |
| ------------------ |
| To compile C++ version of file->file LZMA, go to directory |
| C/7zip/Compress/LZMA_Alone |
| and type "make" or "make clean all" to recompile all. |
| |
| In some UNIX/Linux versions you must compile LZMA with static libraries. |
| To compile with static libraries, change string in makefile |
| LIB = -lm |
| to string |
| LIB = -lm -static |
| |
| |
| Files |
| --------------------- |
| C - C source code |
| CPP - CPP source code |
| CS - C# source code |
| Java - Java source code |
| lzma.txt - LZMA SDK description (this file) |
| 7zFormat.txt - 7z Format description |
| 7zC.txt - 7z ANSI-C Decoder description (this file) |
| methods.txt - Compression method IDs for .7z |
| LGPL.txt - GNU Lesser General Public License |
| CPL.html - Common Public License |
| lzma.exe - Compiled file->file LZMA encoder/decoder for Windows |
| history.txt - history of the LZMA SDK |
| |
| |
| Source code structure |
| --------------------- |
| |
| C - C files |
| Compress - files related to compression/decompression |
| Lz - files related to LZ (Lempel-Ziv) compression algorithm |
| Lzma - ANSI-C compatible LZMA decompressor |
| |
| LzmaDecode.h - interface for LZMA decoding on ANSI-C |
| LzmaDecode.c - LZMA decoding on ANSI-C (new fastest version) |
| LzmaDecodeSize.c - LZMA decoding on ANSI-C (old size-optimized version) |
| LzmaTest.c - test application that decodes LZMA encoded file |
| LzmaTypes.h - basic types for LZMA Decoder |
| LzmaStateDecode.h - interface for LZMA decoding (State version) |
| LzmaStateDecode.c - LZMA decoding on ANSI-C (State version) |
| LzmaStateTest.c - test application (State version) |
| |
| Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code |
| |
| Archive - files related to archiving |
| 7z_C - 7z ANSI-C Decoder |
| |
| |
| CPP -- CPP files |
| |
| Common - common files for C++ projects |
| Windows - common files for Windows related code |
| 7zip - files related to 7-Zip Project |
| |
| Common - common files for 7-Zip |
| |
| Compress - files related to compression/decompression |
| |
| LZ - files related to LZ (Lempel-Ziv) compression algorithm |
| |
| Copy - Copy coder |
| RangeCoder - Range Coder (special code of compression/decompression) |
| LZMA - LZMA compression/decompression on C++ |
| LZMA_Alone - file->file LZMA compression/decompression |
| |
| Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code |
| |
| Archive - files related to archiving |
| |
| Common - common files for archive handling |
| 7z - 7z C++ Encoder/Decoder |
| |
| Bundles - Modules that are bundles of other modules |
| |
| Alone7z - 7zr.exe: Standalone version of 7z.exe that supports only 7z/LZMA/BCJ/BCJ2 |
| Format7zR - 7zr.dll: Reduced version of 7za.dll: extracting/compressing to 7z/LZMA/BCJ/BCJ2 |
| Format7zExtractR - 7zxr.dll: Reduced version of 7zxa.dll: extracting from 7z/LZMA/BCJ/BCJ2. |
| |
| UI - User Interface files |
| |
| Client7z - Test application for 7za.dll, 7zr.dll, 7zxr.dll |
| Common - Common UI files |
| Console - Code for console archiver |
| |
| |
| |
| CS - C# files |
| 7zip |
| Common - some common files for 7-Zip |
| Compress - files related to compression/decompression |
| LZ - files related to LZ (Lempel-Ziv) compression algorithm |
| LZMA - LZMA compression/decompression |
| LzmaAlone - file->file LZMA compression/decompression |
| RangeCoder - Range Coder (special code of compression/decompression) |
| |
| Java - Java files |
| SevenZip |
| Compression - files related to compression/decompression |
| LZ - files related to LZ (Lempel-Ziv) compression algorithm |
| LZMA - LZMA compression/decompression |
| RangeCoder - Range Coder (special code of compression/decompression) |
| |
| C/C++ source code of LZMA SDK is part of 7-Zip project. |
| |
| You can find ANSI-C LZMA decompressing code at folder |
| C/7zip/Compress/Lzma |
| 7-Zip doesn't use that ANSI-C LZMA code and that code was developed |
| specially for this SDK. And files from C/7zip/Compress/Lzma do not need |
| files from other directories of SDK for compiling. |
| |
| 7-Zip source code can be downloaded from 7-Zip's SourceForge page: |
| |
| http://sourceforge.net/projects/sevenzip/ |
| |
| |
| LZMA features |
| ------------- |
| - Variable dictionary size (up to 1 GB) |
| - Estimated compressing speed: about 1 MB/s on 1 GHz CPU |
| - Estimated decompressing speed: |
| - 8-12 MB/s on 1 GHz Intel Pentium 3 or AMD Athlon |
| - 500-1000 KB/s on 100 MHz ARM, MIPS, PowerPC or other simple RISC |
| - Small memory requirements for decompressing (8-32 KB + DictionarySize) |
| - Small code size for decompressing: 2-8 KB (depending from |
| speed optimizations) |
| |
| LZMA decoder uses only integer operations and can be |
| implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions). |
| |
| Some critical operations that affect to speed of LZMA decompression: |
| 1) 32*16 bit integer multiply |
| 2) Misspredicted branches (penalty mostly depends from pipeline length) |
| 3) 32-bit shift and arithmetic operations |
| |
| Speed of LZMA decompressing mostly depends from CPU speed. |
| Memory speed has no big meaning. But if your CPU has small data cache, |
| overall weight of memory speed will slightly increase. |
| |
| |
| How To Use |
| ---------- |
| |
| Using LZMA encoder/decoder executable |
| -------------------------------------- |
| |
| Usage: LZMA <e|d> inputFile outputFile [<switches>...] |
| |
| e: encode file |
| |
| d: decode file |
| |
| b: Benchmark. There are two tests: compressing and decompressing |
| with LZMA method. Benchmark shows rating in MIPS (million |
| instructions per second). Rating value is calculated from |
| measured speed and it is normalized with AMD Athlon 64 X2 CPU |
| results. Also Benchmark checks possible hardware errors (RAM |
| errors in most cases). Benchmark uses these settings: |
| (-a1, -d21, -fb32, -mfbt4). You can change only -d. Also you |
| can change number of iterations. Example for 30 iterations: |
| LZMA b 30 |
| Default number of iterations is 10. |
| |
| <Switches> |
| |
| |
| -a{N}: set compression mode 0 = fast, 1 = normal |
| default: 1 (normal) |
| |
| d{N}: Sets Dictionary size - [0, 30], default: 23 (8MB) |
| The maximum value for dictionary size is 1 GB = 2^30 bytes. |
| Dictionary size is calculated as DictionarySize = 2^N bytes. |
| For decompressing file compressed by LZMA method with dictionary |
| size D = 2^N you need about D bytes of memory (RAM). |
| |
| -fb{N}: set number of fast bytes - [5, 273], default: 128 |
| Usually big number gives a little bit better compression ratio |
| and slower compression process. |
| |
| -lc{N}: set number of literal context bits - [0, 8], default: 3 |
| Sometimes lc=4 gives gain for big files. |
| |
| -lp{N}: set number of literal pos bits - [0, 4], default: 0 |
| lp switch is intended for periodical data when period is |
| equal 2^N. For example, for 32-bit (4 bytes) |
| periodical data you can use lp=2. Often it's better to set lc0, |
| if you change lp switch. |
| |
| -pb{N}: set number of pos bits - [0, 4], default: 2 |
| pb switch is intended for periodical data |
| when period is equal 2^N. |
| |
| -mf{MF_ID}: set Match Finder. Default: bt4. |
| Algorithms from hc* group doesn't provide good compression |
| ratio, but they often works pretty fast in combination with |
| fast mode (-a0). |
| |
| Memory requirements depend from dictionary size |
| (parameter "d" in table below). |
| |
| MF_ID Memory Description |
| |
| bt2 d * 9.5 + 4MB Binary Tree with 2 bytes hashing. |
| bt3 d * 11.5 + 4MB Binary Tree with 3 bytes hashing. |
| bt4 d * 11.5 + 4MB Binary Tree with 4 bytes hashing. |
| hc4 d * 7.5 + 4MB Hash Chain with 4 bytes hashing. |
| |
| -eos: write End Of Stream marker. By default LZMA doesn't write |
| eos marker, since LZMA decoder knows uncompressed size |
| stored in .lzma file header. |
| |
| -si: Read data from stdin (it will write End Of Stream marker). |
| -so: Write data to stdout |
| |
| |
| Examples: |
| |
| 1) LZMA e file.bin file.lzma -d16 -lc0 |
| |
| compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K) |
| and 0 literal context bits. -lc0 allows to reduce memory requirements |
| for decompression. |
| |
| |
| 2) LZMA e file.bin file.lzma -lc0 -lp2 |
| |
| compresses file.bin to file.lzma with settings suitable |
| for 32-bit periodical data (for example, ARM or MIPS code). |
| |
| 3) LZMA d file.lzma file.bin |
| |
| decompresses file.lzma to file.bin. |
| |
| |
| Compression ratio hints |
| ----------------------- |
| |
| Recommendations |
| --------------- |
| |
| To increase compression ratio for LZMA compressing it's desirable |
| to have aligned data (if it's possible) and also it's desirable to locate |
| data in such order, where code is grouped in one place and data is |
| grouped in other place (it's better than such mixing: code, data, code, |
| data, ...). |
| |
| |
| Using Filters |
| ------------- |
| You can increase compression ratio for some data types, using |
| special filters before compressing. For example, it's possible to |
| increase compression ratio on 5-10% for code for those CPU ISAs: |
| x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC. |
| |
| You can find C/C++ source code of such filters in folder "7zip/Compress/Branch" |
| |
| You can check compression ratio gain of these filters with such |
| 7-Zip commands (example for ARM code): |
| No filter: |
| 7z a a1.7z a.bin -m0=lzma |
| |
| With filter for little-endian ARM code: |
| 7z a a2.7z a.bin -m0=bc_arm -m1=lzma |
| |
| With filter for big-endian ARM code (using additional Swap4 filter): |
| 7z a a3.7z a.bin -m0=swap4 -m1=bc_arm -m2=lzma |
| |
| It works in such manner: |
| Compressing = Filter_encoding + LZMA_encoding |
| Decompressing = LZMA_decoding + Filter_decoding |
| |
| Compressing and decompressing speed of such filters is very high, |
| so it will not increase decompressing time too much. |
| Moreover, it reduces decompression time for LZMA_decoding, |
| since compression ratio with filtering is higher. |
| |
| These filters convert CALL (calling procedure) instructions |
| from relative offsets to absolute addresses, so such data becomes more |
| compressible. Source code of these CALL filters is pretty simple |
| (about 20 lines of C++), so you can convert it from C++ version yourself. |
| |
| For some ISAs (for example, for MIPS) it's impossible to get gain from such filter. |
| |
| |
| LZMA compressed file format |
| --------------------------- |
| Offset Size Description |
| 0 1 Special LZMA properties for compressed data |
| 1 4 Dictionary size (little endian) |
| 5 8 Uncompressed size (little endian). -1 means unknown size |
| 13 Compressed data |
| |
| |
| ANSI-C LZMA Decoder |
| ~~~~~~~~~~~~~~~~~~~ |
| |
| To compile ANSI-C LZMA Decoder you can use one of the following files sets: |
| 1) LzmaDecode.h + LzmaDecode.c + LzmaTest.c (fastest version) |
| 2) LzmaDecode.h + LzmaDecodeSize.c + LzmaTest.c (old size-optimized version) |
| 3) LzmaStateDecode.h + LzmaStateDecode.c + LzmaStateTest.c (zlib-like interface) |
| |
| |
| Memory requirements for LZMA decoding |
| ------------------------------------- |
| |
| LZMA decoder doesn't allocate memory itself, so you must |
| allocate memory and send it to LZMA. |
| |
| Stack usage of LZMA decoding function for local variables is not |
| larger than 200 bytes. |
| |
| How To decompress data |
| ---------------------- |
| |
| LZMA Decoder (ANSI-C version) now supports 5 interfaces: |
| 1) Single-call Decompressing |
| 2) Single-call Decompressing with input stream callback |
| 3) Multi-call Decompressing with output buffer |
| 4) Multi-call Decompressing with input callback and output buffer |
| 5) Multi-call State Decompressing (zlib-like interface) |
| |
| Variant-5 is similar to Variant-4, but Variant-5 doesn't use callback functions. |
| |
| Decompressing steps |
| ------------------- |
| |
| 1) read LZMA properties (5 bytes): |
| unsigned char properties[LZMA_PROPERTIES_SIZE]; |
| |
| 2) read uncompressed size (8 bytes, little-endian) |
| |
| 3) Decode properties: |
| |
| CLzmaDecoderState state; /* it's 24-140 bytes structure, if int is 32-bit */ |
| |
| if (LzmaDecodeProperties(&state.Properties, properties, LZMA_PROPERTIES_SIZE) != LZMA_RESULT_OK) |
| return PrintError(rs, "Incorrect stream properties"); |
| |
| 4) Allocate memory block for internal Structures: |
| |
| state.Probs = (CProb *)malloc(LzmaGetNumProbs(&state.Properties) * sizeof(CProb)); |
| if (state.Probs == 0) |
| return PrintError(rs, kCantAllocateMessage); |
| |
| LZMA decoder uses array of CProb variables as internal structure. |
| By default, CProb is unsigned_short. But you can define _LZMA_PROB32 to make |
| it unsigned_int. It can increase speed on some 32-bit CPUs, but memory |
| usage will be doubled in that case. |
| |
| |
| 5) Main Decompressing |
| |
| You must use one of the following interfaces: |
| |
| 5.1 Single-call Decompressing |
| ----------------------------- |
| When to use: RAM->RAM decompressing |
| Compile files: LzmaDecode.h, LzmaDecode.c |
| Compile defines: no defines |
| Memory Requirements: |
| - Input buffer: compressed size |
| - Output buffer: uncompressed size |
| - LZMA Internal Structures (~16 KB for default settings) |
| |
| Interface: |
| int res = LzmaDecode(&state, |
| inStream, compressedSize, &inProcessed, |
| outStream, outSize, &outProcessed); |
| |
| |
| 5.2 Single-call Decompressing with input stream callback |
| -------------------------------------------------------- |
| When to use: File->RAM or Flash->RAM decompressing. |
| Compile files: LzmaDecode.h, LzmaDecode.c |
| Compile defines: _LZMA_IN_CB |
| Memory Requirements: |
| - Buffer for input stream: any size (for example, 16 KB) |
| - Output buffer: uncompressed size |
| - LZMA Internal Structures (~16 KB for default settings) |
| |
| Interface: |
| typedef struct _CBuffer |
| { |
| ILzmaInCallback InCallback; |
| FILE *File; |
| unsigned char Buffer[kInBufferSize]; |
| } CBuffer; |
| |
| int LzmaReadCompressed(void *object, const unsigned char **buffer, SizeT *size) |
| { |
| CBuffer *bo = (CBuffer *)object; |
| *buffer = bo->Buffer; |
| *size = MyReadFile(bo->File, bo->Buffer, kInBufferSize); |
| return LZMA_RESULT_OK; |
| } |
| |
| CBuffer g_InBuffer; |
| |
| g_InBuffer.File = inFile; |
| g_InBuffer.InCallback.Read = LzmaReadCompressed; |
| int res = LzmaDecode(&state, |
| &g_InBuffer.InCallback, |
| outStream, outSize, &outProcessed); |
| |
| |
| 5.3 Multi-call decompressing with output buffer |
| ----------------------------------------------- |
| When to use: RAM->File decompressing |
| Compile files: LzmaDecode.h, LzmaDecode.c |
| Compile defines: _LZMA_OUT_READ |
| Memory Requirements: |
| - Input buffer: compressed size |
| - Buffer for output stream: any size (for example, 16 KB) |
| - LZMA Internal Structures (~16 KB for default settings) |
| - LZMA dictionary (dictionary size is encoded in stream properties) |
| |
| Interface: |
| |
| state.Dictionary = (unsigned char *)malloc(state.Properties.DictionarySize); |
| |
| LzmaDecoderInit(&state); |
| do |
| { |
| LzmaDecode(&state, |
| inBuffer, inAvail, &inProcessed, |
| g_OutBuffer, outAvail, &outProcessed); |
| inAvail -= inProcessed; |
| inBuffer += inProcessed; |
| } |
| while you need more bytes |
| |
| see LzmaTest.c for more details. |
| |
| |
| 5.4 Multi-call decompressing with input callback and output buffer |
| ------------------------------------------------------------------ |
| When to use: File->File decompressing |
| Compile files: LzmaDecode.h, LzmaDecode.c |
| Compile defines: _LZMA_IN_CB, _LZMA_OUT_READ |
| Memory Requirements: |
| - Buffer for input stream: any size (for example, 16 KB) |
| - Buffer for output stream: any size (for example, 16 KB) |
| - LZMA Internal Structures (~16 KB for default settings) |
| - LZMA dictionary (dictionary size is encoded in stream properties) |
| |
| Interface: |
| |
| state.Dictionary = (unsigned char *)malloc(state.Properties.DictionarySize); |
| |
| LzmaDecoderInit(&state); |
| do |
| { |
| LzmaDecode(&state, |
| &bo.InCallback, |
| g_OutBuffer, outAvail, &outProcessed); |
| } |
| while you need more bytes |
| |
| see LzmaTest.c for more details: |
| |
| |
| 5.5 Multi-call State Decompressing (zlib-like interface) |
| ------------------------------------------------------------------ |
| When to use: file->file decompressing |
| Compile files: LzmaStateDecode.h, LzmaStateDecode.c |
| Compile defines: |
| Memory Requirements: |
| - Buffer for input stream: any size (for example, 16 KB) |
| - Buffer for output stream: any size (for example, 16 KB) |
| - LZMA Internal Structures (~16 KB for default settings) |
| - LZMA dictionary (dictionary size is encoded in stream properties) |
| |
| Interface: |
| |
| state.Dictionary = (unsigned char *)malloc(state.Properties.DictionarySize); |
| |
| |
| LzmaDecoderInit(&state); |
| do |
| { |
| res = LzmaDecode(&state, |
| inBuffer, inAvail, &inProcessed, |
| g_OutBuffer, outAvail, &outProcessed, |
| finishDecoding); |
| inAvail -= inProcessed; |
| inBuffer += inProcessed; |
| } |
| while you need more bytes |
| |
| see LzmaStateTest.c for more details: |
| |
| |
| 6) Free all allocated blocks |
| |
| |
| Note |
| ---- |
| LzmaDecodeSize.c is size-optimized version of LzmaDecode.c. |
| But compiled code of LzmaDecodeSize.c can be larger than |
| compiled code of LzmaDecode.c. So it's better to use |
| LzmaDecode.c in most cases. |
| |
| |
| EXIT codes |
| ----------- |
| |
| LZMA decoder can return one of the following codes: |
| |
| #define LZMA_RESULT_OK 0 |
| #define LZMA_RESULT_DATA_ERROR 1 |
| |
| If you use callback function for input data and you return some |
| error code, LZMA Decoder also returns that code. |
| |
| |
| |
| LZMA Defines |
| ------------ |
| |
| _LZMA_IN_CB - Use callback for input data |
| |
| _LZMA_OUT_READ - Use read function for output data |
| |
| _LZMA_LOC_OPT - Enable local speed optimizations inside code. |
| _LZMA_LOC_OPT is only for LzmaDecodeSize.c (size-optimized version). |
| _LZMA_LOC_OPT doesn't affect LzmaDecode.c (speed-optimized version) |
| and LzmaStateDecode.c |
| |
| _LZMA_PROB32 - It can increase speed on some 32-bit CPUs, |
| but memory usage will be doubled in that case |
| |
| _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler |
| and long is 32-bit. |
| |
| _LZMA_SYSTEM_SIZE_T - Define it if you want to use system's size_t. |
| You can use it to enable 64-bit sizes supporting |
| |
| |
| |
| C++ LZMA Encoder/Decoder |
| ~~~~~~~~~~~~~~~~~~~~~~~~ |
| C++ LZMA code use COM-like interfaces. So if you want to use it, |
| you can study basics of COM/OLE. |
| |
| By default, LZMA Encoder contains all Match Finders. |
| But for compressing it's enough to have just one of them. |
| So for reducing size of compressing code you can define: |
| #define COMPRESS_MF_BT |
| #define COMPRESS_MF_BT4 |
| and it will use only bt4 match finder. |
| |
| |
| --- |
| |
| http://www.7-zip.org |
| http://www.7-zip.org/support.html |