Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 1 | LZMA SDK 4.65 |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 2 | ------------- |
| 3 | |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 4 | LZMA SDK provides the documentation, samples, header files, libraries, |
| 5 | and tools you need to develop applications that use LZMA compression. |
| 6 | |
| 7 | LZMA is default and general compression method of 7z format |
| 8 | in 7-Zip compression program (www.7-zip.org). LZMA provides high |
| 9 | compression ratio and very fast decompression. |
| 10 | |
| 11 | LZMA is an improved version of famous LZ77 compression algorithm. |
| 12 | It was improved in way of maximum increasing of compression ratio, |
| 13 | keeping high decompression speed and low memory requirements for |
| 14 | decompressing. |
| 15 | |
| 16 | |
| 17 | |
| 18 | LICENSE |
| 19 | ------- |
| 20 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 21 | LZMA SDK is written and placed in the public domain by Igor Pavlov. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 22 | |
| 23 | |
| 24 | LZMA SDK Contents |
| 25 | ----------------- |
| 26 | |
| 27 | LZMA SDK includes: |
| 28 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 29 | - ANSI-C/C++/C#/Java source code for LZMA compressing and decompressing |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 30 | - Compiled file->file LZMA compressing/decompressing program for Windows system |
| 31 | |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 32 | |
| 33 | UNIX/Linux version |
| 34 | ------------------ |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 35 | To compile C++ version of file->file LZMA encoding, go to directory |
| 36 | C++/7zip/Compress/LZMA_Alone |
| 37 | and call make to recompile it: |
| 38 | make -f makefile.gcc clean all |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 39 | |
| 40 | In some UNIX/Linux versions you must compile LZMA with static libraries. |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 41 | To compile with static libraries, you can use |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 42 | LIB = -lm -static |
| 43 | |
| 44 | |
| 45 | Files |
| 46 | --------------------- |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 47 | lzma.txt - LZMA SDK description (this file) |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 48 | 7zFormat.txt - 7z Format description |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 49 | 7zC.txt - 7z ANSI-C Decoder description |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 50 | methods.txt - Compression method IDs for .7z |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 51 | lzma.exe - Compiled file->file LZMA encoder/decoder for Windows |
| 52 | history.txt - history of the LZMA SDK |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 53 | |
| 54 | |
| 55 | Source code structure |
| 56 | --------------------- |
| 57 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 58 | C/ - C files |
| 59 | 7zCrc*.* - CRC code |
| 60 | Alloc.* - Memory allocation functions |
| 61 | Bra*.* - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code |
| 62 | LzFind.* - Match finder for LZ (LZMA) encoders |
| 63 | LzFindMt.* - Match finder for LZ (LZMA) encoders for multithreading encoding |
| 64 | LzHash.h - Additional file for LZ match finder |
| 65 | LzmaDec.* - LZMA decoding |
| 66 | LzmaEnc.* - LZMA encoding |
| 67 | LzmaLib.* - LZMA Library for DLL calling |
| 68 | Types.h - Basic types for another .c files |
| 69 | Threads.* - The code for multithreading. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 70 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 71 | LzmaLib - LZMA Library (.DLL for Windows) |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 72 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 73 | LzmaUtil - LZMA Utility (file->file LZMA encoder/decoder). |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 74 | |
| 75 | Archive - files related to archiving |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 76 | 7z - 7z ANSI-C Decoder |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 77 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 78 | CPP/ -- CPP files |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 79 | |
| 80 | Common - common files for C++ projects |
| 81 | Windows - common files for Windows related code |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 82 | |
| 83 | 7zip - files related to 7-Zip Project |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 84 | |
| 85 | Common - common files for 7-Zip |
| 86 | |
| 87 | Compress - files related to compression/decompression |
| 88 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 89 | Copy - Copy coder |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 90 | RangeCoder - Range Coder (special code of compression/decompression) |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 91 | LZMA - LZMA compression/decompression on C++ |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 92 | LZMA_Alone - file->file LZMA compression/decompression |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 93 | Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 94 | |
| 95 | Archive - files related to archiving |
| 96 | |
| 97 | Common - common files for archive handling |
| 98 | 7z - 7z C++ Encoder/Decoder |
| 99 | |
| 100 | Bundles - Modules that are bundles of other modules |
| 101 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 102 | Alone7z - 7zr.exe: Standalone version of 7z.exe that supports only 7z/LZMA/BCJ/BCJ2 |
| 103 | Format7zR - 7zr.dll: Reduced version of 7za.dll: extracting/compressing to 7z/LZMA/BCJ/BCJ2 |
| 104 | Format7zExtractR - 7zxr.dll: Reduced version of 7zxa.dll: extracting from 7z/LZMA/BCJ/BCJ2. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 105 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 106 | UI - User Interface files |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 107 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 108 | Client7z - Test application for 7za.dll, 7zr.dll, 7zxr.dll |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 109 | Common - Common UI files |
| 110 | Console - Code for console archiver |
| 111 | |
| 112 | |
| 113 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 114 | CS/ - C# files |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 115 | 7zip |
| 116 | Common - some common files for 7-Zip |
| 117 | Compress - files related to compression/decompression |
| 118 | LZ - files related to LZ (Lempel-Ziv) compression algorithm |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 119 | LZMA - LZMA compression/decompression |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 120 | LzmaAlone - file->file LZMA compression/decompression |
| 121 | RangeCoder - Range Coder (special code of compression/decompression) |
| 122 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 123 | Java/ - Java files |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 124 | SevenZip |
| 125 | Compression - files related to compression/decompression |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 126 | LZ - files related to LZ (Lempel-Ziv) compression algorithm |
| 127 | LZMA - LZMA compression/decompression |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 128 | RangeCoder - Range Coder (special code of compression/decompression) |
| 129 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 130 | |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 131 | C/C++ source code of LZMA SDK is part of 7-Zip project. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 132 | 7-Zip source code can be downloaded from 7-Zip's SourceForge page: |
| 133 | |
| 134 | http://sourceforge.net/projects/sevenzip/ |
| 135 | |
| 136 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 137 | |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 138 | LZMA features |
| 139 | ------------- |
| 140 | - Variable dictionary size (up to 1 GB) |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 141 | - Estimated compressing speed: about 2 MB/s on 2 GHz CPU |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 142 | - Estimated decompressing speed: |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 143 | - 20-30 MB/s on 2 GHz Core 2 or AMD Athlon 64 |
| 144 | - 1-2 MB/s on 200 MHz ARM, MIPS, PowerPC or other simple RISC |
| 145 | - Small memory requirements for decompressing (16 KB + DictionarySize) |
| 146 | - Small code size for decompressing: 5-8 KB |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 147 | |
| 148 | LZMA decoder uses only integer operations and can be |
| 149 | implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions). |
| 150 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 151 | Some critical operations that affect the speed of LZMA decompression: |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 152 | 1) 32*16 bit integer multiply |
| 153 | 2) Misspredicted branches (penalty mostly depends from pipeline length) |
| 154 | 3) 32-bit shift and arithmetic operations |
| 155 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 156 | The speed of LZMA decompressing mostly depends from CPU speed. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 157 | Memory speed has no big meaning. But if your CPU has small data cache, |
| 158 | overall weight of memory speed will slightly increase. |
| 159 | |
| 160 | |
| 161 | How To Use |
| 162 | ---------- |
| 163 | |
| 164 | Using LZMA encoder/decoder executable |
| 165 | -------------------------------------- |
| 166 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 167 | Usage: LZMA <e|d> inputFile outputFile [<switches>...] |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 168 | |
| 169 | e: encode file |
| 170 | |
| 171 | d: decode file |
| 172 | |
| 173 | b: Benchmark. There are two tests: compressing and decompressing |
| 174 | with LZMA method. Benchmark shows rating in MIPS (million |
| 175 | instructions per second). Rating value is calculated from |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 176 | measured speed and it is normalized with Intel's Core 2 results. |
| 177 | Also Benchmark checks possible hardware errors (RAM |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 178 | errors in most cases). Benchmark uses these settings: |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 179 | (-a1, -d21, -fb32, -mfbt4). You can change only -d parameter. |
| 180 | Also you can change the number of iterations. Example for 30 iterations: |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 181 | LZMA b 30 |
| 182 | Default number of iterations is 10. |
| 183 | |
| 184 | <Switches> |
| 185 | |
| 186 | |
| 187 | -a{N}: set compression mode 0 = fast, 1 = normal |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 188 | default: 1 (normal) |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 189 | |
| 190 | d{N}: Sets Dictionary size - [0, 30], default: 23 (8MB) |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 191 | The maximum value for dictionary size is 1 GB = 2^30 bytes. |
| 192 | Dictionary size is calculated as DictionarySize = 2^N bytes. |
| 193 | For decompressing file compressed by LZMA method with dictionary |
| 194 | size D = 2^N you need about D bytes of memory (RAM). |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 195 | |
| 196 | -fb{N}: set number of fast bytes - [5, 273], default: 128 |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 197 | Usually big number gives a little bit better compression ratio |
| 198 | and slower compression process. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 199 | |
| 200 | -lc{N}: set number of literal context bits - [0, 8], default: 3 |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 201 | Sometimes lc=4 gives gain for big files. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 202 | |
| 203 | -lp{N}: set number of literal pos bits - [0, 4], default: 0 |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 204 | lp switch is intended for periodical data when period is |
| 205 | equal 2^N. For example, for 32-bit (4 bytes) |
| 206 | periodical data you can use lp=2. Often it's better to set lc0, |
| 207 | if you change lp switch. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 208 | |
| 209 | -pb{N}: set number of pos bits - [0, 4], default: 2 |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 210 | pb switch is intended for periodical data |
| 211 | when period is equal 2^N. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 212 | |
| 213 | -mf{MF_ID}: set Match Finder. Default: bt4. |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 214 | Algorithms from hc* group doesn't provide good compression |
| 215 | ratio, but they often works pretty fast in combination with |
| 216 | fast mode (-a0). |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 217 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 218 | Memory requirements depend from dictionary size |
| 219 | (parameter "d" in table below). |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 220 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 221 | MF_ID Memory Description |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 222 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 223 | bt2 d * 9.5 + 4MB Binary Tree with 2 bytes hashing. |
| 224 | bt3 d * 11.5 + 4MB Binary Tree with 3 bytes hashing. |
| 225 | bt4 d * 11.5 + 4MB Binary Tree with 4 bytes hashing. |
| 226 | hc4 d * 7.5 + 4MB Hash Chain with 4 bytes hashing. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 227 | |
| 228 | -eos: write End Of Stream marker. By default LZMA doesn't write |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 229 | eos marker, since LZMA decoder knows uncompressed size |
| 230 | stored in .lzma file header. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 231 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 232 | -si: Read data from stdin (it will write End Of Stream marker). |
| 233 | -so: Write data to stdout |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 234 | |
| 235 | |
| 236 | Examples: |
| 237 | |
| 238 | 1) LZMA e file.bin file.lzma -d16 -lc0 |
| 239 | |
| 240 | compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K) |
| 241 | and 0 literal context bits. -lc0 allows to reduce memory requirements |
| 242 | for decompression. |
| 243 | |
| 244 | |
| 245 | 2) LZMA e file.bin file.lzma -lc0 -lp2 |
| 246 | |
| 247 | compresses file.bin to file.lzma with settings suitable |
| 248 | for 32-bit periodical data (for example, ARM or MIPS code). |
| 249 | |
| 250 | 3) LZMA d file.lzma file.bin |
| 251 | |
| 252 | decompresses file.lzma to file.bin. |
| 253 | |
| 254 | |
| 255 | Compression ratio hints |
| 256 | ----------------------- |
| 257 | |
| 258 | Recommendations |
| 259 | --------------- |
| 260 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 261 | To increase the compression ratio for LZMA compressing it's desirable |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 262 | to have aligned data (if it's possible) and also it's desirable to locate |
| 263 | data in such order, where code is grouped in one place and data is |
| 264 | grouped in other place (it's better than such mixing: code, data, code, |
| 265 | data, ...). |
| 266 | |
| 267 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 268 | Filters |
| 269 | ------- |
| 270 | You can increase the compression ratio for some data types, using |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 271 | special filters before compressing. For example, it's possible to |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 272 | increase the compression ratio on 5-10% for code for those CPU ISAs: |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 273 | x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC. |
| 274 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 275 | You can find C source code of such filters in C/Bra*.* files |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 276 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 277 | You can check the compression ratio gain of these filters with such |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 278 | 7-Zip commands (example for ARM code): |
| 279 | No filter: |
| 280 | 7z a a1.7z a.bin -m0=lzma |
| 281 | |
| 282 | With filter for little-endian ARM code: |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 283 | 7z a a2.7z a.bin -m0=arm -m1=lzma |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 284 | |
| 285 | It works in such manner: |
| 286 | Compressing = Filter_encoding + LZMA_encoding |
| 287 | Decompressing = LZMA_decoding + Filter_decoding |
| 288 | |
| 289 | Compressing and decompressing speed of such filters is very high, |
| 290 | so it will not increase decompressing time too much. |
| 291 | Moreover, it reduces decompression time for LZMA_decoding, |
| 292 | since compression ratio with filtering is higher. |
| 293 | |
| 294 | These filters convert CALL (calling procedure) instructions |
| 295 | from relative offsets to absolute addresses, so such data becomes more |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 296 | compressible. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 297 | |
| 298 | For some ISAs (for example, for MIPS) it's impossible to get gain from such filter. |
| 299 | |
| 300 | |
| 301 | LZMA compressed file format |
| 302 | --------------------------- |
| 303 | Offset Size Description |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 304 | 0 1 Special LZMA properties (lc,lp, pb in encoded form) |
| 305 | 1 4 Dictionary size (little endian) |
| 306 | 5 8 Uncompressed size (little endian). -1 means unknown size |
| 307 | 13 Compressed data |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 308 | |
| 309 | |
| 310 | ANSI-C LZMA Decoder |
| 311 | ~~~~~~~~~~~~~~~~~~~ |
| 312 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 313 | Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58. |
| 314 | If you want to use old interfaces you can download previous version of LZMA SDK |
| 315 | from sourceforge.net site. |
| 316 | |
| 317 | To use ANSI-C LZMA Decoder you need the following files: |
| 318 | 1) LzmaDec.h + LzmaDec.c + Types.h |
| 319 | LzmaUtil/LzmaUtil.c is example application that uses these files. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 320 | |
| 321 | |
| 322 | Memory requirements for LZMA decoding |
| 323 | ------------------------------------- |
| 324 | |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 325 | Stack usage of LZMA decoding function for local variables is not |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 326 | larger than 200-400 bytes. |
| 327 | |
| 328 | LZMA Decoder uses dictionary buffer and internal state structure. |
| 329 | Internal state structure consumes |
| 330 | state_size = (4 + (1.5 << (lc + lp))) KB |
| 331 | by default (lc=3, lp=0), state_size = 16 KB. |
| 332 | |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 333 | |
| 334 | How To decompress data |
| 335 | ---------------------- |
| 336 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 337 | LZMA Decoder (ANSI-C version) now supports 2 interfaces: |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 338 | 1) Single-call Decompressing |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 339 | 2) Multi-call State Decompressing (zlib-like interface) |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 340 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 341 | You must use external allocator: |
| 342 | Example: |
| 343 | void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); } |
| 344 | void SzFree(void *p, void *address) { p = p; free(address); } |
| 345 | ISzAlloc alloc = { SzAlloc, SzFree }; |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 346 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 347 | You can use p = p; operator to disable compiler warnings. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 348 | |
| 349 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 350 | Single-call Decompressing |
| 351 | ------------------------- |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 352 | When to use: RAM->RAM decompressing |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 353 | Compile files: LzmaDec.h + LzmaDec.c + Types.h |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 354 | Compile defines: no defines |
| 355 | Memory Requirements: |
| 356 | - Input buffer: compressed size |
| 357 | - Output buffer: uncompressed size |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 358 | - LZMA Internal Structures: state_size (16 KB for default settings) |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 359 | |
| 360 | Interface: |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 361 | int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen, |
| 362 | const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, |
| 363 | ELzmaStatus *status, ISzAlloc *alloc); |
| 364 | In: |
| 365 | dest - output data |
| 366 | destLen - output data size |
| 367 | src - input data |
| 368 | srcLen - input data size |
| 369 | propData - LZMA properties (5 bytes) |
| 370 | propSize - size of propData buffer (5 bytes) |
| 371 | finishMode - It has meaning only if the decoding reaches output limit (*destLen). |
| 372 | LZMA_FINISH_ANY - Decode just destLen bytes. |
| 373 | LZMA_FINISH_END - Stream must be finished after (*destLen). |
| 374 | You can use LZMA_FINISH_END, when you know that |
| 375 | current output buffer covers last bytes of stream. |
| 376 | alloc - Memory allocator. |
| 377 | |
| 378 | Out: |
| 379 | destLen - processed output size |
| 380 | srcLen - processed input size |
| 381 | |
| 382 | Output: |
| 383 | SZ_OK |
| 384 | status: |
| 385 | LZMA_STATUS_FINISHED_WITH_MARK |
| 386 | LZMA_STATUS_NOT_FINISHED |
| 387 | LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK |
| 388 | SZ_ERROR_DATA - Data error |
| 389 | SZ_ERROR_MEM - Memory allocation error |
| 390 | SZ_ERROR_UNSUPPORTED - Unsupported properties |
| 391 | SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src). |
| 392 | |
| 393 | If LZMA decoder sees end_marker before reaching output limit, it returns OK result, |
| 394 | and output value of destLen will be less than output buffer size limit. |
| 395 | |
| 396 | You can use multiple checks to test data integrity after full decompression: |
| 397 | 1) Check Result and "status" variable. |
| 398 | 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize. |
| 399 | 3) Check that output(srcLen) = compressedSize, if you know real compressedSize. |
| 400 | You must use correct finish mode in that case. */ |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 401 | |
| 402 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 403 | Multi-call State Decompressing (zlib-like interface) |
| 404 | ---------------------------------------------------- |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 405 | |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 406 | When to use: file->file decompressing |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 407 | Compile files: LzmaDec.h + LzmaDec.c + Types.h |
| 408 | |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 409 | Memory Requirements: |
| 410 | - Buffer for input stream: any size (for example, 16 KB) |
| 411 | - Buffer for output stream: any size (for example, 16 KB) |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 412 | - LZMA Internal Structures: state_size (16 KB for default settings) |
| 413 | - LZMA dictionary (dictionary size is encoded in LZMA properties header) |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 414 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 415 | 1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header: |
| 416 | unsigned char header[LZMA_PROPS_SIZE + 8]; |
| 417 | ReadFile(inFile, header, sizeof(header) |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 418 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 419 | 2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 420 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 421 | CLzmaDec state; |
| 422 | LzmaDec_Constr(&state); |
| 423 | res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc); |
| 424 | if (res != SZ_OK) |
| 425 | return res; |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 426 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 427 | 3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop |
| 428 | |
| 429 | LzmaDec_Init(&state); |
| 430 | for (;;) |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 431 | { |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 432 | ... |
| 433 | int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, |
| 434 | const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode); |
| 435 | ... |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 436 | } |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 437 | |
| 438 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 439 | 4) Free all allocated structures |
| 440 | LzmaDec_Free(&state, &g_Alloc); |
| 441 | |
| 442 | For full code example, look at C/LzmaUtil/LzmaUtil.c code. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 443 | |
| 444 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 445 | How To compress data |
| 446 | -------------------- |
| 447 | |
| 448 | Compile files: LzmaEnc.h + LzmaEnc.c + Types.h + |
| 449 | LzFind.c + LzFind.h + LzFindMt.c + LzFindMt.h + LzHash.h |
| 450 | |
| 451 | Memory Requirements: |
| 452 | - (dictSize * 11.5 + 6 MB) + state_size |
| 453 | |
| 454 | Lzma Encoder can use two memory allocators: |
| 455 | 1) alloc - for small arrays. |
| 456 | 2) allocBig - for big arrays. |
| 457 | |
| 458 | For example, you can use Large RAM Pages (2 MB) in allocBig allocator for |
| 459 | better compression speed. Note that Windows has bad implementation for |
| 460 | Large RAM Pages. |
| 461 | It's OK to use same allocator for alloc and allocBig. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 462 | |
| 463 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 464 | Single-call Compression with callbacks |
| 465 | -------------------------------------- |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 466 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 467 | Check C/LzmaUtil/LzmaUtil.c as example, |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 468 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 469 | When to use: file->file decompressing |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 470 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 471 | 1) you must implement callback structures for interfaces: |
| 472 | ISeqInStream |
| 473 | ISeqOutStream |
| 474 | ICompressProgress |
| 475 | ISzAlloc |
| 476 | |
| 477 | static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); } |
| 478 | static void SzFree(void *p, void *address) { p = p; MyFree(address); } |
| 479 | static ISzAlloc g_Alloc = { SzAlloc, SzFree }; |
| 480 | |
| 481 | CFileSeqInStream inStream; |
| 482 | CFileSeqOutStream outStream; |
| 483 | |
| 484 | inStream.funcTable.Read = MyRead; |
| 485 | inStream.file = inFile; |
| 486 | outStream.funcTable.Write = MyWrite; |
| 487 | outStream.file = outFile; |
| 488 | |
| 489 | |
| 490 | 2) Create CLzmaEncHandle object; |
| 491 | |
| 492 | CLzmaEncHandle enc; |
| 493 | |
| 494 | enc = LzmaEnc_Create(&g_Alloc); |
| 495 | if (enc == 0) |
| 496 | return SZ_ERROR_MEM; |
| 497 | |
| 498 | |
| 499 | 3) initialize CLzmaEncProps properties; |
| 500 | |
| 501 | LzmaEncProps_Init(&props); |
| 502 | |
| 503 | Then you can change some properties in that structure. |
| 504 | |
| 505 | 4) Send LZMA properties to LZMA Encoder |
| 506 | |
| 507 | res = LzmaEnc_SetProps(enc, &props); |
| 508 | |
| 509 | 5) Write encoded properties to header |
| 510 | |
| 511 | Byte header[LZMA_PROPS_SIZE + 8]; |
| 512 | size_t headerSize = LZMA_PROPS_SIZE; |
| 513 | UInt64 fileSize; |
| 514 | int i; |
| 515 | |
| 516 | res = LzmaEnc_WriteProperties(enc, header, &headerSize); |
| 517 | fileSize = MyGetFileLength(inFile); |
| 518 | for (i = 0; i < 8; i++) |
| 519 | header[headerSize++] = (Byte)(fileSize >> (8 * i)); |
| 520 | MyWriteFileAndCheck(outFile, header, headerSize) |
| 521 | |
| 522 | 6) Call encoding function: |
| 523 | res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, |
| 524 | NULL, &g_Alloc, &g_Alloc); |
| 525 | |
| 526 | 7) Destroy LZMA Encoder Object |
| 527 | LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc); |
| 528 | |
| 529 | |
| 530 | If callback function return some error code, LzmaEnc_Encode also returns that code. |
| 531 | |
| 532 | |
| 533 | Single-call RAM->RAM Compression |
| 534 | -------------------------------- |
| 535 | |
| 536 | Single-call RAM->RAM Compression is similar to Compression with callbacks, |
| 537 | but you provide pointers to buffers instead of pointers to stream callbacks: |
| 538 | |
| 539 | HRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen, |
| 540 | CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, |
| 541 | ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig); |
| 542 | |
| 543 | Return code: |
| 544 | SZ_OK - OK |
| 545 | SZ_ERROR_MEM - Memory allocation error |
| 546 | SZ_ERROR_PARAM - Incorrect paramater |
| 547 | SZ_ERROR_OUTPUT_EOF - output buffer overflow |
| 548 | SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version) |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 549 | |
| 550 | |
| 551 | |
| 552 | LZMA Defines |
| 553 | ------------ |
| 554 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 555 | _LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 556 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 557 | _LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for |
| 558 | some structures will be doubled in that case. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 559 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 560 | _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 561 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 562 | _LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 563 | |
| 564 | |
| 565 | C++ LZMA Encoder/Decoder |
| 566 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 567 | C++ LZMA code use COM-like interfaces. So if you want to use it, |
| 568 | you can study basics of COM/OLE. |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 569 | C++ LZMA code is just wrapper over ANSI-C code. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 570 | |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 571 | |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 572 | C++ Notes |
| 573 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 574 | If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling), |
| 575 | you must check that you correctly work with "new" operator. |
| 576 | 7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator. |
| 577 | So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator: |
| 578 | operator new(size_t size) |
| 579 | { |
| 580 | void *p = ::malloc(size); |
| 581 | if (p == 0) |
| 582 | throw CNewException(); |
| 583 | return p; |
| 584 | } |
| 585 | If you use MSCV that throws exception for "new" operator, you can compile without |
| 586 | "NewHandler.cpp". So standard exception will be used. Actually some code of |
| 587 | 7-Zip catches any exception in internal code and converts it to HRESULT code. |
| 588 | So you don't need to catch CNewException, if you call COM interfaces of 7-Zip. |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 589 | |
| 590 | --- |
| 591 | |
| 592 | http://www.7-zip.org |
Luigi 'Comio' Mantellini | caf72ff | 2009-07-21 10:45:49 +0200 | [diff] [blame] | 593 | http://www.7-zip.org/sdk.html |
Luigi 'Comio' Mantellini | fc9c172 | 2008-09-08 02:46:13 +0200 | [diff] [blame] | 594 | http://www.7-zip.org/support.html |