7-Zip command lines (Split from code's DX runtimes)

Questions or comments on the switchless installers? Want to create a new one? Talk about it here.
Post Reply
OuTman
Posts: 171
Joined: Wed Jul 05, 2006 6:40 pm

7-Zip command lines (Split from code's DX runtimes)

Post by OuTman » Mon Sep 15, 2008 6:44 pm

code65536 wrote:I have posted (see first post) the script that I use to generate these installers (which includes the exact 7zip parameters used to get the nice compression ratio), so anybody is free to update this package in October in case I'm too busy to...
wow that's an insane 7z compression command-line!
set sfx7zmethod=-m0=BCJ2 -m1=LZMA:d27:fb=128:mc=256 -m2=LZMA:d24:fb=128:mc=256 -m3=LZMA:d24:fb=128:mc=256 -mb0:1 -mb0s1:2 -mb0s2:3
-m2 and -m3 are the same, is it normal?

User avatar
code65536
Posts: 735
Joined: Wed Mar 14, 2007 2:58 pm
Location: .us
Contact:

Post by code65536 » Mon Sep 15, 2008 6:53 pm

OuTman wrote:
set sfx7zmethod=-m0=BCJ2 -m1=LZMA:d27:fb=128:mc=256 -m2=LZMA:d24:fb=128:mc=256 -m3=LZMA:d24:fb=128:mc=256 -mb0:1 -mb0s1:2 -mb0s2:3
-m2 and -m3 are the same, is it normal?
Yes, it is. Did you have any reason to think otherwise? Please see the 7-Zip command line documentation for details.

(short version: The BCJ2 filter optimizes compression for executables by splitting the data into a main stream, a stream for calls, and a stream for jumps, and improving repetition by changing short relative addresses to absolute ones. m1-3 then specifies the compression for the main, call, and jump streams.)
My addons: CmdOpen - HashCheck - Notepad2 - MS Runtimes - DirectX

Into the breach, meatbags!

OuTman
Posts: 171
Joined: Wed Jul 05, 2006 6:40 pm

Post by OuTman » Mon Sep 15, 2008 7:07 pm

okay, thank you for the explanations :wink: 7z CLI is quite hard to understand... (but is powerful!)

User avatar
RogueSpear
Posts: 1155
Joined: Tue Nov 23, 2004 9:50 pm
Location: Buffalo, NY

Post by RogueSpear » Mon Sep 15, 2008 9:29 pm

code65536 wrote:Please see the 7-Zip command line documentation for details.
Actually when I saw your 7-Zip sequence I was inspired to do just that. Ended up spending the better part of the day going over that documentation and then looking things up I didn't fully understand at various web sites, most of which I never knew existed. Kind of bummed out I didn't take that pilgrimage years ago.

User avatar
code65536
Posts: 735
Joined: Wed Mar 14, 2007 2:58 pm
Location: .us
Contact:

Post by code65536 » Mon Sep 15, 2008 10:43 pm

(This is going way off-topic; if a mod could split out the recent compression-related posts into a new thread, that'd be great!)

Hehe. I've long been fascinated by compression algorithms ever since studying the original Lempel-Ziv algorithm some years ago.

The key here is the dictionary, which, in a simplified nutshell, stores patterns that have been encountered. You encounter pattern A, compress/encode it, and then the next time you see pattern A, since it's already in the dictionary, you can just use that. With a small dictionary, what happens is that you'll encounter pattern A, then pattern B, C, D, etc., and by the time you encounter pattern A again, it's been pushed out of the dictionary so that you can't re-use it, and thus you take a hit on your size. All compression algorithms basically work on eliminating repetition and patterns, so being able to recognize them is vital, and for dictionary-based algorithms (which comprise most mainstream general-purpose algorithms), a small dictionary forces you to forget what you saw earlier, thus hurting your ability to recognize those patterns and repetitions; small dictionary == amnesia.

Solid compression is also important because without it, you are using a separate dictionary for every file. Compress file 1, reset dictionary, compress file 2, reset dictionary, etc. In other words, regular non-solid compression == amnesia. With solid compression, you treat all the files as one big file and never reset your dictionary. The downside is that it's harder to get to a file in the middle or end of the archive, as it means that you must decompress everything before it to reconstruct the dictionary, and it also means that any damage to the archive will affect every file beyond the point of damage. The first problem is not an issue if you are extracting the whole archive anyway, and the second issue is really only a problem if you are using it for archival backup and can be mitigated with Reed-Solomon (WinRAR's recovery data, or something external, like PAR2 files). Solid archiving is very, very important if you have lots of files that are similar, as is the case for DirectX runtimes (since you have a dozen or so versions of what is basically the same DLL). For example, when I was compressing a few versions of a DVD drive's firmware some years ago, the first file in the archive was compressed to about 40% or so of the original size, but every subsequent file was compressed to less than 0.1% of their original size, since they were virtually duplicates of the first file (with only some minor differences). Of course, with a bunch of diverse files, solid archiving won't help as much; the more similar the files are, the more important solid archiving becomes.

The dictionary is what really makes 7-Zip's LZMA so powerful. DEFLATE (used for Zip) uses a 32KB dictionary, and coupled with the lack of solid archiving, Zip has the compression equivalent of Alzheimer's. WinRAR's maximum dictionary size is 4MB. LZMA's maximum dictionary size is 4GB (though in practice, anything beyond 128MB is pretty unwieldy and 64MB is the most you can select from the GUI; plus, it doesn't make sense to use a dictionary larger than the size of your uncompressed data, and 7-Zip will automatically adjust the dictionary size down if the uncompressed data isn't that big). Take away the dictionary size and the solid compression, and LZMA loses a lot of its edge.

In the case of the DirectX runtimes, because of their repetitive nature and the large size of the uncompressed data, a large dictionary and solid compression really shines here, much more so than it would in other scenarios. My command line parameters set the main stream dictionary to 128MB, and the other much smaller streams to 16MB. For the 32-bit runtimes, where the total uncompressed size is less than 64MB, this isn't any different than the Ultra setting in the GUI, and 7-Zip will even reduce the dictionary down to 64MB. For the 64-bit runtimes, the extra dictionary size has only a modest effect because 64MB is already pretty big, and also because the data with the most similarity are already placed close to each other (by the ordering of the files). The other parameters (fast bytes and search cycles) basically makes the CPU work harder and look more closely when searching for patterns, but their effect is also somewhat limited. In all, these particular parameters are only slightly better than Ultra from the GUI, but I much rather prefer running a script than having to wade through a GUI (just as I prefer unattended installs over wading through installs manually).

Oh, and another caveat: Ultra from the GUI will apply the BCJ2 filter to anything that it thinks is an executable. This command line will apply it to everything. Which means that this command line should not be used if you are doing an archive with lots of non-executable code (in this case, the only non-executable is a tiny INF file, so it's okay). This also means that this command line will perform much better with files that don't get auto-detected (for example, *.ax files are executable, but are not recognized as such by 7-Zip, so an archive with a bunch of .ax files will do noticeably better with these parameters than with GUI-Ultra). If you want to use a 7z CLI for unattended compression but would prefer that 7-Zip auto-select BCJ2 for appropriate files, use -m0=LZMA:d27:fb=128:mc=256 (which is also much shorter than that big long line; for files that do get BCJ2'ed, it will just use the default settings for the minor streams).
My addons: CmdOpen - HashCheck - Notepad2 - MS Runtimes - DirectX

Into the breach, meatbags!

User avatar
RogueSpear
Posts: 1155
Joined: Tue Nov 23, 2004 9:50 pm
Location: Buffalo, NY

Post by RogueSpear » Mon Sep 15, 2008 11:06 pm

code65536 wrote:LZMA's maximum dictionary size is 4GB (though in practice, anything beyond 128MB is pretty unwieldy and 64MB is the most you can select from the GUI
It used to be that you could select up to 192MB. Up until v4.32 I believe. It's funny I never thought I would run into someone who was as interested as myself in compression - other than someone who writes compression software that is. Ever since I first found PKZip back around the time of DOS 3.3 or so, I've loved the idea of getting something for nothing (kinda).

On a related, though totally different, topic - I've spent the last couple of months reading up on and experimenting with everything that is h.264 compression. Now that I have a PS3 and a NAS for my video, I suddenly have a reason to care I suppose. It also has my wife a little upset as I'm always running an encode on her SSE4 capable laptop :P

User avatar
code65536
Posts: 735
Joined: Wed Mar 14, 2007 2:58 pm
Location: .us
Contact:

Post by code65536 » Mon Sep 15, 2008 11:49 pm

RogueSpear wrote:Ever since I first found PKZip back around the time of DOS 3.3 or so, I've loved the idea of getting something for nothing (kinda).
Ah, those were the days... I remember being so excited when I first used PKZip. And to be fair to zip's DEFLATE algorithm, a small dictionary does have one big advantage: less memory usage when compressing and decompressing. The 32KB dictionary for DEFLATE allowed zip to survive in its era. You'll need about 64MB of RAM to decompress a LZMA archive that uses a 64MB dictionary, which is just fine for XP-era machines, but would've killed my 386.
My addons: CmdOpen - HashCheck - Notepad2 - MS Runtimes - DirectX

Into the breach, meatbags!

User avatar
RogueSpear
Posts: 1155
Joined: Tue Nov 23, 2004 9:50 pm
Location: Buffalo, NY

Post by RogueSpear » Sat Sep 27, 2008 12:49 am

I got a little inspired and wrote a vbscript for using these command line options. The basic premise is to create a shortcut to the script in your Send To subdirectory. I've been using it for the last couple of weeks and haven't had any problems yet, so I decided to post it for public consumption.

I'm considering modifying / adding to it so that you can define multiple sets of options, the script will compress whatever you select using all of the options and then keep the smallest file. Kind of like the UPX option --ultra-brute. And the other thing I want to add is a test after compress option like WinRAR has. Man I've been wanting that option in 7-Zip for the longest time now.

cybpsych
Posts: 421
Joined: Wed Jan 12, 2005 2:33 am

Post by cybpsych » Sun Sep 28, 2008 5:33 am

here's an interesting util that optimizes 7z archives ... something similar to Rogue's idea above ...

http://maxcompress.narod.ru/ultra7z.htm

Discussion thread: http://www.encode.ru/forum/showthread.php?t=15

Lots of experimental archivers (especially PAQ) are available here ...

cybpsych
Posts: 421
Joined: Wed Jan 12, 2005 2:33 am

Post by cybpsych » Mon Dec 15, 2008 6:53 am

-removed-

doing more tests :)

User avatar
dumpydooby
Posts: 530
Joined: Sun Jan 15, 2006 6:09 am

Post by dumpydooby » Mon Jan 26, 2009 6:34 pm

I've always used "-m0=lzma -mx=9 -mfb=273 -md=128m -ms=on" and that works fine for me. I tried out your method on a DirectX10 addon that I'm making for myself. Then I tried mine. Yours weighed in a little less than 1 kilobyte lighter than mine.

I'll do some more comparisons. I know when I wrote that one above, it was after I had spent a good deal of time perusing the 7zip documentation. I had decided that was how I wanted to do it. I have no idea what any of these signify now, though.

x-force
Posts: 31
Joined: Tue Feb 19, 2008 4:29 pm

Post by x-force » Sun Mar 15, 2009 8:49 am

@code65536 Thanks for information

Some test results;

Test PC Config : AMD X2 Athlon CPU - 2GB 800Mhz Ram
File Info : Dotnet AIO Files , 1329 Files - 271 Directory , Total : 292.671.866 byte

Test1 Parameters / Multithread OFF / Running 4 thread
set metod=-m0=BCJ2 -m1=LZMA:d27:fb=128:mc=256 -m2=LZMA:d24:fb=128:mc=256 -m3=LZMA:d24:fb=128:mc=256 -mb0:1 -mb0s1:2 -mb0s2:3
..\7za a -t7z -mx=9 -mmt=off %metod% -r "..\test1.7z" *
Run1 4min 03sec 45.766.078 byte

Run2 3min 58sec 45.766.078 byte


Test2 Parameters / Multithread ON / Running 10 thread
set metod=-m0=BCJ2 -m1=LZMA:d27:fb=128:mc=256 -m2=LZMA:d24:fb=128:mc=256 -m3=LZMA:d24:fb=128:mc=256 -mb0:1 -mb0s1:2 -mb0s2:3
7za a -t7z -mx=9 -mmt %metod% -r "..\test2.7z" *
Run1 2min 36sec 45.762.039 byte

Run2 2min 32sec 45.762.039 byte


7-Zip command line documentation for details MMT Parameter;
mt=[off | on | {N}]
Sets multithread mode. If you have a multiprocessor or multicore system, you can get a speed increase with this switch. This option affects only compression (with any method) and decompression of BZip2 streams. Each thread in the multithread mode uses 32 MB of RAM for buffering. If you specify {N}, 7-Zip tries to use N threads.

User avatar
UtCollector
Posts: 464
Joined: Sun Apr 09, 2006 8:31 pm
Contact:

Post by UtCollector » Wed Apr 15, 2009 9:12 pm

Well it works nice. I made it into a small addon. For send to use.
Download here:
Release here: http://www.ryanvm.net/forum/viewtopic.php?p=91939#91939

Post Reply