Back to General and Gameplay Programming

MP3-Beating Compression

kieren_j · 2000-05-06T13:18:55

You probably don''t believe me, but if you''re at all interested in my new "CAR" compression alogrithm, check this out: The strange thing is, it works better on compressed files! Zipping an MP3 file gives you 99% of original, but check this out! **** TESTS ON UNCOMPRESSED FILES **** TXT File Example TXT File: 1,318,671 Savings: 1,308,940 CAR File: 9,731 Percent: 0.7% WAV File Example WAV File: 8,362,354 Savings: 8,323,477 CAR File: 38,877 Percent: 0.5% EXE File Example EXE File: 216,064 Savings: 213,336 CAR File: 2,728 Percent: 1.3% **** TESTS ON ALREADY-COMPRESSED FILES **** MP3 File Example MP3 File: 4,961,773 Savings: 4,945,669 CAR File: 16,104 Percent: 0.3% MPG File Example MPG File: 5,976,068 Savings: 5,946,909 CAR File: 29,159 Percent: 0.5% If you didn''t see it first time, I compressed an MP3 file from 5 meg to 16kb. What CAR actually does is obviously a complete secret, but I''m really really excited about it! I''ve been thinking of how to do it for years - but now, yay! (I figured it out playing around in QB, of all things!). What I want to know is basically are there any sites that are relatively easy to understand that tell you how to do: Huffman Compression LZW Compression "Textbook" RLE Compression (I only know PCX''s RLE) I know that you use binary trees and nodes and so on but I have no idea for a software implementation! Anyways you probably don''t believe me, but I just wanna try to make the compression better. Thanks from a very very excited Kieren Johnstone --------------- kieren_j

General and Gameplay Programming Programming

Started by kieren_j April 06, 2000 01:58 PM

494 comments, last by kieren_j 24 years, 8 months ago

kieren_j

Author

100

April 15, 2000 04:21 AM

(and decompress it!)

LackOfKnack

122

April 15, 2000 05:31 AM

Pariah: At best, that was very immature. And no, I wasn't the original poster.

AIG: The burden of proof may normally be on me, but not in this case because I am merely defending kieren_j. The burden's in him.

Now why couldn't that work? You have a header plus a little info. You insert certain sequences into the code using the header info, you have a bigger file now. Using the header from that, you insert yet more (and possibly bigger) sequences as appropriate until you have another file, and so on. It's almost like having a random seed but more complex and in reverse.

Zipster: I'm sorry, I must've missed that. I only read the first three pages once, and that was back when I started posting.

ga: you seemed to be talking about regular compression where there's more than one pattern to keep track of, and because of this, if you index all of a random file, you save no space.

Jesse: I might've been, I just saw a lot of that going on. Not you specifically, though.

Ridcully: The earth is a disc! What are you talking about?

AP: Ha ha. Someone already took your joke. It's over.

kieren: Finally, thanks.

On window size: say you have 100 megs of completely random, evenly distributed chars. Now, each one is 8 bits. You may find no pattern in that frame. But if you take it down to 3 or 4 bits, you automatically have patterns within each byte, and patterns in the overlap to other bytes, right? Of if you crank it up to say 8,000,000 bits, somewhere there will be a repeat of an 1 meg sequence (this may be stretching it but somewhere there's a happy medium, depending on file size and the data within.) If you replace those (at least two megs) with one bit each, you've saved 2 megs minus two bits. Then you add 98 bits for the rest of the sequences that are'nt the same. If you now tack that one meg sequence onto the header, you still have a 99 meg + 100 bit file that can be restored perfectly. If you then run it at a different byte level, you can get even more compression. And/or bit reordering and/or bit masking.

Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!

Edited by - LackOfKnack on 4/15/00 6:00:28 AM

Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!

MENTAL

384

April 15, 2000 08:26 AM

Will this topic ever end.

How, I have no idea how compression algorithms work (i''ve got a rough idea), what the hell a window size is (a window to me means something with 3 little icons in the top-right hand corner, and so on, but I do know this: This is getting BORING. The bloody thread has been CLOSED, stop posting! Wait until keiren posts a demo, but until then, leave IT!

geez

MENTAL

This had better make 8 pages

MENTAL

384

April 15, 2000 08:27 AM

(damn)

GeniX

114

April 15, 2000 08:34 AM

Disputing my reply at the top of page 7.

Lack tells me a change in byte size will change the data in some way as to make it compressable. Once random, always random until you sort it (which doesnt allow "decompression").

Furthermore, Lack misquoted me by quoting my line of compressing terrabytes to one disk. I was merely showing what _could_ be done if Kieren_J could compress any file, which we all seem to have agreed cannot be done.

I am probably going out on a limb if I generalise and say everyone is happy that pure random data cannot be compressed as patterns cannot be identified at all.
None-the-less I will make that generalisation.

Now, whats left is to decide if his (Kieren_j''s) claims of compressing zip, mp3 and whichever other files he has tested... are correct.

On these I cannot comment. I do not know internal file formats or data representation within zip''s or mp3''s.
What I do know is that youre not very likely to find any patterns within a zip file because zip''s algorithms eliminate regular patterns and replace with lookups (am i right?)

MP3 on the other hand, has the potential to be random, as there should be no way to predict what data could come next. However, (not knowing mp3 format) mp3''s compressed data is a way of breaking a sound into multiple frequencies and storing frequency data. I may be wrong.
A lot of frequencies, and patterns therein would probably occur many times within a song... thus I am assuming that an mp3 must have some repeated data (if the input were some average song).

Anyone care to elaborate?

regards,

GeniX

regards,GeniXwww.cryo-genix.net

An Irritable Gent

122

April 15, 2000 09:41 AM

Lack, you''ll never find a sequence in random data that will shrink the file. Do you know what the odds are of finding a 1meg sequence repeated even once in a 100meg file, just so you can shrink the orig file by just 1meg? I know you were just giving an example, but it is extremely unrealistic. In reality, you won''t find any sequences that will make up for the 1 bits inserted for the other sequences.

But since you''re still hanging on, I suggest you stop trying to prove it to us and prove it to yourself. You understand the algorithm, so code it up. It shouldn''t take more than a couple of hours. Try actually testing a running example rather than theorizing. That''s the only way you''ll get it.

I think it''s time for the thread to stop until someone has a working demo that others can use (even if in a CGI script to keep the program from being disassembled.)

aig

aig

Ridcully

122

April 15, 2000 09:42 AM

right, this is really getting boring.
i''d say leave it and i will do so.
last post in this thread, and i mean it. (at least my last one)

(will i get by any chance on top of page 8?

)

Kertropp

122

April 15, 2000 11:46 AM

quote:
Q. It''s impossible to compress random data.
A. HOAXcompressor doesn''t compress random data. It compresses *any* data. What the hell would you do with a random data compressor?

The guy has a point here ....

-kertropp

C:\Projects\rg_clue\ph_opt.c(185) : error C3142: 'PushAll' :bad idea
C:\Projects\rg_clue\ph_opt.c(207) : error C324: 'TryCnt': missing point

-kertropp C:Projectsrg_clueph_opt.c(185) : error C3142: 'PushAll' :bad ideaC:Projectsrg_clueph_opt.c(207) : error C324: 'TryCnt': missing point

126

April 15, 2000 01:46 PM

Lack, AIG showed that it wouldn't be useful if you index only one pattern. Each n bit pattern would have a frequency of 1/2^n and if you substitute one of them with one bit, you could save (1/2^n of the file - 1/(n*2^n) of the file bits). But you need one additional bit every time an other pattern occurs so you'd need filesize(bits)/n - filesize/(n*2^n) additional bits. And you have to store the "most frequent" pattern in the header and, if you work with variable pattern sizes, you have to store the length of the pattern in the header. Let's say the file is m bits long.
You save m/2^n - m/(n*2^n) bits, but you need m/n - (m/n*2^n) additional bits + >=n header bits.

m/2^n - m/(n*2^n) < m/n - m/(n*2^n) + n

and m/2^n = m/n - m/(n*2^n) is true for n=1 but m/2^n gets smaller and m/n - m/2^n gets bigger with increasing n so you can't pack any random data with your algorithm.

Visit our homepage: www.rarebyte.de.st

GA

Edited by - ga on 4/15/00 1:53:45 PM

Visit our homepage: www.rarebyte.de.stGA