MP3-Beating Compression

kieren_j · 2000-05-06T13:18:55

You probably don''t believe me, but if you''re at all interested in my new "CAR" compression alogrithm, check this out: The strange thing is, it works better on compressed files! Zipping an MP3 file gives you 99% of original, but check this out! **** TESTS ON UNCOMPRESSED FILES **** TXT File Example TXT File: 1,318,671 Savings: 1,308,940 CAR File: 9,731 Percent: 0.7% WAV File Example WAV File: 8,362,354 Savings: 8,323,477 CAR File: 38,877 Percent: 0.5% EXE File Example EXE File: 216,064 Savings: 213,336 CAR File: 2,728 Percent: 1.3% **** TESTS ON ALREADY-COMPRESSED FILES **** MP3 File Example MP3 File: 4,961,773 Savings: 4,945,669 CAR File: 16,104 Percent: 0.3% MPG File Example MPG File: 5,976,068 Savings: 5,946,909 CAR File: 29,159 Percent: 0.5% If you didn''t see it first time, I compressed an MP3 file from 5 meg to 16kb. What CAR actually does is obviously a complete secret, but I''m really really excited about it! I''ve been thinking of how to do it for years - but now, yay! (I figured it out playing around in QB, of all things!). What I want to know is basically are there any sites that are relatively easy to understand that tell you how to do: Huffman Compression LZW Compression "Textbook" RLE Compression (I only know PCX''s RLE) I know that you use binary trees and nodes and so on but I have no idea for a software implementation! Anyways you probably don''t believe me, but I just wanna try to make the compression better. Thanks from a very very excited Kieren Johnstone --------------- kieren_j

Anonymous

April 14, 2000 03:38 PM

Other anonymous poster:

Ah, I thought the files in question were _unzipped_ mp3s and mpegs, which were then being reordered and compressed. I told you I thought I missed something.

Anonymous

April 14, 2000 03:40 PM

Wow, unless that common byte is _really_ common, that aint gonna do squat to reduce filesize.

Anonymous

April 14, 2000 03:45 PM

Aack, sorry about posting all of this in separate messages.

Think about it... to reduce file size using this method, over 1/8 of the file has to be the same byte. Definitely not happening, even after some miraculous data massage.

Zipster

2,420

April 14, 2000 03:59 PM

Wait everyone, get this:

His algorithm compresses, but there is NO decompression algorithm (yet)!! Hmmmm.... Kinda missed the whole point of compression, huh?

Actually, what if kieron_j THINKS his algorithm works when it really doesn''t. Like if there''s a TINY flaw that he constantly overlooks in his code that happens to delete a huge portion of the file? That happens many times to me. So, in a sense, he is telling the truth as far as he knows, even though it could be totally impossible.

Kieron_J - If it really does work, mail me some $$$!

I still don''t know why this post is still going, we''re deviating from the original topic slightly. Besides, I hate waiting for a huge post to load into my browser.

BTW, go to the GDNet Lounge forum. Theres a post with 629 posts, i think!!!! 30 Pages!!! Oh, man

An Irritable Gent

122

April 14, 2000 04:03 PM

If I did my math right, on a 1 Meg file of purely random data (ie. all 256 possible bytes each occur the same number of times, so there''s no most common byte), the algorithm will produce a file that is about 15.44K *larger* than the original 1Meg file. (1/256 of the original file (=4k) could be represented by the 0 bits (512 bytes), and the rest of the bytes would take 9 bits each (the 1 bits + orig bytes)).

So I don''t see how this will work on mp3 files, zips, or anything already essentially "random" data. But it should work fine on other specific uncompressed types of files, including some bmp, txt, etc, where one byte occurs much more than the others.

I vote keep the thread open. The thread is fun even though I don''t believe the algorithm works for anything except files that are already highly compressible.

aig

aig

LackOfKnack

122

April 14, 2000 04:10 PM

You''re missing something: who says you have to use 8-bit bytes?

The reason why it must be possible on some scale is that (I think) it''s pretty hard, or nearly impossible, to create a random file that has no patterns at all, in 3-, 4-, 5-, and so on-, byte sizes. At the very least, not in a 100mb file. Maybe completely random and distributed at the 8-bit-byte level, but you shift gears and it''s a whole new world.

Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!

Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!

BASSOFeeSH

122

April 14, 2000 04:43 PM

Listen Kieran....don''t worry about what we all think. If it works, then it works and you are set for life. If it doesn''t, then most of us have nothing to worry about. BUT if it truely works, then I would stop posting code just to prove the point. Maybe after you patent or sell it or whatever.

BASSOFeeSH
p.s. you said that it is over 500 lines of code but only took 20 minutes.....for some reason that seems wrong

"Why am I the only one on the away team with a red shirt!?"

BASSOFeeSH
><>

-- What would Sweetness do?

Jesse Chounard

394

April 14, 2000 04:45 PM

Isn''t a byte defined as 8 bits?

Regardless, if you use 4-6 bit patterns and they are roughly evenly distributed, then the file size will potentially shrink even less than with 8 bit patterns. (Because for each pattern that is not your most common pattern, you add one extra bit.)

It has been shown above that there is no single compression algorithm that can guarantee compression on all possible files of a certain length, by even a bit. However this doesn''t mean that kieren_j isn''t compressing all files.

When stages are mentioned, it makes me think that multiple passes are being used, and different compression algorithms are being applied to the current data set. And if you choose a compression algorithm tailored to the data, you could theoretically compress random files. (Theoretically, because you''d need access to a near infinite supply of different compression algorithms to guarantee compression on random data.)

This does lead me to believe that kieren_j could be putting together a decent compression system, but I''m skeptical of the numbers we''ve seen. (Because other programs have used this multiple-algorithm strategy to high success, but never anything like these numbers suggest.)

As for voting whether the thread should be closed, I say leave it open. I''ve learned a bunch about compression already, and the discussion can''t hurt anything.

Jesse

Website - Third Party Ninjas | Twitter - twitter.com/Chounard

Anonymous

April 14, 2000 04:45 PM

Lack, if I tell you I have a completely random stream of 1''s and 0''s, then any sort of windowing of the data, 5 bit, 8 bit, 13 bit, whatever... is going to be patternless. If it wasn''t, then the distribution of 1''s and 0''s wouldn''t be random. I know I can generate a random stream of 1''s and 0''s, so I also know I can make a stream of data that has no pattern regardless of how big of chunks you care to look at it through.

This is like me telling you that 37 is a prime number, and then you saying, "WAIT, have you tried to divide it by 4??!? WAIT, what about 3??!?"

Anonymous

April 14, 2000 04:59 PM

GO TO SCHOOL. Seriously. I don''t mean this to be elite, but you''re just spreading dangerous misinformation. In particular:

> (1) Find most common char, store in P
> (2) For all source bytes:
> (2.a) If byte is P, write a ZERO
> (2.b) else, write a ONE then the byte
> (3) Repeat for all bytes
>
> It''s really quite simple!
> Besides, it''s not huffman - it''s one of my own invented schemes.

Huffman is a GREEDY algorithm that finds THE OPTIMAL variable-bit encoding of a dictionary of characters of a specified size (typically 8 bit, but it could be anything). Huffman is provably GREEDY. This means that there is NO OTHER POSSIBLE variable-bit encoding (as you''re suggesting here, by doing 0 to represent the most common character, then doing 1 for all other bytes) that could be more efficient than building the Huffman tree. The above code will never, ever, ever, be better than Huffman. At base, it would be equivalent.

The fact that you don''t even understand this, along with a rudimentary knowledge of information theory, lets me sleep soundly at night knowing that this is all a bunch of hot air. The reason I and others keep posting is because you ar e SERIOUSLY misleading other people. It''s all well and good if we want to talk about compression. But there is no excuse for continuing to promote PROVABLY FALSE CONCEPTS.

MP3-Beating Compression

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

MP3-Beating Compression

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines