Advertisement

MP3-Beating Compression

Started by April 06, 2000 01:58 PM
494 comments, last by kieren_j 24 years, 8 months ago
AIG: Yeah, so what? Who''d want to compress an 8-bit file? As soon as you have enough bits around to make patterns, you can do the zero-replace thing, and then you just kill the pigeon and stuff just one of his feathers with his name (/type/whatever it is that makes him unique) on it into the hole. You can fit a lot more feathers than pigeons into that hole.


Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
It''s just like reading through a list. If you have 700 part numbers, and one of them occurs more than the others, you can just say ''same'' instead of reading out the entire part number.

But you knew that already.

If certain parts (perhaps 1kb chunks) of a large file, say, have a certain pattern occur more often than the same pattern does elsewhere in the file (which it should if your bytes are random and evenly distributed), you could break the file into smaller chunks and use whatever works best on each piece.


Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
Advertisement
"quote:
--------------------------------------------------------------------------------

Let me clarify something, by ''random data is not compressible'', we mean that every file of length n is not compressible by the same algorithm. I could easily write an algorithm that could compress/uncompress a file to a size of 1 byte, but for every other file of the same length, I would see detrement, no compression.
--------------------------------------------------------------------------------


Right, but if you have an array of methods and algorithms to use, sure you could."

Lack, read aig''s post. If you have enough different methods to compress any file, you must have a very big header only to determine which method you''re using, and this header is going to make the compression useless.

My definition of random files:
okay, there will be patterns. But if you use this patterns for compression with lim n -> infinite, the saved space will be zero.

But with this definition, you''re able to find out much things about random data:
The number of 0''s and 1''s is equal. And if you search for n bit patterns, the number of each of the 2^n possible patterns is equal, also with overlapping patterns.

Visit our homepage: www.rarebyte.de.st

GA
Visit our homepage: www.rarebyte.de.stGA
Mr. Lack, I seriously doubt you''re really listening to what AIG is saying.

There are 2^7 different ways to organize a file of 7 bits. There are 2^8 different ways to organize a file of 8 bits. You simply can''t compress every 8 bit file into 7 bits. If your compression method isn''t lossy, each input file must produce a unique output file. Well, there are 2^8 possible inputs and only 2^7 possible outputs, so it''s impossible to compress every 8 bit file into 7 bits.

Please think about what people have been saying recently. I''m starting to think you''re arguing simply because you enjoy arguing, not because you have something to say / prove
Lack: Sorry, I guess I missed that.
My Geekcode: "GCS d s: a14 C++$ P+(++) L+ E-- W+++$ K- w++(+++) O---- M-- Y-- PGP- t XR- tv+ b++ DI+(+++) D- G e* h!"Decode my geekcode!Geekcode.com
Visit our web site:Asylum Entertainment
quote: Original post by ga

Lack, read aig''s post. If you have enough different methods to compress any file, you must have a very big header only to determine which method you''re using, and this header is going to make the compression useless.


Only if you''re using an index of only one type of algorithm. If you can scramble the data, you can get it to work with another algorithm.

quote:

My definition of random files:
okay, there will be patterns. But if you use this patterns for compression with lim n -> infinite, the saved space will be zero.

But with this definition, you''re able to find out much things about random data:
The number of 0''s and 1''s is equal.


Okay, that would explain the imbalance in the string you posted, and maybe why I could compress it. So if you would, post a string that you think is uncompressable by definition.

quote:
And if you search for n bit patterns, the number of each of the 2^n possible patterns is equal, also with overlapping patterns.


Now here''s where it''s at: You''re saying that I will not find an imbalance of patterns with any byte size, with any offset? How is this possible?

TheHoff: Now wait a minute, I didn''t say you could compress any 8 bits into seven. But if you have something like this:

00011110 10110011

and you look at it like this:

0001111 0101100 11...

you get a different set of things. I wasn''t implying that you could simply change the byte size and save 1/8th of the filesize.

AIG, you understood the window thing... what happened?


Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
Advertisement
No, you don't get different data just by grouping the bits differently. If any patterns in the data emerge after you regroup the bits, then all that says is that there is a pattern in the original data too.

The fact remains: No matter how hard you try, you can't represent 2^8 files with only 2^7 different files. This argument is easily extended into larger files.

Edited by - TheHoff on 4/18/00 7:51:42 PM
Lack, look at my definition of random data:
e.g., you examine 2 bit patterns. At every position in the file, the possibility that the next bit is 1 is 50%. With the bit after this bit it''s the same. The possibility that the two bits are both 1 is 25%, for each of the other combinations (10,01,00) it''s the same. And if you take the lim for n -> infinite, the possibilities get frequencies.

Visit our homepage: www.rarebyte.de.st

GA
Lack, look at my definition of random data:
e.g., you examine 2 bit patterns. At every position in the file, the possibility that the next bit is 1 is 50%. With the bit after this bit it''s the same. The possibility that the two bits are both 1 is 25%, for each of the other combinations (10,01,00) it''s the same. And if you take the lim for n -> infinite, the possibilities get frequencies.

Visit our homepage: www.rarebyte.de.st

GA
Visit our homepage: www.rarebyte.de.stGA
Lack,

Yes, files with patterns compress. ;-) But using any algorithm will not compress all files - what about files where the patterns tallies are not significantly different from each other? I posted a example of that a few pages back...

About using multiple algorithms, even if you used a million algorithms, you won't get 2^n pigeons to go into 2^(n-1) + 2^(n-2) + 2^(n-3)... unique holes!

Yes, I understood "windowing". I still do. But it also can't make those pigeons all fit. The best it can do is put in the pigeons that another algorithm couldn't put in, but to do so it's gonna have to take some other pigeons out. Yes, if you scramble the data you'll get it to work with another algorithm - but only by voiding the compressibility of some other sequence.

> But if you have something like this:
> 00011110 10110011
> and you look at it like this:
> 0001111 0101100 11...
> you get a different set of things.

Doesn't matter. You have 16 bits either way, for a total of 65536 different combinations - and you just can't represent every combinations in 15 bits or less. The most different combinations you could represent in 15 bits or less is 65534. There are always going to be at least two 16-bit combinations that won't compress, no matter which "windowing" strategy, or any other algorithm or combination of algorithms you use. Different algorithms or windowing lengths will just change which 2 combinations can't compress.

Note that just as you can't compress 8 bits into 7 (because 2 combinations won't compress), you can't compress 16 into 15 (for the same reason), and you can't compress 32 into 31 (ditto).....going with larger files doesn't get better as you might expect. Sure, the ratio of compressable-by-even-just-one-bit to non-compressable gets better, but you'll always have 2 pigeons left over, even if they fly into a window. ;-)

BTW, I'm *really* done now. I'm going into lurk mode until a demo is posted.

aig

Edited by - An Irritable Gent on 4/18/00 8:13:17 PM
aig

This topic is closed to new replies.

Advertisement