Advertisement

Random bit stream

Started by May 17, 2000 06:40 PM
24 comments, last by LackOfKnack 24 years, 4 months ago
Also, why couldn''t those files be broken into smaller ones? If they aren''t necessarily evenly distributed, then their smaller pieces can be compressed, right?


Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
On another point, even if half are not compressable with a certain algorithm, there are 2^1000000 different files that can take up one ~meg, right? So who says you''ll necessarily hit on the hard-to-compress half? That''s a huge number, and only of those that are exactly the same size.

Oh, now I remember something: If half are uncompressable, but with a certain algorithm (mine, for instance) the average of all of the compressed file sizes does actually yield positive compression (contrary to other''s previous statements), then doesn''t that mean that for any file (of reasonable size) you could split it into smaller files, and probably half of them would compress?


Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
Advertisement
I don''t know enough to really argue compression theory. However, there is no such thing as an uncompressible file. You send me any file, I can write a program to compress it to 1 byte. My program won''t do squat for any other files, but for that 1 exact file, I am Mr. Compression.

I''m guessing the $5000 thing was more like: "You send me a compression algorithm, and I guarantee I can find a file that cannot be compressed.

Jesse Chounard
ya, i agree with jesse. every compression algorithm has a good case (compress into 1 byte) and a bad case (dont compress at all)

- pouya
--------------
Maybe in order to understand mankind, we have to look at the word itself: "Mankind". Basically, it's made up of two separate words - "mank" and "ind". What do these words mean? It's a mystery, and that's why so is mankind
Yeah, I know about the algorithm thing, that''s why I kept saying ''using a certain algorithm,'' like, just the one on a bunch of files.

I didn''t know that that''s how the contest worked, though. All righty.


Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
Actually it''s pretty trivial to create a bit sequence for any given compression algorithm that is uncompressible. Fixed point theorem on compression states that if you feed the output of compression into the compression enough times, eventually the result will be uncompressable by that algorithm.
Advertisement
I think osmanb mentioned in the compression thread that you could compress 2^n-1 of the 2^n files with size n (bits). But you need also information about the length of the original file. Let's say we have one of the for our filesystem biggest possible files, so you'll need round(log2(n)) bits to store the length of the original file.
Then we've got:
one file with 0 bits : log2(n) bits
two files with 1 bit : [1 + log2(n) bits] * 2
four files with 2 bits : [2 + log2(n) bits] * 4
................
2^(n-1) files with n-1 bits : [n-1 + log2(n) bits] * 2^(n-1)
one of the 2^n files can't be compressed : n + log2(n) bits

Now we must add those values and divide them by 2^n to see if the result is more or less than n:
Let's do it with the log2s first: 2^n*log2(n)/2^n = log2(n)
plus the other things: [sum(i*2^i,i,0,n-1)]/2^n = [2^n*(n-2) + 2]/2^n
Let's add the log2(n): n - 2 + 1/2^(n-1) + log2(n)
This is the average compressed file size (if I've made no mistake) and it's bigger than n.

Visit our homepage: www.rarebyte.de.st

GA

Edited by - ga on May 19, 2000 3:15:32 PM
Visit our homepage: www.rarebyte.de.stGA
Most compression algorithms work linearly, right? Well, if something doesn''t compress well linearly, couldn''t you look at it like a matrix?

For instance,
1001011011000101

Would be
1001
0110
1100
0101

Which, read vertically instead of horizontally, would be
1010011101001001

Now, I''m not sure, but this might be useful. Just thinking.

lntakitopi@aol.com | http://geocities.com/guanajam/
Excuse my stupidity, ga, but how does log work exactly?

Thanks.


Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
Oh, and SHilbert: I''m not sure if that would yield better results or not, but I''d guess not (try it, maybe it does.) I tried bit offsets, and that did nothing, but I tried bit masking and that definitely works (as long as you''re at a different byte size than the original.)


Lack

Christianity, Creation, metric, Dvorak, and BeOS for all!
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!

This topic is closed to new replies.

Advertisement