look, i already tried to explain to you that even your so-called random sequences can be compressed, because there are no true random sequences. we are not talking about bits and bytes that we don''t know before compressing them. and there are no files where every pattern occurs with the same frequency! such a file would be of infinite size and that can not be possible, because our universe''s size itself is limited. so you will always find a pattern that you can describe with a function. the more ''random'' your file gets however, the less compression you will be able to achieve with this method. now kieren_j is trying to get rid of some of the file''s ''randomness'' by applying bitmasks. that might make his file even bigger, but in the end a different compressor like huffman would be able to compress the file even more - until the file reaches a minimum size and you can''t compress it any further. whoever said that huffman is already a very greedy algo is of course right, but maybe kieren_j has found a way to perfect huffman coding by making random sequences not so random anymore. and of course there is a lower limit that depends on the file''s size and complexity (i didn''t want to write randomness anymore). we are juat trying to get down to that very limit!
//.athias
___________________________________________
//.athias .\\üller mmathias@magnet.at
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
MP3-Beating Compression
___________________________________________ //.athias .üller mmathias@magnet.at¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Am I wrong in assuming that the only ways to improve on Huffman encoding are
1. Lossy encoding when it doesn''t matter.
2. Increasing the lengths of the patterns the Huffman encoding detects
?
#pragma DWIM // Do What I Mean!
~ Mad Keith ~
**I use Software Mode**
1. Lossy encoding when it doesn''t matter.
2. Increasing the lengths of the patterns the Huffman encoding detects
?
#pragma DWIM // Do What I Mean!
~ Mad Keith ~
**I use Software Mode**
It's only funny 'till someone gets hurt.And then it's just hilarious.Unless it's you.
mmathias, you want to say there are no true random data sequences, because there are bigger patterns, e.g. with length n, where some of the possible 2^n patterns can''t occur because the file is smaller than n*2^n bits. That''s true, but in our random file ther won''t be an n bit pattern which occurs multiple times. And each of those patterns is a random pattern like the whole file itself.
Then you said you can compress any data to a certain size, but if there is no true random data, you''d be able to compress the resulting file again.
Or I compress all my big files to that certain size, then I copy them together into one file, which is bigger than the certain size, and I can compress this again. I could store as much data as I wanted on my hard drive, which is definitly not true!
Visit our homepage: www.rarebyte.de.st
GA
Then you said you can compress any data to a certain size, but if there is no true random data, you''d be able to compress the resulting file again.
Or I compress all my big files to that certain size, then I copy them together into one file, which is bigger than the certain size, and I can compress this again. I could store as much data as I wanted on my hard drive, which is definitly not true!
Visit our homepage: www.rarebyte.de.st
GA
Visit our homepage: www.rarebyte.de.stGA
quote: Original post by MadKeithV
Am I wrong in assuming that the only ways to improve on Huffman encoding are
1. Lossy encoding when it doesn''t matter.
2. Increasing the lengths of the patterns the Huffman encoding detects
?
#pragma DWIM // Do What I Mean!
~ Mad Keith ~
**I use Software Mode**
we are not only talking about huffman coding, it was just an example!
let us look at the output of a sine function:
x = sin(y);
we will make y range from 0 to 360. now let us analyze all x values. i think that if you added up all the x values, and that if you divided them by 360, you would get 0 as a result. that means that the probability for every bit to be 0 or 1 would be 50% ---> a random file?! i do not think so. if you compared the output of this sine function and if you only knew the output, you could easily find the function that generated your data. now all you would have to do is store the function!
//.athias
PS: i hope that everybody understood what i was trying to write. sorry for my english.
___________________________________________
//.athias .\\üller mmathias@magnet.at
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
___________________________________________ //.athias .üller mmathias@magnet.at¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
quote: Original post by ga
mmathias, you want to say there are no true random data sequences, because there are bigger patterns, e.g. with length n, where some of the possible 2^n patterns can''t occur because the file is smaller than n*2^n bits. That''s true, but in our random file ther won''t be an n bit pattern which occurs multiple times. And each of those patterns is a random pattern like the whole file itself.
Then you said you can compress any data to a certain size, but if there is no true random data, you''d be able to compress the resulting file again.
Or I compress all my big files to that certain size, then I copy them together into one file, which is bigger than the certain size, and I can compress this again. I could store as much data as I wanted on my hard drive, which is definitly not true!
Visit our homepage: www.rarebyte.de.st
GA
but this certain size i was talking about depends on the size and complexity of your original data. so you would not be able to store as much data as you wanted on your hd! there will always be this lower limit, but huffman just does not get there. that''s why kieren_j is trying to apply bitmasks to the compressed data in order to get rid of some of the file''s complexity. that way coding methods like huffman can compress the file even better. i am not saying that every file can be compressed down to the smallest amount of information (a bit). what i am trying to tell you is that it is possible to enhance existing compression algos.
//.athias
___________________________________________
//.athias .\\üller mmathias@magnet.at
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
___________________________________________ //.athias .üller mmathias@magnet.at¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Why do I keep repeating myself here? Why?
Lack: Yes, in theory you can write a program (call it TRUCK) that, can compress all but two files of 1MB length down to some smaller number of bits. The proof was given earlier, but just to reiterate: The total number of bit patterns smaller than N bits is 2^0 + 2^1 + 2^2 + 2^3 + ... + 2^(N-1) which is equal to (2^N - 1). I''m even giving you the 0-bit file, just to be generous. Okay, there is still some file 1MB long that you can''t store in fewer bits, there is no free pattern. OK, we''ll just store that in its original format. TRUCK can now compress almost every file of length 1MB, and never increases the size of such a file. BUT: There''s a problem. I want to use TRUCK to compress a file that''s only 600kb long (or any other length for that matter.) Well, when I run any file less than 1MB through the decompressor it MUST produce a file that is 1MB long, so there are no possible files shorter than 600kb that I can use to represent a compressed 600kb file. In fact, the shortest sequence "available" is 1MB.
All of this argument rests squarely on a very basic fact that some people (I''m not saying you, just some people) continue to overlook: Decompression must be deterministic. I don''t care if you have an array of algorithms, or not. I don''t care if you compress the headers, or run the compression repeatedly. Ultimately, if I give you a specific file, there must be exactly one original file that results from that. Two possible counterarguments:
1) No, we pair off files, and every pair of files compresses down to a single file. Then, the decompressor produces both simultaneously, and the user has to pick which one they wanted. This doesn''t work because that information that you''re asking the user to provide (which is only one more bit) is required for the decompression process, so you''ve just moved some of the compressed data from the disk to the user''s brain. And that extra bit of cost is exactly the bit you''d save compared against the scheme above. (Divide the number of files by 2, you need one less bit to store them all...)
2) The decompressor can be run repeatedly, and stopping at different points produces different original files. I won''t go into the gory details, but it''s the same problem as above... The user needs to know how many times to decompress the file, and that information is once again the exact amount of information you save in the file. (And you haven''t compressed it, you''ve just moved it to the user''s brain.)
mmathias:
You said "we are not talking about bits and bytes that we don''t know before compressing them". Yes we are! That''s the whole point. Did the writers of Winzip know what files you plan on compressing when they created the program? Of course not. I''m now staring at my screen trying to decide what else I can say in response, but I''m truely speechless. Personally, it doesn''t disturb me that kieren_j made his claims, and probably believed them (I''ve been in that situtaion). But I am absolutely frightened if you believe what you''ve written. I don''t know what else to say. Hopefully someone else can pick up where I''m leaving off...
-Brian
Lack: Yes, in theory you can write a program (call it TRUCK) that, can compress all but two files of 1MB length down to some smaller number of bits. The proof was given earlier, but just to reiterate: The total number of bit patterns smaller than N bits is 2^0 + 2^1 + 2^2 + 2^3 + ... + 2^(N-1) which is equal to (2^N - 1). I''m even giving you the 0-bit file, just to be generous. Okay, there is still some file 1MB long that you can''t store in fewer bits, there is no free pattern. OK, we''ll just store that in its original format. TRUCK can now compress almost every file of length 1MB, and never increases the size of such a file. BUT: There''s a problem. I want to use TRUCK to compress a file that''s only 600kb long (or any other length for that matter.) Well, when I run any file less than 1MB through the decompressor it MUST produce a file that is 1MB long, so there are no possible files shorter than 600kb that I can use to represent a compressed 600kb file. In fact, the shortest sequence "available" is 1MB.
All of this argument rests squarely on a very basic fact that some people (I''m not saying you, just some people) continue to overlook: Decompression must be deterministic. I don''t care if you have an array of algorithms, or not. I don''t care if you compress the headers, or run the compression repeatedly. Ultimately, if I give you a specific file, there must be exactly one original file that results from that. Two possible counterarguments:
1) No, we pair off files, and every pair of files compresses down to a single file. Then, the decompressor produces both simultaneously, and the user has to pick which one they wanted. This doesn''t work because that information that you''re asking the user to provide (which is only one more bit) is required for the decompression process, so you''ve just moved some of the compressed data from the disk to the user''s brain. And that extra bit of cost is exactly the bit you''d save compared against the scheme above. (Divide the number of files by 2, you need one less bit to store them all...)
2) The decompressor can be run repeatedly, and stopping at different points produces different original files. I won''t go into the gory details, but it''s the same problem as above... The user needs to know how many times to decompress the file, and that information is once again the exact amount of information you save in the file. (And you haven''t compressed it, you''ve just moved it to the user''s brain.)
mmathias:
You said "we are not talking about bits and bytes that we don''t know before compressing them". Yes we are! That''s the whole point. Did the writers of Winzip know what files you plan on compressing when they created the program? Of course not. I''m now staring at my screen trying to decide what else I can say in response, but I''m truely speechless. Personally, it doesn''t disturb me that kieren_j made his claims, and probably believed them (I''ve been in that situtaion). But I am absolutely frightened if you believe what you''ve written. I don''t know what else to say. Hopefully someone else can pick up where I''m leaving off...
-Brian
Your sin example doesn''t change anything. btw, the values of bits cannot make a good sin function (with only two values). We must have e.g. 8 bit values (-128 to 127). Then most(2/3) of the values of sin(x) are higher than 0.5 or lower than -0.5, which means in our example higher than 63 or lower than -64. So most of them would have a set 2^6 - bit and the number of 1s wouldn''t be equal to the number of 0s.
Visit our homepage: www.rarebyte.de.st
GA
Visit our homepage: www.rarebyte.de.st
GA
Visit our homepage: www.rarebyte.de.stGA
mmathias sure, you could make your sin function with one bit, but then you''d be far away from what I defined random data ;-)
Visit our homepage: www.rarebyte.de.st
GA
Visit our homepage: www.rarebyte.de.st
GA
Visit our homepage: www.rarebyte.de.stGA
If you could compress random data once, then what stops your algorithm to think it can still be compressed, thus creating an infinite loop, just like time-travel
IE: (bear with me on this )
If you somehow went back in time and killed your ancestor, you'd never have been born, so you wouldn't have been able to go back in time and kill him, so he wouldn't have died after all and you'd end up being born anyway and sill going back in time an killing him, so you'd never been born...Catch my drift?
No? GOOD! There was absolutely no point for this reply, much the same as there is no point for this thread because most people who post don't know what the hell they are talking about.
Just my 200th of 4 dollars
Edited by - Blah! on 4/19/00 10:27:21 AM
IE: (bear with me on this )
If you somehow went back in time and killed your ancestor, you'd never have been born, so you wouldn't have been able to go back in time and kill him, so he wouldn't have died after all and you'd end up being born anyway and sill going back in time an killing him, so you'd never been born...Catch my drift?
No? GOOD! There was absolutely no point for this reply, much the same as there is no point for this thread because most people who post don't know what the hell they are talking about.
Just my 200th of 4 dollars
Edited by - Blah! on 4/19/00 10:27:21 AM
"End Communication!", Kang from Rigel-4
quote: Original post by ga
mmathias sure, you could make your sin function with one bit, but then you''d be far away from what I defined random data ;-)
Visit our homepage: www.rarebyte.de.st
GA
well, of course i was not trying to do the sine with only one bit of data, and it was yet another example of what i was trying to tell you. i understand your pigeonhole example and that you can''t represent 8 bits of data by 7 bits without losing one. what i am saying is that there are ways to find functions that even describe very complex strings of data. but those function definitions require a smaller amount of memory than the original data. i am also not talking about approximating. it might of course take some time until the compressor finds the function that correctly fits my data, but once that function is found, you don''t have to worry about your original data anymore, because you can always create it if you have this very function. of course it won''t be as easy as finding a sine function, but i am sure that there are ways to accomplish data compression (lossless) that way!
i think that we are talking about two different things right now. kieren_j said that he can enhance existing data compression methods by applying bit masks. you are trying to tell everybody that it''s impossible to fit 8 pigeons into 7 holes. true, but that''s not what kieren_j is doing! and finally i am trying to tell you that it might be possible to find functions that describe strings of data in order to store those strings as their corresponding functions! i think everybody on this board including me really wants to see kieren_j''s demo, which is not his complete compression algo!
well, that should give everybody something to argue about ;-)
//.athias
PS: i also feel like i am repeating myself. maybe we should just wait for keiren_j''s demo and stop arguing about compression in general, because there is no such thing as data compression, is there? (just look at the homeless pigeon!)
___________________________________________
//.athias .\\üller mmathias@magnet.at
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
___________________________________________ //.athias .üller mmathias@magnet.at¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement