Other anonymous poster:
Ah, I thought the files in question were _unzipped_ mp3s and mpegs, which were then being reordered and compressed. I told you I thought I missed something.
MP3-Beating Compression
April 14, 2000 03:40 PM
Wow, unless that common byte is _really_ common, that aint gonna do squat to reduce filesize.
April 14, 2000 03:45 PM
Aack, sorry about posting all of this in separate messages.
Think about it... to reduce file size using this method, over 1/8 of the file has to be the same byte. Definitely not happening, even after some miraculous data massage.
Think about it... to reduce file size using this method, over 1/8 of the file has to be the same byte. Definitely not happening, even after some miraculous data massage.
Wait everyone, get this:
His algorithm compresses, but there is NO decompression algorithm (yet)!! Hmmmm.... Kinda missed the whole point of compression, huh?
Actually, what if kieron_j THINKS his algorithm works when it really doesn''t. Like if there''s a TINY flaw that he constantly overlooks in his code that happens to delete a huge portion of the file? That happens many times to me. So, in a sense, he is telling the truth as far as he knows, even though it could be totally impossible.
Kieron_J - If it really does work, mail me some $$$!
I still don''t know why this post is still going, we''re deviating from the original topic slightly. Besides, I hate waiting for a huge post to load into my browser.
BTW, go to the GDNet Lounge forum. Theres a post with 629 posts, i think!!!! 30 Pages!!! Oh, man
His algorithm compresses, but there is NO decompression algorithm (yet)!! Hmmmm.... Kinda missed the whole point of compression, huh?
Actually, what if kieron_j THINKS his algorithm works when it really doesn''t. Like if there''s a TINY flaw that he constantly overlooks in his code that happens to delete a huge portion of the file? That happens many times to me. So, in a sense, he is telling the truth as far as he knows, even though it could be totally impossible.
Kieron_J - If it really does work, mail me some $$$!
I still don''t know why this post is still going, we''re deviating from the original topic slightly. Besides, I hate waiting for a huge post to load into my browser.
BTW, go to the GDNet Lounge forum. Theres a post with 629 posts, i think!!!! 30 Pages!!! Oh, man
If I did my math right, on a 1 Meg file of purely random data (ie. all 256 possible bytes each occur the same number of times, so there''s no most common byte), the algorithm will produce a file that is about 15.44K *larger* than the original 1Meg file. (1/256 of the original file (=4k) could be represented by the 0 bits (512 bytes), and the rest of the bytes would take 9 bits each (the 1 bits + orig bytes)).
So I don''t see how this will work on mp3 files, zips, or anything already essentially "random" data. But it should work fine on other specific uncompressed types of files, including some bmp, txt, etc, where one byte occurs much more than the others.
I vote keep the thread open. The thread is fun even though I don''t believe the algorithm works for anything except files that are already highly compressible.
aig
So I don''t see how this will work on mp3 files, zips, or anything already essentially "random" data. But it should work fine on other specific uncompressed types of files, including some bmp, txt, etc, where one byte occurs much more than the others.
I vote keep the thread open. The thread is fun even though I don''t believe the algorithm works for anything except files that are already highly compressible.
aig
aig
You''re missing something: who says you have to use 8-bit bytes?
The reason why it must be possible on some scale is that (I think) it''s pretty hard, or nearly impossible, to create a random file that has no patterns at all, in 3-, 4-, 5-, and so on-, byte sizes. At the very least, not in a 100mb file. Maybe completely random and distributed at the 8-bit-byte level, but you shift gears and it''s a whole new world.
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
The reason why it must be possible on some scale is that (I think) it''s pretty hard, or nearly impossible, to create a random file that has no patterns at all, in 3-, 4-, 5-, and so on-, byte sizes. At the very least, not in a 100mb file. Maybe completely random and distributed at the 8-bit-byte level, but you shift gears and it''s a whole new world.
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
Listen Kieran....don''t worry about what we all think. If it works, then it works and you are set for life. If it doesn''t, then most of us have nothing to worry about. BUT if it truely works, then I would stop posting code just to prove the point. Maybe after you patent or sell it or whatever.
BASSOFeeSH
p.s. you said that it is over 500 lines of code but only took 20 minutes.....for some reason that seems wrong
"Why am I the only one on the away team with a red shirt!?"
BASSOFeeSH
><>
BASSOFeeSH
p.s. you said that it is over 500 lines of code but only took 20 minutes.....for some reason that seems wrong
"Why am I the only one on the away team with a red shirt!?"
BASSOFeeSH
><>
-- What would Sweetness do?
Isn''t a byte defined as 8 bits?
Regardless, if you use 4-6 bit patterns and they are roughly evenly distributed, then the file size will potentially shrink even less than with 8 bit patterns. (Because for each pattern that is not your most common pattern, you add one extra bit.)
It has been shown above that there is no single compression algorithm that can guarantee compression on all possible files of a certain length, by even a bit. However this doesn''t mean that kieren_j isn''t compressing all files.
When stages are mentioned, it makes me think that multiple passes are being used, and different compression algorithms are being applied to the current data set. And if you choose a compression algorithm tailored to the data, you could theoretically compress random files. (Theoretically, because you''d need access to a near infinite supply of different compression algorithms to guarantee compression on random data.)
This does lead me to believe that kieren_j could be putting together a decent compression system, but I''m skeptical of the numbers we''ve seen. (Because other programs have used this multiple-algorithm strategy to high success, but never anything like these numbers suggest.)
As for voting whether the thread should be closed, I say leave it open. I''ve learned a bunch about compression already, and the discussion can''t hurt anything.
Jesse
Regardless, if you use 4-6 bit patterns and they are roughly evenly distributed, then the file size will potentially shrink even less than with 8 bit patterns. (Because for each pattern that is not your most common pattern, you add one extra bit.)
It has been shown above that there is no single compression algorithm that can guarantee compression on all possible files of a certain length, by even a bit. However this doesn''t mean that kieren_j isn''t compressing all files.
When stages are mentioned, it makes me think that multiple passes are being used, and different compression algorithms are being applied to the current data set. And if you choose a compression algorithm tailored to the data, you could theoretically compress random files. (Theoretically, because you''d need access to a near infinite supply of different compression algorithms to guarantee compression on random data.)
This does lead me to believe that kieren_j could be putting together a decent compression system, but I''m skeptical of the numbers we''ve seen. (Because other programs have used this multiple-algorithm strategy to high success, but never anything like these numbers suggest.)
As for voting whether the thread should be closed, I say leave it open. I''ve learned a bunch about compression already, and the discussion can''t hurt anything.
Jesse
Website - Third Party Ninjas | Twitter - twitter.com/Chounard
April 14, 2000 04:45 PM
Lack, if I tell you I have a completely random stream of 1''s and 0''s, then any sort of windowing of the data, 5 bit, 8 bit, 13 bit, whatever... is going to be patternless. If it wasn''t, then the distribution of 1''s and 0''s wouldn''t be random. I know I can generate a random stream of 1''s and 0''s, so I also know I can make a stream of data that has no pattern regardless of how big of chunks you care to look at it through.
This is like me telling you that 37 is a prime number, and then you saying, "WAIT, have you tried to divide it by 4??!? WAIT, what about 3??!?"
This is like me telling you that 37 is a prime number, and then you saying, "WAIT, have you tried to divide it by 4??!? WAIT, what about 3??!?"
April 14, 2000 04:59 PM
GO TO SCHOOL. Seriously. I don''t mean this to be elite, but you''re just spreading dangerous misinformation. In particular:
> (1) Find most common char, store in P
> (2) For all source bytes:
> (2.a) If byte is P, write a ZERO
> (2.b) else, write a ONE then the byte
> (3) Repeat for all bytes
>
> It''s really quite simple!
> Besides, it''s not huffman - it''s one of my own invented schemes.
Huffman is a GREEDY algorithm that finds THE OPTIMAL variable-bit encoding of a dictionary of characters of a specified size (typically 8 bit, but it could be anything). Huffman is provably GREEDY. This means that there is NO OTHER POSSIBLE variable-bit encoding (as you''re suggesting here, by doing 0 to represent the most common character, then doing 1 for all other bytes) that could be more efficient than building the Huffman tree. The above code will never, ever, ever, be better than Huffman. At base, it would be equivalent.
The fact that you don''t even understand this, along with a rudimentary knowledge of information theory, lets me sleep soundly at night knowing that this is all a bunch of hot air. The reason I and others keep posting is because you ar e SERIOUSLY misleading other people. It''s all well and good if we want to talk about compression. But there is no excuse for continuing to promote PROVABLY FALSE CONCEPTS.
> (1) Find most common char, store in P
> (2) For all source bytes:
> (2.a) If byte is P, write a ZERO
> (2.b) else, write a ONE then the byte
> (3) Repeat for all bytes
>
> It''s really quite simple!
> Besides, it''s not huffman - it''s one of my own invented schemes.
Huffman is a GREEDY algorithm that finds THE OPTIMAL variable-bit encoding of a dictionary of characters of a specified size (typically 8 bit, but it could be anything). Huffman is provably GREEDY. This means that there is NO OTHER POSSIBLE variable-bit encoding (as you''re suggesting here, by doing 0 to represent the most common character, then doing 1 for all other bytes) that could be more efficient than building the Huffman tree. The above code will never, ever, ever, be better than Huffman. At base, it would be equivalent.
The fact that you don''t even understand this, along with a rudimentary knowledge of information theory, lets me sleep soundly at night knowing that this is all a bunch of hot air. The reason I and others keep posting is because you ar e SERIOUSLY misleading other people. It''s all well and good if we want to talk about compression. But there is no excuse for continuing to promote PROVABLY FALSE CONCEPTS.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement