MP3-Beating Compression
Pariah: At best, that was very immature. And no, I wasn't the original poster.
AIG: The burden of proof may normally be on me, but not in this case because I am merely defending kieren_j. The burden's in him.
Now why couldn't that work? You have a header plus a little info. You insert certain sequences into the code using the header info, you have a bigger file now. Using the header from that, you insert yet more (and possibly bigger) sequences as appropriate until you have another file, and so on. It's almost like having a random seed but more complex and in reverse.
Zipster: I'm sorry, I must've missed that. I only read the first three pages once, and that was back when I started posting.
ga: you seemed to be talking about regular compression where there's more than one pattern to keep track of, and because of this, if you index all of a random file, you save no space.
Jesse: I might've been, I just saw a lot of that going on. Not you specifically, though.
Ridcully: The earth is a disc! What are you talking about?
AP: Ha ha. Someone already took your joke. It's over.
kieren: Finally, thanks.
On window size: say you have 100 megs of completely random, evenly distributed chars. Now, each one is 8 bits. You may find no pattern in that frame. But if you take it down to 3 or 4 bits, you automatically have patterns within each byte, and patterns in the overlap to other bytes, right? Of if you crank it up to say 8,000,000 bits, somewhere there will be a repeat of an 1 meg sequence (this may be stretching it but somewhere there's a happy medium, depending on file size and the data within.) If you replace those (at least two megs) with one bit each, you've saved 2 megs minus two bits. Then you add 98 bits for the rest of the sequences that are'nt the same. If you now tack that one meg sequence onto the header, you still have a 99 meg + 100 bit file that can be restored perfectly. If you then run it at a different byte level, you can get even more compression. And/or bit reordering and/or bit masking.
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
Edited by - LackOfKnack on 4/15/00 6:00:28 AM
AIG: The burden of proof may normally be on me, but not in this case because I am merely defending kieren_j. The burden's in him.
Now why couldn't that work? You have a header plus a little info. You insert certain sequences into the code using the header info, you have a bigger file now. Using the header from that, you insert yet more (and possibly bigger) sequences as appropriate until you have another file, and so on. It's almost like having a random seed but more complex and in reverse.
Zipster: I'm sorry, I must've missed that. I only read the first three pages once, and that was back when I started posting.
ga: you seemed to be talking about regular compression where there's more than one pattern to keep track of, and because of this, if you index all of a random file, you save no space.
Jesse: I might've been, I just saw a lot of that going on. Not you specifically, though.
Ridcully: The earth is a disc! What are you talking about?
AP: Ha ha. Someone already took your joke. It's over.
kieren: Finally, thanks.
On window size: say you have 100 megs of completely random, evenly distributed chars. Now, each one is 8 bits. You may find no pattern in that frame. But if you take it down to 3 or 4 bits, you automatically have patterns within each byte, and patterns in the overlap to other bytes, right? Of if you crank it up to say 8,000,000 bits, somewhere there will be a repeat of an 1 meg sequence (this may be stretching it but somewhere there's a happy medium, depending on file size and the data within.) If you replace those (at least two megs) with one bit each, you've saved 2 megs minus two bits. Then you add 98 bits for the rest of the sequences that are'nt the same. If you now tack that one meg sequence onto the header, you still have a 99 meg + 100 bit file that can be restored perfectly. If you then run it at a different byte level, you can get even more compression. And/or bit reordering and/or bit masking.
Lack
Christianity, Creation, metric, Dvorak, and BeOS for all!
Edited by - LackOfKnack on 4/15/00 6:00:28 AM
Will this topic ever end.
How, I have no idea how compression algorithms work (i''ve got a rough idea), what the hell a window size is (a window to me means something with 3 little icons in the top-right hand corner, and so on, but I do know this: This is getting BORING. The bloody thread has been CLOSED, stop posting! Wait until keiren posts a demo, but until then, leave IT!
geez
MENTAL
This had better make 8 pages
How, I have no idea how compression algorithms work (i''ve got a rough idea), what the hell a window size is (a window to me means something with 3 little icons in the top-right hand corner, and so on, but I do know this: This is getting BORING. The bloody thread has been CLOSED, stop posting! Wait until keiren posts a demo, but until then, leave IT!
geez
MENTAL
This had better make 8 pages
Disputing my reply at the top of page 7.
Lack tells me a change in byte size will change the data in some way as to make it compressable. Once random, always random until you sort it (which doesnt allow "decompression").
Furthermore, Lack misquoted me by quoting my line of compressing terrabytes to one disk. I was merely showing what _could_ be done if Kieren_J could compress any file, which we all seem to have agreed cannot be done.
I am probably going out on a limb if I generalise and say everyone is happy that pure random data cannot be compressed as patterns cannot be identified at all.
None-the-less I will make that generalisation.
Now, whats left is to decide if his (Kieren_j''s) claims of compressing zip, mp3 and whichever other files he has tested... are correct.
On these I cannot comment. I do not know internal file formats or data representation within zip''s or mp3''s.
What I do know is that youre not very likely to find any patterns within a zip file because zip''s algorithms eliminate regular patterns and replace with lookups (am i right?)
MP3 on the other hand, has the potential to be random, as there should be no way to predict what data could come next. However, (not knowing mp3 format) mp3''s compressed data is a way of breaking a sound into multiple frequencies and storing frequency data. I may be wrong.
A lot of frequencies, and patterns therein would probably occur many times within a song... thus I am assuming that an mp3 must have some repeated data (if the input were some average song).
Anyone care to elaborate?
regards,
GeniX
Lack tells me a change in byte size will change the data in some way as to make it compressable. Once random, always random until you sort it (which doesnt allow "decompression").
Furthermore, Lack misquoted me by quoting my line of compressing terrabytes to one disk. I was merely showing what _could_ be done if Kieren_J could compress any file, which we all seem to have agreed cannot be done.
I am probably going out on a limb if I generalise and say everyone is happy that pure random data cannot be compressed as patterns cannot be identified at all.
None-the-less I will make that generalisation.
Now, whats left is to decide if his (Kieren_j''s) claims of compressing zip, mp3 and whichever other files he has tested... are correct.
On these I cannot comment. I do not know internal file formats or data representation within zip''s or mp3''s.
What I do know is that youre not very likely to find any patterns within a zip file because zip''s algorithms eliminate regular patterns and replace with lookups (am i right?)
MP3 on the other hand, has the potential to be random, as there should be no way to predict what data could come next. However, (not knowing mp3 format) mp3''s compressed data is a way of breaking a sound into multiple frequencies and storing frequency data. I may be wrong.
A lot of frequencies, and patterns therein would probably occur many times within a song... thus I am assuming that an mp3 must have some repeated data (if the input were some average song).
Anyone care to elaborate?
regards,
GeniX
regards,GeniXwww.cryo-genix.net
Lack, you''ll never find a sequence in random data that will shrink the file. Do you know what the odds are of finding a 1meg sequence repeated even once in a 100meg file, just so you can shrink the orig file by just 1meg? I know you were just giving an example, but it is extremely unrealistic. In reality, you won''t find any sequences that will make up for the 1 bits inserted for the other sequences.
But since you''re still hanging on, I suggest you stop trying to prove it to us and prove it to yourself. You understand the algorithm, so code it up. It shouldn''t take more than a couple of hours. Try actually testing a running example rather than theorizing. That''s the only way you''ll get it.
I think it''s time for the thread to stop until someone has a working demo that others can use (even if in a CGI script to keep the program from being disassembled.)
aig
But since you''re still hanging on, I suggest you stop trying to prove it to us and prove it to yourself. You understand the algorithm, so code it up. It shouldn''t take more than a couple of hours. Try actually testing a running example rather than theorizing. That''s the only way you''ll get it.
I think it''s time for the thread to stop until someone has a working demo that others can use (even if in a CGI script to keep the program from being disassembled.)
aig
aig
right, this is really getting boring.
i''d say leave it and i will do so.
last post in this thread, and i mean it. (at least my last one)
(will i get by any chance on top of page 8? )
i''d say leave it and i will do so.
last post in this thread, and i mean it. (at least my last one)
(will i get by any chance on top of page 8? )
quote:
Q. It''s impossible to compress random data.
A. HOAXcompressor doesn''t compress random data. It compresses *any* data. What the hell would you do with a random data compressor?
The guy has a point here ....
-kertropp
C:\Projects\rg_clue\ph_opt.c(185) : error C3142: 'PushAll' :bad idea
C:\Projects\rg_clue\ph_opt.c(207) : error C324: 'TryCnt': missing point
-kertropp C:Projectsrg_clueph_opt.c(185) : error C3142: 'PushAll' :bad ideaC:Projectsrg_clueph_opt.c(207) : error C324: 'TryCnt': missing point
Lack, AIG showed that it wouldn't be useful if you index only one pattern. Each n bit pattern would have a frequency of 1/2^n and if you substitute one of them with one bit, you could save (1/2^n of the file - 1/(n*2^n) of the file bits). But you need one additional bit every time an other pattern occurs so you'd need filesize(bits)/n - filesize/(n*2^n) additional bits. And you have to store the "most frequent" pattern in the header and, if you work with variable pattern sizes, you have to store the length of the pattern in the header. Let's say the file is m bits long.
You save m/2^n - m/(n*2^n) bits, but you need m/n - (m/n*2^n) additional bits + >=n header bits.
m/2^n - m/(n*2^n) < m/n - m/(n*2^n) + n
and m/2^n = m/n - m/(n*2^n) is true for n=1 but m/2^n gets smaller and m/n - m/2^n gets bigger with increasing n so you can't pack any random data with your algorithm.
Visit our homepage: www.rarebyte.de.st
GA
Edited by - ga on 4/15/00 1:53:45 PM
You save m/2^n - m/(n*2^n) bits, but you need m/n - (m/n*2^n) additional bits + >=n header bits.
m/2^n - m/(n*2^n) < m/n - m/(n*2^n) + n
and m/2^n = m/n - m/(n*2^n) is true for n=1 but m/2^n gets smaller and m/n - m/2^n gets bigger with increasing n so you can't pack any random data with your algorithm.
Visit our homepage: www.rarebyte.de.st
GA
Edited by - ga on 4/15/00 1:53:45 PM
Visit our homepage: www.rarebyte.de.stGA
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement