Hi,
I call myself Multimedia Mike because I am obsessed with multimedia technology. I also love watching these console time attack videos. One big drawback, I think, is the compression artifacts present with lossy coding algorithms such as MPEG-4 (a.k.a. DivX) and H.264. This is because such codecs are not designed to encode synthetic video very well. To that end, I am experimenting with and prototyping a new video codec that should be better suited for encoding console movies such as time attacks. It is designed to encode synthetic video data (vs. realistic, filmed video data) losslessly.
Anyway, I just thought maybe people on this forum would like to know about my little project. I call it the Palettized Animation Video Codec, or PAVC. If you are interested in what I have done so far, I have been writing about my experiments here:
http://multimedia.cx/eggs/index.php?cat=20
(Heh, ignore the fact that my latest experiment, from August 1, crashes my computer; I am pretty sure that is a hardware problem.)
-Multimedia Mike
Joined: 4/21/2004
Posts: 3518
Location: Stockholm, Sweden
Hi M-Mike. If you manage to encode a certain avi (movie) file with better picture (the currenct codec, H264 has outstanding picture quality) and sound than the current codec, I think everybody would be much pleased, though Im a bit concerned how windows media player and other media player would react to your "home-made" codec and how a avi playing dvd player would play your movie.
At this point, I am only hypothesizing on a better video codec. However, there may be an opportunity for better audio coding as well, perhaps a lossless codec that takes advantage of certain characteristics of synthesized square & triangle waves, such as those emanating from the NES.
Since the goal is 100% lossless compression, I am confident that image quality will be acceptable. The big concern is how it stacks up size-wise.
For playing under Windows, someone (likely me) will need to make some downloadable codec modules available. Same for QuickTime under Mac OS. Fortunately, this whole effort will be open source and anyone will be free to jump in.
Not sure about the AVI-capable DVD player. However, per my understanding, it is tricky to get videos to work in those things in the first place.
Thanks for the feedback.
-Multimedia Mike
Consider, also, that an AVI file encoded with this hypothetical new video codec will retain 100% of the original video information. If an end-user were so inclined, he could download the file and transcode it to a high-bitrate DVD-compliant MPEG-2 video stream for burning onto a DVD. The DVD disc would play in a much larger range of standalone players.
-Multimedia Mike
I once experimented with lossless video compression optimized for at-most-256-color tile-based scrolling videos. Here are some results (no sound, just picture information):
First 1109 frames of the SMB3 video: 263320 bytes.
First 5492 frames of the Snake Rattle&Roll video: 2248127 bytes.
First 8646 frames of the Vicedoom video: 3079616 bytes.
While the results were promising, they were only a bit smaller than DivX-encoded versions of the same frames with acceptable quality. I decided that it (at least my idea) was not worth the efforts: While you would get lossless video, it would require people to download and install an obscure codec, while at a comparable file size and acceptable quality it would be enough to have the far more popular DivX codec installed.
Moreover, with the introduction of H264, which has even a better quality/size ratio than DivX, the idea became even more obsolete.
x264 has a lossless mode too. (Todays I always record the movies with it, and then later re-encode to a lossy format.)
I'm not sure if it's in the h264 standard though.
This sounds very interesting indeed. But I must say I rather keep to the Xvid codec which most people have installed already, and it's not that big files anyways.
But I really like that people tries to improve quality and compress the size better. Without this we won't see an evolution at all. So I very much encourage this.
The way you seem to be coding it this will have applications outside of video game movies (most any animation will benefit really, as animations tend to png compress well, which *seems* to be the idea behind your codec). But you don't seem to have implimented a low motion compression based scheme, which will be a great plus to the codec if you can get it hammered out.
I'm actually very interested to see where this goes.
Build a man a fire, warm him for a day,
Set a man on fire, warm him for the rest of his life.
That makes 2 of us. :-) The intraframe compression I am experimenting with is quite different from PNG. But as you note, the interframe compression is going to be the most important part of this codec. I am prototyping various ways to exploit the inherent interframe redundancy in these videos (so much of the frame remains the same or just shifts by a few pixels).
The patterns are certainly present. We just need to find efficient methods for a computer to notice them.
Is there a website where I can learn more about this optimized for at-most-256-color tile-based scrolling videos codec? Obviously, I am very interested in studying this domain.
Lossy codecs (like MPEG-4 & H.264) may be "good enough" for most users. But I know of a few people (myself included) who would like to see an efficient lossless codec for this type of data. I write it all off as an academic exercise. :-)
Explaining my idea exactly? Probably not (unless someone has come up with the same idea and made such page, which isn't completely impossible, actually). I used well-known motion video compression techniques, though. Here's a short description:
My main idea was to use motion vectors to describe the motion of square-shaped areas of the picture. This is exactly the same thing as the MPEG-1 motion vectors idea (as far as I know). It works like this:
Divide the image into square-shaped blocks. The optimal size for the blocks depends on a few factors (the smaller the blocks the more optimal motion vectors you will find, but the more vectors there will be, taking up space, of course; and the other way around). By trying different sizes I found out that 16x16 pixel blocks are the most optimal for NES videos (I also tried 8x8 and 4x4 but they gave bigger results).
Now, in frame N, for each block search from previous frames (ie. frame N-1, N-2, etc) a 16x16 area of the image which most resembles the contents of the block in this frame. The block being searched in previous frames can be located anywhere in the frame (not just at block boundaries). In other words, the search should be started from every pixel of the previous frame. Naturally the more frames you search the slower the encoding will become. I designed my video format so that it would support searching from at most the N-16 frame (ie. I reserved 4 bits for this information). In practice I only searched at most from the N-8 frame because of speed/compression ratio.
The comparison function as another thing to optimize. Often you will find an exact match (ie. a block which is identical to the current block) but not always. In this case you should find a block which is as similar as possible. Since I supported only paletted 256-color images in my video I ended up with a quite simple function being more or less optimal: A function which counts how many of the pixels are different in the two squares being compared.
To speed up the searching you can limit how far from the location of the current block (in image coordinates) the search is done. Since games usually don't scroll too much at a time (eg. scrolling a half screen at a time is extremely rare) you can limit the search to a certain amount of pixels away from the current block. I used 32 pixels as this limit (in other words it searches inside an area of 65x65 pixels around the current block). My format supports motion vectors between (-128,-128) and (127,127) (ie. I reserved two bytes per motion vector).
There are tricks to speed up the searching: I started the search from the same location as the current block and then in a spiraling order around it since it's most probable that an identical block will be found in earlier frames very close to the location of the current block (naturally if you find an identical block there's no need to continue searching). Also, as you search, you naturally keep count of the best match found so far. A radical speedup is achieved with the simplest trick: When comparing blocks if the current sum of differing pixels becomes larger than the current best, stop comparing. (It's actually a very simple idea but it didn't occur to me at first...)
Once you find the motion vectors for all the blocks in the current frame, write them to the result. Now, since the frame which can be reconstructed from these motion vectors will usually not be identical to the original frame, of course, we will now have to write a differential frame. That is, reconstruct the current frame using the motion vectors (ie. for each motion vector copy the block it refers to and reconstruct a frame in this way) and then calculate the difference of the actual current frame and the reconstructed one. Since the differences will be minimal, most of the diff frame will be full of zeros (which compress very well). Write this diff info to the result.
Now, compression has to be used for the result, of course. I used zlib for this. Unlike many other C libraries, zlib is surprisingly easy to use, especially for writing compressed files: You simply use "gz" versions of the regular C file handing functions (ie. gzopen(), gzwrite(), etc) and that's it, you get a compressed file.
The main idea in this is, naturally, that since the diff frames are usually just mostly full of zeros they compress extremely well, and this is indeed the case.
I used some other minor tweaks to increase compression as well:
- I used different types of frames. For example, if the diff info is completely full of zeros it's rather useless to store all those zeros, thus I just store a special frame type which has only motion vectors but not diff info. And likewise, if all the motion vectors are zeros it's useless to store them. If both are full of zeros, then it's not necessary to store the frame at all. I also made "one-color frames" which are used if the original frame is just filled with one color (in which case it's enough to simply store that color).
- When searching for motion vectors I tested first the same motion vector as in the previous block. In scrolling videos this often gives an identical block right away, speeding up the encoding, plus it makes more of the motion vectors identical, enhancing the compression ratio.
I actually made simple player program for unix (uses GTK) to play the videos in order to check that they are bugless. If anyone is interested I could make a packet and include the three example videos.
But it's not genuinely lossless, is it? I believe that the saturation data is lossy, or something like that. It sure does look very good, though.
Well, yes, it's lossless in the limits of the colourspace.
I think the colourspace used by x264 is YCbCr with 4:2:2 chroma subsampling. Perhaps even 4:4:4.
It would be truly lossless only if there was no chroma subsampling. I think that such option is not supported by the H264 format.
I'm also not sure whether the RGB->YCbCr conversion is lossless. If it isn't, some colours might change an unnoticeable amount.
For my raw AVIs, I use the CamStudio Lossless codec. Best thing is, it's GPL, and while the binaries posted are for Windows, the source is available, and thus it's possible (but I don't know how practical) to port it to other OS's. It's a bit slow to encode the more you compress, but on gzip level 9 the files are quite small. I used Gigafrost's Zelda 2 run as a testbed most recently and the raw file was only 107,814,400 bytes (including mono PCM audio at 44KHz).
Yes, I think Warp here more or less wrote everything I was wishing for a lossless codec for NES/SNES to have.
There are some additional enhancements you could add, like a table for pixel data, which could further enhance compression, as scanlines of pixels would often be same. (But maybe zlib would manage this?)
The previously-posted synopsis of tile-based compression has a few subtle but important changes that could be made to be, ideally, to likely improve compression rate if M-Mike decides to re-implement such a codec. You might have done these, and simply not mentioned them explicitly in your synopsis before, Warp. If so, I appologize.
First, and this is the simple trick, merge all the tiles that move the same amount into a single 'macro tile' when there are more than a certain amount of them. Considering that a bit-mask of the entire screen on NES (256x240) would only be 30 bytes (16x15 blocks, 240 bits, 8 bits to the byte) and the vast majority of the screen IS going to move in lock-step, this is very likely to reduce the space needed for matching block-moves dramatically.
Second, also simple, don't try to keep things aligned on the byte-order. For example, if you only need (-16,15) range, don't use 8 bits. Use 5 bits.
Third, and this would be simple but processor-intensive (what compression isn't?), try all possible alignments of the grid on the new screen. Don't lock them to the boundaries of the screen, so to speak. This would allow better 'locking' with two-axis-scrolling games, and moderately improve the lockdown on single-axis-scrolling games.
Fourth, something of an amgamation of the above... a possible (and completely different approach) would be to only make bit-masks covering portions of the screen with motion-vectors attached, then simply encode the 'unvectored pixels' directly. The bitmasks can be trivially encoded with run-length compression run through some type of Huffman compression as a trivial solution, which could also work quite well with the limited-selection 'unvectored pixel' values on highly-limited-palette systems like the NES games since such a technique would cause a LOT of chaotically-random sequential pixels that might not compress well because their localized relevance was dramatically reduced, so Huffman's advantage (localized differences aren't any harder to compress than distant differences) might give it an edge over more modern LZ-based compression. For truly ridiculous compression ratios though, a BWT pre-filter on the data would likely 'cure' almost all of these problems, since a full-bitstream BWT would be easilly possible on the unvectored bits.
And yes, I do way too much research into compression in my spare time. :-)
I'm curious to see how this idea would perform :
Dump every data sent to the PPU and compress it. Then, make a video codec that emulate a PPU and process the data back
First, and this is the simple trick, merge all the tiles that move the same amount into a single 'macro tile' when there are more than a certain amount of them. Considering that a bit-mask of the entire screen on NES (256x240) would only be 30 bytes (16x15 blocks, 240 bits, 8 bits to the byte) and the vast majority of the screen IS going to move in lock-step, this is very likely to reduce the space needed for matching block-moves dramatically.
That might or might not reduce the video size, depending on how exactly zlib does its work. Zlib will, naturally, already compress sets of identical motion vectors in some way. I'm actually not completely convinced that even if the result is smaller that the reduction would be so dramatical (it could be, of course).
You have to also take into account that your 30-byte bitmask will often be a rather random set of values, making compression harder.
My guess is: More compression, probably. Dramatically more compression, unlikely.
Second, also simple, don't try to keep things aligned on the byte-order. For example, if you only need (-16,15) range, don't use 8 bits. Use 5 bits.
Again, the benefit of this depends a lot on how it affects the values to be compressed and how zlib works. If zlib works on a byte-by-byte or a 2-bytes-by-2-bytes basis, storing values between byte boundaries might actually do more harm than good (depending on how "random" the resulting bytes will be, of course). Of course only practical testing will show.
When I tested my compression idea, the single element which BY FAR most affected compression was, naturally, that the diff frames are mostly full of zeros. Other little optimizations (such as creating special frames which do not store as much information or choosing motion vectors so that there would be more repetition) only improved the compression slightly.
I don't think that eg. reducing somehow the amount of motion vectors will have a big impact on the compression ratio (although even a 1% better compression is always welcome, of course).
Third, and this would be simple but processor-intensive (what compression isn't?), try all possible alignments of the grid on the new screen. Don't lock them to the boundaries of the screen, so to speak. This would allow better 'locking' with two-axis-scrolling games, and moderately improve the lockdown on single-axis-scrolling games.
This might have some impact, and another thing which might have some impact is to extend the macroblock search (partially) outside the image boundaries (which I did not do). It might increase the amount of zeros in the frame diff info which can markedly increase compression.
The bitmasks can be trivially encoded with run-length compression run through some type of Huffman compression as a trivial solution, which could also work quite well with the limited-selection 'unvectored pixel' values on highly-limited-palette systems like the NES games since such a technique would cause a LOT of chaotically-random sequential pixels that might not compress well because their localized relevance was dramatically reduced, so Huffman's advantage (localized differences aren't any harder to compress than distant differences) might give it an edge over more modern LZ-based compression.
Huffman encoding is actually part (in fact, I think it's the major compression algorithm) of zlib. By default zlib makes first a dictionary search and then it compresses the result using huffman-encoding. So it's probably already doing what you describe above (about compressing the pixels with huffman).
Actually, if lossless quality itself isn't really needed (one could do it himself with an appropriate emulator and any video grabbing utility), can any other encoding algorithm be implemented? Something assymetrical, maybe IFS will do the trick?
The encoding will be slow of course, but the main goal is fast decoding and great quality on simple images.
Warp wrote:
Edit: I think I understand now: It's my avatar, isn't it? It makes me look angry.
That might or might not reduce the video size, depending on how exactly zlib does its work.
Then what you were describing wasn't actually a compression technique, like I read it as, but a pre-process technique that was relying on zLib to do the actual heavy-weight compression. If you're relying on something like zLib, then yes, keep EVERYTHING byte-aligned. :-)
Though to be honest, a modern BWT-prewashed Adaptive Arithmatic coder would perform significantly better than zLib does. But things like BWT didn't physically exist when zLib came out in the first place. It's damn good tech, but there's better stuff available now. :-)
I'm curious to see how this idea would perform :
Dump every data sent to the PPU and compress it. Then, make a video codec that emulate a PPU and process the data back
Interesting idea, but as another poster pointed out, getting timing right can be tricky. Plus, if you carry this approach too far, you may as well be working with an emulator, the ROM image, and the movie-action file. Also, it would be limited to NES data, while we have a good opportunity to design this codec to cope with data from the 8-bit Atari all the to the Neo-Geo.
-Multimedia Mike