Posts for Sour

Experienced Forum User
Joined: 6/29/2016
Posts: 53
Essentially, rewind works by taking a save state every 60 frames and recording the inputs in-between. When you start rewinding, the emulation first needs to load the previous state and re-run the emulation until the current point in time before it can start showing the video in reverse. This means your PC has to emulate at least 60 frames before anything can happen on the screen (that's the best case scenario, in practice it can be up to 119 frames). Even running at maximum speed, 120 frames can potentially take a good half second or more (varies wildly from a PC to another), which is where the delay comes from. The "rewind 10 secs/1 min" options simply load a save state and skip all of this extra processing, which is why they're instantaneous. The ideal/obvious solution would be to always keep the last ~120 frames of video/audio in memory and start preparing the rewinded video/audio for the 120 frames before that while the first 120 frames are displayed. But I'm not entirely convinced the extra code complexity+RAM usage that would be required is worth it.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Spikestuff wrote:
• Input: Movie recording/playback
I should probably mention that currently the timing for the input polling isn't quite as robust as it should be. It occurs at the end of the current cpu instruction after vblank begins. But if that happens to be a really long DMA transfer, it can be delayed for a stupidly long amount of time. Need to fix this, but only realized the issue a couple of days ago and didn't want to change that code and risk breaking something just before a release, so I kept it as-is for now. So probably best not to start experimenting too much with the movies just yet, but this particular problem should be fixed by the next release.
Dimon12321 wrote:
Does SNES actually have games that support analog sticks?
Sorry, might have worded that a bit weirdly - I meant you can control how big the deadzone is for your PC's controller's analog sticks (some people prefer using them over the D-pad.) It doesn't actually have any implication on any of the emulated SNES controllers.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Ah, somehow didn't get a notification e-mail for your post. Thanks! There's still a lot to get done, but it's coming together quite a bit faster than Mesen did. Being able to reuse a lot of code has definitely been helpful :p Working on SA-1 support at the moment, should hopefully be in a decent state (including its own debugger window, etc) in a few more days. Also added NES-like overclocking options that work by adding scanline before/after NMI - surprisingly seems to work pretty well so far. Once SA-1 is done, I'll probably tackle Super FX next. With DSP/SA-1/Super FX supported, the number of unsupported games will be down to less than 10, which is a pretty acceptable number!
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Ah, just noticed this thread now. Thanks for taking the time to make one!
I'm interested to know what this will be able to do that higan/bsnes can't.
Arguably there's already a few features in the debugger tools that aren't available in bsnes-plus, though in terms of gameplay there isn't all that much it offers right now (I can only write so much code in 2 months!). Though I think bsnes/snes9x don't have a rewind feature? Could be wrong on that.
Like being fast!
To be fair, it's not very fast right now (slower than Mesen in most cases), but it actually manages to process mid-scanline writes like the accuracy profile of higan while running a lot faster right now. Though, considering a number of things are still not displaying quite right, it might not be a fair comparison in the long run. Still, I'm "hoping" to be able to improve both performance and accuracy, compared to what it is right now, eventually. Mostly trying to focus on fixing all the emulation bugs I can find first before trying to optimize anything.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Unfortunately have not really had the chance to work on TAS tools yet. Most of my time this year has been spent on improving the debugging tools, for the most part and adding some smaller features here and there (along with making a libretro core). The code refactoring I did for VS Dualsystem support in 0.9.6 (which also allowed me to create the "History viewer" tool) should come in handy once I do start working on a TAS editor, though (since it gives me the ability to run multiple independent copies of the emulation core at once, etc.) I don't think there is too much missing from the emulation core to support TASing, it's mostly a matter of building the proper UI for it. Unfortunately, I still have no clue how people typically create TAS runs (never did it myself), so that's a bit of a roadblock at the moment - I need to actually sit down and learn how people use the tools in FCEUX/Bizhawk before I can create something that meets most people's expectations. A large chunk of what I wanted to implement for the debugging tools is done at this point, and a TAS editor is essentially one of the last few major things that is still missing, so I think I might actually start working on a TAS Editor when I get back (like Spikestuff mentioned, I will be away until next December)
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Sonia wrote:
Since I don't have real hardware to test, I'm not sure which one is correct.
This was an issue with Mesen (just fixed it w/ the latest commit) - kudos for noticing that! I'm not sure I would have noticed the difference even if I had played the game in both emulators :p
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Spikestuff wrote:
Game dumps incorrectly compared to an actual console. Just as a point out. And I'm stating this to FCEUX, BizHawk (NESHawk & QuickNES) as well as Mesen. The bottom is being cut off.
Your FCEUX screenshot is only 224px tall - the NES' vertical resolution is 240px. Afaik, both FCEUX and Mesen default to cutting 8px on the top and bottom edges, which is most likely why you're seeing the same behavior in both emulators. Both emulators should look like the console capture you posted if you remove the overscan cropping.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
FYI, there's a list of games that have bugs where they rely on the state of uninitialized RAM here: http://wiki.nesdev.com/w/index.php/Game_bugs#Reliance_on_RAM_values I just added Silva Saga to it.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
In case anybody is interested, I finally released a new version of Mesen. It adds support for 20 more NES/Famicom peripherals (controllers, keyboards, mice, barcode readers, external storage, etc.). This version also brings a rewrite of the movie file format to use a format very similar to Bizhawk's own format instead (zip container with input logs and settings stored as text data). It can be used to record every input device that Mesen now supports (although I have not had the time to thoroughly test every single one of them), including weird stuff like the barcode readers or even the Family BASIC Data recorder (it stores the loaded file as base64 data in the movie's input log). It also works properly with VS System or FDS games, etc. Beyond that, it's mostly a lot of random small features (60.0 FPS mode, exclusive fullscreen, etc.), a good amount of debugger improvements and a lot of bug fixes. Still ways off being able to use Mesen for TASing, but with the movie format rewrite done, a lot of the work needed to start building TAS-related features is done.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Thanks for the link - I'll take a look at FCEUX's docs before getting started on a tas editor. I'm relatively close to being finished with the input rework, actually. Just a few minor things left to fix and implement in the core, and then add the UI for the key bindings for the new input devices, basically. As for sub-frame input, at the moment I've moved to frame input (as opposed to before where the input was polled whenever the NES game asked for it). But it would be relatively simple to make the process repeat itself multiple times per frame, at a configurable scanline interval (e.g every 65 scanlines for 4 inputs per frame, etc). This wouldn't be all that useful for regular gameplay (since each frame is emulated in a couple of milliseconds and then the cpu sleeps until the next frame), but would allow for more flexibility TAS-wise. Also, I'm trying to figure out the best moment to poll the input normally. Bizhawk/FCEUX both do it on scanline 240, iirc? Was this chosen because games typically check the input in their NMI handler?
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Spikestuff wrote:
Mesen just needs a TASeditor as it does have movie creation, debugging and dumping. Haven't played around with Mesen's recording myself in a while so I don't know to what extent it works or doesn't. The other issue Mesen currently the input file (.mmo) can't be edited in a simple text editor similar to FCEUX's .fm2 format or be basically a zip file like BizHawk/lsnes in order to access the input.
I'm actually in the middle of adding support for over a dozen of the controllers missing from Mesen at the moment (including keyboards, barcode readers, etc.), rewriting the movie file format and allowing every kind of input device to be recorded or used via netplay. The new format is essentially the same as bk2, it's a zip file containing text files for configuration & input log. After that, I still need to make some sort of TAS editor, but that may still take a while (I personally haven't ever used one and don't really know what features people want/need from one)
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Alyosha wrote:
Although Kattekita Mario Bros wasn't crashing for me previously, where was it crashing for you? Maybe I have a different version? Also thanks for the valuable R&D on this Sour!
After the 1/2 player select screen, a scene plays out before the game asks you to switch to side B. There are 3 different scenes, the game selects one of these randomly. As far as I can tell, Bizhawk used to play 1 of the 3 scenes correctly (which seems to be the scene that occurs the most frequently from the times I've tested this), the other 2 cause a black screen. (at least, this is the case on Bizhawk 1.13) In other emulators, the other 2 scenes worked, but the "switch to side B" screen was corrupted. No problem! Someone reported an issue with Putt Putt Golf to me, which I ended up tracking down to being caused by the hack I was using for Kaettekita, so I figured I'd stop trying to fix with guesses and test the hardware's actual behavior instead :p
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Completely unrelated, but just in case you didn't see this thread already: http://forums.nesdev.com/viewtopic.php?f=3&t=16507 It's a FDS test rom that validates the IRQ's behavior - Bizhawk currently fails a number of the tests (which causes Kaettekita Mario Bros to crash). Should be pretty simple to fix though.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Just a few random ideas: The "switch (timer_control & 3)" part might be faster as a lookup table? e.g:
static int[] mask = new int[4] { 0x200, 0x8, 0x20, 0x80 };
state = divider_reg & mask[timer_control & 3];
Might just end up being pretty similar, though.
if (timer < timer_old)
Couldn't this just be if(timer == 0) or if(timer == Int32.MinValue) (or whatever the data type used here) - should technically be faster than comparing to another variable. I imagine that's probably not really causing much of a performance issue, either, though. If you don't actually need the values of old_state & old_state_c elsewhere, maybe try something like this?
if(state == 0 || state_c == 0) {
   if(oldStatesNotZero) {
      ...
   }
   oldStatesNotZero = false;
} else if(state > 0 && state_c > 0)  {
   oldStatesNotZero = true;
}
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Alyosha wrote:
@Sour: I don't know what happened, but I downloaded fresh from github and rebuilt (since my current local build wasn't running anymore for some reason after i compiled it) and now I'm getting the proper fps.
No problem, let me know if you ever get the same issue again. About macros, while they do tend to improve performance, this is usually because they are the equivalent of inlining the whole code (instead of making potentially costly function calls). They are a pain to debug though (and sometimes understand), so I tend to avoid them whenever possible. Normally in C++, you can use __forceInline (or similar, depends on compiler) to force the compiler to inline functions. You can also do something similar in C# (as of .NET 4.5) by adding this attribute to a function: [MethodImpl(MethodImplOptions.AggressiveInlining)]. This is not exactly the same, but it will allow the compiler to inline the function in most cases, no matter the size of the function. Without this attribute, only very small functions will be inlined. Whether inlining will make it faster or not, though, really depends on the scenario - only testing will tell. That being said, it looked like most of the PPU emulation is in a single function already, so there is probably relatively little to gain from function inlining there. I don't think structs are slow to access in general, they aren't really any different from any other variable (e.g ints or structs are both value types, and memory-wise are stored the same way). But because they are value types, it means a copy of the struct is made every time it is passed from a function to another as a parameter (the function uses a copy of the struct, not the original). This is opposite of classes, which are always reference types, which means a reference is given to the function, not a copy of the class itself. (This is all C#/.NET-only stuff, by the way, structs & classes in C++ are virtually identical in behavior) So while passing a struct as a function parameter in C# could potentially be slow (this can be avoided by using the "ref" keyword on the parameter though), accessing it in itself shouldn't be slower than accessing other variables. e.g:
struct MyStruct
{
   int a;
}
class Test
{
   int b;
   MyStruct data;
}
In this case, I would expect that accessing "this.b" would be pretty much as fast as "this.data.a", unless the CLR is doing something pretty funky with the structs that I am not aware of (which is entirely possible - I'm very familiar with C# itself, but have not ever really needed to optimize C# code on such a low level) Ended up writing a wall of text, but maybe something in all of this will be of use. Sorry if I just ended up saying a bunch of things that you already know!
Experienced Forum User
Joined: 6/29/2016
Posts: 53
That's just one of the limits of managed code, sadly. Despite what everybody likes to say, for these kinds of things, managed code or scripted languages like JS are definitely slower than compiled code. Just compare the Visual 2C02/2A03 and my C++ port - they're 10-20x slower and all I did was pretty much copy/paste the javascript code & adjust it to make it compile in C++. (Obviously JS is a good amount slower than C# in general) I took a quick look at Neshawk and it looks like these 3 lines on their own are ~8% of the cpu usage:
sl_sprites[0, xt * 8 + xp] = 0;
sl_sprites[1, xt * 8 + xp] = 0;
sl_sprites[2, xt * 8 + xp] = 0;
That seems odd, but the managed array accesses might be slow in this case (e.g due to bound checks, etc.) - maybe try using unsafe code to access them? Unsafe code is already used elsewhere, so no real reason not to if it's causing a bottleneck. Also, since your condition seems to me sl_sprites[1, ...] != 0, I don't actually think you have any reason to reset the other 2 indexes? That would probably help too. PpuOpenBusDecay being called in runppu is also eating up a lot of processing (6%) - is there a reason you can't calculate the decay only when needed? (e.g when reading a register) There are most likely some more ways to speed things up, but it's unlikely you'd break past 200fps or so, I think. e.g if you compare to MyNes (C#) and Nintaco (Java), I get 170fps/210fps in those. And about the weird FPS in Mesen, could I ask you to check the Console::GetFrameDelay() function and see what values you're getting for frameDelay & emulationSpeed in it for a few different emulation speed values in the UI?
Experienced Forum User
Joined: 6/29/2016
Posts: 53
That's weird - is it giving 160fps at all emulation speeds from 300%+ until it hits "Maximum speed"? If so, it might be something to do with the emulation thread sleeping too long on your computer (e.g the thread asks to sleep for ~5ms, but Windows doesn't wake it until 7+ms have passed). Once you hit "Maximum speed", the thread doesn't sleep at all, so that would fix that. Have you been using the built-in performance profiler in VS? It's really easy to use and pretty great at finding bottlenecks, too.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Alyosha wrote:
To give some context, right now NESHawk can run Battletoads at about 130 fps on my laptop. Mesen meanwhile can do around 160 fps.
I'm curious why your results are so different from my own tests. e.g on my computer: MM2 Stage Select - Mesen: 320fps, Bizhawk 2.2: 110fps Battletoads pause screen - 315fps vs 110fps Super Dodge Ball title screen - 440fps vs 113fps Not sure if something is causing Bizhawk to cap around 110fps on my computer (first gen i5 @ 3.4ghz), but it is running at 100% cpu (and the QuickNes core does go up to 2000+fps). On the other hand, maybe something is causing Mesen to be slow on your computer? What's your laptop's CPU? Also, just a note, if you're using a build of Mesen that you compiled, then it'll probably be ~10-20% slower than actual releases, since those use PGO to boost performance some more.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
I did a few tests using clumsy: https://github.com/jagt/clumsy With my ping set to 20ms, I get around 30-40ms of delay. A ping of 100ms, gives about 160ms of delay. A ping of 500ms, gives ~650ms of delay. These are pretty much what I would expect the numbers to be: 500ms of ping + 10 frames of buffer means ~660ms in theory. The 3/10 buffer size values are not exactly a number of frames, but rather (in the case of standard controllers) the number of times the controller data is refreshed. In the majority of games this happens once per frame, though. What game(s) are you testing with? What OS are you testing with? If it's Linux, the network socket code for Linux hasn't been tested much, so it might have more issues than the Windows version.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
doomday45 wrote:
-: frame latency very high for client even if both players have good internet(not playable) Summary: ~0,1s delay with 40ms ping until 1s(!!!) with 100+ms ping?
This sounds like a bug in the netplay logic more than anything else. I've played several hours' worth of time with my brother via netplay in the past, and it was definitely very playable for both of us. I've also gotten feedback from other users via email that they used the netplay feature to play with friends, too. So it may not work perfectly in all scenarios, but definitely some. Like I said though, it could most likely be improved - it just hasn't been my focus. The problem is that developing network code that is highly sensible to latency on my own (e.g on a single computer) is relatively complex to test/debug.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
feos wrote:
Mesen has been tested and it has an increasing lag.
If it does, it's probably only because of this: https://github.com/SourMesen/Mesen/blob/master/Core/GameClientConnection.cpp#L152 The netplay gradually increases the buffer's size whenever it becomes empty (typically because of network lag), but never tries decreasing it again (it probably should). When/if it reaches 10, that's typically at least 10 frames of lag, which is pretty noticeable. I haven't really looked at the netplay code much in the last couple of years, so it could definitely be improved.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
The CPU calls MemoryRead()/MemoryWrite() for every CPU cycle - the first thing done in either of them is to call IncCycleCount(), which runs the PPU/APU. So this is done before actually reading/writing to memory. If what you're checking is vblank-based, it might be worth mentioning that Mesen sets vblank on cycle 0 (scanline 241) and clears it at cycle 1 (scanline -1) - whereas the timing sheet on the wiki says it should be set on cycle 1, iirc. I think Nintendulator also does this, but unsure. I feel like I took a look at this in the simulators before, but can't recall the result.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Actually, Mesen does 3x PPU followed by 1x CPU (with the APU running before the CPU cycle) too. I think most emulators I've checked also do it this way. A problem with hardware vs emulators that can't be solved are the CPU/PPU alignments - even with a TAS movie that syncs up on a NES, there is no guarantee that it would sync up on every single possible alignment. I had already tried splitting up the ticks into master clock ticks in Mesen a few months back, but as far as I could tell, it didn't really have much of an impact (I didn't exactly go into depth with it - it didn't help me solve the problems I was trying to solve at the time, and broke a few timing roms, so I just abandoned the idea). Most effects happen at the same time as the rising/falling edges of their own frequency (e.g flags in the PPU tend to be set on the rising edge, but some do so on the falling edge of the PPU's clock). So splitting the CPU/PPU clock processing into rising/falling edges could potentially lead to more exact timings, but at the end of the day, the random CPU/PPU alignments are also always going to cause trouble (and there's nothing to be done there) when you're talking about running TAS movies on actual hardware. As far as Bizhawk vs Mesen goes, it's pretty likely that there might be minor timing differences in the PPU implementation, or stuff like DMC or DMA stalling, etc.
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Shouldn't be too hard - and this also made me realize that the flag is not supposed to be cleared on reset, according to the wiki. (it currently does get cleared on reset though) For now, you can edit PPU::Reset() to change it on your end if you need to: _statusFlags = {}; _statusFlags.VerticalBlank = true; // <<-- add this line below the previous one
Experienced Forum User
Joined: 6/29/2016
Posts: 53
Enable the "Hide the pause screen" option in the Preferences - it'll remove the pause+overlay and keep the fps/lag/frame counters visible on the screen