I'm making this thread similar to the AtariHawk one so I can sort things out easier.
This thread is for general bug fixes in NESHawk, please post any issues you have.
Things to emulate:
DMA on multi-write instructions
Exact power on and reset behaviour
FDS / NES audio filtering / mixing
second look at sprite limit
VS Dual System
microphone
Things to investigate:
FDS Metroid
FDS Yume Koujou Doki Doki Panic
chinese translation for TMNT3
VRC7 sound (ex Lagrange Point)
Chaos World (CH) Save Ram
Tests to Pass:
scanline/scanline
tvpassfail/tv
Games that don't work:
unsupported low priority
Lots of pirate and multi cart stuff
Test build:
https://ci.appveyor.com/project/zeromus/bizhawk-udexo/build/artifacts
RDY wasnt added until after the relevant code in neshawk was made.
most of the bugs in neshawk can be solved by burning it down and replacing it with code based on more up-to-date knowledge. it was accurate at the time but is outdated now.
Well, I know Blues Brothers has different lag on the 2nd/third stage that varies from BizHawk, FCEUX, and a version of FCEUX that fixes a bug in Mahjong.
That fix also desyncs a number of other runs too, linked in the post.
I have been (very) slowly working on NESHawk PPU. So far I have rewrote sprite evaluation to take place simultaneously with background generation, as it is with a real NES. Together with proper read behaviour of 2004 during this time, I finally managed to fix those annoying horizontal lines and shaking in Micro Machines.
This is still very early WIP work and generally breaks other things, but the timing is correct so i should be able to start slowly fixing other timing issues and bugs and such.
Right now it also slows down emulation by a noticable factor as well. I don't think my code takes considerably longer to run then the original, so this is the first major issue to be worked out.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
I greatly improved my code and committed to my fork for testing. So anyone who wants to try it out please do and let me know if anything is amiss. I have tested with several games so far (SMB, SMB3, Battletoads, Micro Machines) and found no deficiancies.
https://github.com/alyosha-tas/BizHawk
This version also passes 2 accuracy tests that previously failed
sprite_overflow_tests/3.Timing
sprite_overflow_tests/4.Obscure
NesHawk now passes:
apu_test/rom_singles/7-dmc_basics
Figuring out what was going on here took some effort but I now feel the implementation is pretty accurate.
Unfortunately it now hangs completely on sprdma_and_dmc_dma. I haven't the slightest idea why. Previously the test resulted in answers off by a factor of about 5, but at least it finished. I'm not sure what to make of this and posted on NesDev looking for some insight. Safe to say it's still a work in progress.
But other games i tested that use DMC still work fine, so prehaps I'm just off by a couple of cycles, sprdma_and_dmc_dma is a very exacting test.
EDIT:
AFter a bit of work filling in some of the undocumented opcodes, I am now able to pass instr_misc/instr_misc
So 4 extra passing tests means we should be tied for 2nd place with MyNes, alright!
This may be unrelated, but do any of these tests correlate with how accurate lag emulation is or could something like this only be really tested by comparing movie files on nesbot and on emu?
accurate lag is directly correlated to accurate timing. any time timing is made more accurate, lag is made more accurate. Whether a particular change makes nesbot sync is irrelevant unless that's the only data concerning whether or not the change is accurate. In the case of tests, whether the tests pass is data; nesbot is irrelevant.
Joined: 3/31/2010
Posts: 1466
Location: Not playing Puyo Tetris
In theory, the more accurate the NESHawk core gets, the more likely NESBot will be able to playback more TAS. However, the TAS MUST be done on the Changed and Improved NESHawk core. Old TAS will still Desync/not work.
When TAS does Quake 1, SDA will declare war.
The Prince doth arrive he doth please.
Joined: 4/17/2010
Posts: 11495
Location: Lake Chargoggagoggmanchauggagoggchaubunagungamaugg
What about attempting to speed up the core too? FCEUX is so fast because it uses a separate function for every memory address for read and write, so it has to do zero checks regarding regions (and I think Nestopia too). Considering that reading is done at least every instruction, and sometimes more, and quite often it also writes, doing this stuff millions times per second with all the region checks must be causing quite some slow down.
I also tried manually inlining the opcode functions right into the switch, but that gave zero speed up.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
For every byte of the CPU address space there's a function for reading and a function for writing?
CPUs have branch predictors, so if there aren't too many tests (and they follow predictable patterns) it should still be fast enough.
Joined: 4/17/2010
Posts: 11495
Location: Lake Chargoggagoggmanchauggagoggchaubunagungamaugg
Yes.
How can you predict what memory address a game will need?
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
- reading blocks of data: predict it's sequential and read ahead
- loops: predicting that the CPU jumps back to the beginning of the loop
- switches: remember the most-travelled path
Well, my current improvements to accuracy slow down the core by a noticable amount. At 400% speed I can run SMB3 on my laptop at about 180 fps on 1.11.6. The current Bizhawk master build can do about 160. My current test build can do about 152.
So... we're kind of going in the wrong direction there 8D
These represent almost entirely ppu changes. So that might also be a good place to look for performance improvements.
EDIT: after spending a long time to find a very small bug, oam_stress/oam_stress now passes.
I'm also making progress on sprdma_and_dmc_dma, I'm hoping maybe by the end of the month those tests will pass.
Good work! 180 to 152 is fine with me. CPUs speed up over time, accuracy improves over time. NESHawk isnt made to be fast.
But it's a bit worrisome if it continues. The architecture may not support this level of accuracy without the speed degrading to obscene levels. I hope we don't need to burn it all down and rebuild just for speed reasons.
By the way can you please install https://visualstudiogallery.msdn.microsoft.com/c8bccfe2-650c-4b42-bc5c-845e21f96328
Joined: 4/17/2010
Posts: 11495
Location: Lake Chargoggagoggmanchauggagoggchaubunagungamaugg
zeromus, what do you think about the fceux approach I mentioned?
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
How obscene?
I had a look at the source after feos' post, and while I didn't find all these functions (macros?), I saw lots of "hack", "probably shouldn't do it but we do it anyway!" etc. comments. It doesn't really make for an impression of good and stable architecture.
Joined: 4/17/2010
Posts: 11495
Location: Lake Chargoggagoggmanchauggagoggchaubunagungamaugg
Language: c
// function pointer type
typedef uint8 (*readfunc)(uint32 A);
typedef void (*writefunc)(uint32 A, uint8 V);
// macro to define the body
#define DECLFR(x) uint8 x(uint32 A)
#define DECLFW(x) void x(uint32 A, uint8 V)
// example region
static DECLFR(ARAM) {
return RAM[A];
}
void SetReadHandler(int32 start, int32 end, readfunc func) {
// do all needed checks here, when emu starts
// go through all cells
for (x = end; x >= start; x--)
// declare a function
ARead[x] = func;
}
// example call for a region
SetReadHandler(0, 0x7FF, ARAM);
// and this is used in opcodes
static __inline uint8 RdMem(unsigned int A)
{
return(_DB=ARead[A](A));
}
It's probably written oddly, the core is old and not many people wanted to refactor it, but its just regular C, and I see no harm in having an array of functions. And if it speeds things up, that's actually a benefit, since this is probably not the case when being slow is critical and required.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Feos, be my guest. I don't like the way it looks, and it's a speedhack, but that part of the emulator core is not likely ever to change again, so its a fine time to speedhack it. Just don't do it for the mappers first, that's too big a job.
No where near that bad. The performance hit so far came from code that runs every pixel (3 times per cpu tick) but that is done now. I think only accurate APU emulation remains that will a signifcant negative impact on speed (my guess is another 2-3%.) After that I do plan to go over everything and make obvious optimizations to gain back some of the performance losses, but I doubt I can reach parity with 1.11.6 again in terms of speed.
I don't really understand what feos is suggesting with the cpu refactor, so I'll leave that up to him, seems like a huge undertaking.
zeromus wrote:
Sure thing I installed it, but I can't tell what, if anything, it is doing, how do I use it?
_____________________
I've been testing various games known to be tricky to emulate. The first game that was previously incompatible with BizHawk now works, Fire Hawk!
Curiously this game runs on FCEUX but went to black screen on previous versions of BizHawk.
If anyone knows other games which are just plain incompatible please let me know (I am aware of Time Lord currently) having games to test on really helps.
EDIT: oh, the recent fixes to OAM reads also fixes cpu_dummy_writes/cpu_dummy_writes_oam, so, scratch another one off the list!
I got a suggestion: Try running the console verified runs on BizHawk and see if they sync. Most of them were done on FCEUX, but if for whatever reason it does not sync maybe something is wrong?