Good idea! I think I'll give it a try once I get DMA timings worked out. Only games that are very geenrous with timing alignments will sync without that.
Speaking of which, after several failed attempts and hours of debugging, dpcmletterbox/dpcmletterbox now works properly. This is a signifcant improvement to DMC DMA timing so I hope i can make more progress on OAM DMA now.
Wow, it's crazy how fast you get these improvements out. Great work.
I've a question though, is an emulator passing all those above tests perfect in the sense that it behaves like real hardware?
Or is it just that he passes all pubished test roms but there might still be differences to real hardware because maybe those roms didn't cover every hardware aspect?
Emulator Coder, Site Developer, Site Owner, Expert player
(3581)
Joined: 11/3/2004
Posts: 4754
Location: Tennessee
In the case of NES test roms, they most definitely test edge cases that are beyond anything games utilize, for instance certain undocumented opcodes. The nes test in general are very well established and the hardware is very well understood. Passing all tests would generally mean high "game compatibility" as well as very accurately emulating the hardware. I air quote game compatibility because in reality the nes also has a insane number of cart types to support and that generally dwarfs the core emulation in terms of developer effort.
Thanks!
In the case of NesHawk specifically, yes there is a very real difference between the emulation architecture and the real hardware.
NesHawk runs at the the timing level of a PPU tick, when 3 of those ticks go by we run one CPU instruction. A real NES though runs at a timing level one step above this, where a master clock ticks off pulses and the PPU and CPU use their own divders to time execution. In particular the 6502 (or the NES variant in this case) actually runs 2 phases which make up one instruction clock, and this level of hardware detail is not emulated at all by NesHawk. The fact it can still pass all these edge case tests though is simply proof that this level of detail is not really needed almost all of the time.
If you are interested in how things really work, you can check out Visual 2A03 (the NES cpu) or Visual 2C02 (the ppu), pretty neat stuff!
____________________________
I finally got around to working on ppu open bus behaviour. It was pretty simple, now cpu_dummy_writes/cpu_dummy_writes_ppumem and ppu_open_bus/ppu_open_bus both pass. Scratch 2 more off the list.
I noticed that the DMA work you've been doing relies on the RDY pin. I've put off working much on the 6502 CPU core until after I could get some automated testing going. However, the IncPc micro-op in the MOS6502X core was missing the dummy read (which would be stalled by RDY.) I've committed that fix today to master.
AHX/SHX/SHY illegal opcodes affected by the high byte of the effective address also behave differently based on the RDY pin. More info here: http://csdb.dk/forums/index.php?roomid=11&topicid=94460
If you read that, keep in mind that in the C64 context, RDY is controlled by the video chip and is made inactive when it needs the bus fully.
Thanks!
Oh good catch! After looking at that I noticed that the dummy read for various instructions also is not affected by the RDY pin as it should be, and I think that is part of what is keeping from passing some more of these test, so fortuitous timing.
Glad to hear! I'm thinking that should take care of all the dummy reads. A quick skim revealed nothing else.
If you're looking over it more, just remember the rule of thumb: if the 6502 isn't writing, it's reading. This applies at all times, regardless if the read is useful. If the code doesn't perform exactly one of these in any given cycle, it's a bug.
I finally got Time Lord working, Hurray! This is the only total incompatibility that I know of that remained so I'm quite happy to get it working. It was actually a simple fix, lucky the NesDev wiki already said what needed to be done, it was just a matter of implementing it.
Unfortunately things have gone awry elsewhere. NesHawk had a timing bug and apparently passed some rather important tests through shear luck and cancelation of errors. Removing this bug has led these tests to no longer pass , despite my repeated efforts to resolve them. So looks like a long road ahead.
cpu_interrupts_v2/rom_singles/4-irq_and_dma
cpu_interrupts_v2/rom_singles/5-branch_delays_irq
^ those tests fail directly as a result of fixing the timing bug, (which is running the CPU one extra cycle before starting OAM DMA.)
They are both related to IRQs caused by the frame timer at $4017. I tried repeatedly to fix it, but when I do then other tests fail (jitter, NMI and IRQ, even IRQ timing.) I can't get them all to pass at once, which makes me think there is something else going wrong.
Also, srp_and_dmc_dma is consistently off by 1 cycle too many for all 16 tests. Nothing I do makes that 1 cycle go away. I think all these issues are somehow related, but have no ideas.
cpu_exec_space/test_cpu_exec_space_apu now passes with a few simple fixes. That makes 10 new tests passed, up to 90%!
I did not include DMA timing fixes, as I still don't know the problem is, so that 90% can be taken with a grain of salt, but still this is progress.
Pretty much every remaining test involves the APU. This is good, since it means the rest of the core is really solid, but that is also the least developed part, so will take some effort to get in order.
After days and days of repeated failure, I finally found the last piece of the puzzle needed get sprdma_and_dmc_dma to pass in a cycle accurate way, with full RDY implementation and no hardcoded hacks.
Visual 2A03 saved the day here, and without it there is absolutely no way I could have figured this out. The key seems to be an overlooked feature of DMC DMA where if it is called from a write to $4015 and the buffer is empty, it will take one less cycle to complete. This 1 cycle is what had me stumped and there is no other conceivable way it could have worked out (I tried every imaginable alternative.)
There is still some work to be done to clean things up, but I doubt there will be any remaining tests as difficult as this one, so onward to 100%!
Joined: 4/17/2010
Posts: 11556
Location: Lake Chargoggagoggmanchauggagoggchaubunagungamaugg
Unbelievable, man!
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
With a little added logic the other DMA test variant, sprdma_and_dmc_dma_512, now passes as well. That's 2 of the most grueling tests passing, so if I can just work out frmae IRQ stuff there will be no more hurdles and only tuning the APU left to worry about.
I'm a little worried about the accruing performance hits though. I'm currently at about a 20% penalty over 1.11.6. On my laptop that equates to being able to run at about 140 fps. Well hopefully it will still be at over 2x speed once I'm done, I think it will need some serious optimization work though.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Once I got DMC figured out it was relatively easy to sort out the frmae timer. I got it working correctly and cleaned up the regression is DMA_and_IRQ. In the process I also got several other tests passing that were part of the same test suite.
So with that, there are only 10 more tests to go! Some of them seem relatively easy, so we should be tied with punes in no time!
The only challenging ones here that i am worried about are scanline (which I have no idea how it's supposed to pass and punes doesn't pass either) and tvpassfail which seems like a great deal of work.
Joined: 4/17/2010
Posts: 11556
Location: Lake Chargoggagoggmanchauggagoggchaubunagungamaugg
It's not that much of work, we can just adopt an existing ntsc filter and it will magically pass. No emulator internally emulates composite artifacts. But some shaders do, as well as some old fashion filters.
I'd leave this task to whenever someone adopts Bisqwit's new algorithm:
http://forums.nesdev.com/viewtopic.php?f=2&t=14338
I'm going to figure out the tweaks it needs for PAL emulation and implement it in fceux some day.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Thanks!
As I expected the remaining tests are going pretty smoothly, only 7 more to go!
@feos: oh wow, thanks for pointing that out, I guess the hard part is already done then!
EDIT: We're #1! Only 5 tests to go!