VBlank count is definitely not accurate. Apologies for uploading this 3 times, the website was giving me an error since the file was missing VBlankCount. I assumed it didn't go through.
This is far from optimal.
"What's with the 18 seconds of nothing on the file name screen?"
So, the way this TAS works is by sending in non-matching inputs over and over again with the SubNesHawk core (bizhawk version 2.8) to create precise amounts of lag. And it needs to be incredibly precise. At the end of any frame, there are the following instructions.
STA $2000
STA $FF
RTI
JMP $E45B
"STA $2000" Enables the Non Maskable Interrupt, so another frame can begin after that point. "STA $FF" stores a copy of the data in $2000 at $FF. "RTI" pulls 3 bytes off the stack, and returns to an infinite loop at address $E45B. "JMP $E45B" is the infinite loop. It just keeps jumping to itself. I need to create lag so precise, that it starts a new frame after "STA $2000" but before "RTI". I think this is around a 5 CPU cycle window.
Keep in mind, whenever I stall for additional inputs, it's not adding by a single CPU cycle, it adds 218 cycles. To time it just right, I need to stall for more than just one frame. If I time it just right, a new frame begins in that 5 cycle window, adding 3 more bytes to the stack. This sets it up so if I hit the "RTI" instruction on a later frame, it will immediately follow it up with a second RTI instruction. I can repeat this process as long as I want. What does this achieve?
Well, if I do it enough times I can make the stack overflow. The "Register Name" selection on the file name screen has an interesting property. Your inputs are pushed to the stack, and aren't overwritten before the end of the frame. On the frame you initially select this option, a value of $34 is pushed to the stack one byte higher than where your inputs will get stored. By overflowing the stack, writing these desired bytes, and letting the giant tower of RTI instructions all execute, you could return to address $xy34, where "xy" is whatever buttons you hold down on controller 1. The names of the save files are stored at $0638, so I could use this to begin executing from $0634 by holding down "06" which is a combination of Down and Left.
Slight issue, $0634 is set to a value of 00 if the middle file doesn't exist yet. So I need to set up that save file, return to the menu, and re-enter the file name screen. With a value of 00 it executes a BRK instruction, which jumps to the programmable interrupt. Zelda doesn't use such an interrupt, so it jumps to some code that leads to another BRK. an infinite loop. If $0634 has a value of "01", however, it's an ORA instruction, and I don't have to worry about it.
"What are you writing with the file names?"
The code I can write is extremely limited. the bytes I can write are 00 - 24, 28 - 2C, 63, and 64. Not the most useful bytes, but let's see what we can do with them. We have the ASL instructions, which let's us multiply a byte by 2. ("Arithmetic Shift Left" also tosses bit 7 into the carry flag. Also any number greater than 255 will get truncated to a single byte.) This can be used to change some of the bytes in the payload to more useful ones.
I write:
ASL $06
ASL $06
ASL $06 (Address $06 starts at a value of 0x12. shifting it thrice gives it a value of 0x90, which is "BCC" or Branch on Carry Clear.)
ASL $07 (Address $07 starts at a value of 0xFF. shifting it gives it a value of 0xFE. combined with the branch instruction, if executed this causes an infinite loop, since 0xFE is -2. it branches back 2 bytes to the start of the branch instruction.)
ASL $0622,X (X = 0x25, this shifts address $0647)
ASL $0622,X (this also shifts address $0647)
ASL $60,X (The number 60 isn't a byte that I can write. It starts as 0x18, but the two previous instructions shift it to a value of 0x60. the offset of X makes it shift address $85, which is the vertical position of the cursor for selecting characters to name the file.)
CLC (Clear the Carry flag just in case)
JSR $0006 (Jump to that infinite loop branch I made)
In short, this creates an infinite loop, moves the cursor for selecting letters to name a file with, and then jumps to that loop.
With the cursor offset, I can now grab characters from outside the bounds of the table.
"Hold up. Why are you jumping to an infinite loop?"
Remember how the game jumps to an infinite loop after regularly executing "RTI"? I'm unable to jump back there with the bytes I have to work with, so I need to create my own loop. This is the game's way of "Spinning" while we wait for the next frame.
"What about the changes you make to the save file with the new characters?"
This time, I have more bytes to work with. However, I am unable to change the name of file 2, since that file already exists. I'll need to work around it. I write:
LDA #$13 (A = 0x13)
STA $12 (Store A in $12. this wins the game)
INX (X is now 0x26)
NOP
NOP
NOP
ASL $0622,X (X = 0x26, this shifts address $0648)
ASL $0622,X (this also shifts address $0648)
ASL $60,X (This is leftover from the previous payload. it has no use in the second payload)
JMP $0648 (I can't write "JMP" so I needed to shift something else into this value. it started as 0x13. With "JMP" set up, it creates an infinite loop.)
In short, this wins the game and then loops infinitely.
"Okay... why did you run payload 1 on the name register screen, but payload 2 in game?"
Unfortunately, the code for the credits isn't loaded on the name register screen, so I had to start the game. Also, due to underflowing the stack, the game will crash after registering the new names. It does however save them, so I can simply reset the game and start from there. After loading in, I press the start button to pause and begin subframe mashing again. Once the stack is about to overflow, I press UP+A on controller 2 to bring up a menu that has the same properties as the name register menu. That's right, it pushes my inputs to the stack and never overwrites it! Once again I can jump to $0634 and execute the new file names.
This wins the game from the pause screen.
"You mentioned this is far from optimal. What can be improved?"
One glaring issue with this TAS is the several seconds of subframe mashing. First off, I can begin mashing earlier. If I subframe mash while writing the file names, or while the level is loading for payload 2, I can execute the code earlier. Second, I'm still figuring out how to optimize the subframe mash.
Right now, I have a LUA script simply alternate inputs over and over until it lands a new frame inside the 5 cycle window. This is inconsistent. Some times it takes 4 frames, other times around 20. Sometimes I have chains of 4 frames multiple times in a row, other times 20 frames multiple times in a row. It's likely due to executing the "STA $FF" (the last instruction before the next frame) on a different cycle than other times it works. Either that, or due to how a frame can take 29780 or 29781 CPU cycles before the next one begins. I've got some research to do. I imagine I might be able to save time by stalling longer than 20 frames if it can end up in another large chain of 4 frame successes. I'd also like to cut out a payload entirely, but I've been unable to win the game without offsetting the cursor. One idea I had was using link's position to write a branch-loop, but I'm unable to jump to his position with the limited 24 bytes I have to work with. I'll keep looking for ideas though.
Anyway, the combination of what was written in the above paragraph and the obvious lack of entertainment during the periods of subframe mashing is why this is currently a user file and not being submitted for publishing yet.