Use the Bizhawk SubNesHawk Core to watch. (Or the youtube link)
This has not been console verified yet, so if anybody wants to give it a go, it would be very appreciated. But you should know this run involves executing uninitialized RAM as code, (using bizhawk's 00 00 00 00 FF FF FF FF pattern) so keep that in mind. I'm hesitant to submit a Subframe Input TAS without console verification.
In this TAS, I complete the Game "Gimmick" by utilizing subframe inputs, not unlike my TAS of Super Mario Bros. 3. More info on the technical details if verified, but I'll explain a general idea of how the "route" works.
At the moment I execute RAM as code, the game has uninitialized RAM from address $F1 through $F4. This is a bit problematic for verification, but it's also the key idea of how this run works. Controller data is stored at $F5 through $F8, in pretty much the same manner as SMB3.
Circling back to the uninitialized RAM, bizhawk's "00 00 00 00 FF FF FF FF" pattern for RAM on console boot leaves address $F2 with "00" and $F4 with "FF". Both of those bytes get executed, and "FF" is a 3 byte instruction. That limits the number of "controller bytes" from 4 to 2. (Those are also the only uninitialized bytes that get executed. Gimmick! Clears the first 240 bytes of the zero page, leaving $F0 - $FF uninitialized, but by the time I execute code, the only uninitialized bytes are $F1 through $F4)
On the first frame with inputs, I 'stall' for 58 inputs, then using both controllers, I write STY $00F4. This replaces the "FF" at address $F4 with "08", as that's what was held in the Y register. This creates a PHP instruction, but what's more useful is that it's only 1 byte long. Now I have the full 4 "controller bytes"
I reset the console, and recall that address $F4 is uninitialized. That means when resetting, Gimmick doesn't clear the 08. I might go back and try to improve this TAS, but the subframe mashing to execute code seems to only work on the first frame with inputs. I need to strategically write JSR $E0F0, or 20 F0 E0 written as bytes. I need to hold down E0 (A + B + Select) on controller 1, while the new inputs are 20 (Select) so I use the first possible frame after resetting to hold down A + B. Then I reset again.
On the third and final loop, I repeat the same subframe mashing as before, and hold down the buttons that write JSR $E0F0, which wins the game when executed.
If I could find a way to offset the code so I can execute 3 controller bytes while also lining up the 20 F0 E0 for loop 2, I could save 5 frames. Likewise, if I can execute RAM after the first frame with inputs, it's likely I could cut out the second reset, saving time.