Thanks for those suggestions!
I couldn't get your first example working as is, not sure why. You're right that the NMI reads the joypads.
I found the NMI also updates the $15 counter, saving the need to waste byte(s) incrementing.
This version works and is 12 bytes long, though one byte is a 00 so only 11 glitch activations would be needed. Appropriate placement of the final program bytes requires the correct starting value of $15.
Loop:
JSR $96E5 -- NMI
LDX $15 -- load counter into X
LDA $F7 -- controller one into accumulator
STA,X $0100 -- store byte at next location
BNE Loop -- loop if controller used
Program:
Your second suggestion is intriguing for implementing the staging program:
LDA $15
STA $01xx
JSR $897b
That's only 8 bytes so would be doable using a combination of the x and y sprite locations. It may end up preferable to use the $15 counter to define each byte here since not all values 0-255 are actually available via the controller (up + down and left + right are not recognized, independent of emulator settings). I don't think that limitation would be a show-stopper for writing the final program since there's a lot more flexibility with length there.
Edit:
Got a 9-byte version working, short enough that it should be possible to implement with a single invocation of the 7-1 glitch!
JSR $96e5 -- NMI
LDA $F7 -- load controller 1
STA ($15,y) -- store to zero-page, using $15 counter to advance to next byte
-- y never changes
BNE loop: -- loop while controller1 input non-zero