This post is supplementary material for
this submission, although it is useful here as well.
----
Here's an analytical look at my TAS.
It is known that program execution in Pokemon Yellow can be hijacked with nothing more than a 7-byte sequence with 13 zeroes proceeding it. A 7-byte sequence would be:
D368:
76 HALT
F0 F5 LDH A (FFF5)
22 LDI (HL) A
C3 5B D3 JP D35B
We'll get into how it works later.
For the purposes of this TAS, the following 9-byte sequence takes less time to enter (and requires D350-D365 to be all 0):
D366:
22 LDI (HL) A
00 NOP
76 HALT
00 NOP
F0 F5 LDH A (FFF5)
D4 50 D3 CALL NC D350
Why? Well, go back to when the rival was named (space) (female) (PK) (END). Memory in D343 at that point is:
D343: 00 00 00 00 30 00 7F F5 E1 50 00
We want to change it to the 9-byte sequence somehow, so we do this:
* Reset in the middle of saving. This overwrites D162 with FF and the game thinks you have 255 Pokemon. More importantly, you can now switch Pokemon below the 6th level. This will swap huge chunks of memory.
* Switch any Pokemon 1-9 with the 10th Pokemon. This overwrites D31C with FF and the game thinks you have 255 items.
* Switch the 17th and 20th Pokemon. Apart from other stuff going on, this will swap D322-D32C with D343-D34D. So now D322, which is close to the beginning of the item list, reads:
D322: 00 00 00 00 30 00 7F F5 E1 50 00
or from the beginning of the item list:
D31D: FF FF FF FF FF 00 00 00 00 30 00 7F F5 E1 50 00
None of the "items" cause problems when trying to toss them (some values crash the game, or are untossable). The addresses in the even spaces (D31E, D320, D322, ...) can all be reduced through tossing items, and 00 is treated as 0x100 so tossing one gives 0xFF.
We now do this:
* Toss D321 completely. This shifts it so it looks like this:
D31D: FF FF FF FF 00 00 00 30 00 7F F5 E1 50 00 00 00
* Toss 14 of D323. This changes 0x30 to 0x22.
* Toss 9 of D325. This changes 0x7F to 0x76.
* Toss 13 of D327. This changes 0xE1 to 0xD4.
* Toss 45 of D329. This changes the 0x00 directly after 0x50 to 0xD3.
D31D: FF FF FF FF 00 00 00 22 00 76 F5 D4 50 D3 00 00
* Switch D32B and D329. This switches the last 00 00 with 50 D3.
* Switch D329 and D327. This switches the same 00 00 with F5 D4.
D31D: FF FF FF FF 00 00 00 22 00 76 00 00 F5 D4 50 D3
* Toss 16 of D327. This switches the last 0x00 with 0xF0.
D31D: FF FF FF FF 00 00 00 22 00 76 00 F0 F5 D4 50 D3
Notice that we have our 9-byte sequence above starting from D324, but it needs to go to D366. So we do some Pokemon switches:
* Switch the 19th and 17th Pokemon. This swaps D322-D32C with D338-D342.
* Switch the 12th and 11th Pokemon. This swaps D322-D34D with D34E-D379 so now D338-D342 ends up at D364-D36E, exactly where we want it. Notice that the switching ensures that D350-D363 are all 0.
Now close the menu. This is what happens:
* At some point, the game reads D350 off of address D36E, which is supposed to hold a ROM address, and jumps to it. The value in register A is 0x50, and in HL is 0xD350, the location of the jump (note that the instruction 00 is NOP, which does nothing).
D350: 00 00 ... 00 22 00 76 00 F0 F5 D4 50 D3
The instruction LDI (HL),A (22) writes whatever is in A to the address pointed by HL. When it reaches HALT (76), it waits for the next frame.
D350: 50 00 00 ... 00 22 00 76 00 F0 F5 D4 50 D3
Now HALT runs a routine that puts key input into FFF5 as a number. The instruction LD A, (FFF5) (F0 F5) places this number into A. Then CALL D350 (D4 50 D3) jumps to D350 as a subroutine (in which we do not care about the "subroutine" part of it) and execution cycles again. Thus, by using LDI (HL), A over and over along with LD A (FFF5) to read input, we can write a program at D350. However, since D350 is executed every cycle, we must be careful.
The program at D350 reads:
50 LD D B
which is harmless. We feed the input 0x18 and now it reads:
50 LD D B
18 00 JR 0
JR 0 means "jump relative by 0", which doesn't go anywhere. We feed the input 0x0F and now it reads:
50 LD D B
18 0F JR D362
Notice the significance of the last instruction. It jumps 15 forward but still prior to the instruction LDI (HL),A at D366. That means that now we can input anything for the next 15 bytes and execution will ignore them. We now write:
50 LD D B
18 0F JR there
16 98 LD D 0x98
21 6F D3 LD HL D36F
input:
76 HALT
F0 F5 LDH A (FFF5)
22 LDI (HL) A
15 DEC D
CA 6F D3 JP Z D36F
18 F6 JR input
there:
Feed the input 0x18:
50 LD D B
18 0F JR there
16 98 LD D 0x98
21 6F D3 LD HL D36F
input:
76 HALT
F0 F5 LDH A (FFF5)
22 LDI (HL) A
15 DEC D
CA 6F D3 JP Z D36F
18 F6 JR input
there:
18 00 JR 0
Again, when jumping to the label "there", the instruction 18 00 does nothing because it is jump by 0. Now feed the input 0xEF:
50 LD D B
18 0F JR there
enter:
16 98 LD D 0x98
21 6F D3 LD HL D36F
input:
76 HALT
F0 F5 LDH A (FFF5)
22 LDI (HL) A
15 DEC D
CA 6F D3 JP Z D36F
18 F6 JR input
there:
18 EF JR enter
Execution now jumps to "there", which then jumps to "enter", executing the program we set up. This is the stage 2 program, a simple RAM writer that writes 152 bytes to D36F and then executes it. It is left to the reader to verify this.
Stage 3 is now constructed, and it is large. Here it is:
21308F LD HL 8F30 //where to place data in Tile Data Table
01D2D3 LD BC data //where 8x8 data is
1628 LD D 28 //size of 8x8 data
loop:
0A LD A,(BC)
22 LDI (HL),A
22 LDI (HL),A
03 INC BC
15 DEC D
20F9 JR NZ loop
2100D6 LD HL D600
3E10 LD A #10
loop1.5:
22 LDI (HL),A //clear mirror
CB44 BIT 0,H
28FB JR Z loop1.5 //loop until D700
1E00 LD E 00 // register DE is offset of BTM
outerloop:
76 HALT
F0F5 LD A, FFF5
FEEF CP A,#EF
2869 JR Z exit
FEEE CP A,#EE
282B JR Z refresh
3016 JR NC numbers
4F LD C A
CB61 BIT 4,C
2804 JR Z sprite1
3EF4 LD A #F4 //pirev
1802 JR skip
sprite1:
3EF3 LD A #F3 //pi
skip:
CBA1 RES 4,C
0600 LD B 0
2100D6 LD HL D600 //start of lower BTM mirror (starting from 9980)
09 ADD HL,BC
77 LD (HL),A
18DD JR outerloop
numbers:
210298 LD HL 9802 //start of BTM+2
19 ADD HL,DE
12 LD (HL),A //write "Fx" number to BTM
13 INC DE
CB63 BIT 4,E
28D3 JR Z outerloop
7B LD A E
C610 ADD A,#10
5F LD E A
30CD JR NC outerloop
14 INC D
18CA JR outerloop
refresh:
218299 LD HL 9982 //start of lower BTM
0100D6 LD BC D600
loop2: //loop for 256 bytes
0A LD A,(BC)
22 LDI (HL),A
3E10 LD A #10
02 LD (BC),A //clear mirror
03 INC BC
CB40 BIT 0,B
28F6 JR Z loop2 //that means B is still D6
18B8 JR outerloop
data:
FF 81 5B DB DB DB DB B9 //f3 pi
FF 3B B7 B7 B7 B7 B5 03 //f4 pirev
00 00 00 00 00 30 30 00 //f5 .
00 38 4C C6 C6 64 38 00 //f6 0
00 18 38 18 18 18 7E 00 //f7 1
exit:
F3 DI //disable interrupts, such as current music
218099 LD HL 9980
3EF3 LD A #F3 //pi
loop3:
22 LDI (HL), A
CB54 BIT 2,H //loop until 9C00
28FB JR Z loop3:
done:
18FE JR done
It works as such:
* First, it draws out the 8x8 tiles for pi, pi upside down ("pirev"), decimal point, 0, and 1. All other digits are already in the tileset.
* Then, it takes input and changes the VRAM accordingly. Input is as follows:
** If input is from 0x00 to 0xED, the program draws to a cache which can later be dumped into VRAM at 9982. The drawing field is 8 rows by 16 columns. The input is considered as aaacbbbb, where aaa is the row, bbbb is the column, and c=0 for the pi tile and c=1 for the pirev tile. The choice of format is motivated by the fact that going down a row in VRAM is the same as adding 0x20=32 to the address.
** If input is 0xEE, the program dumps the cache into 9982, and then clears the cache using the black tile (value 0x10).
** If input is 0xEF, the program executes its ending sequence.
** If input is 0xF0-0xFF, the program writes directly to 9802, with the drawing field being 12 rows by 16 columns. The input's own number is written in as the value of the tile; all tiles of interest are in the Fx row. Tiles are written serially from left to right, then from top to bottom. Technically, the program can draw into the the bottom drawing field used by the first case of input above.
* Its ending routine is a "fake crash" that disables interrupts, floods the bottom screen with pi tiles, then gets itself into an infinite loop (18 EF, or JR -2).
And that's about it.
There is no guarantee that VRAM writing occurs in a safe (to a real GB) manner; it does not check for status of FF40-FF41 (for reference, pages 51-53 of
http://marc.rawer.de/Gameboy/Docs/GBCPUman.pdf ).
----
The C++ parser works as follows:
* It recognizes the characters "0123456789abcdef|*s@." as well as uppercase variants. All other characters are delimiters.
* If it first detects a character from "0123456789abcdef", then it expects immediately a second character from "01234567". This is for drawing to the cache representing VRAM at 9982. The first character represents the column, the second character the row. If it is immediately followed by '*', then it uses the pirev tile. Otherwise it uses the pi tile.
* If it first detects the character '|', it counts to 15, inserting 0xED inputs along the way (the do-nothing-visible input), then inserts 0xEE (dump cache). It also resets the count to 1.
* If it first detects the character '@', it counts to 8, inserting 0xED inputs along the way (the do-nothing-visible input), then inserts 0xEE (dump cache).
* If it first detects 's', then it expects immediately a second character from ".0123456789". This is for drawing directly to VRAM at 9802. It uses the second character's corresponding tile.
The parser does not insert the ending code (0xEF); it must be manually hex-edited. Also, the number of inputs between 0xEE (dump cache) commands should not exceed the counts to 15 or 8, whichever is appropriate.
----
Edit: Fixed pointer to data.