I have a lot of respect for the incredible work that both Chef Stef (and Baxter) did in the previous movies. We all took very different routes and improved on each other. Not much can be added to their detailed explanations, except to talk about what I did differently.
Although I use a bot, I take different approach than Chef Stef:
They built a C-based reverse engineered version of the game, yielding enourmous speedups compared to a using a lua on top of bizhawk. My approach, simply compiling my bot against quickNES (C++) is a bit less sophisticated, but by no means slow. The fact that it took them a year of execution on a 6-core machine but took me a month to do this one on a 64-core machine gives an idea of the relative performances.
They built a decision tree based on the 8 possible angles a ball can take, I went full brute, allowing any x position for the paddle when a ball is in the 'hittable' layer. To compensate for the impossibly large exploration space, I pruned the action space to either pressing "L" or "R" -- standing still was not an option. This paid off pretty nicely, I must say.
The main difficulty during making this movie was the bonus score screen. After finishing some levels, you get 'punished' with extra score -- I believe this has to do with the amount of enemies you killed, but so far haven't been able to decode it (nor did I spend much time to research it either).
Future Work
Before starting this project I tried to get in contact with Chef Stef via DM (and Discord). I had no luck, but also didn't try any further. This is the thing: their approach to this game and mine are not entirely mutually exclusive; if we were to join forces in the future, we could even be talking about optimality here.
So here it is: Chef Stef, if you are reading this, HMU and let's kill this game once and for all.
Re-Record Count
The re-record (load state, change input, advance state) count of this movie is: 2,529,914,332,046. I'd like to ask for this value be properly represented in the publication. I'll keep the execution logs for a while if someone wants to check them out.
Great work. I am a big fan of this kind of TAS optimization. I appreciate the comparison video.
It's not clear to me why these two approaches should produce different output—unless the simulation in [3943] NES Arkanoid "warpless" by Chef Stef in 11:11.18 made an unjustified assumption somewhere and was not completely representative. Do you have a feeling for what the difference might be? For example, is it true that there are exactly 8 possible angles, or did JaffarPlus find a possibility that the simulator in 3943M did not?
I suppose the best explanation would come from re-running the simulator from 3943M on a round where the strategies differ, find the point at which the simulator considered (a prefix of) the particular strategy used in this submission, and then figure out why that strategy was rejected. Or, if the simulator never considered the strategy used in this submission, find out what prevented it from exploring that part of the input space.
Or does it come down to different scores resulting in different RNG outputs?
Round 24 is a good example. As far as I can tell, the ball launch and ball movement are identical, up until this submission collects the multiball powerup 1 or 2 frames earlier. But according to #6347: Chef_Stef's NES Arkanoid "warpless" in 11:11.18, variable delay before collecting a powerup was one of the things considered by the simulator. ("There's a 10-frame window where the powerup is collect-able. Collecting it on different frames lets us manipulate when the single ball turns into a multiball, which affects how each ball can be bounced after that.") So why did the earlier simulator not find the faster solution found here?
In Round 32, this submission loses a ball, while 3943M pruned any path that lost a ball. That's one concrete difference that could affect optimization, but it doesn't come into play in every round.
It took me a while to realize that the score tally is what makes Round 23 slower. This submission clears all the blocks more than a second faster than 3943M, but the score tally makes it slightly slower overall. The old 3943M actually earns more points than this submission does (77220 versus 77020, because of 1 additional enemy kill I believe). My impression is that this round is score-limited rather than block-limited: 3943M breaks a few more blocks early and then takes time to clean up the rest, but that extra time is mostly "free" because it has to wait for the score to catch up anyway; whereas this submission starts slower and then breaks a large burst at the end, so less of the score tally runs concurrently with gameplay.
Joined: 11/14/2014
Posts: 933
Location: South Pole, True Land Down Under
You blow me away. I thought that Stef Chef ended this for good. It is amazing to hear that it could go even further. I applaud you on this effort.
Yes vote!
I recently discovered that if you haven't reached a level of frustration with TASing any game, then you haven't done your due diligence.
----
SOYZA: Are you playing a game?
NYMX: I'm not playing a game, I'm TASing.
SOYZA: Oh...so its not a game...Its for real?
----
Anybody got a Quantum computer I can borrow for 20 minutes?
Nevermind...eien's 64 core machine will do. :)
----
BOTing will be the end of all games. --NYMX
Here's a picture of what I mean. I wrote a Lua script to record, for each frame, the number of blocks remaining (called remainingBlocks in jaffarPlus, blue in the graph), as well as the 6-element array at 0x037c..0x0381 that represents the score increment; i.e., that points that are waiting to be tallied into the player's score (orange in the graph). Even though 3943M takes longer to break all the blocks, it is overall slightly faster. By breaking blocks at a slower, but more consistent rate, it keeps the score increment more under control. The maximum score increment in Round 23 in 3943M is 13010; here in 8315S it is 33040.
The way scoring works is, when you earn points, they don't go directly into your score (the 6-element array at 0x0370..0x0375). Instead the points go into the score increment, and from there they move into your score, a little each frame. Here's an excerpt of the CSV log from 3943M to show how the transfer works:
frame
round
remaining_blocks
p1_score
score_increment
2006
2
63
007820
000470
2007
2
63
007830
000460
2008
2
63
007840
000450
2009
2
63
007850
000440
2010
2
63
007860
000430
2011
2
63
007870
000420
2012
2
63
007880
000410
2013
2
63
007890
000400
2014
2
63
007990
000300
2015
2
63
008090
000200
2016
2
63
008190
000100
2017
2
63
008290
000000
The score increment counts down by tens as long as the tens digit is nonzero, then it counts down by hundreds until the whole thing is zero. So, for example, an increment of 1350 takes 13 + 5 = 18 frames to tally into the player's score. A side effect of this method is that sometimes earning points can decrease the amount of time it takes to tally the score increment. For example, an increment of 250 takes 7 frames to tally, whereas an increment of 300 takes only 3 frames.
I don't know the internals of jaffarPlus, but would it make sense to add something like scoreIncrement/100 + scoreIncrement/10%10; into the reward in getStateReward, to make the bot take score increment tally delay into account?
Here's a picture of what I mean. I wrote a Lua script to record, for each frame, the number of blocks remaining (called remainingBlocks in jaffarPlus, blue in the graph), as well as the 6-element array at 0x037c..0x0381 that represents the score increment; i.e., that points that are waiting to be tallied into the player's score (orange in the graph). Even though 3943M takes longer to break all the blocks, it is overall slightly faster. By breaking blocks at a slower, but more consistent rate, it keeps the score increment more under control. The maximum score increment in Round 23 in 3943M is 13010; here in 8315S it is 33040.
The way scoring works is, when you earn points, they don't go directly into your score (the 6-element array at 0x0370..0x0375). Instead the points go into the score increment, and from there they move into your score, a little each frame. Here's an excerpt of the CSV log from 3943M to show how the transfer works:
frame
round
remaining_blocks
p1_score
score_increment
2006
2
63
007820
000470
2007
2
63
007830
000460
2008
2
63
007840
000450
2009
2
63
007850
000440
2010
2
63
007860
000430
2011
2
63
007870
000420
2012
2
63
007880
000410
2013
2
63
007890
000400
2014
2
63
007990
000300
2015
2
63
008090
000200
2016
2
63
008190
000100
2017
2
63
008290
000000
The score increment counts down by tens as long as the tens digit is nonzero, then it counts down by hundreds until the whole thing is zero. So, for example, an increment of 1350 takes 13 + 5 = 18 frames to tally into the player's score. A side effect of this method is that sometimes earning points can decrease the amount of time it takes to tally the score increment. For example, an increment of 250 takes 7 frames to tally, whereas an increment of 300 takes only 3 frames.
I don't know the internals of jaffarPlus, but would it make sense to add something like scoreIncrement/100 + scoreIncrement/10%10; into the reward in getStateReward, to make the bot take score increment tally delay into account?
Wow, thanks so much for doing this research. I didn't know this tally mechanism existed and is what caused the delay at the end of the stage (and for some egregiously so).
I'll try to factor this into the reward function next time. Hopefully that will push the bot to pursue solutions that, albeit slower in the destruction of blocks, leads to an earlier tally finish.
Typically, we don't include botted re-record counts and we just put "Unknown" as the re-record count in such cases. We also have an ongoing topic about tracking re-record counts, where no conclusion was reached thus far. Are you fine with me mentioning an unknown re-record count in this case as well?