Posts for Sand

1 2
6 7
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
#9396: dwangoAC, ephphatha, staphen & AJenbo's Windows Diablo in 04:10.31 has this interesting note:
Diablo uses a type of pseudo-random number generator called a Linear Congruential Generator (LCG). … Each dungeon seed is picked by advancing the RNG state then treating the 32 bit state as a signed integer value and transforming it into a positive integer value between 0 and 231 using the C standard library function abs() (yielding a 31 bit seed[6]). [6]: Plus an extra value; because the absolute value of -231 cannot be represented as a positive signed 32 bit integer, Diablo ends up using this value as-is.
I found more information at The Cutting Room Floor:
Very, very rarely, the random number generator can generate a negative number when the code expects a number to fall within a random range of positive numbers. The issue lies with a broken implementation of Borland's random function (as listed in the Hellfire source code):
//***************************************************************************
//***************************************************************************
long GetRndSeed() {
	SeedCount++;
	static const DWORD INCREMENT = 1;
	static const DWORD MULTIPLIER = 0x015a4e35L;
	sglGameSeed = MULTIPLIER * sglGameSeed + INCREMENT;
	return abs(sglGameSeed);
}


//***************************************************************************
//***************************************************************************
long random(byte idx,long v) {
	if (v <= 0) return 0;

	// high order bits are more "random" than low order bits
	if (v < 0x0ffff) return (GetRndSeed() >> 16) % v;

	return GetRndSeed() % v;
}
The problem is with the call abs(sglGameSeed). The evident intention is that GetRndSeed should never return a negative number; which is why the function takes an absolute value before returning. But there is one case where the output of abs may be negative. When sglGameSeed has the value 1457187811, the calculation MULTIPLIER * sglGameSeed + INCREMENT results in −2147483648; i.e., −231; i.e., 0x80000000, the most negative signed 32-bit integer. Because this value has no positive counterpart in two's complement arithmetic, the call abs(sglGameSeed) returns the same negative value, −2147483648. This is the only case where GetRndSeed returns a negative value, and this is the only negative value it may return. When this happens, a call to random will return a value that is ''non-positive''. When v evenly divides −2147483648 (i.e., when v is a power of 2), the return value will be 0; otherwise it will be negative. To put it more plainly: if the game needs to generate a random number within a range (and the upper limit of the range is not a power of 2), there is a very small chance that the number generated will actually be a negative number, instead of a number within that range. This can wreak havoc in the game, causing anything from memory corruption to random sound effects being played. Since the chances of this happening are incredibly small and the effects are so random, it's unlikely that this has been observed more than a handful of times over the years.
The idea of an out-of-bounds write is intriguing, because that is the kind of thing that arbitrary code execution exploits and credits warps are made of. If the code is trying to write into a random index of an array of size 10, for example, with the correct RNG seed the game will write to index −8 instead. You would need to find a place in the code where random is called and its return value used in the address of a subsequent write, with either the value written or the modulus v being usefully controllable. You'd likely get only one chance, unless there is way to re-seed the RNG, because it's only one point in the RNG's period where this happens.
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Do you have the start, end coordinates for each race in a convenient machine readable format? 150 vertices is few enough to at least approximate a solution to traveling salesman. I was looking at traveling salesman solvers recently. (Even though it turned out not to be necessary for the problem I was looking at.) For this problem, you set up a directed graph over 150 vertices. The weight of the edge between vertex i and vertex j is the distance from the end of race i to the start of race j. wij = distance(Ri.end, Rj.start) To take resets in to account, you'll need to figure out some conversion factor from distance on the map to time (i.e., seconds), and measure the reset_time in seconds. Then the weight between vertex i and vertex j is the smaller of the no-reset option (the one above) and the reset option. wij = min( distance_to_time × distance(Ri.end, Rj.start), distance_to_time × distance(Ri.start, Rj.start) + reset_time ) Here's an example of approximating a solution using NetworkX in Python. I used the pixel coordinates of the first few races from the map you posted. You'll have to fill in the full table of coordinates. There's documentation on traveling_salesman_problem and simulated_annealing_tsp.
Language: python

import collections import math import random import networkx import networkx.algorithms.approximation.traveling_salesman as TSP Point = collections.namedtuple("Point", ("x", "y")) Race = collections.namedtuple("Race", ("label", "start", "end")) RACES = ( Race(1, Point(1263, 1724), Point(6888, 556)), Race(2, Point(1400, 1775), Point(7555, 979)), Race(3, Point(1477, 1798), Point(1728, 1715)), # ... ) def distance(p_1, p_2): return math.sqrt((p_2.x - p_1.x)**2 + (p_2.y - p_1.y)**2) G = networkx.DiGraph() for j in range(len(RACES)): r_j = RACES[j] for i in range(len(RACES)): if i == j: continue r_i = RACES[i] G.add_edge(r_i.label, r_j.label, weight = distance(r_i.end, r_j.start)) print(G) path = TSP.traveling_salesman_problem( G, cycle = False, method = lambda g, weight: TSP.simulated_annealing_tsp(g, "greedy"), ) print("path", path) print("weight", networkx.path_weight(G, path, "weight"))
For me this outputs
DiGraph with 3 nodes and 6 edges
path [3, 1, 2]
weight 6086.839929148261
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
CasualPokePlayer wrote:
MUGG wrote:
I'm saying when loading a state it should "forget" that I inputted a hard-reset. Loading a state should rewind to the exact situation I was in before.
Savestates do not hold your current input.
I also was surprised to learn this fact. I originally thought that the current state of input would be stored along with each savestate, and that loading a savestate would replace the current input with what was stored. But in actuality, the input state is a kind of global property of the emulator, one that remains the same as various savestates are loaded, and is only "cashed in" to become a part of the input log after a frame is advanced. At Post #532299 I talk about adding joypad state to a "savestate object" data type defined in Lua code, in order to make inputs be saved and restored alongside savestates, as I originally and wrongly guessed they would be.
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Sand wrote:
But what I'm more excited about, and what I plan to make a thread at tool-assisted laboratory for, is the rudimentary data-oriented savestate manipulation library I used to script this TAS.
The promised thread is Thread #25755: Data-oriented savestate manipulation in BizHawk and Lua.
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
The inputs in my recent Kwirk TAS were produced by a Lua script. The script programmatically navigates the menus, manipulates RNG, and plays the solutions to the puzzles. Instead of hardcoding frame offsets, the script empirically discovers the earliest frame at which each input is accepted, for every input, each time it is run. When there are two ways to get through a menu, the script tries both of them, and keeps the faster one. Doing all this requires sophisticated control over the emulator that is difficult to achieve using BizHawk callbacks alone. To make it easier, the script includes a chunk of support code for managing savestates as immutable data, using ordinary Lua statements, loops, and function calls, not callbacks. The programming style enabled by this support code is what I want to talk about in this thread. If you've ever felt limited by while true do emu.frameadvance() end, this is for you. User movie #638643807847186809 The link above is the Kwirk Lua script as a userfile. It's the same script as was used for [6203] GB Kwirk "Heading Out?, all floors" by Sand in 25:15.19, with additional comments and edits for clarity. In my Git repository at https://www.bamsoftware.com/git/kwirk.git, it's tas.lua at commit ef235541. The important parts for this discussion (generic savestate manipulation code, not specific to Kwirk) are from BASE_SAVESTATE to earliest_effective_frame. Before I talk about how the support code works, I'll show some excerpts from the script to show what it looks like to program in this style. The first one below is from the beginning of the run, where we dismiss the title screen and enter the menus. The script waits for the title screen code to run (by waiting for the address TITLE_SCREEN_ADDR to be executed), then finds the earliest possible frame to press the Start button (using an auxiliary function, earliest_effective_frame, that I'll say more about later), and finally selects "Heading Out?" mode from the first menu.
Language: lua

local savestate = BASE_SAVESTATE -- Run until the title screen (skipping over the BIOS animation). savestate = run_until_exec(savestate, nil, TITLE_SCREEN_ADDR) -- Find the earliest frame at which we can press Start to get past the -- title screen. savestate = earliest_effective_frame(savestate, function (s) return savestate_joypad_set(s, {Start = true}) end, function (s) return run_until_exec(s, 32, MENU_ADDR) end) savestate = select_menu_item(savestate, 1) -- Heading Out?
The next excerpt is part of the post_run_reconfigure function, which is responsible for doing the menuing between segments. The function reads the current values of settings from memory, tries two ways of navigating the menus to change the settings, and returns a savestate representing the one that's faster:
Language: lua

local cur_skill = savestate_read_u8(savestate, SKILL_ADDR) local cur_num_floors = savestate_read_u8(savestate, NUM_FLOORS_ADDR) -- The display variable uses the encoding 0 = DISPLAY_BIRDSEYE, -- 1 = DISPLAY_DIAGONAL which is the opposite of our convention (which -- uses the order in which the options appear in the menu). local cur_display = assert(({[0] = 1, [1] = 0})[savestate_read_u8(savestate, DISPLAY_ADDR)]) -- Try the End menu. local menu_savestate = savestate if cur_display ~= display or cur_num_floors ~= num_floors or cur_skill ~= skill then menu_savestate = select_menu_item(menu_savestate, 1) -- End if cur_skill ~= skill then menu_savestate = select_menu_item(menu_savestate, 2) -- Select skill menu_savestate = select_menu_item(menu_savestate, skill) else menu_savestate = select_menu_item(menu_savestate, 1) -- Select course end menu_savestate = select_num_floors(menu_savestate, num_floors) menu_savestate = select_menu_item(menu_savestate, display) end -- Try the B Button. local b_savestate = savestate if cur_display ~= display or cur_num_floors ~= num_floors or cur_skill ~= skill then b_savestate = cancel_menu(b_savestate) if cur_num_floors ~= num_floors or cur_skill ~= skill then b_savestate = cancel_menu(b_savestate) if cur_skill ~= skill then b_savestate = cancel_menu(b_savestate) b_savestate = select_menu_item(b_savestate, skill) end b_savestate = select_num_floors(b_savestate, num_floors) end b_savestate = select_menu_item(b_savestate, display) end if savestate_frame_count(menu_savestate) <= savestate_frame_count(b_savestate) then savestate = menu_savestate else savestate = b_savestate end return savestate
"Savestates" in these excerpts are actually custom savestate objects, built around BizHawk's native savestates. They are ordinary data that you can assign to a variable or pass into a function. Once created, a savestate is immutable: you cannot change the emulator state is represents, but you can derive new, different savestates from it. Functions like select_menu_item that take a savestate as input produce a new savestate as output, with the input savestate remaining unchanged. You can assign a returned savestate value over an existing variable if you don't need the old value (as happens in the first excerpt above); or you can assign it to a new variable to get two variables that represent different emulator states (as with menu_savestate and b_savestate in the second excerpt). Notice there's no explicit "rewind" operation in the second excerpt, even though, after deriving menu_savestate from the passed-in savestate, we need to go back to the beginning of the menu to begin deriving b_savestate. The focus is on computing over data, not driving the emulator. The emulator is, in fact, going to be jumping around various savestates and emulating frames as you run these function calls, but you don't write the code that way. The support functions take care of controlling the emulator to implement the savestate data operations you request. There are two difficulties with writing Lua code in BizHawk that the support code is trying to address:
  1. The emulator itself—its emulated CPU registers and RAM, the state of joypad inputs, and more—is effectively a big mutable global variable, with all the problems that come with that. Instead of writing functions that change their behavior depending on the current state of the emulator, we'd prefer to have functions depend only on their parameters. This means representing the emulator state as data, and abstracting away the low-level details of controlling the emulator.
  2. Programming with callbacks is awkward. We frequently want to let the emulator run until some event occurs, such as a change in RAM or the passing of a certain number of frames. The only built-in way for a BizHawk Lua script to do this is to relinquish control, let the emulator run, and wait to be called back. It gets more complicated if you want to wait for event A, then event B, then event C in sequence, or wait for multiple events at the same time. (JavaScript programmers know about callback hell.) We'd prefer to write code that looks like ordinary synchronous Lua code, using ordinary statements and control structures.
We deal with the first difficulty by defining an abstract "savestate" data type and requiring that all changes to the emulator state be mediated through it. The code inside the do ... end block is the code that is trusted to control the emulator directly (calling functions like memorysavestate.loadcorestate and joypad.set), and to do operations that depend on the current emulator state (like memorysavestate.savecorestate and emu.framecount). All other code must work through the safe primitives that are exported (not marked as local) by this block. (If you know Rust, the do ... end block is analogous to an unsafe block, providing a safe interface around low-level details that require careful management.) We deal with the second difficulty using Lua coroutines. I've remarked on the inconvenience of programming with callbacks: you have to register a callback, let your script exit so the emulator can run, and then when the event happens and the callback is called, you have to reconstruct what you were doing before you gave up control. This is a kind of problem that coroutines were meant to solve. Instead of letting your script exit, you can register a callback and then call coroutine.yield, which lets the emulator run but also remembers the execution state of the script. In the callback function, you call coroutine.resume, which jumps back to where you were waiting for the event. Here's a small example of converting code from the callback style to the coroutine style. This is how you would wait for the address ADDR to be executed in a callback, using console.log to represent the code that should be run after the event happens. The event.unregisterbyid in the callback is because BizHawk's event registrations are multi-shot, and we only want to be called back once.
Language: Lua

local event_id event_id = event.on_bus_exec(function () event.unregisterbyid(event_id) console.log("exec", ADDR) end, ADDR)
And here's an equivalent example that uses coroutines in the way I described:
Language: Lua

local co = coroutine.running() local event_id event_id = event.on_bus_exec(function () event.unregisterbyid(event_id) coroutine.resume(co) end, ADDR) coroutine.yield() -- Execution continues here after callback calls coroutine.resume. console.log("exec", ADDR)
It may not look like much of an improvement at first. But notice that, crucially, the console.log happens at the same level as coroutine.yield—not inside the callback. This makes a big difference when the code that needs to be executed after the event occurs is more complex—and especially when that code goes on to wait for further events. It's analogous to converting callback-oriented code to use promises and async in JavaScript. You'll see this pattern used in the support code's functions run_until_event_raw and frame_align. You can, of course, encapsulate the coroutine operations in a function. Then waiting for multiple events in sequence becomes a simple sequence of statements:
Language: Lua

local function wait_for_exec(addr) local co = coroutine.running() local event_id event_id = event.on_bus_exec(function () event.unregisterbyid(event_id) coroutine.resume(co) end) coroutine.yield() end wait_for_exec(ADDR1) console.log("exec", ADDR1) wait_for_exec(ADDR2) console.log("exec", ADDR2)
Whereas doing the same in the callback style would require nested callbacks. In the callback style, the emulator is in control: the emulator runs, and calls into your Lua script when certain events happen. In the coroutine style, your code is in control: your code runs, and calls into the emulator when it needs the emulator to compute a new savestate. Of course, behind the scenes, everything is still all built on the emulator issuing callbacks to the Lua engine, but you get to write your program in a natural way. Think of the emulator as a hardware peripheral, a co-processor optimized for computing over savestates. Driving the emulator is not the goal in itself; rather the emulator is a tool that helps you incrementally compute the game-winning savestate you want. Now I'll comment on a few parts of the support code to help explain what it's doing.
Language: Lua

-- The metatable for all savestate objects. The important metamethod is -- __gc, which tells BizHawk to reclaim the resources of savestates -- that are no longer referenced. local savestate_mt = { __gc = function(savestate) delete_emu_savestate(savestate.id) end, __tostring = function(savestate) return string.format("savestate{%s}", savestate.id) end, } -- Create a new savestate object from the current state of the -- emulator. It is the caller's responsibility to ensure that the -- emulator state is frame-aligned. local function new_savestate() -- A savestate object consists of the identifier of an -- emulator-native savestate, and the joypad state. local savestate = { id = save_emu_savestate(), buttons = joypad.get(), } -- Keep the keys from joypad.get, but set all the values to false. for k in pairs(savestate.buttons) do savestate.buttons[k] = false end setmetatable(savestate, savestate_mt) return savestate end
A "savestate" object is a table with id and buttons keys. This is all I needed for Kwirk; other games may need more. id is the ID of a native BizHawk savestate (which I've called an emu_savestate to distinguish it from our savestate objects). buttons makes the state of input part of the savestate, because BizHawk savestates do not store what buttons are currently pressed. The joypad state would otherwise be another global variable. We set a __gc metamethod on savestate objects to free the savestate object's associated BizHawk-native savestate when the object is no longer referenced.
Language: Lua

-- Assume that at the point at which this script is run, it's a safe -- time to make a savestate (i.e., we're frame-aligned). BASE_SAVESTATE = new_savestate()
The only way to get a savestate object is to do an operation on an already existing savestate object. Where does the first savestate object come from? BASE_SAVESTATE is set to the state of the emulator at the point when the script is loaded.
Language: Lua

-- Run the emulator until the event registered by the register_event -- function happens, or frame_limit frames elapse, whichever happens -- first. If the register_event event happens first, return true; if -- frame_limit is exceeded first, return false. frame_limit may be an -- integer, or nil to mean "no limit". register_event should be a -- callback registration function such as event.on_bus_exec, -- event.onframeend, etc. The ... variable arguments are passed to -- register_event. local function run_until_event_raw(frame_limit, register_event, ...) if frame_limit ~= nil and frame_limit <= 0 then return false end local co = coroutine.running() -- We register two event callbacks: one for the event the -- caller is interested in (event_id), and one to count frames -- and enforce frame_limit (limit_id). We register both -- callbacks, then yield to let the emulator run until one of -- the two callbacks calls coroutine.resume. The event_id -- callback resumes with the value true; the limit_id callback -- resumes with the value false. After being resumed, we -- dispose of the callback that was not called. We register -- event_id before limit_id so that, if both callbacks happen -- for the same event, the caller's event takes precedence. local event_id = register_event(function () if co ~= nil then assert(coroutine.resume(co, true)) end end, ...) local frame_count = 0 local limit_id = event.onframeend(function () frame_count = frame_count + 1 if frame_limit ~= nil and frame_count >= frame_limit then if co ~= nil then assert(coroutine.resume(co, false)) end end end) local event_occurred = coroutine.yield() assert(event.unregisterbyid(event_id)) assert(event.unregisterbyid(limit_id)) -- We want only one of the two callbacks above to run -- coroutine.resume. At this point, one of them has done so -- -- but just unregistering both callback IDs does not guarantee -- that the other will not also run coroutine.resume. That is -- because both callbacks may be looking for the very same -- event; i.e. register_event is event.onframeend and -- frame_limit is 1). BizHawk runs all callbacks that were -- registered at the time the event occurred, even if, in the -- course of working through the list of callbacks, a later one -- gets unregistered by an earlier one. We set co to nil as a -- signal to both callbacks that if they run, they should not -- call coroutine.resume. co = nil return event_occurred end -- Starting from the given savestate, return a new savestate that is -- the result of running until the event registered by register_event -- happens; or return nil if frame_limit frames elapse without the -- event happening. frame_limit may be nil for no limit. register_event -- should be a callback registration function such as -- event.on_bus_exec, event.onframeend, etc. The ... variable arguments -- are passed to register_event. function run_until_event(savestate, frame_limit, register_event, ...) restore_savestate(savestate) local event_occurred = run_until_event_raw(frame_limit, register_event, ...) if event_occurred then -- The event may not have occurred at a frame boundary, -- so align the emulator state before creating the new -- savestate. frame_align() return new_savestate() else return nil end end
The function run_until_event_raw encapsulates the basic register/coroutine.yield/coroutine.resume pattern I've described, for arbitrary events. It gives you the option of specifying a maximum number of frames to wait for the event to happen before giving up. The internal, "raw" version of the function operates on the emulator directly: it starts at the current state of the emulator, and mutates the state before returning. The public function run_until_event wraps run_until_event_raw in a safe interface that takes a savestate object as input and returns another savestate object as output. run_until_event calls restore_savestate (another trusted function) to reset the emulator to the input savestate object before calling run_until_event_raw; and then calls new_savestate to create a savestate object from the emulator state.
Language: Lua

-- Find the earliest frame at which running an action function on a savestate -- produces a savestate that satisfies a test predicate. -- -- The action function takes a savestate as input and returns a savestate as -- output. The test function takes a savestate as input and returns a Boolean -- output (nil|false or otherwise). This function runs the action function on -- the given savestate, then runs the test predicate on the resulting -- savestate. If the test function's return value is not nil and not false, it -- returns two values: the return value of action and the return value of test. -- If the test function returns nil or false, it goes back to the original -- savestate, advances one frame, and tries the action and test again. The -- process continues, advancing one frame at a time, until the test predicate -- succeeds. -- -- It is common to call this function with a test predicate whose return value -- is either a savestate or nil. In that case, you can regard the two return -- values as "savestate after action performed" and "savestate after test -- passed". local function earliest_effective_frame(savestate, action, test) local action = action or function (savestate) return savestate end local frames = savestate_frame_count(savestate) while true do local action_performed = action(savestate) local test_result = test(action_performed) if test_result then return action_performed, test_result end -- No luck with this frame. Try doing the action one frame later. savestate = run_one_frame(savestate) local new_frames = savestate_frame_count(savestate) assert(new_frames == frames + 1, string.format("frames=%d new_frames=%d", frames, new_frames)) frames = new_frames end end
earliest_effective_frame is the most important auxiliary function used throughout the script. Starting from a provided savestate, it finds the earliest frame at which performing an action (such as pressing a button) eventually has some effect (such as a memory location being written to or an address being executed). The while true loop advances one frame at a time, running the action function on a savestate for that frame, then running test on the savestate that action returns. The loop terminates when test returns a true value. The function returns the results of calling both action and test (because sometimes you want one or the other). In the Kwirk TAS, the test function is usually one of the functions defined in terms of run_until_event, such as run_until_exec or run_until_u8_becomes. These functions return either a future savestate, or nil if the event doesn't happen within the frame limit. The effect of using such a test function in combination with earliest_effective_frame is that we peek a short time into the future after performing every action, to see if the action will have an effect. This is a versatile operation that helps with a lot of TASing tasks. In the title screen code excerpt above, you'll see earliest_effective_frame called with an action function that presses the Start button, and a test function that peeks up to 32 frames ahead to see if MENU_ADDR gets executed. The savestate manipulation support code in tas.lua is what I needed for Kwirk. The same or a similar technique could work for lots of other games and TASing tasks, so I encourage you to try it with your own projects. But you may have to make some adjustments, depending on what you're doing. For example, my TAS only needed joypad buttons; if you're doing a console with an analog stick, you'll need to add the analog stick state to the savestate object. The current version of the support code wouldn't work well for press-and-hold inputs (Kwirk only needs one-frame presses). In order to hold buttons in a convenient way, you may have to enrich the savestate object to support some kind of "sticky" inputs that don't become unpressed when the frame is advanced, perhaps by adding code in the frame counter callback of run_until_event_raw that renews the inputs every frame. What about the common while true do emu.frameadvance() end pattern, which is endorsed in the BizHawk docs and used in example scripts? It turns out it's equivalent to a special case of what I have shown here. emu.frameadvance just calls coroutine.yield after setting a flag indicating that the script wants to be resumed at the next frame. Scripts that yield in that way are resumed by a function that calls coroutine.resume. So calling emu.frameadvance is effectively the same as calling run_until_event with event.onframestart, relying on special-purpose code inside BizHawk. The advantage of doing it yourself is that you can do it with any event, not just event.onframestart.
Post subject: GBA resync
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Thank you Alyosha, that's great! What did you have to do to resync the video for console verification? I didn't specifically design for this, but a few days ago I found that the tas.lua script copes just fine in GBA mode, even though I originally did the TAS with Gambatte. That is, start recording a movie, load the script, watch the game play itself, and save the movie file when the script finishes. I'm curious whether this produces a movie file equivalent to what you used for the verification. Below are some rough instructions on running the script from the README file in the repository. The main thing is you need to edit tas.lua to set SAVESTATE_DIRECTORY to a filesystem path—this is a slower mode, but necessary when you want to save the final movie file.
Run the tas.lua script
BizHawk-2.9.1$ ./EmuHawkMono.sh --lua=/path/to/tas.lua kwirk.gb
How I do this is:
  1. Run BizHawk as shown above.
  2. Pause the emulator.
  3. Ctrl+R to reboot the emulator.
  4. Start recording a movie.
  5. Click the "Toggle Script" button in the Lua window to refresh tas.lua.
  6. Unpause the emulator.
  7. Wait for the emulation to finish.
You will see the game window scrubbing back and forth in time frequently – this is the TAS script trying multiple ways of doing the same action and choosing the fastest one. But beware: the bk2 movie files created this way will not be replayable. They will desync if you try to replay them. That is because, by default, the tas.lua script uses in-memory savestates that don't faithfully preserve an input log. To make the movie file replayable once you have found the best one, you will have to enable on-disk savestates as shown below. Record a movie When you're ready to record a video that plays back, you need to adjust tas.lua to make it save real on-disk savestates, rather than in-memory savestates. The difference with on-disk savestates is that they contain a full input history, which is not the case with in-memory savestates. But on-disk savestates are slower to save and restore, which is why they're not the default. Just edit this line and set SAVESTATE_DIRECTORY to some writable directory.
local SAVESTATE_DIRECTORY = "/tmp"
Record a movie as shown above. This one will be able to be played back. Afterward, you can delete the tempsavestate-* files.
Post subject: Looking into root cause
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
I looked at an instruction trace of User movie #638616165666042151. I traced things back as far as where the game starts executing some data as instructions, until it hits an 0xff opcode (which is rst $38) at 0x4cb3. The code starting at 0x0038 is just NOPs, so execution falls through to the vblank interrupt handler at 0x0040. The vblank interrupt handler returns with a reti instruction, returning to the address 0x4cb4, which another 0xff rst $38 instruction, so the vblank interrupt handler gets called again, returns again, and continues at 0x4cb5. What we get is a long sequence of instructions, most of which are 0xff, which means that the code repeatedly invokes the vblank interrupt handler, punctuated by a few other instructions, until finally a long buffer of 0xff bytes ends at 0xc000. Stripping out the code of the jumping into and returning from the interrupt handler, what we have is this:
4CB3:  FF        rst $38             A:00 F:10 B:79 C:80 D:20 E:71 H:98 L:f0 LY:28 SP:dfe9  Cy:4465421866
4CB4:  FF        rst $38             A:00 F:10 B:79 C:80 D:20 E:71 H:98 L:f0 LY:29 SP:dfe9  Cy:4465422036
4CB5:  FE FD     cp a, $FD           A:00 F:10 B:79 C:80 D:20 E:71 H:98 L:f0 LY:2a SP:dfe9  Cy:4465422206
4CB7:  FF        rst $38             A:00 F:70 B:79 C:80 D:20 E:71 H:98 L:f0 LY:2a SP:dfe9  Cy:4465422210
4CB8:  5F        ld e, a             A:00 F:70 B:79 C:80 D:20 E:71 H:98 L:f0 LY:2b SP:dfe9  Cy:4465422380
4CB9:  CF        rst $08             A:00 F:70 B:79 C:80 D:20 E:00 H:98 L:f0 LY:2b SP:dfe9  Cy:4465422382
AFD8:  FF        rst $38             A:d7 F:20 B:f0 C:80 D:00 E:e0 H:af L:d7 LY:2c SP:dfe9  Cy:4465422744
AFD9:  FF        rst $38             A:d7 F:20 B:f0 C:80 D:00 E:e0 H:af L:d7 LY:2d SP:dfe9  Cy:4465422914
AFDA:  FF        rst $38             A:d7 F:20 B:f0 C:80 D:00 E:e0 H:af L:d7 LY:2e SP:dfe9  Cy:4465423084
AFDB:  FF        rst $38             A:d7 F:20 B:f0 C:80 D:00 E:e0 H:af L:d7 LY:2e SP:dfe9  Cy:4465423254
... long buffer of FF
BFFB:  FF        rst $38             A:d7 F:20 B:f0 C:80 D:00 E:e0 H:af L:d7 LY:3a SP:dfe9  Cy:4466128134
BFFC:  FF        rst $38             A:d7 F:20 B:f0 C:80 D:00 E:e0 H:af L:d7 LY:3b SP:dfe9  Cy:4466128304
BFFD:  FF        rst $38             A:d7 F:20 B:f0 C:80 D:00 E:e0 H:af L:d7 LY:3c SP:dfe9  Cy:4466128474
BFFE:  FF        rst $38             A:d7 F:20 B:f0 C:80 D:00 E:e0 H:af L:d7 LY:3c SP:dfe9  Cy:4466128644
BFFF:  FF        rst $38             A:d7 F:20 B:f0 C:80 D:00 E:e0 H:af L:d7 LY:3d SP:dfe9  Cy:4466128814
C000:  00        nop                 A:d7 F:20 B:f0 C:80 D:00 E:e0 H:af L:d7 LY:3e SP:dfe9  Cy:4466128984
The trace that leads to the first execution of 0xff rst $38 is strange. It looks like two separate pieces of code that have been mashed together. Here is the trace that leads to the execution of 0x4cb3:
4CA1:  2A        ld a, [hl+]         A:01 F:20 B:78 C:80 D:20 E:71 H:d6 L:3d LY:28 SP:dfe9  Cy:4465421818
4CA2:  6E        ld l, [hl]          A:20 F:20 B:78 C:80 D:20 E:71 H:d6 L:3e LY:28 SP:dfe9  Cy:4465421822
4CA3:  67        ld h, a             A:20 F:20 B:78 C:80 D:20 E:71 H:d6 L:71 LY:28 SP:dfe9  Cy:4465421826
4CA4:  7E        ld a, [hl]          A:20 F:20 B:78 C:80 D:20 E:71 H:20 L:71 LY:28 SP:dfe9  Cy:4465421828
4CA5:  E6 80     and a, $80          A:10 F:20 B:78 C:80 D:20 E:71 H:20 L:71 LY:28 SP:dfe9  Cy:4465421832
4CA7:  B0        or a, b             A:00 F:a0 B:78 C:80 D:20 E:71 H:20 L:71 LY:28 SP:dfe9  Cy:4465421836
4CA8:  77        ld [hl], a          A:78 F:00 B:78 C:80 D:20 E:71 H:20 L:71 LY:28 SP:dfe9  Cy:4465421838
4CA9:  07        rlca                A:78 F:00 B:78 C:80 D:20 E:71 H:20 L:71 LY:28 SP:dfe9  Cy:4465421842
4CAA:  0D        dec c               A:f0 F:00 B:78 C:80 D:20 E:71 H:20 L:71 LY:28 SP:dfe9  Cy:4465421844
4CAB:  09        add hl, bc          A:f0 F:60 B:78 C:7f D:20 E:71 H:20 L:71 LY:28 SP:dfe9  Cy:4465421846
4CAC:  03        inc bc              A:f0 F:00 B:78 C:7f D:20 E:71 H:98 L:f0 LY:28 SP:dfe9  Cy:4465421850
4CAD:  0F        rrca                A:f0 F:00 B:78 C:80 D:20 E:71 H:98 L:f0 LY:28 SP:dfe9  Cy:4465421854
4CAE:  A1        and a, c            A:78 F:00 B:78 C:80 D:20 E:71 H:98 L:f0 LY:28 SP:dfe9  Cy:4465421856
4CAF:  04        inc b               A:00 F:a0 B:78 C:80 D:20 E:71 H:98 L:f0 LY:28 SP:dfe9  Cy:4465421858
4CB0:  3F        ccf                 A:00 F:00 B:79 C:80 D:20 E:71 H:98 L:f0 LY:28 SP:dfe9  Cy:4465421860
4CB1:  7F        ld a, a             A:00 F:10 B:79 C:80 D:20 E:71 H:98 L:f0 LY:28 SP:dfe9  Cy:4465421862
4CB2:  7F        ld a, a             A:00 F:10 B:79 C:80 D:20 E:71 H:98 L:f0 LY:28 SP:dfe9  Cy:4465421864
4CB3:  FF        rst $38             A:00 F:10 B:79 C:80 D:20 E:71 H:98 L:f0 LY:28 SP:dfe9  Cy:4465421866
The trace stops making sense as code at 0x4ca9 rlca. What it looks like, to me, is that the instructions before that point come from one part of the ROM, and the instructions after that come from a different part of the ROM. Here's the code that starts at 0x1f4ca1 in the ROM. Notice how it agrees with the above trace up through 0x1f4ca8 ld [hl], a, and disagrees thereafter:
1f4ca1  2a      ldi   a, [hl]
1f4ca2  6e      ld    l, [hl]
1f4ca3  67      ld    h, a
1f4ca4  7e      ld    a, [hl]
1f4ca5  e680    and   0x80
1f4ca7  b0      or    b
1f4ca8  77      ld    [hl], a
1f4ca9  e5      push  hl
1f4caa  cd181e  call  0x1e18
1f4cad  e1      pop   hl
1f4cae  cdd34d  call  0x4dd3
1f4cb1  fa79d6  ld    a, [0xd679]
And here is the code (actually looks more like data) that starts at 0x374ca1 in the ROM. Notice how it agrees with the above trace after 0x374ca9 rlca, but disagrees before that:
374ca1  0d      dec   c
374ca2  09      add   hl, bc
374ca3  0a      ld    a, [bc]
374ca4  0604    ld    b, 0x04
374ca6  04      inc   b
374ca7  84      add   h
374ca8  02      ld    [bc], a
374ca9  07      rlca
374caa  0d      dec   c
374cab  09      add   hl, bc
374cac  03      inc   bc
374cad  0f      rrca
374cae  a1      and   c
374caf  04      inc   b
374cb0  3f      ccf
374cb1  7f      ld    a, a
374cb2  7f      ld    a, a
374cb3  ff      rst   0x38
Again, I don't know much about the Game Boy architecture, but what it looks like to me is, the game had 0x1f4cxx mapped in, and while that code was executing, 0x374cxx got mapped in on top of it. Is that possible?
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
jlun2 wrote:
I wasn't able to replicate the trigger from my previous input file, so I found a new spot right below an exit tile that jumps to 0xC000 once you ground pound on it. The game then jumps back to the main loop, and I beat the stage with ending id set to 6. This plays the real final chapter ending credits after saving.
This is awesome! I'm impressed. The disassembly visualization in the video is great, too. An alternative to writing the instructions you need would be to find a place where those instructions already exist in the ROM, and jump to them. (Something like finding a "gadget" in return-oriented programming.) There is a sequence in the ROM at address 0x00264b0e that is tailor-made, ld a, 0x06; ld [0xd70a], a, followed by a ret:
264afa:  3e01    ld    a, 0x01
264afc:  1816    jr    0x16
264afe:  3e02    ld    a, 0x02
264b00:  1812    jr    0x12
264b02:  3e03    ld    a, 0x03
264b04:  180e    jr    0x0e
264b06:  3e04    ld    a, 0x04
264b08:  180a    jr    0x0a
264b0a:  3e05    ld    a, 0x05
264b0c:  1806    jr    0x06
264b0e:  3e06    ld    a, 0x06
264b10:  ea0ad7  ld    [0xd70a], a
264b13:  c9      ret
264b14:  ea0ad7  ld    [0xd70a], a
264b17:  cd9e06  call  0x069e
264b1a:  c9      ret
Instead of encoding your own ld a, 0x06; ld [0xd70a], a (5 bytes), you could do a call 0x4b0e (3 bytes). Or even jp 0x4b0e: maybe you'll get lucky and there will be a non-crashing return address on the stack for the ret to return to. However, I tried dumping memory at the time when the game starts executing 0xc000, and it looks like the above code is not mapped at that point. (At least I think—I don't know Game Boy architecture that well.) So this idea may be a dead end.
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Exploit code constructed out of treasure and puzzle flags, that's amazing :) It's not every day you have to think about the Hamming weight of your shellcode. I'm enjoying reading about this. If the contents of the a register are predictable, it may be possible to replace ld a, $06 (3e06) with add a, $XX (c6XX), where XX is the difference between 0x06 and the value of a. 0xc6 saves one bit relative to 0x3e, but whether it requires fewer treasures overall depends on what XX is.
jlun2 wrote:
The following applies to gameboy. I found out that when you ground pound in certain areas below the stage, you can crash the game. I checked why, and it was because it jumped to 0xE200. 0xC504, and thus 0xE504 is the treasure flags. So there's a small spot to manipulate for a credits warp.
I see why the contents of 0xe504 are the same as 0xc504—that's echo RAM. But how does the program counter get from 0xe200 to 0xe504? Is it all NOPs in between?
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Sand wrote:
I just learned there's a cheat code in Heading Out? mode. If you hold the Left and Select buttons while confirming the "Select Display" menu, then the floor schedule will be overwritten to play all the floors in the chosen skill level, exactly once and in order. Each floor may still be inverted or not, but the cheat code skips seeding the RNG, so the inversion schedule is always the same as well. (Unless you've played an earlier round to set the seed.) This cheat code would obviously be the fastest way to play all floors in Heading Out? mode, but I'm not planning to use it in the TAS, because it would take away the fun of RNG seed routing.
Out of curiosity, and because I wanted to make a speedrunning guide, I hacked the tas.lua script to use the cheat code and play all the floors in order, in 3 segments. The total time is 87537 frames and 24:25.128. That's 3033 frames and 50.071 seconds faster than the 90570 frames and 25:15.195 of Post #530388, with its 10 segments. That tracks with the estimate of each restart costing about 8 seconds. Using the cheat code, it might be possible to get a slightly lower time by playing the difficulty levels in a different order. The floors would be the same, but they would be inverted differently.
SkillSeed# FloorsFloors
EasyN/A3030, 31i, 32, 33, 34, 35, 36i, 37i, 38i, 39, 40, 41, 42i, 43, 44, 45, 46, 47, 48, 49, 50, 51i, 52, 53i, 54, 55, 56, 57i, 58, 59i
AverageN/A5060i, 61i, 62, 63, 64i, 65, 66, 67i, 68, 69i, 70i, 71, 72, 73, 74i, 75i, 76i, 77, 78, 79i, 80i, 81, 82, 83, 84, 85, 86, 87i, 88, 89i, 90i, 91, 92i, 93i, 94i, 95, 96, 97, 98i, 99, 100, 101, 102, 103, 104, 105, 106, 107i, 108, 109i
HardN/A40110i, 111, 112, 113, 114, 115, 116i, 117i, 118, 119, 120i, 121, 122, 123i, 124, 125i, 126i, 127, 128, 129i, 130i, 131, 132, 133, 134i, 135, 136i, 137, 138, 139, 140, 141, 142, 143i, 144, 145i, 146i, 147, 148, 149
Link to video
Post subject: Linear algebra
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
There's a way to model "lights out" problems like this as linear algebra, with 0/1 matrices and vectors. You have a vector p that represents the current state of the board (in this case a 9-vector), a vector r that represents the desired end state of the board (another 9-vector), and a matrix A that represents which lights get flipped when you press each of the lights (a 9×9 matrix). Then the vector x that tells you which lights to press can be computed as x = (rp)A−1. Here's a youtube video about it: Solving the "Lights Out" Problem And here's a good web page: The Mathematics of Lights Out. The technique isn't exactly applicable to Match Blox, because of this restriction: "Only the orange squares may be changed. If you press the fire button while on a blue square, nothing happens." But it's pretty close. Jaap's Puzzle Page also has discussion of "lit-only games":
The lit-only game is solved in nearly the same manner as the normal game. Simply figure out whether any of the lit buttons need to be pressed, and then press them. If any unlit button needs to be pressed, you will need to delay pressing it until other button presses have lit it. Occasionally all buttons that remain in the solution are unlit, so then some lit button has to be pressed first to allow you to move on. Later that same button will have to be pressed again, since it wouldn't have been part of the solution if you hadn't been forced to press it the first time. This situation can usually be avoided.
What this means is that the order of presses is important (you have to make sure that each square is lit at the time to press it), and in some cases you may need to press a light that's not in the vector solution, then press it again later. The other aspect of the game that this doesn't capture is the time required for cursor movement. Your bot is likely a more straightforward solution, given these additional complications. But just for interest, here's a way of doing the matrix computations using SageMath. You can run the program at https://sagecell.sagemath.org/. This is for the first batch of solutions (not the updated batch).
Language: python

# Each row shows which squares are affected by pressing the square that # corresponds to the number of that row. For example, pressing square 1 flips # squares 1, 2, 4, and 5 (first row of the matrix). # https://archive.org/details/1986-11-computegazette/page/n57/mode/2up # "Choosing one of the four corner squares (1, 3, 7, or 9) reverses the color of # that square and the three adjacent squares. Choosing an edge square # (2, 4, 6, or 8) reverses its color as well as the two adjoining corner # squares. If you select the center square, its color is reversed and so are the # colors of the four edge squares." A = matrix(GF(2), [ [1,1,0,1,1,0,0,0,0], [1,1,1,0,0,0,0,0,0], [0,1,1,0,1,1,0,0,0], [1,0,0,1,0,0,1,0,0], [0,1,0,1,1,1,0,1,0], [0,0,1,0,0,1,0,0,1], [0,0,0,1,1,0,1,1,0], [0,0,0,0,0,0,1,1,1], [0,0,0,0,1,1,0,1,1], ]) Ainv = A.inverse() # https://www.jaapsch.net/puzzles/lomath.htm#linalg def solve(p, r): return (p - r) * Ainv # Target patterns, each square numbered in the natural way. UNI_COLOR = vector(GF(2), [0,0,0,0,0,0,0,0,0]) CROSS = vector(GF(2), [1,0,1,0,0,0,1,0,1]) NO_CENTER = vector(GF(2), [0,0,0,0,1,0,0,0,0]) FOUR_CORNERS = vector(GF(2), [0,1,0,1,1,1,0,1,0]) FIVE_POINTS = vector(GF(2), [0,1,0,1,0,1,0,1,0]) for p, r, name in ( ([1,1,0,1,0,1,0,0,1], FIVE_POINTS, "5 POINTS"), ([0,0,1,0,0,0,1,1,0], FOUR_CORNERS, "4 CORNERS"), ([0,0,0,1,1,1,1,1,0], NO_CENTER, "NO CENTER"), ([1,0,0,1,0,0,1,1,1], UNI_COLOR, "UNI-COLOR"), ([1,1,1,1,1,0,1,0,0], CROSS, "CROSS"), ): p = vector(GF(2), p) x = solve(p, r) print(f"{name:9}", p, x, "->", [i+1 for (i, xi) in enumerate(x) if xi != 0])
Here's the output:
r=5 POINTS  p=(1, 1, 0, 1, 0, 1, 0, 0, 1) x=(0, 1, 1, 0, 0, 0, 0, 0, 1) -> [2, 3, 9]
r=4 CORNERS p=(0, 0, 1, 0, 0, 0, 1, 1, 0) x=(1, 1, 0, 0, 1, 0, 1, 0, 0) -> [1, 2, 5, 7]
r=NO CENTER p=(0, 0, 0, 1, 1, 1, 1, 1, 0) x=(1, 0, 0, 1, 1, 0, 0, 0, 0) -> [1, 4, 5]
r=UNI-COLOR p=(1, 0, 0, 1, 0, 0, 1, 1, 1) x=(1, 0, 0, 0, 1, 0, 1, 0, 1) -> [1, 5, 7, 9]
r=CROSS     p=(1, 1, 1, 1, 1, 0, 1, 0, 0) x=(0, 1, 0, 1, 0, 1, 0, 1, 1) -> [2, 4, 6, 8, 9]
The solutions match what your bot found, except that they may be out of order. The exception is CROSS, which is one of those cases where none of the square you need to press is lit, so you need to press some other square (5 in this case) and press it again later: [2, 4, 5, 6, 9, 5, 8].
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Alyosha wrote:
Do you foresee the tools you used here being applicable to other projects?
One obvious application would be to try the floor solver and/or extractor with A-mazing Tater. I don't know the game, but from [3716] GB A-mazing Tater "Puzzle Mode" by TheRealThingy & Alyosha in 17:22.00 the mechanics look the same. For this project, I had to work out how floors are stored in the ROM (see the rip-floors script in the source code), and if it's the same format in Amazing Tater, it might be useful to extract the floors for an external solver. It would require some analysis to find the appropriate ROM offsets. But what I'm more excited about, and what I plan to make a thread at tool-assisted laboratory for, is the rudimentary data-oriented savestate manipulation library I used to script this TAS. All the inputs come from a Lua script, which does things like find the earliest frame at which each input is accepted. To do that, the script tries pressing the input, then runs forward a few frames to see if a relevant memory location has changed; if not, it rewinds to where it started, tries the same input 1 frame later, runs ahead again, scrubs back if necessary, etc., until it finds the earliest frame that accepts the input. That kind of thing is difficult to write using event callbacks and emu.frameadvance(). I found a nice way to express operations like that in Lua, one that looks more like manipulating data using straightforward imperative code, functions and loops. It's similar to rewriting callback-oriented JavaScript code to instead use promises, async, and await. I'm hoping to write more about this, but see, for example, the earliest_effective_frame function in tas.lua, and how it is called at the title screen to find the earliest frame at which you can press Start. Rather than treating the emulator as a massive global variable, the function accepts an immutable savestate as input and produces an immutable savestate as output, purely functional as long as you remain within the abstraction.
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Sand wrote:
It was surprisingly difficult to find a route faster than the 90741-frame one I posted as a point of comparison in Post #529667. Long story short, the best route I've found is 90570 frames long, about 2.8 seconds faster.
I've submitted the 90570-frame route E42x30-H34x22-H22x10-H58x8-A37x12-A02x15-A04x15-A17x9: #9257: Sand's GB Kwirk "Heading Out?, all floors" in 25:15.19. The video encode has a tracker on the side that shows what floors have been completed. It also shows the frame counter used to seed the RNG, and the RNG seed for the current segment. Link to video
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Thank you. That explanation helps me understand. If I run an emu.totalexecutedcycles() each frame, I can see that the difference between consecutive frames is not constant. What BizHawk is doing is totally reasonable, given the constraints you've outlined. I reworked my status variable logger to store a timestamp, rather than a frame number, with each set of values. When postprocessing the encoded video, I can take the video decoder's timestamp for each frame, and use that to look up the set of values with the closest timestamp. It's usually not an exact match, but the difference is less than 1 frame, which is fine for what I'm doing.
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
I tried encoding a video from a bk2 file, and while the video file has the expected total duration and what seems to be a correct framerate, the total number of frames is smaller than the number of frames in the bk2 file. Take User movie #638529018325843149 for example. The bk2 contains 90741 frames (which, given the duration of 1517.80 s, implies a framerate of 59.7845 fps). The encoded video, however, contains 90655 frames for the same total duration, leading to a framerate of 59.7275 fps. Is this expected? I'm using BizHawk 2.9.1 on Linux, Gambatte core, FFmpeg lossless Matroska encoding. Reproduction instructions and more analysis follow.
  1. Download User movie #638529018325843149.
  2. Run BizHawk as ./EmuHawkMono.sh --movie=kwirk-headingout-90741.bk2 kwirk.gb.
  3. EmulationPause and FileMoviePlay from Beginning.
  4. FileMovieOn Movie End...✓Pause.
  5. FileAVI/WAVConfig and Record AVI/WAV....
  6. Click OK at "Most of these options will cause crashes on Linux."
  7. Select FFmpeg writer and leave Sync to Audio checked. Click OK.
  8. Select Matroska Lossless. Command should be -c:a pcm_s16le -c:v libx264rgb -qp 0 -pix_fmt rgb24 -f matroska. Click OK.
  9. Choose an output filename (I used tmp.mkv) and click Save.
  10. Uncheck ConfigSpeed/SkipClock Throttle, then uncheck EmulationPause. Let the movie play until it auto-pauses at frame 90741.
  11. FileAVI/WAVStop AVI/WAV
These are the stats from Header.txt in the bk2 file:
CycleCount 3183059634
ClockRate 2097152
3183059634 / 2097152 yields a total duration of 1517.80 s, which matches what's on the userfile page. 90741 frames divided by that duration gives a framerate of 59.7845 fps. The encoded video file has the expected duration of 1517.80 s, but counting frames reveals that the video stream has just 90655 frames, and a framerate of 59.7275 fps (some ffprobe output omitted):
$ ffprobe -show_streams -count_frames tmp.mkv
Input #0, matroska,webm, from 'tmp.mkv':
  Metadata:
    ENCODER         : Lavf59.27.100
  Duration: 00:25:17.81, start: 0.000000, bitrate: 1514 kb/s
  Stream #0:0: Video: h264 (High 4:4:4 Predictive), gbrp(pc, gbr/unknown/unknown, progressive), 160x144 [SAR 1:1 DAR 10:9], 59.73 fps, 59.73 tbr, 1k tbn
    Metadata:
      ENCODER         : Lavc59.37.100 libx264rgb
      DURATION        : 00:25:17.810000000
  Stream #0:1: Audio: pcm_s16le, 44100 Hz, 2 channels, s16, 1411 kb/s
    Metadata:
      ENCODER         : Lavc59.37.100 pcm_s16le
      DURATION        : 00:25:17.801000000
[STREAM]
index=0
r_frame_rate=23891/400
avg_frame_rate=23891/400
nb_read_frames=90655
[/STREAM]
(Note, however, that the audio stream has the expected 90741 frames):
[STREAM]
index=1
r_frame_rate=0/0
avg_frame_rate=0/0
nb_read_frames=90741
[/STREAM]
I am not sure where the 59.7845 fps (derived from the cycle counts in Header.txt) originally stems from, but the 59.7275 fps looks to be attributable directly to Gambatte.cs, as well as being attested in online documentation about Game Boy hardware. https://github.com/TASEmulators/BizHawk/blob/3c1248547f5c8116152a041a43d8e806419dc0fe/src/BizHawk.Emulation.Cores/Consoles/Nintendo/Gameboy/Gambatte.cs#L267-L275
Language: csharp

/// <summary> /// the nominal length of one frame /// </summary> private const uint TICKSINFRAME = 35112; /// <summary> /// number of ticks per second /// </summary> private const uint TICKSPERSECOND = 2097152;
2097152 / 35112 = 59.72750057..., which as a ratio also matches https://mgba-emu.github.io/gbdoc/#hardware's 4194304 cycles per second divided by 70224 cycles per frame. In summary, with lossless video encoding, I expected there to be exactly one video frame for each frame in the bk2 file. Instead, the encoded video file has fewer frames for the same total duration (which implies a somewhat lower framerate and that some frames must have been discarded). I am not sure if this is expected behavior or not, and I would appreciate some guidance before doing a lot of experiments with different cores, etc. Any hints? The reason I noticed this at all is that I had a Lua script to write out the values of some status variables on each frame, with the intention of adding a tracker UI in postprocessing by matching up the status variable with the video, frame by frame. But because the video did not represent every frame in the input file, the tracker very slowly drifted out of sync with the game.
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
It was surprisingly difficult to find a route faster than the 90741-frame one I posted as a point of comparison in Post #529667. Long story short, the best route I've found is 90570 frames long, about 2.8 seconds faster. Link to video BizHawk file This route has 8 segments, compared to Post #529667's 10 segments. It has one repeated floor, floor 90 in Average skill, but repeating the floor is faster than restarting with additional segments.
SkillSeed# FloorsFloors
Easy423032i, 35i, 55i, 46, 59, 45i, 49i, 58, 56, 33, 30i, 57, 48i, 31i, 47i, 51i, 37, 41, 38, 44i, 50i, 34i, 42i, 53, 43, 54i, 40i, 36, 52i, 39i
Hard3422138i, 110i, 139, 113i, 146, 148i, 111i, 144i, 143, 140, 131, 142i, 136i, 149i, 112i, 147, 121, 129i, 127i, 141, 135i, 115i
Hard2210128i, 123i, 145i, 117, 114i, 125, 119, 120, 133, 137
Hard588132, 122i, 126, 124i, 134i, 130, 118, 116
Average3712109i, 100, 106, 104i, 98i, 65i, 63i, 93, 90, 73i, 95i, 96
Average21589i, 108, 71, 94, 87i, 102, 86, 92i, 80, 103i, 60i, 91, 97, 64, 62i
Average41568, 82i, 83i, 74i, 99, 70i, 76, 84i, 101, 77, 67, 90, 69, 107, 105i
Average17978, 75, 61i, 79, 88, 72, 85, 81i, 66i
I started using an abbreviated notation to describe routes. The above route is E42x30-H34x22-H22x10-H58x8-A37x12-A02x15-A04x15-A17x9. Here are other routes I discovered and tested over the past month:
RouteSegmentsRepeatsFramesComment
E42x30-H34x23-H22x10-H58x7-A26x2-A21x16-A45x17-A04x8-A17x99291644
E42x30-H34x22-H22x10-H58x8-A21x3-A21x12-A45x16-A03x3-A04x8-A17x910191528
E42x30-H34x22-H22x10-H58x8-A21x5-A37x13-A45x7-A04x14-A17x10-A53x110090930
E42x30-H34x23-H22x10-H58x7-A21x6-A37x11-A45x13-A04x12-A48x2-A17x610090904
E42x30-H34x23-H22x10-H58x7-A21x6-A37x12-A45x13-A04x11-A53x1-A17x710090902
E42x30-H34x23-H22x10-H58x7-A26x2-A21x22-A17x10-A19x5-A04x8-A45x310090896
E42x30-H34x23-H22x10-H58x7-A02x29-A17x9-A26x5-A10x1-A14x3-A13x310090873Best with UCS
E42x30-H44x8-H41x15-H22x5-H07x4-H58x8-A45x3-A25x1-A23x2-A48x4410090786
E42x30-H44x8-H41x17-H22x4-H07x4-H58x7-A22x1-A45x3-A25x4-A48x4210090743
E42x30-H34x23-H22x10-H58x7-A02x29-A17x9-A26x5-A13x5-A10x1-A05x110090741Post #529667
E42x30-H34x22-H22x10-H58x8-A37x12-A02x14-A04x16-A17x98190578
E42x30-H34x22-H22x10-H58x8-A37x12-A02x13-A04x17-A17x98190577
E42x30-H34x22-H22x10-H58x8-A37x12-A02x15-A04x15-A17x98190570This post
I let the uniform cost search optimizer from Post #529667 keep running for a week or so. It did eventually find an exact solution, E42x30-H34x23-H22x10-H58x7-A02x29-A17x9-A26x5-A10x1-A14x3-A13x3. The optimizer doesn't take into account the fact that the time to restart a new segment is variable (mainly because of the time to wait for the next RNG seed), because that it too hard to simulate. So the idea was to get some fast route candidates, then get precise frame counts for them by executing the routes in the emulator. (Because the TAS is automated using a Lua script, testing candidate routes doesn't require manual TASing.) The optimizer estimates that E42x30-H34x23-H22x10-H58x7-A02x29-A17x9-A26x5-A10x1-A14x3-A13x3 is 18 frames faster than the E42x30-H34x23-H22x10-H58x7-A02x29-A17x9-A26x5-A13x5-A10x1-A05x1 of Post #529667, but when actually played out, it's actually 183 frames slower. If allowed to continue, the optimizer would eventually have gotten to E42x30-H34x23-H22x10-H58x7-A02x29-A17x9-A26x5-A13x5-A10x1-A05x1 and competitive routes, but here I decided to try something different, a genetic algorithm. The genetic algorithm has the advantage that is is not bound by the simplifying restrictions introduced to make the problem more tractable. In particular, it's allowed to repeat floors, and start a new segment even when the current segment is not just about to hit a repeat. The tradeoff, of course, is that there's no guarantee of optimality. The first breakthrough was when the genetic algorithm found the solution H34x22-H22x10-H58x8 for the Hard floors. This is very similar to how Hard was solved in Post #529667, H34x23-H22x10-H58x7, just moving one floor from the seed 58 segment to the seed 34 segment, but its slightly different history buffer of recently played levels clears the way for a faster clear of Average. I ran the genetic optimizer several thousand times and tested the best routes it found. Ultimately the best one is the one you see in the video above. The tas.lua script to execute it is in commit 78a676ebcde7f6885d99b751ac27d3f1683df17f in my repo.
Post subject: A complete route for Heading Out?, all floors
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
I've been working on the all-floors route since correcting my understanding of how RNG is affected by previously completed floors. As I suspected, it is a harder overall than the set cover / traveling salesman I had envisioned before. But while the larger search space makes solutions harder to find, solutions also have the potential to be faster, with fewer segments and fewer repeated floors. I found a working route for "Heading Out? all unique floors" that hits all 120 floors in 10 segments with no repeats. It's 90741 frames in length. You can find it in User movie #638529018325843149, and I'll say more about it below. I'm planning to try running a few more variations through the route optimizer to see if I can find a faster route. The route optimizer I wrote is the usual uniform cost search kind of thing. It treats the game states (skill, floor schedule, number of unique floors completed) as vertices in a graph. Being at a certain vertex means you have just completed a floor, and you now have the option of playing the next floor in the schedule, or starting a new segment with a new skill and RNG seed. We're trying to find a shortest path from the origin vertex to a vertex that has all floors completed. To make the problem more tractable, I've imposed some simplifying restrictions:
  1. Optimize only one skill level at a time (Easy, Average, or Hard). This means when we follow a "restart" edge, we only have to consider up to 60 seeds in the current skill, not up to 180 seeds across all skills. This restriction could cause the optimizer to ignore some possibilities; for example, suppose there is a route where you play a few segments in Hard, which lines up the RNG just right to solve Average, which then lines up the RNG to finish the rest of Hard.
  2. Don't follow edges ("next floor" or "restart") that would have us repeat a floor that has already been done. This one is actually not much of a restriction, because we expect an optimum route not to have any repeats anyway.
  3. Only consider "restart" edges when "next floor" is impossible because it would cause a repeat. This means that every segment is as long as possible without a repeat. The optimizer will ignore opportunities to restart a segment earlier. Since every "restart" choice means adding up to 60 new vertices to the candidate list (which in turn can add 60 of its own, and so on), this restriction is important for keeping the size of the candidate vertex set manageable.
The Easy and Hard skill levels are not too hard to deal with. Easy has only 30 floors, and we already know that seed 42 works from boot to cover all 30 perfectly. The optimizer finds that single-segment route for Easy instantly. Hard has 40 floors, which makes the optimizer work more, but is still not too bad. Hard can be done after Easy in just 3 segments. The difficult part is Average skill, with its 50 floors. The most progress I have been able to make in Average, so far, is 48 floors. That's with giving it several hours of computation and about 150 GB of space to work with. I tried optimizing the skill levels in the order Easy, Hard, Average. It got to 118 floors before running out of space. The unfinished floors were 76 and 105. I tried restarting the optimizer with just those two floors, and it found a way to cover them with one additional segment each. This is probably suboptimal (even with the restrictions listed above) because it's not a global optimization: we optimized the problem in two parts and glued them together. But it is at least a functional route that can serve as a basis of comparison. Here is the route. The movie file for it is User movie #638529018325843149. The tas.lua script that executes the route is available at commit 32c435cca14ae486f42be4b261a3fa5a061e0c82 in my repo.
SkillSeed# FloorsFloors
Easy423032i, 35i, 55i, 46, 59, 45i, 49i, 58, 56, 33, 30i, 57, 48i, 31i, 47i, 51i, 37, 41, 38, 44i, 50i, 34i, 42i, 53, 43, 54i, 40i, 36, 52i, 39i
Hard3423138i, 110i, 139, 113i, 146, 148i, 111i, 144i, 143, 140, 131, 142i, 136i, 149i, 112i, 147, 121, 129i, 127i, 141, 135i, 115i, 118
Hard2210128, 123i, 145i, 117i, 114, 125i, 119, 120, 133, 137
Hard587132, 122, 126i, 124i, 134i, 130, 116
Average22989i, 98, 90, 108, 71i, 94, 102, 86i, 106, 92i, 103i, 60, 104, 63, 91i, 97i, 64i, 62, 68, 88i, 93, 65i, 74i, 107, 73, 109, 72, 84, 78
Average17967, 69, 75i, 61, 79, 85, 81, 70i, 66i
Average26587, 80i, 77, 82i, 95i
Average13583, 100i, 101i, 99i, 96
Average101105i
Average5176
I'm planning to try a few more experiments with route optimization, but not take it too far. Finding routes with fewer restarts, or more favorable floor inversions, can be left as an opportunity for future improvements.
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
TheAllmightyStegosaurus, what are you hoping to be able to do? If you're working on a TAS, you don't necessarily need the full source code for a game. You do need to spend time learning and understanding the game, but there's a lot you can accomplish just using savestates and observing what happens onscreen. Reverse engineering the binary code of a game to understand how it works is a big job in itself. For a game like Mega Man 3, even something as seemingly simple as, say, understanding the map file format could take dozens of hours of work. My advice, if you're working on a TAS, is not to start at such a low level. Watch the fastest RTA runs (megamanleaderboards.net, speedrun.com) and learn what tricks they use. Try making a TAS of a full stage, using RTA strategies but with no mistakes. An intermediate step, before you dive into full reverse engineering of the code, is finding useful RAM addresses, such as Mega Man's (x, y) coordinates, HP, and weapon energy. Learning how to use RAM search is a good step towards acquiring the other skills you will need. Randomno is correct, there's no automatic way to rip the source code from a compiled program. It's not like the source code is hidden in the program somewhere. The best you can do is look at the compiled binary code and infer what the source code would have been. I'm not certain about Mega Man 3, but Mega Man 1 for DOS was written in assembly, not C or some other high-level language. You can tell this (when you have enough experience) just by observing the way functions are called, for example, but in this case there's definitive evidence in the form of a fragment of source code visible in Gaming Historian's video. When I reverse engineered the Sewer Rat AI in Mega Man 1, for example, I did it just as Randomno says: I looked at the assembly language disassembled from the program executable, identified the loops, figured out what the function calls did, etc. Then I wrote C code (C-like pseudocode really) with equivalent behavior because C is easier to read and understand than assembly—but the game was definitely not originally written in C. The comments are my interpretation: there are no comments or even variable names in the program itself. Randomno recommended the ScummVM's reverse engineering howto which is a good place to start. There are lots of reverse engineering guides online if it's something you want to learn. I use Rizin for disassembling the machine code, annotating memory locations, giving names to functions, etc. Honestly it's not an ideal tool for this purpose; in particular, it's not very good with the memory segments of 16-bit DOS code. But I haven't spent a lot of time looking for alternative tools. But any of these tools is going to require substantial background knowledge. You will need to know how a CPU works and how to read assembly language, at least. If you're not able to do those things yet, you will have to build a foundation of knowledge first. Start by doing some experiments with savestates and RAM search, and document your progress in the Mega Man 3 thread because we're interested in seeing what you come up with :)
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
TheAllmightyStegosaurus wrote:
Do we have a resources page for Mega Man 3 DOS?
No, just Mega Man 1. There are some brief notes about Mega Man 3 at Talk: Mega Man 3 at the Cutting Room Floor. rainwarrior on that page has a profile here too and has worked on Mega Man 3. There's also S&F Prod.'s page with Megaman3stuff.zip with some utilities. BTW there is an existing topic for Mega Man 3 DOS at Thread #15702: Mega Man 3.
Post subject: Optimized segment order – and a plot twist
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
I found approximately optimal orderings of the 14 segments. But I had overlooked something initially in the way random floor schedules are generated. The schedules are not independent, but are affected by schedules that have already been executed. In light of this new knowledge, optimizing the route through RNG seeds is going to require different techniques. I put the segment transition costs from Post #529194 into traveling salesman solvers, with the simplifying assumption of using Bird's-eye view only. This would forego potential minor time savings of using Diagonal view for some segments. I tried the LK solver and GenericGraph.traveling_salesman_problem from Sage. (I also tried algorithms.approximation.traveling_salesman.traveling_salesman_problem from NetworkX, but even with a reduced graph of 7 segments, it was still running after 2 hours, while the other two solvers took only a few seconds. There must be a bug, because 7! is only 5,040, small enough to solve quickly even by brute force.) LK and Sage gave me different routes with equal cost:
FramesSkillSeed# FloorsFloors
17379Easy423032i, 35i, 55i, 46, 59, 45i, 49i, 58, 56, 33, 30i, 57, 48i, 31i, 47i, 51i, 37, 41, 38, 44i, 50i, 34i, 42i, 53, 43, 54i, 40i, 36, 52i, 39i
42681Hard3428120, 133i, 138i, 110i, 118i, 123, 139, 119i, 113i, 146, 148i, 111i, 144, 143i, 140[, 131i, 142i, 136i, 149, 116, 112, 147i, 121i, 129i, 127i, 141, 135, 115i
46056Hard263137i, 140i, 130i
49567Hard584128i, 132, 122i, 126
54155Hard214125, 134, 114i, 145i
57988Average125105i, 94, 104, 77i, 99i
60217Average36260, 101i
64304Average54682, 91, 97, 78, 96, 95i
67084Average52474i, 89, 93i, 68i
70738Average114109i, 100i, 62, 83
73845Average45490, 103, 87, 76i
75257Average321102
91869Average172567i, 78, 84i, 69, 75, 61, 79, 88, 72, 85, 81, 70i, 66, 92, 86, 98, 73i, 71i, 108i, 106, 65, 63i, 107i, 64i, 80i
93520Hard352124i, 117
FramesSkillSeed# FloorsFloors
17379Easy423032i, 35i, 55i, 46, 59, 45i, 49i, 58, 56, 33, 30i, 57, 48i, 31i, 47i, 51i, 37, 41, 38, 44i, 50i, 34i, 42i, 53, 43, 54i, 40i, 36, 52i, 39i
42681Hard3428120, 133i, 138i, 110i, 118i, 123, 139, 119i, 113i, 146, 148i, 111i, 144, 143i, 140, 131i, 142i, 136i, 149, 116, 112, 147i, 121i, 129i, 127i, 141, 135, 115i
46076Hard263137i, 140i, 130i
62664Average172567i, 78, 84i, 69, 75, 61, 79, 88, 72, 85, 81, 70i, 66, 92, 86, 98, 73i, 71i, 108i, 106, 65, 63i, 107i, 64i, 80i
66497Average125105i, 94, 104, 77i, 99i
68726Average36260, 101i
72813Average54682, 91, 97, 78, 96, 95i
75593Average52474i, 89, 93i, 68i
79247Average114109i, 100i, 62, 83
82354Average45490, 103, 87, 76i
83771Average321102
88276Hard214125, 134, 114i, 145i
91869Hard584128i, 132, 122i, 126
93520Hard352124i, 117
But when I entered one of the schedules into the Lua script to have it executed, it desynced after the first segment. There was an error in my understanding of how the random floor schedule generation algorithm works. I understood that the algorithm does not allow repeating a floor if that same floor was already played within the most recent 20 floors of a schedule. What I did not understand is that this 20-floor history carries over across schedules, too. The first floor in a schedule cannot be the same as any of the most recent 20 floors played in previous runs, then the window shifts by 1 for the second floor, and so on. (The equality that matters for this purpose is the relative floor number; that is, the floor's index number minus the index of the first floor in the skill level. So floor 35 (index 5 in Easy) will block not only 35, but also 65 (index 5 in Average) and 115 (index 5 in Hard). This is the code that I initially misinterpreted:
generate_random_floor_schedule:
        31e7    1111c3  ld      de, heading_out_floor_schedule
                        ; Seed the RNG.
        31ea    fa77c3  ld      a, [frame_counter_copy]
        31ed    ea28cf  ld      [rng_state_a], a
        31f0    ea29cf  ld      [rng_state_b], a
        31f3    ea2acf  ld      [rng_state_c], a
        31f6    ea2bcf  ld      [rng_state_d], a
        31f9    0e63    ld      c, 99           ; Always generate a full schedule of 99 rooms.
next_random:
        31fb    cd5432  call    rng_next        ; Get a random 8-bit integer.
        31fe    6f      ld      l, a
        31ff    2600    ld      h, 0
        3201    cdc93e  call    heading_out_num_floors_for_skill_level
        3204    78      ld      a, b
        3205    cd2b1f  call    remainder_16b   ; Reduce the random number mod the number of floors in the skill level.
        3208    c61e    add     30              ; Add 30 to skip over Going Up? floors.
        320a    cd3c32  call    check_for_recent_duplicate_floors
        320d    feab    cp      0xab            ; 0xab is a sentinel that means a duplicate was found.
        320f    28ea    jr      Z, next_random
        3211    12      ld      [de], a         ; Assign next slot in heading_out_num_floors.
        3212    13      inc     de
        3213    0d      dec     c               ; Are we done with all 99 floors yet?
        3214    20e5    jr      nZ, next_random
        3216    fab9c2  ld      a, [heading_out_num_floors]
        3219    47      ld      b, a
        321a    fab8c2  ld      a, [game_mode]
        321d    fe01    cp      0x01
        321f    2808    jr      Z, copy_last_20_into_history
        3221    fabac2  ld      a, [heading_out_num_floors_remaining]
        3224    4f      ld      c, a
        3225    90      sub     b
        3226    3801    jr      C, copy_last_20_into_history
        3228    41      ld      b, c
copy_last_20_into_history:
        3229    78      ld      a, b
        322a    21fdc2  ld      hl, heading_out_floor_schedule - 20
        322d    cd6b24  call    add_16b         ; Add the selected number of floors to hl.
        3230    0614    ld      b, 20
        3232    11fdc2  ld      de, heading_out_floor_schedule - 20
                        ; Copy 20 bytes from heading_out_floor_schedule - 20 + num_floors
                        ; into the history buffer at heading_out_floor_schedule - 20.
copy_loop:
        3235    2a      ldi     a, [hl]
        3236    12      ld      [de], a
        3237    13      inc     de
        3238    05      dec     b
        3239    20fa    jr      nZ, copy_loop
        323b    c9      ret

check_for_recent_duplicate_floors:
        323c    d5      push    de
        323d    c5      push    bc
        323e    0614    ld      b, 20
        3240    4f      ld      c, a
        3241    1b      dec     de
check_loop:
        3242    1a      ld      a, [de]
        3243    b9      cp      c
        3244    2806    jr      Z, dup_found    ; Duplicate found, bail out.
        3246    1b      dec     de
        3247    05      dec     b               ; Done with all 20 yet?
        3248    20f8    jr      nZ, check_loop
        324a    1804    jr      no_dups         ; Finished all 20, no duplicates.
dup_found:
        324c    3eab    ld      a, 0xab         ; 0xab is a sentinel indicating a duplicate.
        324e    1801    jr      done
no_dups:
        3250    79      ld      a, c            ; Restore the caller's a register.
done:
        3251    c1      pop     bc
        3252    d1      pop     de
        3253    c9      ret
The important part is the copy_last_20_into_history label at the end of generate_random_floor_schedule. There is an array of 99 bytes at 0xc311 that I call heading_out_floor_schedule, which holds the randomly generated floor schedule. There is also a buffer that starts 20 bytes before heading_out_floor_schedule, which I'll call the "history" buffer. Initially, the history buffer is filled with 0xee, which is unequal to any floor number. I thought the purpose of this buffer was just to simplify the code of check_for_recent_duplicate_floors, allowing it to unconditionally look 20 bytes back, even when that goes before the beginning of heading_out_floor_schedule. But the loop at copy_last_20_into_history copies the 20 floors at the end of the schedule into the history buffer. (It's the 20 floors at the end of the schedule only up to the player's chosen number of floors, which may be less than 99.) So on the second and subsequent runs, the history buffer is no longer filled with 0xee, but contains floor numbers from the previous schedule, which affects the subsequent generation of floors, because of the no-repeat rule. It is not the case that there are only 60 possible schedules in each skill level, one for each seed, as was claimed in Post #527972. That's true for the first run, but after that each of the 60 seeds in each skill level will yield a slightly different random sequence of floors than it would have the first time. This is both good and bad. It will likely be possible to visit all unique floors in fewer than the 13 restarts that I had previously computed, because there are more opportunities to affect the RNG sequence. But the search space is greatly increased – we're no longer looking at just 180 basically static schedules – so it may not be possible to find a route that is definitely minimal.
Post subject: Timing transitions between segments
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Now that we have a list of 14 segments that together cover all unique floors, we have to decide what order to play them in. The order matters for two reasons. First, menuing. Settings carry over between segments, so transitioning from one Average segment to another Average segment, for example, is faster than transitioning from an Average to a Hard, because you don't have to go as deep into the menus. Second: RNG. Recall that the RNG seed is a frame counter mod 60. Depending on where we are in the RNG seed cycle after finishing a segment, we have to wait longer for some seeds that others. (At the same time, forced RNG waiting may make some extra menuing "free", as long as it doesn't come at the expense of something even faster.) I wrote a script to automatically measure the time to transition between each pair of segments. For each segment, it gets into the segment from the title screen (to find the transition cost if that segment happens to be chosen as the first segment), plays through the segment and records the time at the end, then (starting from a savestate immediately after solving the segment), measures how long it takes to get into every other segment. There are two ways to go through the menus and change settings between segments, and the script tries both. It also does every segment in both Diagonal and Bird's-eye views. It also measures the time to complete each floor and transition between floors within a segment. After running for about 9 hours, it outputs a table of 29×28×2 segment transition costs and 122 floor transition costs (there are 120 unique floors, 2 of which get played twice). These are segments.csv and floors.csv in the repository. Here's a sample of segments.csv. It says: from boot, it takes 524 frames to get into the (easy, birdseye, 30, 42) segment (earliest frame that gives the desired RNG seed). At the end of that segment, 17030 total frames have elapsed. To complete the (easy, birdseye, 30, 43) segment and transition into the (average, birdseye, 4, 11) segment takes 17417 frames if you use the B Button to traverse menus, or 17416 if you use the END menu. To complete the (easy, birdseye, 30, 43) segment and transition into the (hard, birdseye, 28, 34) segment takes 17440 and 17379 frames respectively for the B Button and END menu respectively.
from_skillfrom_displayfrom_num_floorsfrom_seedto_skillto_displayto_num_floorsto_seedreconfigureframes
easybirdseye3042524
easybirdseye304217030
easybirdseye3042averagebirdseye411b17417
easybirdseye3042averagebirdseye411menu17416
easybirdseye3042hardbirdseye2834b17440
easybirdseye3042hardbirdseye2834menu17379
Why test every segment in both Diagonal and Bird's-eye views? We know from #8905: Nitrodon, ZenicReverie, CyberShadow, Sand, Alyosha & Deadcode's GB Kwirk "Going Up?" in 15:15.89 that Bird's-eye view is preferable during gameplay, because pushing a block into a hole is 8 frames faster. In addition, Bird's-eye saves around 16 frames before the first floor, and between each pair of floors in a segment, as you can see in this excerpt from floors.csv (compare corresponding transition_frames between display=diagonal and display=birdseye):
prev_floorfloordisplayseednum_floorsfloor_seqtransition_framesfloor_frames
109idiagonal11413361101
109i100idiagonal1142230523
100i62diagonal1143229585
6283diagonal1144229259
109ibirdseye11413211101
109i100ibirdseye1142214499
100i62birdseye1143213585
6283birdseye1144213259
The only advantage of Diagonal is that it is 1 frame faster to choose at the SELECT DISPLAY menu—which might be enough to save an RNG cycle and therefore 60 frames. Of course, 16 frames of that gets immediately erased on entering the first floor in a segment, and the segment cannot be too long, or else the rest of the gain will be eaten up by longer floor transitions. (And of course, the segment cannot involve many instances of pushing a block into a hole.) It's unlikely, but possible, that using Diagonal display in some of the shortest segments could save time. In between segments, you have to go through the menus to change the settings for the next segment. The settings you may have to change are the skill level (Easy, Average, or Hard), number of floors, and view (Diagonal or Bird's-eye). How much time you spend in the menus depends on how many settings you have to change. In the special case where the skill level, number of floors, and view are the same in two consecutive segments, you don't have to enter the menus at all, you only have to wait for the right RNG seed. But there are two ways to navigate the menu. You can back up screen by screen using the B Button, or you can directly choose a screen to go to using the END menu. In either case, you then can only advance linearly through the rest of the screens. This is what the possible menu transitions look like graphically: The difference in time between the END menu and the B Button depends on how far back you have to go in the menu sequence:
If you need to change the skill level (SELECT SKILL menu),the END menu is 1 or 61 frames faster than the B Button
If you need to change the number of floors (SELECT COURSE menu),the END menu is 0 or 60 frames faster than the B Button
If you need to change the view (SELECT DISPLAY menu),the END menu is 2 or 62 frames slower than the B Button
I was surprised at seeing the numbers 1, 2, 61, and 62. It seems like every difference should always be a multiple of 60, because of the RNG seed cycle. Either the difference between the END menu and the B Button is small enough to hit the same cycle, or else one is one cycle slower. That is true, but as it turns out, the RNG seed variable is not an exact frame counter. It can fail to count vblanks in some situations, and moving between menus is one of them. This is why the END menu is usually faster than the B Button, because there are fewer total menu transitions. Only in the special case of having to visit only the final menu (SELECT DISPLAY) can it be faster to use the B Button. The next step is to load the segment transition costs into a traveling salesman solver to optimize the order of segments, in each case using whichever of the B Button or END menu gives the fastest transition between consecutive segments, and keeping the option of doing each segment in either Diagonal or Bird's-eye view.
Post subject: Left + Select cheat code
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
I just learned there's a cheat code in Heading Out? mode. If you hold the Left and Select buttons while confirming the "Select Display" menu, then the floor schedule will be overwritten to play all the floors in the chosen skill level, exactly once and in order. Each floor may still be inverted or not, but the cheat code skips seeding the RNG, so the inversion schedule is always the same as well. (Unless you've played an earlier round to set the seed.) This cheat code would obviously be the fastest way to play all floors in Heading Out? mode, but I'm not planning to use it in the TAS, because it would take away the fun of RNG seed routing. Also, while reverse engineering, I discovered a feature that I couldn't find documented. If you hold the B Button while playing, it disables movement auto-repeat. You have to release and press the D-Pad for each step. It's probably not useful for a speedrun.
Post subject: Re: 27 frames to push a block into a hole, not 28
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Sand wrote:
Pushing a block into a hole consistently takes 27 frames total, not the 28 frames reported at Post #527986.
Never mind, I was mistaken. The inputs are still spaced 28 frames apart in the movie file. It was an artifact of the way the script counts frames and how the script decides when an input has taken effect. In floor 56, for example, the 4 W moves in the middle have 2× block push, 1× hole fill, 1× step, the costs of which we think of as
exec	W	10
exec	W	10
exec	W	28
exec	W	9
But the script considers a move accomplished (and starts looking for the first possible frame for the next move) when Kwirk's position in memory changes, not when the hole is completely filled. So it ends up being the same total number of frames, just subdivided differently:
exec	W	10
exec	W	10
exec	W	10
exec	W	27
The script was just attributing the 18 frames to fill the hole to the next move, rather than the current move. The 27 comes from 18+9 (fill hole and take a step), not 10+17 (push block and fill hole).
Post subject: 27 frames to push a block into a hole, not 28
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
EDIT: This was incorrect, see Post #528398. The execution Lua script doesn't have move costs hardcoded. After making a move, it tries the next move at multiple offsets to find the earliest frame where an input has effect. It finds the move costs that are already known, e.g. 9 frames for a move into an empty square, 10 to push a block, 12 to push a turnstile, with one exception. Pushing a block into a hole consistently takes 27 frames total, not the 28 frames reported at Post #527986. For example, applying this patch to the Lua script to log move costs:
Language: diff

diff --git a/tas.lua b/tas.lua index 1886e26..19aa575 100644 --- a/tas.lua +++ b/tas.lua @@ -717,6 +717,7 @@ end local function play_floor(savestate, schedule) console.log("executing schedule", schedule) savestate = run_until_exec(savestate, nil, SWITCH_PLAYERS_ADDR) + local prev_frames for i = 1, string.len(schedule) do local code = string.sub(schedule, i, i) local button = ({ @@ -732,6 +733,11 @@ local function play_floor(savestate, schedule) end, function (s) return run_until_player1_moves(s, 16) end, true) + local frames = savestate_frame_count(savestate) + if prev_frames ~= nil then + console.log("exec", code, frames - prev_frames) + end + prev_frames = frames end return savestate end
results in this log for floor 56:
floor	56
executing schedule	WWWWWWWNWENEESWWWWSNWWWSWWWWWW
exec	W	9
exec	W	9
exec	W	9
exec	W	9
exec	W	9
exec	W	9
exec	N	9
exec	W	12
exec	E	9
exec	N	9
exec	E	9
exec	E	9
exec	S	9
exec	W	10
exec	W	10
exec	W	10
exec	W	27
exec	S	12
exec	N	9
exec	W	12
exec	W	9
exec	W	9
exec	S	9
exec	W	9
exec	W	9
exec	W	9
exec	W	9
exec	W	9
exec	W	9
Despite the thread for #8905: Nitrodon, ZenicReverie, CyberShadow, Sand, Alyosha & Deadcode's GB Kwirk "Going Up?" in 15:15.89 saying 28 frames, I checked the input file and it actually uses 27 frames. I didn't see anything to improve there. I re-ran both the per-floor and route optimizers. The frame count for a few floors decreases slightly, but the solutions don't change in any significant ways. The RNG seed segments remain the same.
Sand
He/Him
Experienced Forum User, Published Author, Player (143)
Joined: 6/26/2018
Posts: 175
Nice analysis, thanks for the writeup.
1 2
6 7