SimTower Reverse Engineering – Part 2

With the header sorta figured out, and unit data partially figured out, there’s still a lot more file that hasn’t been determined yet.

Immediately after the last floor’s unit information, the game does a 4 Byte read, followed by some number of 16 Byte entries. As I quickly suspected, the number of entries is exactly the same as those 4 Bytes interpreted as an integer. Perfect, now we at least know the structure of another large chunk of the file.

Next Parts

But what do we do with this? First I started by dumping the values into a terminal to see if I could see any patterns. Two things jumped out at me, the first and third values increment. The first goes up from 0 to 113, which is the count of floors in the building that have anything on them (including the cathedral’s multiple stories above floor 100). So we’ve got one entry per floor, it seems. Some floors seem to have no entries, which appear to be lobbies or otherwise completely empty floors.

Next, I started looking for a pattern in the number of entries on a floor and what’s on that floor. Quickly, I saw that there were 30 entries for floors with 10 condo units on them. This suggests a condo has 3 entries. Knowing the game, I know that a condo has a population of 3 people, so these must be entries related to people!

Checking a floor with 19 offices there are 114 entries. Offices have 6 people per office, so that’s what this data structure is for. On to the contents of the data structure, past the two that were immediately apparent. The next pattern I spot in that the second byte is also incrementing, and seems to be the index of the unit on the floor. Now we’ve got 3 / 16 Bytes figured out. What’s next?

I name a couple people and use the in game tools to find them throughout the building, and see that byte 7 seems to be the current floor they’re on, and byte 5 looks suspiciously like bit-flags, even if I don’t know what exactly they mean yet. They may show things like if a person is in the building or not, sleeping, etc. I think the last four bytes are two 16 bit integers, and these may be storing the stress and eval(uation), but these don’t look consistent.

What’s After People?

I decide to put the people data aside and see what’s next. The ProcMon CSV (explained in Part 1) shows a read of 9,216 Bytes, with no read for length before it. This suggests to me that this is a static sized block. It’s divisible by 16, but 576 isn’t a nice “round” number, not like 512. Seeing that this is close to 512, I try the next larger even number. 9,216 / 18 is 512. There’s our nice round number.

From this, I can strongly infer that I’m dealing with a fixed length of 512 entries, each 18 bytes in size. Maybe they have a similar structure to our unit structure, which also has an 18 byte size. I can also see that not all of them are full from a hex editor. Where else do I have a similar number? I know the count of commercial (shop, restaurant, fast-food place, etc.) is 419. I’m betting that I’ll have 419 complete values, and the rest as empty placeholders/padding. Let’s see. I’ve included the some entries in the section, with their decimal values shown.

0: [40, 0, 2, 14, 50, 25, 11, 24, 35, 21, 50, 0, 220, 255, 0, 0, 46, 0]
127: [56, 1, 2, 30, 50, 25, 8, 27, 35, 18, 50, 3, 220, 255, 0, 0, 48, 0]
255: [71, 1, 1, 27, 50, 10, 6, 29, 35, 5, 30, 4, 220, 255, 0, 0, 29, 0]
383: [42, 8, 1, 3, 30, 10, 12, 13, 25, 3, 0, 9, 230, 255, 0, 0, 13, 0]
511: [255, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

One of the things I do with values is stick them in a dictionary to see what sorts of results I get, and how many.

# commercial_list is a list of "unparsed" commercial objects, which is just the list of values above.
values = [defaultdict(int) for _ in range(18)]
for comm in commercial_list:
#    if comm.values[0] == 255:  # uncomment to skip empty floors
#        continue
    for i in range(18):
        values[i][comm.values[i]] += 1
# And then a quick and dirty print to see the results.
for i, e in enumerate(values):
    print(f"{i} ({len(e)}):")
    print(', '.join([f"{k}: {v}" for k, v in e.items()]))
# Or sorted with values only, and not counts. Equivalent to using a set().
    print(', '.join([str(k) for k in sorted(e.keys())]))

I’m not going to include the full output but I can quickly see a few things. Most evident is that value for the first byte the values range from 1 to around 100, which suggests that this is a floor number. Except several entries are FF (=255), and these are otherwise empty. How many non-empty values do we have? 419! Yep, this is additional metadata related to commercial tenants, separate from the data in the Units data. Likely this stores their profits, eval, etc. Given that we know the floor now, lets see what else we can find out.

I can also see that the 15th, 16th and 18th (last) bytes are 0, at least in this test tower. I see that byte 3 only has the values 0, 1, 2 and 3. Knowing other things about the game helps me look for other values. I know there are 5 fast food places. 5 restaurants and 11 different shops. None of the bytes has 21 different entries, but the 12th byte has 11 entries, ranging from 0 to 10. Perhaps this is the index of which variant, which is supported by there being many more entries with 0-4, than than other 6 values, which are nearly identical. So I think this byte is the unit’s variant.

How do I check this easily? I used my thumbnail generation code from before and made thethe unit look up its entry in the commercial data section by its index, and then use the variant to choose a colour. In this case, I generated 11 different HSV colours, converted to RGB and used those. From this, I can see indeed this is the variant, and from looking at the game, I can see what each value refers to.

But something is wrong. That red block on the bottom left? Those three aren’t all the same, so something is going on here. This looks overall correct, but it seems that order things were built in matters. There doesn’t look like pointers back to the in the Commercial Data Section, and there isn’t anything in the Unit data structure. But there’s that 188 Bytes for each floor. I divide 188 by 4, but 47 isn’t a number that means much in SimTower. 188 by 2 is 94, which is much more interesting. 375 (the width of a floor) tiles / 4 tiles for the smallest unit (a single hotel room or parking stall) gives 93.75 maximum units on a floor. So, maybe that 188 bytes is a 2 byte count, and up to 93 remapping pointers?

Doing them in order allowed me to figure out the indices, but now I have the issue that they’re not correctly re-mapped. So how does the re-mapping table work? There isn’t a initial count byte, so it looks like there are 94 possible 2-byte counts here.

After trying to figure out the mapping, I decided to move on, but will revisit the section. It’s tantalizingly close, but having more entries in it than units on a floor, as well as repeated entries, suggests that it’s not as straight forward as I’d hoped.

Elevator Data

Next thing in the data we have is a repeated data structure composed of a 194 Byte read, a 480 Byte read, two 120 Byte reads and then up to 29x 324 Byte reads and up to 8x 346 Byte reads. This was all repeated 24 times for a building full of elevators. The game allowing only 24 elevators is a well known limitation, so this is clearly all of the elevators.

Normal and service elevators can be 29 floors tall and have a maximum of 8 cars, so this these segments likely store that information about an elevator shaft. Further confirmation that this is indeed elevators. For a tower with fewer than 24 elevator shafts, only the 194 Byte header exists, so somewhere inside this block is metadata related to height. 120 Bytes is probably 120 flags related to floors.

Looking at a building full of elevators, I can see that a maximum height elevator has 29 of the 324 Byte entries and minimum height elevator has 1. I can also see that the last 8 bytes in the header are an elevator car’s home floor, with the value being the start floor of the elevator if those cars don’t exist.

Further investigation reveals a count of cars, starting tile from the left the elevator is on, and the top and bottom floors. But what did I do to figure this out? I made a change, such as adding another car or changing the elevator height to see what changed.

Sure looks like elevators. Black is normal elevators, red is service and blue is express elevators. Express elevators have a little different format. They only have a floor data structure for each floor that can stop at, not all of the floors they cover. This is likely because in the game, they only stop on underground floors, and at floors 1, 15, 30, 45, 60, 75, and 90 (which all can be sky-lobbies).

From there, the only other data in the header that isn’t determined yet is a 56 byte segment in the middle. There appear to possibly be 4 sub-sections that are 14 bytes each. One segment of is all 5, which is the default number of floors for an elevator car to service. Which means that this is storing the configuration of the elevator scheduler in the elevator properties window. Changing the settings confirms this, but there are only 6 periods to configure. This could mean that there’s a 7th hidden period, or more likely in my opinion, this was changed at some point in development.

With that figured out, the elevator header is done. But what about the next 480 bytes, or the two sections of 120 bytes? Those look suspiciously like information about floors. What happens if we take the info and assign every 4 numbers a colour and generate a 4 x 120 byte image for the 480 byte segment? This could be a status indicator of some sort for each car, so perhaps we need to split into 8x 4 bit values. Let’s see what that looks like too.

That certainly looks like it has something to do with elevator car statuses. But what exactly? The values don’t seem to match those shown in the elevator’s status section, but there is definitely a pattern there. I see repeated patterns at lobby floors, as well as underground (but not on B10, which can only be used for the Metro line’s tunnel). But also similar patterns between the express elevator and the normal elevator, especially where they overlap. Is this from people on those floors wanting to get somewhere? Something else? I’m still not sure, but being able to visualize data like this really helps.

Moving on, next I looked at the 324 byte floor data segment. The first thing I notice is that 324 bytes would be an even 80 entries of 4 Bytes, plus a single 4 Byte header, and this is exactly what this structure looks like. I could see that the values looked exactly like IDs in the people data segment. Closer inspection indicated that after the header, there are two independent segments. I noticed this because on the bottom floor of the elevator, one half was completely empty, with the same being true of the other half on the top floor. On floors not serviced, both were empty. But what about the header? It looks like 4 single byte values? A quick look at the game showed that the first and third values were the number of people waiting on the left and right of the elevator, or going up and down, respectively, and that this count capped at 40.

With the elevator data mostly sort-of decoded, I decided to move on to the next segments. I’m getting close to the end of the file, so there isn’t a whole lot else. I’m expecting data for the finance window, stairs and escalators and similar, though perhaps some of this is stored in that 490 Byte block at the beginning that I skipped.

The Next Segments

I can see a read of 88 bytes, of 132 bytes, of 12 bytes and of 42 bytes. SimTower does use 32 bit integers is some places, but the game is still really a 16 bit game, so even in places the save uses 32 bit integers to store the data, the numbers never get that large. This means that something that looks like XX XX 00 00 in the game file is usually a giveaway to interpret this as a 32 bit integer. The first and second entries look like this, while the third and fourth don’t.

The second entry has numbers that match the finance window, so that’s easily decoded.

I got sidetracked while looking at that segment, and I found that the next segment, at 1026 bytes long, had an initial value that was what looked like a 2 byte count, and then an increasing index up to that count. I looked at what else shared that count, and it was the number of parking stalls in the tower. So this stores some information related to the parking stalls. Once I got past the basic structure, which appears to be a count of connected stalls, verified by removing a stall and seeing that the count of stalls with red ‘X’s in them was subtracted here, the rest appeared to be a 2B index value. However, once I removed and added a stall, and checked, the values got a bit weird, so I’m not sure what this does.

Next comes a 22 Byte long block, which is mostly empty, so maybe this is padding or a placeholder?

After that comes 64x 10 Bytes blocks. There are 64 elevators and escalators in the game, so that’s what’s stored here, as there isn’t much else left in the file, and I haven’t found it anywhere else yet.

Looking at the actual structure, the first byte is 01 if there’s a set of stairs of escalator built, and the second appears to indicate what is built. Interestingly, 0 is escalator, so maybe they were added first. There are 6 total values for the each of the stair and escalator variants, total. The next two bytes are the same for all the stairs/escalators in one test tower, and it appears to be how far from the left side it is. The next byte is the start/bottom floor, though this is potentially two bytes.

The next set of two bytes, or single byte and second padding/other byte, appears to be the count of people going up and down the escalator respectively. How did I figure this out? Well, I guessed that the number of people shown in the game must be stored somewhere, and like the elevator cars, the total number of people should be stored inside this segment. But I had an escalator with 14 people on it, and I didn’t see 14. After staring at it a bit, I realized that 9 and 5 equal 14. Sure enough, this value matched on all the stairs and escalators I checked. I figured out the direction by looking at the counts first thing in the morning just after the fast food places opened, and people were only going up from my floor 1 lobby, via escalators, to them.

Final Bits

After the escalator/stairs section are 8 segments of 484 Bytes each. This looks suspiciously like a 4-Byte header, and 120 entries afterwards, one for each floor. Each entry might be a 4B value, but it could also be 4x 1B or 2x 2B. I didn’t have much luck decoding this one, other than to note that the first 4 Bytes are a header, because it’s a specific value if the rest of the entry is empty, and that the 120 values don’t look like 4 Byte integers. I’ll need to poke at this some more, but it looks like something that maybe isn’t exposed directly in the game and is instead internal simulation related.

Next are 10x 2Byte entries. My first thought it security offices, as there are a maximum of 10 of these. There are also a maximum of 10 medical clinics, but security offices are treated differently by the game, so it makes sense that these would be noted separately, even though they’re stored in the Units Data section as well. And sure enough, each value if either -1 or the floor that a security office is built on.

There’s still some more sections to go, I see a bunch of 6 Byte long entries, 10x 4 Byte entries (medical clinics), 16x 12 Byte entries, a 80 Byte entry, a 40 Byte entry. After that are three entries that seem the same length in a few towers I looked at, which are a 4,354 Byte entry, 2,114 Byte entry and a 3,234 entry. I have no idea what these store, but it’s probably more internal simulation state as a cursory inspection didn’t really reveal any structure, but a more thorough investigation may show something.

At the very end, there’s an 8 Byte read (of what seems to be empty data) and then 16B entries for named entities in the game. I’m not entirely sure how the entries are mapped, but the first entries are for named units, and the rest are for named people in the tower.

Ending Thoughts

I was very quickly able to determine most of the overall structure of the file, but things got more difficult towards the end of the file where there were lots of blobs that weren’t structured in a way that made their usage apparent.

My approach of figuring out the reads the game was doing and then looking at the data those reads contained really helped. I’ve looked at newer games that just load the entire file into memory and parse it, and they’re much more annoying to reverse engineer.

I’ve also poked at reverse engineering other games that I was less familiar with, and knowing things like there can only be 512 commercial units in a building really helped when I had a section that was a multiple of that length long.

There’s still a lot to be determined, and lots of unknown values sprinkled in the documentation, but overall, I got a large proportion of the file format figured out. As was the case for my SimCity 2000 city format reverse engineering project, the first 90% takes 10% of the time, and the remaining 10% takes 90% of the time.

I’ll need to decide whether or not I’m interested in grinding out more of the documentation on the format, but the documentation is open source on GitHub, so other people can always use it as a basis and open pull requests if they discover something new. But there are still parts I skipped that seem like they’d be relatively easy to do, so I’ll probably do some more work on this before I set it aside for whatever my next project it.

Or I’ll start a re-implementation project of the game. No guarantees…

Advertisement

More Reverse Engineer – SimTower

Another game from my childhood that I played a lot was SimTower. Never to the same extent as SimCity 2000, but still a fixture of my game playing time. Note that I’m writing this as I work on the reverse engineering, and I plan to update with more information on my process.

Someone in a Discord was commenting that they would like to see a viewer (and maybe editor) like the one I’ve created for SimCity 2000. I decided to take a crack at the .TDT (Tower DaTa?) format SimTower uses to store the towers, especially given I’ve got a decent amount of experience reverse engineering old game formats like what SimTower uses.

First Steps

The first step was to get the game working. I found that Winevdm worked pretty well, but I needed to copy WAVEMIX.INI from my CD into Winevdm’s WINDOWS folder, and WAVEMIX.DLL into WINDOWS\SYSTEM in order for the game to start. With that out of the way, the game started.

Now, one of the tools I used to help reverse engineer the .SC2 file format was ProcMon, because old games frequently loaded small chunks of the save file into memory, likely to save on relatively limited RAM resources as they may be stored in a more RAM efficient data structure when in use, and serialized to something that’s better for that (or not).

SimTower was luckily no different, giving me my first hints as to what the file format looks like. I’ve attached a CSV output from ProcMon, as an image from VSCode so that it’s a little nicer coloured below.

A colourized CSV from ProcMon showing the process reading from the tower file, listing offsets into the file and lengths of the read.
Part of the CSV output of ProcMon after filtering. “ReadFile” means that it’s reading a file, the next part is the path to the files and “SUCCESS” means the read was successful and returned data. But the interesting part is the offset into the file and the length.
The small, short reads are generally indicative of reading value in single variable. The repeated pattern is also really interesting, as it suggests repeated entries of the same data structure.

Floor Data

Immediately I can pick out a few things. The first 70 bytes look like a header, because it was common for games of the time to load various single variables individually, especially if they store counts or lengths of later values. Inspecting the value of a few towers confirms this is probably the case. I’ll touch on the header later. There also seems to be another 490B block common in most tower files I looked at. Past that, there’s a suspicious repetition of 360 reads following the same format. 6 Bytes, variable bytes and 188 Bytes.

Note: The game save is little-endian, so data will be in reverse order of Big-endian, which I’m more used to on x86 systems. This probably means the game was originally developed for the Mac and ported. In this case, I looked at the hex representation of several of the 4 Byte values, and saw XX YY 00 00, which is the ordering I’d expect for a small number stored little-endian. Bit-endian would be `00 00 YY XX` instead, with the unused data on the left and the byte order reversed.

As a guess, 360 reads means 120 actual values. SimTower has at least 113 floors (10 underground, 100 normal floors and a additional 3 floors for the cathedral on the 100th floor), so a few extra to pad out the window to 120 seems reasonable. I also note that each variable length piece is always a multiple of 18 Bytes long, so my first guess is that each of these is a “unit” on a floor.

The 6 Bytes repeated screams “row header” to me, and after creating a tower with 110 floors, save the first floor lobby, of empty floors, I see my first pattern. I know, from counting money spent building a full width floor, that I have 375 horizontal tiles to work with. In my empty tower, each of the 6 Bytes looks like 01 00 00 00 77 01 which interpreted as 3 integers is 1, 0, 375. Given that the floors cover the whole width, the second and third value appear to be the start and end of the floor. Inspecting several other saves confirms this, and I also noticed a pattern in the first value. It’s the count of the number of 18B entries after it.

Which means yes, we’ve got floor data here. There’s still a lot of data that isn’t accounted for. People’s and business’ nicknames (the game allows setting name for people and businesses, and this appears to be stored at the end of the file), elevators, various simulation variables, etc.

Unit Data

Now that we know, at minimum, a unit is given by an 18Byte structure, we can start sussing that out. Right away, I see that my empty tower has one unit, and the data structure mirrors the 6 Byte header, with the first 4 Bytes being 00 00 77 01 followed by 00 or 24 for my height 1 lobby, which suggests that this is a unit type field.

If I know start and end of a floor, as well as start and end of a unit and the unit’s type, then I can start figuring out what values in the field correspond to the type in the game. Rather than squint at a blast of text in a proof-of-concept Python parsing script, I decided to write a simple image generator using Pillow. Here’s an example of a 5 star tower I built to test things out.

A colourful schematic view of the floor data in a tower's data file.
A 100 story tower with empty space in blue, lobbies in purple, yellow for shops, grey for fast food, green for restaurants, salmon and cyan for hotel rooms, turquoise for offices, brown for condos and other colours for other things. Referring between this and the game is easy to figure out the index mapping.

And just like that, with randomly chosen colours, we can see a tower take shape. The smallest “block” in SimTower is a 10×45 pixel tile (or so), so I made each block 1×5 pixels in this image to maintain a similar aspect ratio. Immediately I can see the blank space (blue), various commercial units, the lobbies, the Metro station (3 stacked colours in the bottom right, as well as the Metro tunnel at the very bottom), the Cathedral’s 5 stories at the top and various other units. This allowed me to get quickly the values for the rest of the various units for the type, because I could just look at them in the game.

Looking at this simple graphic can also be useful to figure out what other data parts of the data structure mean. For example, here’s another one. For those who play the game, can you guess what the various colours mean?

A colourful schematic view of the pricing/rate data in a tower's data file.
Pricing modeled after the in-game map/overlay. Default is “average” pricing, or yellow. High is red, low is green and very low is cyan. The top part was set for testing, but I had some shops stubbornly empty due to low traffic unless I dropped the pricing.

It’s game pricing of the units. Yellow is average price, red is high, green is low and cyan is very low. I set these colours after I figured it out, to mirror what was in the game, but it made visually figuring out that part of the data structure very easy.

Others can be more perplexing, such as this one showing a specific unknown value in the Unit data structure for shops. Perhaps this is an index somewhere else, as the values look largely unique. But being able to see data like this makes finding patterns so much easier.

A colourful schematic view of all the shop unit type's data in a tower's data file.
Unknown data, with each entry coloured a random colour, or gray for the build foorprint of the tower.

Header Data

Okay, but now what about that header? Here I use another debugging tool, CheatEngine, to inspect the running memory of the game. While CheatEngine may be designed to allow cheating at games, I’ve almost exclusively used it as a debugging tool, as there are that many easy debugging tools with a similar feature set on Windows.

Unlike SimCity 2000, which had a normal start address, SimTower running under Winevdm made for a little more work. But not much. I started by looking for the values in the save file, saving the game a couple times and scanning for the new value in CheatEngine. Quickly I found that the first value was the tower’s rating (SimTower towers start at 1 star and progress through 5 stars to finally receive a tower rating) by opening towers with different ratings. I also saw that the values in the save, without using CheatEngine for some of the other values referred to money and budgetary values (the game displays the money values multiplied by $100, but would store that as 1).

I played the game and watched the values change, quickly finding a value tracking number of commercial units (fast-food, restaurant and shops), one for parking stalls, one for recycling facilities, one for security facilities as well as a few others. As of when this was written, I don’t have them all figured out.

One thing I noticed looking is that the game runs with a various number of ticks per second of in game time. The one hour period between Noon and 1:00PM has 800 ticks out of a total of 2600 ticks for the whole day associated with it. Meanwhile, between 1:00AM and 7:00AM only has 200 ticks of simulation time. When I was playing the game as a child, I assumed that was just because there was more to simulate during the day, not that the game had lower fidelity simulations then. I guess lunch time being exceptionally busy is a 90’s Japanese office-worker thing that I was never exposed to. This also means that the day starts at 7:00AM, at least as far as the game is concerned, because this is when the tick counter rolls over.

I’ve attached the whole 70 Byte header in CheatEngine with a 5-star tower with $115,300,900 in cash, paused, to show what various values look like. I added the labels myself after I figured them out, or noted it as unknown. Note that the addresses shown are likely not going to be the same after an app restart.

Final Thoughts

I got a lot more done a lot quicker than I expected, likely because I’ve spent quite a bit of time getting good at this specific sort of reverse engineering from the SimCity 2000 reverse engineering project. I’ve started a GitHub repo with my findings here, and includes my sample parsing code in all it’s gory glory. It’s quick, dirty, and gets the job done.

I’ll write a part 2 as I work on the rest of the file. Elevators and a bunch of other information still need to be determined, and I’d like to share more about the more fiddly aspects of reverse engineering, but this post is enough for tonight.

I’ve added more to part 2, here.