Apple II Monitor ROM Disassembly

The Apple II Reference Manual includes the source code for the original monitor ROM, starting on page 155, and the autostart monitor ROM, starting on page 136. The former shipped in the original Apple ][, the latter in the Apple ][+.

As an exercise, I loaded the ROM images into 6502bench SourceGen and reproduced the contents. This is a fairly faithful rendition, and provides little in the way of additional commentary or improved formatting. It does, however, make it possible to search, and you can use SourceGen's cross-reference features to see how things connect.

An excellent source of information on the Apple II monitor is the book "Apple II Monitors Peeled", published by Apple Computer in 1981.


The Oft-Misunderstood WAIT

The explanation of how long the WAIT routine at $FCA8 takes to run is incorrect in multiple sources. For example, the original monitor ROM listing says:

fcaa: e9 01        WAIT3       sbc     #$01            ;1.0204 usec
fcac: d0 fc                    bne     WAIT3           ;(13+2712*A+512*A*A)

Neither comment is correct. The official Apple documentation, Apple II Monitors Peeled, says:

  2.5A**2 + 13.5A + 13 machine cycles of 1.023 microseconds

William F. Luebbert's What's Where in the Apple says:

  wait estimated at 2.5A^2+13.5A+13 wait cycles of 1.02 microseconds

These are both multiplying the cycle count by the CPU's clock speed (in cycles per second) when they should be using the cycle time (in seconds per cycle). A 2MHz machine would run the code in half the time, not twice the time.

So what's the correct answer? Let's start by confirming the cycle count. The code is:

fca8: 38           WAIT     sec                ;2
fca9: 48           WAIT2    pha                ;3
fcaa: e9 01        WAIT3    sbc     #$01       ;2
fcac: d0 fc                 bne     WAIT3      ;2+
fcae: 68                    pla                ;4
fcaf: e9 01                 sbc     #$01       ;2
fcb1: d0 f6                 bne     WAIT2      ;2+
fcb3: 60                    rts                ;6

The inner SBC/BNE loop is usually 5 cycles, because BNE takes 3 cycles when the branch is taken. The last iteration takes one fewer. We decrement A each time, so if initially A=4, the inner loop executes 4+3+2+1 times. So this takes A*(A+1)/2 * 5 - A cycles.

The outer loop executes A times, and takes 12 cycles. Again, the last time through takes one fewer: A*12 - 1.

Outside of that, we have 8 cycles of non-loop stuff (SEC/RTS). If we want to add the JSR that called here that's another 6 cycles, but I prefer to put that in the caller's account instead (could've been a JMP at the end of a function rather than a JSR).

Putting it together yields A*(A+1)/2 * 5 - A + A*12 - 1 + 8. Applying algebra:

  (A*A/2 + A/2) * 5 + A*11 + 7
  A*A*5/2 + A*5/2 + A*11 + 7
  A*A*2.5 + A*13.5 + 7

Throw in the 6-cycle JSR and you get the formula from Apple II Monitors Peeled. So the cycle-count part of their formula is correct. What about the time per cycle?

In a comp.sys.apple2 post, awanderin notes:

The CPU has 64 clock periods of 14 * (1 / 14.318181 MHz) or 0.978µs and one stretched period of 16 * (1 / 14.318181 MHz) or 1.117µs, which gives an average clock period of 0.980µs. That works out to an average clock speed of 1.0205 MHz.

(why) This gives a final result of:

  (A*A*2.5 + A*13.5 + 7) * 0.980 usec

Which is about 4% less than the "official" estimate.

Side note: calling WAIT with A set to zero is *almost* the same as A=256. The code does the subtraction before the zero test, so it doesn't exit immediately. However, the first subtraction clears the carry, which means the next subtraction will subtract 2 instead of 1. So the first two executions of the inner loop have one fewer iteration (the first one because of the inner-loop SBC, the second one because of the outer-loop SBC). So it's 10 cycles short.


Copyright 2019 by Andy McFadden

Back to list of disassembly projects