I happened to come across 50 original German GEOS 2.0 disks that were broken and sent in for replacement. In the first part, I covered the disks that were broken probably due to user error. Now let’s look at the read errors on the remaining disks. As it turns out, there might be a bug in GEOS that caused the boot disks to break!
Error Types
On a 1541 disk, every track consists of sectors that are encoded like this:
The SYNC marks are used to find the start of a header and the start of data. The header contains the number of the sector. Before and after the sector data, there are gaps to account for uncertainty in timing when overwriting a sector.
On the GEOS boot disks, SYNC marks are 3 bytes long, and gaps are 8 bytes1. Headers are always 10 bytes, and sector contents are 3252 bytes. About 2% of the end of each track are unused (tail gap).
About 90% of a track is actual sector data. 3% is header data, and 1.5% are SYNC marks. So if a disk degrades through wear and tear, physical damage or demagnetizing, we expect mostly data checksum errors, and a maybe a few header checksum errors and missing SYNC marks. But this is the actual statistic:
Missing Header SYNC | Missing Data SYNC | Header Checksum Error | Data Checksum Error |
---|---|---|---|
213 | 0 | 1 | 205 |
The ratio between missing data sync and the checksum errors is about what we expect – but what about the huge number of missing header SYNCs?
In addition, three of the disks that don’t boot fail because of the very same issue:
Three of the disks fail in DeskTop when trying to read track 1, sector 9 – because of a missing header SYNC.
Overwritten Headers
A missing header can have different causes:
- The SYNCs that mark the header might have degraded. Commodore DOS writes 40 bit SYNC marks, but on GEOS boot disks, they are only 24 bits. But this would apply to header SYNCs and data SYNCs evenly, and we are not seeing any missing data blocks.
- The byte after the SYNC, which distinguishes the header from the sector, might have degraded. This is very specific and quite unlikely.
- The SYNC and maybe more parts of the header may have been overwritten.
Checking several dozen cases showed exactly the same picture:
- The preceding sector was written by the user. (A side effect of the copy protection allows detecting which sectors are unchanged from the original disk contents and which sectors got overwritten.)
- The preceding sector data is intact.
- The preceding sector was written 7-8% more slowly – too much for the gap to cover. Its last bytes overwrote the next sector header, but not the next sector data.
Track 1 Sector 9
The reason why track 1 sector 9 is broken on many disks is simple: The preceding sector on disk is track 1 sector 8, which happens to be the first free block of the GEOS boot disk according to the GEOS filesystem logic. Whenever GEOS searches for a free block for writing on the boot disk, it would pick track 1 sector 8. And this happens a lot: Whenever the user starts a desk accessory, like the alarm clock or preferences, the part of memory required by the desk accessory gets written to disk into a “Swap File” and later loaded back in. So every start of a desk accessory while the boot disk is in the drive will write track 1 sector 8. And sometimes, writing a sector will break the next one.
Track 1 sector 9 is the info sector (the sector that contains the icon and some other metadata) of the joystick driver, which is on the first page of the disk and will be shown by DeskTop after booting – unless the user switched to e.g. using mouse for input, in which case a different input driver would be shown on the first page.
Speed Zone
So we have established that sometimes sectors are written too slowly, so they write over the next sector header, making the next sector unreadable. A problem that is not too uncommon with 1541-style disk drives is that the motor speed might be off: Because of the sector gaps, the on-disk format can tolerate a motor that is about 2.5% too slow, and we’re seeing a case of 7-8% here. But let’s first gather more data before jumping to conclusions.
This graph shows that the overwritten headers only ever happen on tracks 1 to 17:
Tracks 1 to 17 have one thing in common: They are speed zone 3. A 1541 disk has three speed zones to account for the different lengths of the tracks:
Track | # Sectors | Speed Zone | µs/Byte | Raw Kbit/Track |
---|---|---|---|---|
1 – 17 | 21 | 3 | 26 | 60.0 |
18 – 24 | 19 | 2 | 28 | 55.8 |
25 – 30 | 18 | 1 | 30 | 52.1 |
31 – 35 | 17 | 0 | 32 | 48.8 |
In speed zone 3, one byte will be written every 26 microseconds. If we write a sector with the speed zone incorrectly set to 2 on a track that should be speed zone 3, one byte will be written every 28 microseconds instead of every 26, which is 7.7% slower. The 330 bytes written with the speed zone 2 setting will cover a length of 355 bytes of the speed zone 3 equivalent – so it will overwrite about 25 bytes at the end. This will overshoot the 8 byte gap and completely destroy the next header.
The fact that the 28/26 ratio matches the measured data perfectly and that the errors only ever happen in speed zone 3 are very strong indications that sometimes, sectors in speed zone 3 get incorrectly written with speed 2.
GEOS-Specific
First, it is important to find out whether this has anything to do with GEOS. Maybe it happens on all 1541 disks, but since no other tasks on a C64 are as disk-bound as running GEOS, the error might not show itself much anywhere else. This is a distribution of missing sector headers errors across about 300 random disks from my collection:
The error is spread evenly, except for the lowest tracks, which are written with a lower density, and track 18, which holds the file directory and therefore usually mostly empty sectors.
So it is not an inherent property of the 1541 hardware or its firmware. It is GEOS-specific.
In order to find out whether it is specific to GEOS boot disks, I would need a large number of GEOS work disks. After all, the boot disks in my collection have been hand-picked to have many errors. I don’t have access to such a collection, but I don’t see why boot disks would be any special, so I am assuming this happens to all disks used with GEOS.
The GEOS 1541 Driver
This makes the GEOS 1541 driver the suspect. The driver takes control of the drive and its firmware and uses its own sector read/write and bus communication code. It does reuse some ROM code though for tasks like GCR encoding/decoding.
This is the reverse-engineered GEOS driver code:
https://github.com/mist64/geos/blob/master/drv/drv1541.s
The speed setting is stored in bits 5 and 6 of VIA #2 port B (register at $1C00
). The regular speed setup code is this:
Drv_NewDisk_6:
jsr $f24b
sta $43
Drv_NewDisk_7:
lda $1c00
and #$9f
ora DTrackTab,x
Drv_NewDisk_8:
sta $1c00
rts
DTrackTab:
.byte $00, $20, $40, $60
The ROM code at $F24B
looks up the number of sectors for the track number in A
, and as a side effect, returns the speed zone (0-3) in X
. The code there is identical on all versions of the 1541, the 1571 and all common clones.
There is one other place that writes to $1C00
, the code to move the head (direction in X
, number of steps in A
):
D_DUNK6:
stx $4a
asl
tay
lda $1c00
and #$fe
sta $70
lda #$1e
sta $71
D_DUNK6_1:
lda $70
add $4a
eor $70
and #%00000011
eor $70
sta $70
sta $1c00
Bits 0 and 1 of $1C00
control movement of the head. The remaining bits are saved in $70
. I don’t see how this code would be able to change bits 5 and 6.
Conclusion
This is what we know:
- Most GEOS boot disks fail because a sector of speed zone 3 (tracks 1 to 17) has been written with an incorrect speed setting of 2, overwriting the following sector header and thus making that sector inaccessible.
- This pattern cannot be found on disks used outside of GEOS. It could not be shown that it can be found on users’ data disks used inside GEOS, but I don’t see where boot and data disks would be different in this regard.
- No bug in the GEOS 1541 disk driver could be found.
This is an unresolved mystery. I would be very grateful for:
- collections of data/work disks (
.G64
format or physical disks) to confirm this is not specific to boot disks - a way to reproduce this
- the solution why this is happening!
Fascinating, as always! But this time, the suspense is really killing me.. 😉
This goes years back, but I do recall something about programs *never* looking up the speed zone for Track 18, it was always going to be speed zone 2. Is it possible that $1c00 got written as speed zone 2 after a Directory write on Track 18, and missed a lookup while headed down to Tracks 1-17?
I also wonder how many disks wrote to upper tracks with the wrong zone, but work fine. Assuming you could write a higher speed zone um to a lower speed zone (and probably successfully on a quality diskette), but not vice versa…
Lastly, what speed zone are tracks 36-40? (don’t some drives sweep into track 38 before hitting the Track “0” head stop during head alignment?) If essentially they are speed zone “-1” then is it possible moving back to track 1 sector 9 might be set up as Speed Zone 2 (3 minus the deficit -1)
Track 36-40 should afaik really have an even lower speed than the speeds available in the 1541.
I get why the original 2040/3040/4040 drives only used 35 tracks as that was the original specs for the first 5.25″ disks. But at the time the 1540 (which is an older sibling to the 1541, with only the rom differing) 40 track were common on 5.25″ disks so it would had made sense to actually use all tracks. That would had made the disks more incompatible with the 2040/3040/4040 though, but still…
You wrote:
This is an unresolved mystery. I would be very grateful for:
collections of data/work disks (.G64 format or physical disks) to confirm this is not specific to boot disks
a way to reproduce this
the solution why this is happening!
I can mostly emulate that. That is, in emulation the sync is shortend to 7 bytes. Which means that a floppy with abou 299.07 rpm instead of 300 rpm would overwrite the complete sync.
The thing is that the whole sector is mastered too fast and therefore writing to it with standard speed by a 1541 does overwrite parts.
You need a kryoflux dump of a completely unused (i. e. not even installed) geos boot disk and convert it to p64 to emulate that. Then a simple
open 1,8,15,”i”
open 2,8,2,”#”
print#1,”u1 2 0 1 8″
print#1,”u2 2 0 1 8″
close 2
close 1
does the trick – the sync is shorted to just 7 bits instead of 32 bits, as you can inspect with p64conv together with g64conv (git master, the current release is not ready for decoding flux data).
Of cause a floppy with RPM 300.93 and not 299.07 is needed for overweriting the complete sync.
If I write the Geos 2.0 G64 with kryoflux mastering information back to two 5,25″ floppy disk (you need “Systemdiskette” and “Sicherheitssystem”) then I can sucessfully install that Geos.
And after that
open 1,8,15,”i”
open 2,8,2,”#”
print#1,”u1 2 0 1 8″
print#1,”u2 2 0 1 8″
close 2
close 1
do break Track 1 Sektor 9 on my 1571. Perfectly reproducible. And as predicted by my emulation.