For decades, PC users have been able to relax by watching the computer defragment a disk. Now C64 users can do the same! Introducing “defrag1541”, a disk defragmentation tool for C64 and 1541.
- Download: defrag1541.prg (9 blocks)
- Source: github.com/mist64/defrag1541
defrag1541 consists of 49 lines of BASIC, takes about 15-30 minutes to defragment a typical disk, and visualizes its work using colored blocks on the screen. The different colors symbolize the different files, and the character shapes show whether the block is at its original (round) or final (square) position. (Note that this tool is meant for visualization only, do not use it on real data!)
In the remainder of this article, I will describe how the program works.
The 1541 Disk Format
Commodore 1541 floppy disks use the CBM filesystem. A disk consists of 683 sectors of 256 bytes, which are addressed as a track/sector tuple. There are
- 35 tracks, numbered 1-35
- with 21 to 17 sectors each, numbered from 0
The number of sectors depends on the track, higher tracks (closer to the center) have fewer sectors:
Tracks | Sectors |
---|---|
1-17 | 21 |
18-24 | 19 |
25-30 | 18 |
31-35 | 17 |
The following table visualizes this: The columns are the tracks, the lines are the sectors. The higher numbered tracks don’t have as many sectors.
111111111112222222222333333 T0123456789012345678900123456789012345 S 0 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 2 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 3 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 4 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 5 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 6 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 7 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 8 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 9 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 10 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 11 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 12 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 13 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 14 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 15 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 16 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 17 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 18 OOOOOOOOOOOOOOOOOOOOOOOOOO 19 OOOOOOOOOOOOOOOOOO 20 OOOOOOOOOOOOOOOOOO
Track 18 is special, it holds all the metadata. 18/0 contains the disk name and the Block Availability Map (BAM), which uses one bit per block to keep track of which blocks are allocated, and the remaining sectors contain 32 byte directory entries, so 8 directory entries per block. Every directory entry consists of a file type, a 16 byte file name, and a track/sector tuple for the first block of the file.
Every file is stored as a linked list. Every block holds the track/sector tuple of the next block in its first two bytes, followed by 254 data bytes. A track value of 0 indicated the last block of a file.
The Data Model
The easiest way to look at the problem of defragmenting a disk is to imagine all files on the disk as concatenated into one long file that needs to be defragmented. Now every used block on disk has a logical block index. Logical block 0 is the first block of the first file, block 1 is the second block of the first file โ if the first file was two blocks or more. Otherwise it’s the first block of the second file, and so on.
So for every physical block, we need to know what logical block is stored there. Then we build the desired mapping: For every physical block, what logical block should be stored there. Our core defragmenter code then moves sectores around and updates links until the two mappings are identical.
The Data Structures
At the very beginning, we are defining the arrays. We do this early, so we can’t run into OUT OF MEMORY errors at runtime. In fact, this program occupies about 99% of the 38911 bytes available to BASIC on the C64.
10 dimds(17):dimn(143):dimcc(143):dimci(682):dimic(682):dimcj(682):dimjc(682) 15 dimtr(682):dimsc(682):dimaa(682):dimdd(682):dimoo(682):dimss(682):n$=chr$(0)
The following table contains a small overview of the arrays. We will talk about them more as they are used in the code.
ds(17) | array of sector numbers of dentry chain |
n(143) | map dentry index -> number of blocks |
cc(143) | map dentry index -> color |
ci(682) | map physical block index -> logical block index – current |
ic(682) | inverse of ci() |
cj(682) | map physical block index -> logical block index – desired |
jc(682) | inverse of cj() |
tr(682) | map block index -> track |
sc(682) | map block index -> sector |
aa(682) | map block index -> screen ram location |
dd(682) | map logical block index -> dentry |
oo(682) | map logical block index -> offset |
ss(682) | stack |
The UI
We create a border around a 35×21 character area on the screen, so that every block can be represented by a character.
20 a=1024:b=1060:poke646,15:v=53280:pokev,0:pokev+1,15:printchr$(147):pokev+1,0 25 pokea,112:pokea+22*40,109:fori=1to35:pokea+i,67:pokea+22*40+i,67:next 30 pokeb,110:fori=1to21:pokea+40*i,66:pokeb+40*i,66:next:pokeb+22*40,125
The array cc() will map a file index to a color. We cycle through the C64 palette skipping black, which is the background color.
35 j=1:fori=0to143:cc(i)=j:j=j+1:ifj=16thenj=1 40 next
Reading the Current State
As a first step, we need to find out where data is stored on disk, i.e. for every physical block, we need to know what logical block (or none) is currently stored there.
The following code iterates over all directory entries and follows the linked lists of all files, recording in the array ci() the logical block index for each physical block. The reverse mapping (at what physical location is the logical block stored?) is built in the array ic(). The array nb() collects the number of blocks each file occupies.
In addition, we are keeping track of the sectors on track 18 that are used for directory entries. We will need this information later to update the directory entries when blocks have moved.
Since we are using linear block indexes for everything, we need to convert the track/sector tuples that the filesystem works with. This is done by the t2 comparisons in the second half of the code block.
45 open1,8,15,"i":fori=0to682:ci(i)=-1:ic(i)=-1:next:s=1:open2,8,2,"#" 50 ds(di)=s:ci(357+s)=-2:di=di+1:print#1,"u1 2 0 18"s:get#2,t$,s$:t1=asc(t$+n$) 55 s1=asc(s$+n$):fori=0to255step32:print#1,"u1 2 0 18"s:poke1082+s*40,4 60 print#1,"b-p 2"i+2:get#2,f$:f=asc(f$+n$):iff<>129andf<>130goto105 65 get#2,t$,s$:t2=asc(t$+n$):s2=asc(s$+n$):ift2=0goto105 70 print#1,"u1 2 0"t2;s2:poke1064+t2+s2*40,81 75 poke55336+t2+s2*40,cc(ni):ift2<18thena=s2+(t2-1)*21:goto95 80 ift2<25thena=s2+(t2-18)*19+357:goto95 85 ift2<31thena=s2+(t2-25)*18+490:goto95 90 a=s2+(t2-31)*17+598 95 ci(a)=cd:ic(cd)=a:n(ni)=n(ni)+1 100 cd=cd+1:goto65 105 ni=ni+1:next:s=s1:on-(t1<>0)goto50:ci(357)=-2
The ci(357) and ci(357+s) assignments mark the directory sectors as occupied, so that the defrag algorithm can’t use them as temporary storage.
As this code is walking the linked lists, it shows them on the screen in the form of a bullet, in a color that represents the file index.
Helper Data Structures
We have to do some programmatic work, which will take about a minute, so we draw progress bars. For this, we first draw “[” and “]” characters in an empty line at the bottom of the screen.
110 poke1944,27:poke1980,29
When updating links during defragmentation, we will need to convert linear block indexes back into track/sector tuples, so we create the two arrays tr() and sc() for a quick lookup. In addition, in aa() we calculate the screen position for the character that corresponds to the block.
115 i=0:fort=1to35:readns:fors=0tons:tr(i)=t:sc(i)=s:aa(i)=1064+t+s*40:i=i+1 120 next:poke1944+t,46:next:data20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20 125 data20,18,18,18,18,18,18,18,17,17,17,17,17,17,16,16,16,16,16
Later, we will also need to know for every logical block index what file it belongs to and which index it has within the file. This will be stored in dd() and oo().
130 ford=0to143:ifn(d)<>0theno=0:forj=1ton(d):dd(l)=d:oo(l)=o:l=l+1:o=o+1:next 135 poke1945+int(d/4.2),81:next
Desired Allocation
We have to build a desired allocation array, which has the same format as the current allocation array, just with the data about where we would like the blocks to be. The array cj maps physical blocks to logical blocks, and jc() is the reverse of this.
For simplicity, we are just filling the disk linearly, starting with track 1, and not using an interleave. This doesn’t make the defragmenter very useful, because the disk might actually read more slowly after this, but it makes for a very clean visualization!
140 fori=0to682:cj(i)=-1:jc(i)=-1:next:c=0:b=0:fori=0to143:ifn(i)=0goto155 145 forj=1ton(i):ifc>=357andc<376thenc=376 150 cj(c)=b:jc(b)=c:c=c+1:b=b+1:next 155 poke1945+int(i/4.2),160:next:fori=357to375:cj(i)=ci(i):next
The code could be easily extended to use a more reasonable strategy. The 1541 operating system for example picks the empty block closest to track 18 as the first block for a new file, then stays on the same track as long as there are free blocks, preferably with an interleave of 10. Then it goes on to the closest track with an empty sector again.
We're done thinking, so let's clear the progress bar:
160 fori=0to34:poke1945+i,32:next:poke1944,32:poke1980,32:sl=0:i=0
The Main Algorithm
Now we iterate over all physical blocks and make sure the current allocation matches the desired allocation.
- If the current allocation matches the desired allocation, do nothing.
- If the block is currently empty, just move the correct block there.
- Otherwise, we need to move the logical block that is there to its new location first โ using recursion!
For recursion, we use the array ss() as the stack and sl as the stack pointer. Recursion can cause a cycle though, e.g. when logical block A is stored where logical block B should be stored and vice versa. So if the block we are looking at is the same as the bottom of the stack, we have to break the cycle.
The following code checks for block identity, empty block as well as a cycle:
165 on-(ci(i)=cj(i))goto190:on-(ci(i)=-1)goto185:on-(sl=0orss(0)<>i)goto180
To break the cycle, we need the help of a temporary block. We find a free block and move the current block there (subroutine 225, we'll talk about it later), then continue.
170 forj=0to682:ifci(j)<>-1thennext 175 i1=i:i2=j:gosub205:goto165
For recursion, we push the current index on the stack and continue with the physical block that should contain the data that the current block currently contains.
180 ss(sl)=i:sl=sl+1:i=jc(ci(i)):goto165
In case the current block is empty, we move it using subroutine 225.
185 i1=ic(cj(i)):i2=i:gosub205
At the end of the loop, continue with the value from the top of the stack, if there is any, otherwise we take the next block in sequence.
190 ifci(i)>=0thenpokeaa(i),160 195 ifsl<>0thensl=sl-1:i=ss(sl):goto165 200 i=i+1:on-(i<683)goto165:close2:close1:open1,8,15,"v":end
When we are done, we send a "V" (validate) command to the drive, which will follow the links of all files and recreate the block availability map. For simplicity, our code never updates this table, so we have the operating system do it for us at the end.
Moving a Block
In order to move a block, in addition to copying the data between the two physical blocks and updating our data structures, we also have to update the track/sector link tuple from the previous block or the directory entry.
This reads the block into a buffer and writes it to a different location without transfering it to the computer:
205 pokeaa(i1),18:print#1,"u1 2 0"tr(i1)sc(i1):pokeaa(i2),23 210 pokeaa(i2)+54272,cc(dd(i2)):print#1,"u2 2 0"tr(i2)sc(i2) 215 pokeaa(i1),32:pokeaa(i2),160
The POKE instructions update the screen.
To update the parent link, we have to find out whether this is the first block in a file, in which the parent link is located in the directory entry.
220 c=ci(i1):o=oo(c):t=tr(i2):s=sc(i2):ifo<>0goto240
If that's the case, we find the correct directory entry, read the block and write the new link.
225 d=dd(c):s1=ds(int(d/8)):pp=peek(aa(357+s1)):pokeaa(357+s1),0 230 print#1,"u1 2 0"18;s1:print#1,"b-p 2"(dand7)*32+3 235 print#2,chr$(t)chr$(s);:print#1,"u2 2 0"18;s1:pokeaa(357+s1),pp:goto250
Otherwise, we read the preceding block in the file and update the link there.
240 a=ic(c-1):tt=tr(a):ss=sc(a):print#1,"u1 2 0"tt;ss:pp=peek(aa(a)) 245 pokeaa(a),0:print#2,chr$(t)chr$(s);:print#1,"u2 2 0"tt;ss:pokeaa(a),pp
Finally, we update the data structures to reflect the fact that this physical block now contains the new data, and the block we moved it from is now empty.
250 v=ci(i1):ic(v)=i2:ci(i2)=v:ci(i1)=-1:return
Limitations
defrag1541 is not meant as a serious tool. Nevertheless, here are some limitations and ideas for improvement:
- The algorithm for the desired allocation was chosen for its looks, not to make disks actually faster.
- Optionally, the algorithm could also place data blocks on track 18. That would give the user 17 more blocks on the disk.
- That said, disks that currently have data blocks on track 18 can't get defragmented with the current algorithm. It avoids track 18 and will run out of space.
- The current code does not test for read errors at all.
- We also do not update BAM ourselves and rely on the operating system doing it at the end. While the code is written so that interrupting the process at any point will not lead to data loss, it is important to run the VALIDATE command at the end if data should ever be written to the disk again.
- Of course it could be much faster. A lot of this is CPU bound, because BASIC is so slow. Rewriting everything in assembly would help. Some parts of the code could also directly run inside the disk drive. I doubt the complete data structures would ever fit into the 1541's 2 KB of RAM, so defragmentation can't completely run on the disk drive, but major parts of the logic could still live there, to minimize the lag introduced by communication between the computer and the drive.
the command you need to run at the end to rebuild the bam is “validate”, not “verify” ๐
just for giggles, i made the opposite of this program a while ago, so if you need a good starting point, screw up your disk image with it: https://csdb.dk/release/?id=111975
Whoops, fixed. Thanks.
1. Loading from 1541 disk is not fastest when loading sectors next to each other. Fastloaders usually prefer interleave of 8 or 10.
2. File search times is another issue. That’s why directory is stored on middle track 18 to make the way the head needs to move to the data track after reading the directory entry as short as possible.
This is no criticism about the tool itself. Using a disk over and over again with different files may lead to fragmentation. The optimization is just not fitting to how 1541 drives work.
Wonderful, I always wanted to watch defragmentizing C64-Blocks in realtime. However dude, that GIF. This abomination has 3000 Frames before my GIMP crashed. I get that IT-Experts suck at image-optimization, but man. 3000 FRAMES! INSANE! My guess is you videoscreened the output with the usual framerate of 20f/s and let it run for some Minutes or so. Due to the lack of colors and the spacing, the GIF-Compression was able to create the illusion of a smooth 5MB-GIF, which is way too big but who gives a fuck in Giphy/Instagram-Times am I rite. Thus, you created a monster. Here’s a pro-tipp: C64-Cursors blink 2 times per second and the defrag moves every second. Videoscreen this with 2 Frames per Second, you get the same result with a filesize of about 1MB and 100 Frames or so. You’re welcome ๐
sgnd
formerly C64-Coder and Graphic Designer
As noted, this will be slower on a 1541 than a non-defragmented disk due to the head not being in an optimal position to read consecutive sectors (the disk is moving). All that being said, it seems that it may be possible to optimize the algorithm for interleaving. Alter the algorithm so that the definition of defragmentation is not files stored in consecutive sectors, but rather a file stored in an optimally interleaved fashion, so that the chained sectors in a file may be read in the most efficient way.