In almost all assembly books you’ll find some nice tricks to do fast multiplications. E.g. instead of “imul eax, ebx, 3” you can do “lea eax, [ebx+ebx*2]” (ignoring flag effects). It’s pretty clear how this works. But how can we speed up, say, a division by 3? This is quite important since division is still a really slow operation. If you never thought or heart about this problem before, get pen and paper and try a little bit. It’s an interesting problem.
UP9600: How to Bit-Bang 9600 Baud RS-232 on the C64
The user port of the Commodore 64 exposes a TTL-level RS-232 serial port that supports up to 1200 baud1. In 1997, Daniel Dallmann came up with a very sophisticated trick that allowed sending and receiving at 9600 baud2, using slightly different wiring and a dedicated driver. This “UP9600” wiring has become the de-facto standard for all modern accessories, like C64 WiFi modems. Let’s see how UP9600 works.
The Ultimate Game Boy Talk [video]
This is the video recording of “The Ultimate Game Boy Talk” at 33C3.
Clockslide: How to waste an exact number of clock cycles on the 6502
by Sven Oliver ‘SvOlli’ Moll; the original German language version has been simultaneously posted on his blog.
Chaosradio Express #177: Commodore 64
HFS+ File System Analysis and Forensics with fileXray
Modern filesystems are highly optimized database systems that are a core function of modern operating systems. They allow concurrent access by many CPUs, they keep locality up and fragementation down, and they can recover from crashes guaranteeing consistent data structures.
Having Fun with Branch Delay Slots
Branch Delay Slots are one of the awkward features of RISC architectures. RISC CPUs are pipelined by definition, so while the current instruction is in execution, the following instruction(s) will be in the pipeline already. If there is for example a conditional branch in the instruction stream, the CPU cannot know whether the next instruction is the one following the branch or the instruction at the target location until it has evaluated the branch. This would cause a bubble in the pipeline; therefore some RISC architectures have a branch delay slot: The instruction after the branch will always be executed, no matter whether the branch is taken or not.
A Standalone printf() for Early Bootup
A while ago, I complained about operating systems with overly complicated startup code that spends too much time in assembly and does hot have printf() or framebuffer access until very late.
Minimizing the Assembly needed for Machine Initialization
In many operating systems, I have seen overly complicated startup code. Too much is done in assembly, and printf() and framebuffer access is only available very late. In the next three blog posts, I will show how this can be avoided.
LOAD"$",8
Commodore computers up to BASIC 2.0 (like the Commodore 64, the VIC-20 and the PET 2001) only had a very basic understanding of mass storage: There were physical device numbers that were mapped to the different busses, and the “KERNAL” library had “open”, “read”, “write” and “close” functions that worked on these devices. There were also higher-level “load” and “save” functions that could load and save arbitrary regions of memory: The first two bytes of the file would be the (little endian) start address of the memory block.
Readable and Maintainable Bitfields in C
Bitfields are very common in low level C programming. You can use them for efficient storage of a data structure with lots of flags, or to pass a set of flags between functions. Let us look at the different ways of doing this.
Optimized Flags Code for 6502
When I disassembled Steve Wozniak’s Apple I BASIC, I found a 6502 trick that I had never seen before, although I had read a lot of 6502 code, including the very well-written Commodore BASIC (i.e. Microsoft BASIC for 6502).
Simple compiler optimization
I thought of an optimization that compilers for most CPUs could do that I think should be implemented. Let’s say you have C code like this: