This is our first assembly language puzzle for the new site! These puzzles are tests to see whether you are good enough of an assembly nerd, and to learn some tricks if you’re not =^_^=
Our first puzzle is of a classic type: size optimization. It is for x86-32 assembly language, certainly the most widely known assembly language. We will definitely do other puzzles for other processors though!
You might find that many of these puzzles are good ideas to place into compilers and other automated assembly/machine code generators.
The puzzle: In terms of opcode bytes, find the smallest sequence of x86-32 instructions to implement the following C/C++ code:
if ((x == 0) || (y == 0))
   goto label;
Rules:
- x and y are 32-bit integers or pointers.
- x and y are each already in general-purpose registers or memory locations of your choice.
- Do not assume a particular state of the flags, except that you may assume the direction flag is always clear as that is its usual state.
- You may destroy any general-purpose registers or memory locations as you see fit, including the locations of x and y.
- Assume that label is within range of a short jump.
- Do not assume that you have access to protected instructions.
- In general, answers that are the same size but faster or less destructive are considered better than others.
I was rather verbose in the rules because it’s the first puzzle. Future puzzles won’t necessarily mention these restrictions.
Answers that don’t fit all the rules but have other merits like creativity are certainly welcomed!
The smallest answer I could find was 6 bytes. The straightforward answer is 8 bytes. Good luck!
-Myria
(check comments for solution(s))
or eax,ebx
jnz label
4 bytes: 09 D8 75 [rel jump distance]
Oops, I read it backwards…now I feel dumb. That code jumps if either is nonzero. If the two are in ecx and eax, and then
jecxz label
or eax,eax
jz label
So yeah, 6 bytes.
(I’m the post author. We haven’t gotten the site to show who posted what yet…)
dowhatimean.net: You have what my 6 byte solution was. There is a problem with your code that results in 6 bytes rather than 4. “mul” sets the zero flag based on the low half of the result, not the whole double-sized result. Thus 0x80000000 * 0x80000000 would break your code. This is required in order to get it correct (my original solution):
mul edx
or eax, edx
jz label
Matt Parks: Omg, I totally forgot about jecxz! On older systems, your code is certainly much faster than mine. I don’t know about the P4 and A64 though, since they have really fast multiplies. It all depends on whether the mul is more expensive than the CISC-like (and therefore microcoded) jecxz and its accompanying branch misprediction penalty.
-Melissa
Forgive me my ignorance Melissa…
My answer would have been
or eax, edx
jz label
I see no need for the mul 😉
-stefan
Ohhhh Ohhh Ohhh I see my error 😉
OR not AND
I found a 5 byte one. =)
See if you can guess it.
-Myria
Maybe something like
Input: eax, ecx
dec eax
jo label
jecxz label
??? (I don’t remember at the moment if underflow also triggers o flag)
Stefan – I tried that code snippet out on my trusty DOS debug, and the jo didn’t branch like it should…guess the overflow wasn’t set. Good thought though. Any 5-byte answer has to look something like that though…what other one-byte instructions modify the flags?
Myria – keep the puzzles coming! This is fun.
Unfortunately, “inc” and “dec” do not modify the carry flag, because the 8080 (and its clone, Z80) did not. “inc” and “dec” do modify the overflow flag, however. The problem with overflow is that it represents a *signed* overflow/underflow (for “dec”, this means going from 80000000 to 7FFFFFFF). This makes it useless for this problem >_< The 5 byte solution I found isn't as big a trick as it might seem. -Myria
What about an IMUL? The same idea as mul, but does it set the zero flag correctly?
The 5-byte solution I came up with is:
jecxz label
xchg eax, ecx
jecxz label
xchg eax, ecx is a single-byte opcode. Strangely, this optimal solution is also the least destructive: x and y were not destroyed by the operation if the condition is false, although they did switch places.
Thanks to Matt for reminding me about jecxz =)
imul has 3 forms, unlike mul. The first is the same format as mul: edx:eax = eax * param. The second form is a multiply of any register (not just eax) by another. However, this only stores the low 32 bits of the result. The third form is dest = src * immediate, again only storing the low 32 bits of the result.
None of these sets the zero flag when the *whole* result is zero, only when the low 32 bits is zero. They *do* set carry when the result overflows (meaning that the operands weren’t really zero to begin with). Unfortunately, there is no “jump if zero and not carry” instruction in x86, as no normal arithmetic comparison would need one.
-Myria
Values in eax and ebx
lea ecx,[eax*ebx]
jecxz label
I think that should do it. Let me know.
Derek
You can’t do register*register with lea. You can only do register*X where X is 1, 2, 3, 4, 5, 8 or 9. (Technically, only 1, 2, 4, 8, but you can add one register as well, plus the multiplication. So lea eax, [eax*5] is actually lea eax, [eax*4+eax].)
-Myria
I’m mortified that I made a stupid error like that. I haven’t programmed in asm since 1991 and I have no 32 bit debugger to test instructions so my memory is showing it’s age.
Derek
I think the following should do:
and eax,ebx
jz label
Dadhikaka,
That only works when they are equal or both 0. What happens when the arguments have no 1 bits in common?
Hello! Good Site! Thanks you! byqbufbqgzz
good site…
checkout assembly.co.nr for assembly language online guide.
jecxz is nice and small, but painfully slow on the K6 onward.
They slowed down the loop instruction, because of a popular
software *cough* using it in delay loops (not Turbo Pascal,
the other popular oneâŚ), to 300 MHz or so, and jecxz seems
to share the implementation or something.
Doesn’t hinder me from using it in the bootloader though 😉
I also tried out the OR idea and got positive results, but others here seem to have had a different experience?
My code exactly:
mov eax, 0
mov ebx, 0
or eax, ebx
jz label
No matter what values I set EAX and EBX to, the jump only occurs if both are 0. Am I missing something?
After posting, it hit me that it’s supposed to be if one is 0, not both. Replacing the OR with an AND seems to have solved that – and it’s still 4 bytes.
I’ve discovered your blog in August 2017 and reached the end (= the beginning ;)) of the blog – this post – today, early in September 2017! Lots of super interesting information here! 🙂
Thanks a lot! 🙂