{"id":214,"date":"2012-12-23T07:58:08","date_gmt":"2012-12-23T07:58:08","guid":{"rendered":"http:\/\/halobates.de\/blog\/?p=214"},"modified":"2012-12-26T20:30:45","modified_gmt":"2012-12-26T20:30:45","slug":"efficient-n-states-on-x86-systems","status":"publish","type":"post","link":"http:\/\/halobates.de\/blog\/p\/214","title":{"rendered":"Efficient n-states on x86 systems"},"content":{"rendered":"<p>Programs often have flags that affect the control flow. The simplest case is a boolean that can be either TRUE or FALSE (normally 1 and 0). The x86 ISA uses <a href=\"http:\/\/www.sandpile.org\/x86\/cc.htm\">condition codes<\/a> to test and do a conditional jumps. It has a direct flag to test for zero, so that a test for zero and non zero can be efficiently implemented. <\/p>\n<pre>\r\n<code> \r\nif (flag == 0) case_1(); \r\nelse case_2(); \r\n<\/code>\r\n<\/pre>\n<p>In assembler this will look something like this (don&#8217;t worry if you don&#8217;t know assembler, just read the comments): <\/p>\n<pre>\r\n<code> \r\ntest $-1,flag       ; and flag with -1 and set condition codes \r\njz   is_zero        ; jump to is_zero if flag was zero \r\n; flag was non zero here \r\n<\/code> \r\n<\/pre>\n<p>But what happens when flag can have more than two states (like the classic <a href=\"http:\/\/us.thedailywtf.com\/Articles\/What_Is_Truth_0x3f_.aspx\">true, false, file_not_found<\/a>). How can we implement such a test in the minimum number of instructions? A common way is to use a negative value (-1) for the third state. x86 has a condition code for negative, so the code becomes simply <\/p>\n<pre><code> \r\nif (flag == 0)       case_1(); \r\nelse if (flag < 0)   case_2(); \r\nelse \/* > 0 *\/       case_3(); \r\n<\/code> <\/pre>\n<p>(if you try that in gcc use a call, not just a value for the differing control flows, otherwise the compiler may use tricks to implement this without jumps) <\/p>\n<p>In assembler this looks like <\/p>\n<pre>\r\n<code> \r\ntestl $-1,flag   ; and flag with -1 and set condition codes \r\njz case_1        ; jump if flag == 0 \r\njs case_2        ; jump if flag < 0 \r\n; flag > 0 here \r\n<\/code> \r\n<\/pre>\n<p>Now how do we handle four states with a single test and a minimum number of jumps? Looking at the x86 <a href=\"http:\/\/www.sandpile.org\/x86\/cc.htm\">condition tables<\/a> it has more condition codes. One interesting but obscure one is the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Parity_flag\">parity<\/a> flag. Parity is true if the number of bits set is even. In x86 it also only applies to the lowest (least-significant) byte of the value. <\/p>\n<p>Parity used to be used for 7bit ASCII characters to detect errors, but adding a parity bit in the 8th bit that makes the parity always even (or odd, I forgot). These days this is mostly obsolete and usually replaced with better codes that can not only detect, but also correct errors. <\/p>\n<p>Back to the n-value problem.  So we can use an additional positive value that has an even parity, that is an even number of bits, like 3 (0b11). The total values now are: -1, 0, 1, 3 <\/p>\n<p>In assembler this a straight forward additional test for &#8220;p&#8221;: <\/p>\n<pre>\r\n<code> \r\ntestl $-1,flag     ; and flag with -1 and set condition codes \r\njz    case_1       ; jump if flag == 0 \r\njs    case_2       ; jump if flag < 0 \r\njp    case_3       ; jump if flag > 0 and flag has even parity \r\n; flag > 0 and has odd parity \r\n<\/code> \r\n<\/pre>\n<p>Note that the 3 would be only detected in the lowest byte, so all 32bit values with 3 in the lowest 8bit would be equivalent to 3. That is also true for all negative values which are the same as -1. Just 0 is lonely. <\/p>\n<p>Now how to implement this in C if you don&#8217;t want to write assembler (or not writing a compiler or JIT)? gcc has a __builtin_parity, function but unfortunately it is defined as parity on 4 bytes, not a single byte. This means it cannot map directly to the &#8220;p&#8221; condition code. <\/p>\n<p>One way is to use the <a href=\"http:\/\/gcc.gnu.org\/onlinedocs\/gcc\/Extended-Asm.html\">asm goto<\/a> extension that is available in gcc 4.5 and use inline assembler. With asm goto an inline assembler statement can change control flow and jump directly to C labels. <\/p>\n<pre>\r\n<code> \r\nasm goto(\r\n    \"      testl $-1,%0\\n\"\r\n    \"      jz   %l1\\n\"\r\n    \"      js   %l2\\n\"          \r\n    \"      jp   %l3\" :: \"rm\" (flag) :: lzero, lneg, lpar);         \r\n\/* flag > 0, odd parity here (1) *\/ \r\nlzero:  \/* flag == 0 here (0) *\/ \r\nlneg:   \/* flag < 0 here (-1) *\/ \r\nlpar:   \/* flag > 0 (3) *\/ \r\n<\/code> \r\n<\/pre>\n<p>This works and could be packaged into a more generic macro. The disadvantage of course is &#8212; apart from being non portable &#8212; that the compiler cannot look into the inline assembler code. If flag happens to be a constant known to the compiler it won&#8217;t be able to replace the test with a direct jump. <\/p>\n<p>Can this be extended to more than four states? We could add a fifth negative state with two negative numbers that have even and odd parity respective. This would require the negative cases to go through two conditional jumps however: first to check if the value is negative and then another to distinguish the two parity cases. <\/p>\n<p>It&#8217;s not clear that is a good idea. Some older x86 micro architectures have limits on the number of jumps in a 16byte region that can be branch predicted (that is why gcc sometimes add weird nops for -mtune=generic). Losing branch prediction would cost us far more than we could gain with any optimization here. <\/p>\n<p>Another approach would be to use a cmp (subtraction) instead of a test. In this case the overflow and <a href=\"http:\/\/en.wikipedia.org\/wiki\/Carry_flag\">carry<\/a> condition codes would be also set. But it&#8217;s not clear if that would gain us any more possibilities. <\/p>\n<p>Update 12\/26: Fix typo<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Programs often have flags that affect the control flow. The simplest case is a boolean that can be either TRUE or FALSE (normally 1 and 0). The x86 ISA uses condition codes to test and do a conditional jumps. It has a direct flag to test for zero, so that a test for zero and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[10,7,12,11,1],"tags":[],"_links":{"self":[{"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/posts\/214"}],"collection":[{"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/comments?post=214"}],"version-history":[{"count":12,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/posts\/214\/revisions"}],"predecessor-version":[{"id":226,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/posts\/214\/revisions\/226"}],"wp:attachment":[{"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/media?parent=214"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/categories?post=214"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/halobates.de\/blog\/wp-json\/wp\/v2\/tags?post=214"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}