Wednesday, January 6, 2016

Achieving the ultimate date continued, good interpretations

Hello google wanderer, let me disappoint you because once more this article, although it has to do with speed dates, won't help you any to find your soul mate, and especially not how to understand her/him...

Ah, BOOST preprocessing, as explained before, will be postponed once more.

Another inexpressible thing


It turns out there is no way to tell a C++ compiler that it can see the same bits as two different types of values. reinterpret_cast and C-style casts that reinterpret the value are simply not allowed in constexprs. But we have unions, right? For example, I thought for this code
union bits64 {
    long integer;
    double floating;

    constexpr bits64(double init) noexcept: floating(init) {}
    constexpr bits64(long init) noexcept: integer(init) {}
};
that integer and floating could be used indistinctly; furthermore, for ages my code has done exactly that to significant gains of clarity and performance. However, what that code really means is you can set either and read the same; if you last set one and read the other, that’s undefined behavior. IN YOUR FACE!. In particular, for constexprs, where the compiler can track which is the last member set, the so-called active member, it blows into an error. I am discovering that unions are discriminated. If you need persuasion that this is the case, check this code:
union bits64 {
    long integer;
    double floating;

    constexpr bits64(double init): floating(init) {}
    constexpr bits64(long init): integer(init) {}

    constexpr operator long() const { return integer; }
};

constexpr bits64 fromLong{long(0)};

static_assert(0 == long(fromLong), "");

constexpr bits64 fromDouble = 0.0;
static_assert(0 == long(fromDouble), "");
There will be a compilation error, accessing fromDouble.integer when the compiler is sure that the last union member set (the active member) is other than the one being accessed.

I don’t know what is there to gain by such definition of unions. I mean, there are things that come to mind, we could put an addendum to tools such as valgrind to keep track of the last union member set and track its usage.  That’s all good, but it does not require a language feature to force it; however, what about the legitimate use of indiscriminated unions?

There is a good example for wanting to use an alternative interpretation for the same bits: A date type such as the one described above as trivial conversions to and from integer, if the user does not care about endianness:
union date_record {
    struct {
        uint8_t char day, month;
        int16_t year;
    };
    int32_t integer;

    constexpr bool operator<(date_record other) const noexcept {
        return integer < other.integer;
    }
};

bool compare(date_record dr) {
    return dr.day + dr.month * 256 + dr.year == dr.as_integer;
}
compare will always be true in little endian arithmetic (such as AMD64). This fact can be used to simplify many operations. I implemented an integer comparison which under the assumption of little endianness is unbeatable in terms of economy of programming effort, clarity, and runtime performance; precisely the kind of things a good language should encourage.

Using an alternative interpretation for the same bits through the union feature, is very well defined "undefined behavior" in all compilers I’ve ever used.

Let us see another example, a date increment by one month operation (this implementation is not complete because adding one month to a January 30 needs to change the day component too, however, for simplicity of exposition):
union date_record {
    struct {
        uint8_t char day, month;
        int16_t year;
    };
    int32_t integer;

    constexpr bool operator<(date_record other) const noexcept {
        return integer < other.integer;
    }
};

date_record incrementMonthAggregate(date_record dr) {
    if(dr.month < 12) { ++dr.month; }
    else { dr.month = 0; ++dr.year; }
    return dr;
}

date_record incrementMonthInteger(date_record dr) {
    date_record rv;
    rv.integer =
        dr.month < 12 ?
        dr.integer + 256 :
        (dr.integer & 0xFFFF00FF) + 0x10000;
    return rv;
}
Notice both functions do essentially the same work, so, programming one, even for constexprs would make the other superflous except for the apparently arbitrary rule that unions are discriminated. But since that rule is in the language, gcc honors it and I don’t know how to turn it off, then if full compile time reasoning is desired, there is no recourse but to implement all the functions as many times as there are alternative interpretations. I thought that was the end of it, and I had my article nicely tied up.

However, being thorough, I checked the assember. It turns out they are sufficiently different. In gcc, 20 assembler lines versus 8, in clang, 18 versus 12 --notice clang’s assembler for the integer operation, despite being slightly longer, is superior to gcc’s because it does not have a conditional jump–. It took me a while to figure out what’s the difference (GCC case):
incrementMonthAggregate(date_record):
        movl    %edi, %edx ;; edx <- dr
        movl    %edi, %esi ;; esi <- dr
        movzbl  %dh, %ecx  ;; ecx <- dr.month
        sarl    $16, %esi  ;; esi = dr.year
        cmpb    $11, %cl   ;; if(dr.month <= 11)
        movl    %esi, %eax ;;
        jbe     .L5        ;; { eax <- dr.year; goto .L5 }
        leal    1(%rsi), %eax ;; eax <- 1 + dr.year
        xorl    %ecx, %ecx ;; ecx <- 0
        movb    %cl, %dh   ;; dr.month = 0
        sall    $16, %eax  ;; eax <- (1 + dr.year) << 16
        movzwl  %dx, %edx  ;; edx <- dr.month * 256 + dr.day
        orl     %edx, %eax ;; eax <- dr
        ret
.L5:
        addl    $1, %ecx   ;; ecx <- 1 + dr.month
        sall    $16, %eax  ;; eax <- dr.year << 16
        movb    %cl, %dh
        movzwl  %dx, %edx  ;; edx <- dr.month * 256 + dr.day
        orl     %edx, %eax ;; eax <- dr
        ret
incrementMonthInteger(date_record):
        movq    %rdi, %rax
        cmpb    $11, %ah
        jbe     .L10
        andl    $-65281, %edi
        leal    65536(%rdi), %eax
        ret
.L10:
        leal    256(%rdi), %eax
        ret
Contrary to my expectations, quite simply all three compilers (intel’s, gcc, clang) fail to de-aggregate the day, month, year representation into a consolidated integer, so, they do the superfluous work of splitting the value into components and resynthetizing it.

Now that we have been informed that there actually is a runtime performance difference, we have to make sure we use the best interpretaton for each operation. I wish the compiler would let me use the interpretation I see as easiest to implement an operation and then give me the equivalent for the best interpretation, because I am never in the mood to do the compiler’s job while programming, but can’t anymore, there is performance to be gained.

In summary, discriminated unions mean that in practice the compile time functions have to respect which is the active member, leading to programming several versions of exactly the same operations, and more complications to indicate to the end programmer which ones to use for the runtime. IMO, this is not acceptable.

No comments:

Post a Comment