Wednesday, January 6, 2016

Achieving the ultimate date continued, good interpretations

Hello google wanderer, let me disappoint you because once more this article, although it has to do with speed dates, won't help you any to find your soul mate, and especially not how to understand her/him...

Ah, BOOST preprocessing, as explained before, will be postponed once more.

Another inexpressible thing


It turns out there is no way to tell a C++ compiler that it can see the same bits as two different types of values. reinterpret_cast and C-style casts that reinterpret the value are simply not allowed in constexprs. But we have unions, right? For example, I thought for this code
union bits64 {
    long integer;
    double floating;

    constexpr bits64(double init) noexcept: floating(init) {}
    constexpr bits64(long init) noexcept: integer(init) {}
};
that integer and floating could be used indistinctly; furthermore, for ages my code has done exactly that to significant gains of clarity and performance. However, what that code really means is you can set either and read the same; if you last set one and read the other, that’s undefined behavior. IN YOUR FACE!. In particular, for constexprs, where the compiler can track which is the last member set, the so-called active member, it blows into an error. I am discovering that unions are discriminated. If you need persuasion that this is the case, check this code:
union bits64 {
    long integer;
    double floating;

    constexpr bits64(double init): floating(init) {}
    constexpr bits64(long init): integer(init) {}

    constexpr operator long() const { return integer; }
};

constexpr bits64 fromLong{long(0)};

static_assert(0 == long(fromLong), "");

constexpr bits64 fromDouble = 0.0;
static_assert(0 == long(fromDouble), "");
There will be a compilation error, accessing fromDouble.integer when the compiler is sure that the last union member set (the active member) is other than the one being accessed.

I don’t know what is there to gain by such definition of unions. I mean, there are things that come to mind, we could put an addendum to tools such as valgrind to keep track of the last union member set and track its usage.  That’s all good, but it does not require a language feature to force it; however, what about the legitimate use of indiscriminated unions?

There is a good example for wanting to use an alternative interpretation for the same bits: A date type such as the one described above as trivial conversions to and from integer, if the user does not care about endianness:
union date_record {
    struct {
        uint8_t char day, month;
        int16_t year;
    };
    int32_t integer;

    constexpr bool operator<(date_record other) const noexcept {
        return integer < other.integer;
    }
};

bool compare(date_record dr) {
    return dr.day + dr.month * 256 + dr.year == dr.as_integer;
}
compare will always be true in little endian arithmetic (such as AMD64). This fact can be used to simplify many operations. I implemented an integer comparison which under the assumption of little endianness is unbeatable in terms of economy of programming effort, clarity, and runtime performance; precisely the kind of things a good language should encourage.

Using an alternative interpretation for the same bits through the union feature, is very well defined "undefined behavior" in all compilers I’ve ever used.

Let us see another example, a date increment by one month operation (this implementation is not complete because adding one month to a January 30 needs to change the day component too, however, for simplicity of exposition):
union date_record {
    struct {
        uint8_t char day, month;
        int16_t year;
    };
    int32_t integer;

    constexpr bool operator<(date_record other) const noexcept {
        return integer < other.integer;
    }
};

date_record incrementMonthAggregate(date_record dr) {
    if(dr.month < 12) { ++dr.month; }
    else { dr.month = 0; ++dr.year; }
    return dr;
}

date_record incrementMonthInteger(date_record dr) {
    date_record rv;
    rv.integer =
        dr.month < 12 ?
        dr.integer + 256 :
        (dr.integer & 0xFFFF00FF) + 0x10000;
    return rv;
}
Notice both functions do essentially the same work, so, programming one, even for constexprs would make the other superflous except for the apparently arbitrary rule that unions are discriminated. But since that rule is in the language, gcc honors it and I don’t know how to turn it off, then if full compile time reasoning is desired, there is no recourse but to implement all the functions as many times as there are alternative interpretations. I thought that was the end of it, and I had my article nicely tied up.

However, being thorough, I checked the assember. It turns out they are sufficiently different. In gcc, 20 assembler lines versus 8, in clang, 18 versus 12 --notice clang’s assembler for the integer operation, despite being slightly longer, is superior to gcc’s because it does not have a conditional jump–. It took me a while to figure out what’s the difference (GCC case):
incrementMonthAggregate(date_record):
        movl    %edi, %edx ;; edx <- dr
        movl    %edi, %esi ;; esi <- dr
        movzbl  %dh, %ecx  ;; ecx <- dr.month
        sarl    $16, %esi  ;; esi = dr.year
        cmpb    $11, %cl   ;; if(dr.month <= 11)
        movl    %esi, %eax ;;
        jbe     .L5        ;; { eax <- dr.year; goto .L5 }
        leal    1(%rsi), %eax ;; eax <- 1 + dr.year
        xorl    %ecx, %ecx ;; ecx <- 0
        movb    %cl, %dh   ;; dr.month = 0
        sall    $16, %eax  ;; eax <- (1 + dr.year) << 16
        movzwl  %dx, %edx  ;; edx <- dr.month * 256 + dr.day
        orl     %edx, %eax ;; eax <- dr
        ret
.L5:
        addl    $1, %ecx   ;; ecx <- 1 + dr.month
        sall    $16, %eax  ;; eax <- dr.year << 16
        movb    %cl, %dh
        movzwl  %dx, %edx  ;; edx <- dr.month * 256 + dr.day
        orl     %edx, %eax ;; eax <- dr
        ret
incrementMonthInteger(date_record):
        movq    %rdi, %rax
        cmpb    $11, %ah
        jbe     .L10
        andl    $-65281, %edi
        leal    65536(%rdi), %eax
        ret
.L10:
        leal    256(%rdi), %eax
        ret
Contrary to my expectations, quite simply all three compilers (intel’s, gcc, clang) fail to de-aggregate the day, month, year representation into a consolidated integer, so, they do the superfluous work of splitting the value into components and resynthetizing it.

Now that we have been informed that there actually is a runtime performance difference, we have to make sure we use the best interpretaton for each operation. I wish the compiler would let me use the interpretation I see as easiest to implement an operation and then give me the equivalent for the best interpretation, because I am never in the mood to do the compiler’s job while programming, but can’t anymore, there is performance to be gained.

In summary, discriminated unions mean that in practice the compile time functions have to respect which is the active member, leading to programming several versions of exactly the same operations, and more complications to indicate to the end programmer which ones to use for the runtime. IMO, this is not acceptable.

Tuesday, January 5, 2016

Achieving the ultimate date

If you stumbled upon this article from a search engine, sorry to disappoint, but this is not about dating, nor how to find your soul mate. It is about the puerile pursuit of the ultimate implementation for computer program values that hold dates.

The intellectual fertility of implementing the ultimate Date class


In the article lists of function call arguments we introduced an idiom to configure a class.

Part of the motivation for the technique is that lists of function call arguments are not expressible directly into C++, especially the very useful case of partial lists of parameters; thus, we wanted to explore for the special case of boolean parameters whether it was feasible to pack them into a C++ type. Fortunately, we saw that it was possible, without a loss of performance, and potentially a performance gain. As a matter of fact, now I recommend not using boolean parameters at all, but to group them into their own set of flags. To further make the idiom practical, I introduced supporting BOOST preprocessing macros to take care of the boilerplate.

In my outgoing remarks I mentioned in passing that the technique could be generalized to arbitrary lists of function call arguments, provided their types respected value semantics.
 
This is one aspect that has kept me busy researching. The technique works, however, implementing it, and its supporting BOOST preprocessing macros, I encountered obstacles of different natures:
  1. The compilers for reasons not known to me translate essentially the same code in different ways
  2. The language has a few rules that conspire against highly performing and easy to understand code
  3. I reached hard limitations of BOOST preprocessing
  4. Some compilers may fail to perform optimizations that GCC can.
I thought a class for dates was simple enough for illustration purposes and non-trivial enough that the idioms would be applicable.

Offline a friend pointed out the deficiency of the idiom presented of boolean flags of not being able to force a complete set of flags to initialize a class. That is, the expression of complete sets of flags, in the example we have been using, to construct a Font only if the FontOptions has all its flags set. This problem has a solution that I suspect is original.

Generalizing the conversion of boolean parameters into a consolidated aggregate type for any value-semantic argument list turned into a through and through research project, because the straightforward solution, which is expressible, still leads to very interesting aspects. In that regard, it seems the conceptualization I arrived at is at the leading edge of the capabilities of the language, compilers and supporting techniques. This makes for very interesting writing material.

In this article, I’ve decided to simply introduce the aspects, to be explained in depth in successive articles. Yet again, I am forced to postpone explaining BOOST preprocessing, actually, I am afraid this time I had to use BOOST preprocessing techniques I don’t understand too well, and even beyond that, there are techniques that I don’t look forward to explaining because they are too weird, but I'll try.

 

Complete flag sets


Let us start with recapitulating the solution for the sets of options to create a font:

#include <boost/preprocessor/seq/for_each_i.hpp>
#include <boost/preprocessor/cat.hpp>

#define PP_SEQ_I_DECLARE_ENUMERATION_VALUE(r, dummy, i, e) e = (1 << i),
#define PP_SEQ_I_FLAG_SET_MEMBER_FUNCTIONS(r, name, i, e)\
    constexpr name e() const noexcept { return value | internal::e; }\
    constexpr name BOOST_PP_CAT(not_, e)() const noexcept { return value & ~internal::e; }\
    constexpr bool BOOST_PP_CAT(is_, e)() const noexcept { return value & internal::e; }

#define BOOLEAN_FLAG_SET(name, flags)\
    class name {\
        struct internal { enum: unsigned char {\
            EMPTY = 0,\
            BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_DECLARE_ENUMERATION_VALUE, ~, flags)\
        }; };\
        unsigned char value = 0;\
        constexpr name(unsigned v) noexcept: value(v) {}\
    public:\
        name() = default;\
        name(const name &) = default;\
        BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_FLAG_SET_MEMBER_FUNCTIONS, name, flags)\
        constexpr bool operator==(name other) const noexcept { return value == other.value; }\
        constexpr bool operator!=(name other) const noexcept { return !(*this == other); }\
    }

We can have a class Font that requires a FontOptions to be constructed, like this:

BOOLEAN_FLAG_SET(FontOptions, (bold)(italic)(underlined)(strike_through));

class Font {
    public:

    Font() = default;
    Font(const Font &) = default;

    inline constexpr Font(FontOptions fo) noexcept;
};
 

Power set covariant return type template


This example suggest that the only way to make a new Font is by first having a FontOptions. However, there is nothing to prevent a Font being constructed with FontOptions not completely specified. This may be an important feature in other examples. Without further ado, I present the solution I came up with, which potentially is a new idiom that I will call “the power-set covariant return type template”, first, in its hand coded form:

namespace zoo {
    
template<unsigned FlagsSet> class FOI {
protected:
    enum Flags { BOLD = 1, ITALIC = 2, UNDERLINED = 4, STRIKE_THROUGH = 8 };
    unsigned char flagsSet = 0;

    explicit constexpr FOI(unsigned integer) noexcept: flagsSet(integer) {}

    constexpr unsigned flagVal(bool v, unsigned flag) const noexcept {
        return v ? (flagsSet | flag) : (flagsSet & ~flag);
    }

    template<unsigned> friend class FOI;

public:
    FOI() = default;
    FOI(const FOI &) = default;

    constexpr FOI<FlagsSet | BOLD> bold(bool v) const noexcept {
        return FOI<FlagsSet | BOLD>(flagVal(v, BOLD));
    }

    constexpr FOI<FlagsSet | ITALIC> italic(bool v) const noexcept {
        return FOI<FlagsSet | ITALIC>(flagVal(v, ITALIC));
    }

    constexpr FOI<FlagsSet | UNDERLINED> underlined(bool v) const noexcept {
        return FOI<FlagsSet | UNDERLINED>(flagVal(v, UNDERLINED));
    }

    constexpr FOI<FlagsSet | STRIKE_THROUGH> strike_through(bool v) const noexcept {
        return FOI<FlagsSet | STRIKE_THROUGH>(flagVal(v, STRIKE_THROUGH));
    }

    constexpr bool getBold() const noexcept { return flagsSet & BOLD; }
    constexpr bool getItalic() const noexcept { return flagsSet & ITALIC; }
    constexpr bool getUnderlined() const noexcept { return flagsSet & UNDERLINED; }
    constexpr bool getStrikeThrough() const noexcept { return flagsSet & STRIKE_THROUGH; }
};

}

template<unsigned fs> using FontOptions = zoo::FOI<fs>;

struct Font {
    bool b, i, u, s;
    Font(FontOptions<15> o): b(o.getBold()), i(o.getItalic()), u(o.getUnderlined()), s(o.getStrikeThrough()) {}
};

Font makeFont() {
    return  FontOptions<0>().bold(true).underlined(false).italic(true).strike_through(true);
}

Go ahead and try to build a Font from less than the result of initializing all the flags, the compiler will reject the code.

The technique before lets you:
  1. To not lose any performance (if in doubt, construct a test case and see the assembler)
  2. Full battery of compile-time reasoning about the values
  3. To set the flags by name
  4. Regardless of order
  5. To have partial sets of flags and communicate them all over
  6. The hability to enforce that all the flags are needed, simply require the signature with the number that represents all bits set.
  7. Performance gains from keeping the set consolidated
In summary, at the same time greater clarity and performance.

The name “power-set covariant return type” stems for the fact that the return type of the accessors in effect traverse the power set of flags. Remember, the power set of a set is the set of all of its subsets. That is, the return types typically begin with the empty set (no flag has been set), and progress adding any flag to the current set; the possible ways are all of the power set.

Now, let us add macros to not repeat the same boilerplate code:
 
#define PP_SEQ_I_DECLARE_ENUMERATION_VALUE(r, dummy, i, e) e = (1 << i),
#define PP_SEQ_I_POWER_SET_SET_FLAG_FUNCTION(r, name, i, e)\
    constexpr name<set | internal::e> e(bool v) const noexcept { return flagVal(v, internal::e); }\
    constexpr bool BOOST_PP_CAT(get_, e)() const noexcept { return value & internal::e; }

#define POWER_SET_COVARIANT_RETURN_TYPE_FLAG_SET(name, flags)\
    template<unsigned set> class name {\
        struct internal {\
            enum {\
                BOOST_PP_SEQ_FOR_EACH_I(\
                    PP_SEQ_I_DECLARE_ENUMERATION_VALUE,\
                    ~,\
                    flags\
                )\
            };\
        };\
        unsigned char value = 0;\
        constexpr name(unsigned v) noexcept: value(v) {}\
        constexpr unsigned flagVal(bool v, unsigned flag) const noexcept {\
           return v ? (value | flag) : (value & ~flag);\
        }\
        template<unsigned> friend struct name;\
    public:\
        name() = default;\
        name(const name &) = default;\
        BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_POWER_SET_SET_FLAG_FUNCTION, name, flags)\
    }

 

But it is just a generalization!


So far, the language, compiler and preprocessor are cooperating with the idioms. Let’s now generalize the idiom of transforming the non-expressible concept of boolean argument list (potentially partial) into a configuration set for arbitrary types.

First, the input can’t be just a list of identifiers, because the members will have arbitrary types. This generalization won’t be for the uninteresting case of the members having the same type. Since the list will be pairs of name and type, and because in preprocessing it does not make a difference whether they are pairs, triplets, or larger tuples, let us make them triplets: the name of the internal member variable, the type, and the public interface accessor.

The example I’ve chosen to illustrate the generalization is that of a date, expressed either as an integer or as an aggregate of day, month, year. You’ll see below I first exemplify the technique by hand-coding an example, and then, a fairly complicated set of macros that generate the same thing. Notice that to be fair, I am also using union to compare the assembler generated to the case in which an alternate representation makes sense.

Again, the code will be published without further explanation.

Investigating on date representations such as these I stumbled upon interesting phenomena, which is not related to preprocessing. Since I do not want to bore the audience with too many preprocessing articles in a row, the next article will be on that subject matter.

#include <boost/preprocessor/seq/for_each.hpp>
#include <boost/preprocessor/seq/for_each_i.hpp>
#include <boost/preprocessor/tuple/elem.hpp>
#include <boost/preprocessor/comparison/equal.hpp>
#include <boost/preprocessor/control/iif.hpp>
#include <boost/preprocessor/comma_if.hpp>

namespace pp {

enum Month: unsigned char {
    JANUARY, FEBRUARY, JULY
};

#define PP_SEQ_MEMBER_OPERATOR(r, representative, op)\
    constexpr bool operator op(const self_type &arg) const noexcept { return representative(*this) op representative(arg); }

#define PP_COMPARABLE_OPERATIONS(type, representative)\
    using self_type = type;\
    BOOST_PP_SEQ_FOR_EACH(PP_SEQ_MEMBER_OPERATOR, representative, (==)(!=)(<)(<=)(>)(>=))

#define PP_EXPAND(...) __VA_ARGS__
#define MEMBERF(return_name, args) constexpr PP_EXPAND return_name (PP_EXPAND args) const noexcept

struct DateRecord {
    DateRecord() = default;
    DateRecord(const DateRecord &) = default;
    DateRecord &operator=(const DateRecord &) = default;

    constexpr operator int() const noexcept {
        using u8 = unsigned char;
        return d_ + (u8(m_) << 8) + (y_ << 16);
    }

    PP_COMPARABLE_OPERATIONS(DateRecord, int)
protected:
    union {
        struct {
            unsigned char d_;
            Month m_;
            short y_;
        };
        int integer;
    };

    constexpr DateRecord(unsigned char d, Month m, short y) noexcept:
        d_(d), m_(m), y_(y)
    {}
};

struct Date;

struct DI: DateRecord {
    using DateRecord::DateRecord;

    constexpr DI year(short y) const noexcept { return DI(d_, m_, y); }
    constexpr DI month(Month m) const noexcept { return DI(d_, m, y_); }
    constexpr DI day(unsigned char d) const noexcept { return DI(d, m_, y_); }

    constexpr inline Date date() const noexcept;
};

struct Date: DateRecord {
    Date() = default;
    Date(const Date &) = default;

    constexpr Date(DI initializer) noexcept: DateRecord(initializer) {}
};

constexpr Date DI::date() const noexcept{
    return Date(*this);
}

constexpr Date Apollo11Landing = DI().month(JULY).day(22).year(1969);
constexpr auto UnixEpoch = DI().year(1970).month(JANUARY).day(1).date();

static_assert(Apollo11Landing < UnixEpoch, "No!");

}

// The following six macros are "framework" style macros that perform
// useful work in many situations
#define PP_SEQ_METACALL(r, macro, e) macro e
#define PP_SEQ_I_ENUM_METACALL(r, macro, i, e) BOOST_PP_COMMA_IF(i) macro e

#define PP_EXPAND(...) __VA_ARGS__
#define PP_EMPTY()
#define PP_DEFER(...) __VA_ARGS__ PP_EMPTY()
#define PP_DEFERRED_BOOST_PP_SEQ_FOR_EACH_I() BOOST_PP_SEQ_FOR_EACH_I

// Now, the macros specific to the implementation of this idiom
#define PP_IMPL_DECLARE_IDENTIFIER(identifier, type, name) type identifier

/// Used for initializer lists to define constructors
#define PP_IMPL_INITIALIZER(name, type, fullname) name(name)

/// Simply add a semicolon to the declaration
#define PP_IMPL_DECLARE_MEMBER(name, type, fullname) PP_IMPL_DECLARE_IDENTIFIER(name, type, fullname);

/// Implementation detail of PP_SEQ_I_PSEUDO_CONSTRUCTOR: fragment of the setter signature
#define PP_IMPL_PSEUDO_CONSTRUCTOR_FRAGMENT(name, type, fullname) fullname(type arg) const noexcept

/// Converts the given tuple into an argument copy or \c arg depending on
/// Whether the index is the same
#define PP_IMPL_SEQ_I_PSEUDO_COPY(r, index, i, e)\
    BOOST_PP_COMMA_IF(i) BOOST_PP_IIF(BOOST_PP_EQUAL(index, i), arg, BOOST_PP_TUPLE_ELEM(3, 0, e))

/// Copies all the members except the member with the given index, which will be \c arg
#define PP_IMPL_PSEUDO_COPY(members, index)\
    PP_DEFER(PP_DEFERRED_BOOST_PP_SEQ_FOR_EACH_I)()(PP_IMPL_SEQ_I_PSEUDO_COPY, index, members)

/// This is the setter that returns a copy modified
#define PP_IMPL_SEQ_I_PSEUDO_CONSTRUCTOR(r, name_members, i, e)\
    constexpr BOOST_PP_TUPLE_ELEM(2, 0, name_members) PP_IMPL_PSEUDO_CONSTRUCTOR_FRAGMENT e {\
        return BOOST_PP_TUPLE_ELEM(2, 0, name_members) (\
            PP_IMPL_PSEUDO_COPY(BOOST_PP_TUPLE_ELEM(2, 1, name_members), i)\
        );\
    }

// Now, macros intended to be used by the end programmer
#define PP_IMPL_DECLARE_MEMBERS(members)\
    BOOST_PP_SEQ_FOR_EACH(PP_SEQ_METACALL, PP_IMPL_DECLARE_MEMBER, members)

#define PP_VALUES_CONSTRUCTOR(name, members)\
    constexpr name(BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_ENUM_METACALL, PP_IMPL_DECLARE_IDENTIFIER, members)) noexcept:\
    BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_ENUM_METACALL, PP_IMPL_INITIALIZER, members) {}

#define PP_PSEUDO_CONSTRUCTORS(name, members)\
    PP_EXPAND(BOOST_PP_SEQ_FOR_EACH_I(PP_IMPL_SEQ_I_PSEUDO_CONSTRUCTOR, (name, members), members))

#define PP_CONFIGURATOR(name, members)\
    class name {\
        protected:\
        PP_IMPL_DECLARE_MEMBERS(members)\
        PP_VALUES_CONSTRUCTOR(name, members)\
        public:\
        name() = default;\
        name(const name &) = default;\
        PP_PSEUDO_CONSTRUCTORS(name, members)\
    }

#define MEMBERS ((d_, unsigned char, day))((m_, pp::Month, month))((y_, short, year))
PP_CONFIGURATOR(dateI, MEMBERS);

struct date: dateI {
    date() = default;
    date(const date &) = default;

    constexpr date(dateI d) noexcept: dateI(d) {}

    MEMBERF((operator int),()) {
        using u8 = unsigned char;
        return d_ + (u8(m_) << 8) + (y_ << 16);
    }

    PP_COMPARABLE_OPERATIONS(date, int)
};

constexpr date Apollo11Landing = dateI().month(pp::JULY).day(22).year(1969);
constexpr date UnixEpoch = dateI().year(1970).month(pp::JANUARY).day(1);

static_assert(Apollo11Landing < UnixEpoch, "No!");

constexpr dateI dt = dateI().month(pp::JULY).day(22).year(1969);

PP_CONFIGURATOR(
    fulltime,
    ((n_, unsigned, nanos))
    ((s_, unsigned short, seconds))
    ((m_, unsigned char, minutes))
    ((h_, unsigned char, hours))
    ((d_, unsigned char, day))
    ((mo, pp::Month, month))
    ((y_, short, year))
);

constexpr fulltime ft = fulltime().nanos(2134).hours(3).minutes(22).month(pp::FEBRUARY).year(2015).seconds(33).day(3);

fulltime pairoft[2];
static_assert(sizeof(pairoft) == 24, "");