Tuesday, January 5, 2016

Achieving the ultimate date

If you stumbled upon this article from a search engine, sorry to disappoint, but this is not about dating, nor how to find your soul mate. It is about the puerile pursuit of the ultimate implementation for computer program values that hold dates.

The intellectual fertility of implementing the ultimate Date class


In the article lists of function call arguments we introduced an idiom to configure a class.

Part of the motivation for the technique is that lists of function call arguments are not expressible directly into C++, especially the very useful case of partial lists of parameters; thus, we wanted to explore for the special case of boolean parameters whether it was feasible to pack them into a C++ type. Fortunately, we saw that it was possible, without a loss of performance, and potentially a performance gain. As a matter of fact, now I recommend not using boolean parameters at all, but to group them into their own set of flags. To further make the idiom practical, I introduced supporting BOOST preprocessing macros to take care of the boilerplate.

In my outgoing remarks I mentioned in passing that the technique could be generalized to arbitrary lists of function call arguments, provided their types respected value semantics.
 
This is one aspect that has kept me busy researching. The technique works, however, implementing it, and its supporting BOOST preprocessing macros, I encountered obstacles of different natures:
  1. The compilers for reasons not known to me translate essentially the same code in different ways
  2. The language has a few rules that conspire against highly performing and easy to understand code
  3. I reached hard limitations of BOOST preprocessing
  4. Some compilers may fail to perform optimizations that GCC can.
I thought a class for dates was simple enough for illustration purposes and non-trivial enough that the idioms would be applicable.

Offline a friend pointed out the deficiency of the idiom presented of boolean flags of not being able to force a complete set of flags to initialize a class. That is, the expression of complete sets of flags, in the example we have been using, to construct a Font only if the FontOptions has all its flags set. This problem has a solution that I suspect is original.

Generalizing the conversion of boolean parameters into a consolidated aggregate type for any value-semantic argument list turned into a through and through research project, because the straightforward solution, which is expressible, still leads to very interesting aspects. In that regard, it seems the conceptualization I arrived at is at the leading edge of the capabilities of the language, compilers and supporting techniques. This makes for very interesting writing material.

In this article, I’ve decided to simply introduce the aspects, to be explained in depth in successive articles. Yet again, I am forced to postpone explaining BOOST preprocessing, actually, I am afraid this time I had to use BOOST preprocessing techniques I don’t understand too well, and even beyond that, there are techniques that I don’t look forward to explaining because they are too weird, but I'll try.

 

Complete flag sets


Let us start with recapitulating the solution for the sets of options to create a font:

#include <boost/preprocessor/seq/for_each_i.hpp>
#include <boost/preprocessor/cat.hpp>

#define PP_SEQ_I_DECLARE_ENUMERATION_VALUE(r, dummy, i, e) e = (1 << i),
#define PP_SEQ_I_FLAG_SET_MEMBER_FUNCTIONS(r, name, i, e)\
    constexpr name e() const noexcept { return value | internal::e; }\
    constexpr name BOOST_PP_CAT(not_, e)() const noexcept { return value & ~internal::e; }\
    constexpr bool BOOST_PP_CAT(is_, e)() const noexcept { return value & internal::e; }

#define BOOLEAN_FLAG_SET(name, flags)\
    class name {\
        struct internal { enum: unsigned char {\
            EMPTY = 0,\
            BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_DECLARE_ENUMERATION_VALUE, ~, flags)\
        }; };\
        unsigned char value = 0;\
        constexpr name(unsigned v) noexcept: value(v) {}\
    public:\
        name() = default;\
        name(const name &) = default;\
        BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_FLAG_SET_MEMBER_FUNCTIONS, name, flags)\
        constexpr bool operator==(name other) const noexcept { return value == other.value; }\
        constexpr bool operator!=(name other) const noexcept { return !(*this == other); }\
    }

We can have a class Font that requires a FontOptions to be constructed, like this:

BOOLEAN_FLAG_SET(FontOptions, (bold)(italic)(underlined)(strike_through));

class Font {
    public:

    Font() = default;
    Font(const Font &) = default;

    inline constexpr Font(FontOptions fo) noexcept;
};
 

Power set covariant return type template


This example suggest that the only way to make a new Font is by first having a FontOptions. However, there is nothing to prevent a Font being constructed with FontOptions not completely specified. This may be an important feature in other examples. Without further ado, I present the solution I came up with, which potentially is a new idiom that I will call “the power-set covariant return type template”, first, in its hand coded form:

namespace zoo {
    
template<unsigned FlagsSet> class FOI {
protected:
    enum Flags { BOLD = 1, ITALIC = 2, UNDERLINED = 4, STRIKE_THROUGH = 8 };
    unsigned char flagsSet = 0;

    explicit constexpr FOI(unsigned integer) noexcept: flagsSet(integer) {}

    constexpr unsigned flagVal(bool v, unsigned flag) const noexcept {
        return v ? (flagsSet | flag) : (flagsSet & ~flag);
    }

    template<unsigned> friend class FOI;

public:
    FOI() = default;
    FOI(const FOI &) = default;

    constexpr FOI<FlagsSet | BOLD> bold(bool v) const noexcept {
        return FOI<FlagsSet | BOLD>(flagVal(v, BOLD));
    }

    constexpr FOI<FlagsSet | ITALIC> italic(bool v) const noexcept {
        return FOI<FlagsSet | ITALIC>(flagVal(v, ITALIC));
    }

    constexpr FOI<FlagsSet | UNDERLINED> underlined(bool v) const noexcept {
        return FOI<FlagsSet | UNDERLINED>(flagVal(v, UNDERLINED));
    }

    constexpr FOI<FlagsSet | STRIKE_THROUGH> strike_through(bool v) const noexcept {
        return FOI<FlagsSet | STRIKE_THROUGH>(flagVal(v, STRIKE_THROUGH));
    }

    constexpr bool getBold() const noexcept { return flagsSet & BOLD; }
    constexpr bool getItalic() const noexcept { return flagsSet & ITALIC; }
    constexpr bool getUnderlined() const noexcept { return flagsSet & UNDERLINED; }
    constexpr bool getStrikeThrough() const noexcept { return flagsSet & STRIKE_THROUGH; }
};

}

template<unsigned fs> using FontOptions = zoo::FOI<fs>;

struct Font {
    bool b, i, u, s;
    Font(FontOptions<15> o): b(o.getBold()), i(o.getItalic()), u(o.getUnderlined()), s(o.getStrikeThrough()) {}
};

Font makeFont() {
    return  FontOptions<0>().bold(true).underlined(false).italic(true).strike_through(true);
}

Go ahead and try to build a Font from less than the result of initializing all the flags, the compiler will reject the code.

The technique before lets you:
  1. To not lose any performance (if in doubt, construct a test case and see the assembler)
  2. Full battery of compile-time reasoning about the values
  3. To set the flags by name
  4. Regardless of order
  5. To have partial sets of flags and communicate them all over
  6. The hability to enforce that all the flags are needed, simply require the signature with the number that represents all bits set.
  7. Performance gains from keeping the set consolidated
In summary, at the same time greater clarity and performance.

The name “power-set covariant return type” stems for the fact that the return type of the accessors in effect traverse the power set of flags. Remember, the power set of a set is the set of all of its subsets. That is, the return types typically begin with the empty set (no flag has been set), and progress adding any flag to the current set; the possible ways are all of the power set.

Now, let us add macros to not repeat the same boilerplate code:
 
#define PP_SEQ_I_DECLARE_ENUMERATION_VALUE(r, dummy, i, e) e = (1 << i),
#define PP_SEQ_I_POWER_SET_SET_FLAG_FUNCTION(r, name, i, e)\
    constexpr name<set | internal::e> e(bool v) const noexcept { return flagVal(v, internal::e); }\
    constexpr bool BOOST_PP_CAT(get_, e)() const noexcept { return value & internal::e; }

#define POWER_SET_COVARIANT_RETURN_TYPE_FLAG_SET(name, flags)\
    template<unsigned set> class name {\
        struct internal {\
            enum {\
                BOOST_PP_SEQ_FOR_EACH_I(\
                    PP_SEQ_I_DECLARE_ENUMERATION_VALUE,\
                    ~,\
                    flags\
                )\
            };\
        };\
        unsigned char value = 0;\
        constexpr name(unsigned v) noexcept: value(v) {}\
        constexpr unsigned flagVal(bool v, unsigned flag) const noexcept {\
           return v ? (value | flag) : (value & ~flag);\
        }\
        template<unsigned> friend struct name;\
    public:\
        name() = default;\
        name(const name &) = default;\
        BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_POWER_SET_SET_FLAG_FUNCTION, name, flags)\
    }

 

But it is just a generalization!


So far, the language, compiler and preprocessor are cooperating with the idioms. Let’s now generalize the idiom of transforming the non-expressible concept of boolean argument list (potentially partial) into a configuration set for arbitrary types.

First, the input can’t be just a list of identifiers, because the members will have arbitrary types. This generalization won’t be for the uninteresting case of the members having the same type. Since the list will be pairs of name and type, and because in preprocessing it does not make a difference whether they are pairs, triplets, or larger tuples, let us make them triplets: the name of the internal member variable, the type, and the public interface accessor.

The example I’ve chosen to illustrate the generalization is that of a date, expressed either as an integer or as an aggregate of day, month, year. You’ll see below I first exemplify the technique by hand-coding an example, and then, a fairly complicated set of macros that generate the same thing. Notice that to be fair, I am also using union to compare the assembler generated to the case in which an alternate representation makes sense.

Again, the code will be published without further explanation.

Investigating on date representations such as these I stumbled upon interesting phenomena, which is not related to preprocessing. Since I do not want to bore the audience with too many preprocessing articles in a row, the next article will be on that subject matter.

#include <boost/preprocessor/seq/for_each.hpp>
#include <boost/preprocessor/seq/for_each_i.hpp>
#include <boost/preprocessor/tuple/elem.hpp>
#include <boost/preprocessor/comparison/equal.hpp>
#include <boost/preprocessor/control/iif.hpp>
#include <boost/preprocessor/comma_if.hpp>

namespace pp {

enum Month: unsigned char {
    JANUARY, FEBRUARY, JULY
};

#define PP_SEQ_MEMBER_OPERATOR(r, representative, op)\
    constexpr bool operator op(const self_type &arg) const noexcept { return representative(*this) op representative(arg); }

#define PP_COMPARABLE_OPERATIONS(type, representative)\
    using self_type = type;\
    BOOST_PP_SEQ_FOR_EACH(PP_SEQ_MEMBER_OPERATOR, representative, (==)(!=)(<)(<=)(>)(>=))

#define PP_EXPAND(...) __VA_ARGS__
#define MEMBERF(return_name, args) constexpr PP_EXPAND return_name (PP_EXPAND args) const noexcept

struct DateRecord {
    DateRecord() = default;
    DateRecord(const DateRecord &) = default;
    DateRecord &operator=(const DateRecord &) = default;

    constexpr operator int() const noexcept {
        using u8 = unsigned char;
        return d_ + (u8(m_) << 8) + (y_ << 16);
    }

    PP_COMPARABLE_OPERATIONS(DateRecord, int)
protected:
    union {
        struct {
            unsigned char d_;
            Month m_;
            short y_;
        };
        int integer;
    };

    constexpr DateRecord(unsigned char d, Month m, short y) noexcept:
        d_(d), m_(m), y_(y)
    {}
};

struct Date;

struct DI: DateRecord {
    using DateRecord::DateRecord;

    constexpr DI year(short y) const noexcept { return DI(d_, m_, y); }
    constexpr DI month(Month m) const noexcept { return DI(d_, m, y_); }
    constexpr DI day(unsigned char d) const noexcept { return DI(d, m_, y_); }

    constexpr inline Date date() const noexcept;
};

struct Date: DateRecord {
    Date() = default;
    Date(const Date &) = default;

    constexpr Date(DI initializer) noexcept: DateRecord(initializer) {}
};

constexpr Date DI::date() const noexcept{
    return Date(*this);
}

constexpr Date Apollo11Landing = DI().month(JULY).day(22).year(1969);
constexpr auto UnixEpoch = DI().year(1970).month(JANUARY).day(1).date();

static_assert(Apollo11Landing < UnixEpoch, "No!");

}

// The following six macros are "framework" style macros that perform
// useful work in many situations
#define PP_SEQ_METACALL(r, macro, e) macro e
#define PP_SEQ_I_ENUM_METACALL(r, macro, i, e) BOOST_PP_COMMA_IF(i) macro e

#define PP_EXPAND(...) __VA_ARGS__
#define PP_EMPTY()
#define PP_DEFER(...) __VA_ARGS__ PP_EMPTY()
#define PP_DEFERRED_BOOST_PP_SEQ_FOR_EACH_I() BOOST_PP_SEQ_FOR_EACH_I

// Now, the macros specific to the implementation of this idiom
#define PP_IMPL_DECLARE_IDENTIFIER(identifier, type, name) type identifier

/// Used for initializer lists to define constructors
#define PP_IMPL_INITIALIZER(name, type, fullname) name(name)

/// Simply add a semicolon to the declaration
#define PP_IMPL_DECLARE_MEMBER(name, type, fullname) PP_IMPL_DECLARE_IDENTIFIER(name, type, fullname);

/// Implementation detail of PP_SEQ_I_PSEUDO_CONSTRUCTOR: fragment of the setter signature
#define PP_IMPL_PSEUDO_CONSTRUCTOR_FRAGMENT(name, type, fullname) fullname(type arg) const noexcept

/// Converts the given tuple into an argument copy or \c arg depending on
/// Whether the index is the same
#define PP_IMPL_SEQ_I_PSEUDO_COPY(r, index, i, e)\
    BOOST_PP_COMMA_IF(i) BOOST_PP_IIF(BOOST_PP_EQUAL(index, i), arg, BOOST_PP_TUPLE_ELEM(3, 0, e))

/// Copies all the members except the member with the given index, which will be \c arg
#define PP_IMPL_PSEUDO_COPY(members, index)\
    PP_DEFER(PP_DEFERRED_BOOST_PP_SEQ_FOR_EACH_I)()(PP_IMPL_SEQ_I_PSEUDO_COPY, index, members)

/// This is the setter that returns a copy modified
#define PP_IMPL_SEQ_I_PSEUDO_CONSTRUCTOR(r, name_members, i, e)\
    constexpr BOOST_PP_TUPLE_ELEM(2, 0, name_members) PP_IMPL_PSEUDO_CONSTRUCTOR_FRAGMENT e {\
        return BOOST_PP_TUPLE_ELEM(2, 0, name_members) (\
            PP_IMPL_PSEUDO_COPY(BOOST_PP_TUPLE_ELEM(2, 1, name_members), i)\
        );\
    }

// Now, macros intended to be used by the end programmer
#define PP_IMPL_DECLARE_MEMBERS(members)\
    BOOST_PP_SEQ_FOR_EACH(PP_SEQ_METACALL, PP_IMPL_DECLARE_MEMBER, members)

#define PP_VALUES_CONSTRUCTOR(name, members)\
    constexpr name(BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_ENUM_METACALL, PP_IMPL_DECLARE_IDENTIFIER, members)) noexcept:\
    BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_ENUM_METACALL, PP_IMPL_INITIALIZER, members) {}

#define PP_PSEUDO_CONSTRUCTORS(name, members)\
    PP_EXPAND(BOOST_PP_SEQ_FOR_EACH_I(PP_IMPL_SEQ_I_PSEUDO_CONSTRUCTOR, (name, members), members))

#define PP_CONFIGURATOR(name, members)\
    class name {\
        protected:\
        PP_IMPL_DECLARE_MEMBERS(members)\
        PP_VALUES_CONSTRUCTOR(name, members)\
        public:\
        name() = default;\
        name(const name &) = default;\
        PP_PSEUDO_CONSTRUCTORS(name, members)\
    }

#define MEMBERS ((d_, unsigned char, day))((m_, pp::Month, month))((y_, short, year))
PP_CONFIGURATOR(dateI, MEMBERS);

struct date: dateI {
    date() = default;
    date(const date &) = default;

    constexpr date(dateI d) noexcept: dateI(d) {}

    MEMBERF((operator int),()) {
        using u8 = unsigned char;
        return d_ + (u8(m_) << 8) + (y_ << 16);
    }

    PP_COMPARABLE_OPERATIONS(date, int)
};

constexpr date Apollo11Landing = dateI().month(pp::JULY).day(22).year(1969);
constexpr date UnixEpoch = dateI().year(1970).month(pp::JANUARY).day(1);

static_assert(Apollo11Landing < UnixEpoch, "No!");

constexpr dateI dt = dateI().month(pp::JULY).day(22).year(1969);

PP_CONFIGURATOR(
    fulltime,
    ((n_, unsigned, nanos))
    ((s_, unsigned short, seconds))
    ((m_, unsigned char, minutes))
    ((h_, unsigned char, hours))
    ((d_, unsigned char, day))
    ((mo, pp::Month, month))
    ((y_, short, year))
);

constexpr fulltime ft = fulltime().nanos(2134).hours(3).minutes(22).month(pp::FEBRUARY).year(2015).seconds(33).day(3);

fulltime pairoft[2];
static_assert(sizeof(pairoft) == 24, "");

No comments:

Post a Comment