Tuesday, December 29, 2015

The path to the innermost nature of things: metaprogramming

Metaprogramming

I once had the privilege of attending a presentation by Stephen Dewhurst on the subject of metaprogramming. It is from this event that I learned metaprogramming was not designed into C++ but discovered. That is, our predecessors trying to squeeze performance out of the language developed idioms that would have the compiler go beyond merely translating to object code, to also perform useful computation and insert the results to the object code, so that they would not have to be computed at run time; these techniques grew and became a new way to program; C++'s type system is so powerful that it is “turing complete”, that is, it can be used to program anything. This does not mean that it is practical, especially not that it is easy to do it.

Expressing relationships

There are times when meta programming is helpful, not just to bring to compilation time computation that otherwise would be done at run time, but computation that helps improve software quality. Most software elements have many relationships to other software elements, better programming languages allows the expression and maintenance of more and more abstract of those relationships. It is this interest, of improving software quality by expressing sophisticated relationships between software elements, what is my main interest in metaprogramming. I have introduced BOOST preprocessing in the last two articles illustrating techniques whose aim is to express relationships between software elements that can not be expressed within the pure language, I thought on reviewing those articles that I should have begun with relationships that can be expressed, hence I allow myself to postpone once again the description of how to use boost preprocessing.
By way of example, let us discuss the data structure of the singly linked list. Not caring about the operations, the data layout may be:
template<typename T> struct SListNode { T value; SListNode *link; };
struct SList { SListNode *head; unsigned size; };
However, this can be wasteful. In today’s architectures a pointer is typically 8 bytes long, if the element type T is as small as a single byte, because of very important alignment considerations, the size of the node will be 16 bytes. That’s an overhead of 16 to 1…

Type inferences

Can this be implemented better? Most architectures will have the same performance dereferencing a pointer as dereferencing a pointer plus an offset. That is, *pointer performs as well as *(pointer + offset). This is the case on the AMD64 architectures, and in Intel’s 32 bit and AMD64, there is another mode of addressing with the same performance, the “Scale Index Base”: *(pointer + index*element_size + offset) is as performing as *pointer provided the element_size is any of 1, 2, 4, or 8. Incidentally, in intel32/AMD64, the fastest way to multiply by a constant such as 5, (or 3 or 9) is to “load the effective address” of the operand register + 4*register, for example, lea (rax + 4*rax), rbx.
If we are not going to hold more than 4 giga elements, using a pointer to discriminate is wasteful, so it may be practical to create a memory arena for our lists, just a large array of bytes. Then, our node could be represented like this:
template<typename T> struct SListNode {
    static SListNode *arena; ///< a class-member common to all lists of this type T
    T value;
    unsigned arenaIndexOfNext;

    SListNode &next() { return arena[arenaIndexOfNext]; }
    unsigned currentIndex() const noexcept { return this - arena; }
};
If the total number of nodes will be less than 2^16, thus we won’t have more than 2^16 indices, and the size of the type T is 2 bytes, our current implementation of using an unsigned (typically 4 bytes) forces the node to be of 8 bytes when it really requires 4; where I am going with this line of reasoning is that the optimal type used for the index is dependent on the size of the type T and the maximum number of nodes to be supported. Conceivably, we can manually implement the structure for each possible combination:
  1. 1 byte element, up to 256 nodes: use a byte for index
  2. up to 2 byte elements and less than 2^16 indices: use a two-byte for the index
  3. up to 4 byte elements use a four-byte index
  4. more than 4 byte elements use a pointer

Generic programming

And this is in the case we want to keep the value and the link together, if we chose to have separate arenas for the values and the links represented as indices, we have the whole cartesian product of sizes: for the element type size < 2, < 4, < 8, >= 8 and the total number of elements, < 2^8, < 2^16, > 2^16 or 12 different cases that are essentially the same code, just substituting the appropriate types and constants. The language has templates, both with type parameters and non-type parameters to accomplish coding the mechanism (singly linked list through arenas and indices). In that sense it is generic, in which the implementation may not care about the details of the parameters.
To use the generic implementation the programmer/user still has to make a choice: what is the type of the index that best suits the number of possible indices and the type of the elements; as explained in the preceding list, there is a clear way to select the right index. Since both the element type and the maximum number of indices are not known when writing the code to select the right index type, in effect that code will reason about elements that will be given to the program at a later time, that is, they work with elements of the program itself, hence metaprogramming.
So, generic is when the specific types (or constants) don’t matter; meta is when the parameters to templates will be decided based off other types the programmer gives, of course that each are the other to some degree. Alexander Stepanov, the person with the largest share of the credit for the invention of Generic Programming said in this zero-waste interview that he “[believes] it is necessary to design a programming language from the ground up to enable generic programming in a consistent way”. It is safe to say that the higher one goes into generic abstraction, that is, into proper meta-programming territory, the inconsistencies become much harder to overcome.
It begs the question, is it worth at all fighting this accidental complexity on top of the already huge inherent complexity? I would say yes, it is, because the inherent complexity (being able to program in a generic way) is the ultimate way to make software, to make something generic, one needs to understand it to its core, thus the inherent complexity of generic programming is how to capture the innermost nature of what’s being programmed. With regards to the accidental complexity, I draw from my personal experience in that I have had to exercise myself fully into getting around the inconsistencies; I have used techniques of functional programming, of logic programming, of artificial intelligence, and mathematics, more mathematics and even more mathematics. I have reached a much higher understanding of what programming is all about thanks to my interest in trying to accomplishing genericity. Surprisingly, I have actually succeeded at getting great practical results after it all. Think about all of the frustrations that needed overcoming to program the STL so that it could be approved into the 1998 standard, but then look at the monumental accomplishment… just to name one example, C++, to my knowledge, is the only language that has a managed array type (std::vector) with essentially zero performance overhead over unmanaged arrays… and in the 17 years and counting since it has been part of the standard, all other languages still “don’t get it”, for example, Java, C# have “generics” which are but pathetic syntactic sugar.
That is, the fighting itself has been an excellent school.
I find the subject so fascinating that it motivates me to make community out of sharing our findings.
Back to the pedestrian dydactic example, here’s where meta programming is helpful:
  1. lets the programmer express the relationships between the element size and the index size (or whether to use pointers),
  2. lets the compiler choose the implementation case that best suits the size of the type and the maximum number of elements,
  3. relieves the programmer from the responsability to choose the index (and thus the chance for her/him to make a poor choice)
  4. keeps the option for further refinements
I trust the reader to be able to program the generic arena solution given the optimal type for the indices, and being able to specialize for the case in which it makes more sense to use the standard node structure. For dydactical purposes, this is what will give the adequate index type for the arena implementation in which the nodes are the composition of the element and the index. The template alias meta::Index that takes a template type parameter (what is going to be held in lists) and the maximum number of nodes for lists of that type; this is defined below:
namespace meta {
    template<unsigned> struct type_size { using type = unsigned long; };
    template<> struct type_size<4> { using type = unsigned; };
    template<> struct type_size<2> { using type = short unsigned; };
    template<> struct type_size<1> { using type = unsigned char; };

    constexpr unsigned powerOf(unsigned base, unsigned exponent) noexcept {
        return exponent ? base*powerOf(base, exponent - 1) : 1;
    }

    constexpr unsigned logCeiling(unsigned base, unsigned v) noexcept {
        return v <= 1 ? 0 : 1 + logCeiling(base, (v + base - 1)/base);
    }

    template<typename T> constexpr T max(T a, T b) { return a < b ? b : a; }

    template<typename ElementType, unsigned MaxIndex> using Index =
        typename type_size<
            powerOf(
                2,
                max(
                    logCeiling(2, sizeof(T)),
                    logCeiling(256, Max)
                )
            )
        >::type;
}
Just a few definitions and the compiler will be able to pick the right index type. Provided you have an implementation of a heap manager, you’d have reduced the space overhead of containing elements in lists to its optimum. Since indexing as opposed to direct dereferencing has no penalty, the net effect of generic programming plus metaprogramming are
  1. Generics make it so that two implementations (normal single linked list and arenas) reduce the chances for errors of doing things manually or the different varieties to drift apart
  2. The space overhead is minimal,
  3. the increase in space efficiency translates into net speedup due to more effective caching
  4. No other performance penalty
  5. Increased compilation time
In essence, metaprogramming has been used in this example to express the relationship between the optimal index type to the element type and the maximum number of nodes that will be supported. The benefit or the importance of being able to code this relationship is trivial, however, the relationships that can be expressed with the mechanisms of metaprogramming are of arbitrary complexity, and they lead to truly great benefits.

Wednesday, December 23, 2015

Lists of function call arguments

Another thing not expressible in C++ are lists of arguments to functions, but curiously enough, lists of arguments to templates are actually expressible in a limited way.

Whenever I have referred to inexpressible things, what I have meant is that one loses the help from the compiler to keep track of details it should be able to keep track of, one is forced to do it manually, even resorting to very dangerous copying and pasting.

I’ve just read in the e-zine “Overload” number 130 the QM Bite on boolean parameters. The author points out three issues with boolean parameters, for example’s sake, let’s say a function that takes three flags to construct a font: bold, italic, underlined.

Without any more information, can you tell what does this fragment of code does?

auto userFont = createFont(12, false, true, false, true);

What about this other fragment?:

auto userFont = createFont(12, FontOptions().italic().underlined());

In this article we will work towards implementing a building block that will let us transform lists of boolean arguments into a call as the one just portrayed.

The author of the QM Bite (Matthew Wilson) objects to boolean parameters. The three issues he mentions are:

  1. Since they all have the same time, it is easy to to try to set boldness but make a mistake and set italic instead (you’d have to remember the exact position of each parameter)
  2. It will be difficult to ever change the order of the arguments, change their number, etc.
  3. The arguments obfuscate the code, to see what a true as argument means, the reader needs to jump to the declaration of the argument, manually count how many positions.

I would add a fourth complaint: The quantity of arguments must be minimized, the fewer the number of arguments, the easier to understand, manage what a function does. It improves quality.

Since lists of function call arguments are not expressible, try to avoid passing individual parameters, instead, group them together into sets. As always, the members of the set must be highly cohesive. If you do this right, then you’d have an abstraction improvement that will accelerate your development.

Continuing with the example by Matthew Wilson, imagine you capture the flags from the user: Preserving the list of arguments, there is no recourse but to manually copy the values of each of the flags every time something needs propagating. This makes no sense, it is clear that bold, italic, underlined, strike-through, etc. are a set of configuration options. Implement them exactly like that.

I think Mr. Wilson’s advice is no good, “enumeralization” won’t help you any: You incurr the extra work to create synonims of the type bool (which is not easily supported in C++ either), and you continue to have the problems of remembering the exact order of the arguments, the same inflexibility as to their quantity, etc. All you have helped yourself with is for the compiler catching when you accidentally set, for example, bold when italics was intended, a dubious trade for the extra work.

Here’s a hopefully better solution, if we could get for free the following code, the implementation of the set of options:

struct FontOptions {
    enum Options: unsigned char {
        UNMODIFIED = 0,
        BOLD = 1,
        ITALIC = 2,
        UNDERLINED = 4,
        STRIKE_THROUGH = 8
    };

    unsigned char value = 0;

    FontOptions() = default;
    FontOptions(const FontOptions &) = default;

    FontOptions unmodified() const { return FontOptions(0); }
    FontOptions bold() const { return value | BOLD; }
    bool is_bold() const { return value & BOLD; }
    FontOptions not_bold() const { return value & ~BOLD; }
    FontOptions italic() const { return value | ITALIC; }
    bool is_italic() const { return value & ITALIC; }
    FontOptions not_italic() const { return value & ~ITALIC; }
    FontOptions underlined() const { return value | UNDERLINED; }
    bool is_underlined() const { return value & UNDERLINED; }
    FontOptions not_underlined() const { return value & ~UNDERLINED; }
    FontOptions strike_through() const { return value | STRIKE_THROUGH; }
    bool is_strike_through() const { return value & STRIKE_THROUGH; }
    FontOptions not_strike_through() const { return value & ~STRIKE_THROUGH; }

    // The usual equality, difference operators, perhaps even the relational
    // not implemented for brevity of exposition

private:
    FontOptions(unsigned v): value(v) {}
};

then creating a font could have this signature: Font createFont(unsigned size, FontOptions options). And it could be called like this:

auto myFont = createFont(12, FontOptions().italic().underlined());

As you can see, instead of suffering of problems mentioned, this example:

  • shows how easy it is to understand exactly what are the options set,
  • there is no need to remember the order,
  • introducing new options won’t even necessarily require recompilation, and in particular no code change,
  • and should you want to change the meaning of any of the options, a simple grep of the desired setter will tell you reliably all the call sites so that you change accordingly.

Very probably inside the Font implementation you’d want to have a set of options. You could use ‘FontOptions’ exactly as it is inside Font. You could communicate internally the whole set of options using FontOptions. It is useful.

Observe that the implementation of FontOptions is very repetitive. And incomplete, since the constructors, setters, etc, are not constexpr noexcept, there is no operator for comparison, etc; it is error prone too, at any moment one could have written ‘italic’ meaning ‘bold’. These things are best left to something automated. Unfortunately, we begin with a list of identifiers (bold, …) and as already explained in the previous article, there is no support in pure C++ for them; we have turned the inexpressible function call argument list into a list of identifiers, thus, as done in the previous article, the next best choice is to use preprocessing:

#include <boost/preprocessor/seq/for_each_i.hpp>
#include <boost/preprocessor/cat.hpp>

#define PP_SEQ_I_DECLARE_ENUMERATION_VALUE(r, dummy, i, e) e = (1 << i),
#define PP_SEQ_I_FLAG_SET_MEMBER_FUNCTIONS(r, name, i, e)\
    constexpr name e() const noexcept { return value | internal::e; }\
    constexpr name BOOST_PP_CAT(not_, e)() const noexcept { return value & ~internal::e; }\
    constexpr bool BOOST_PP_CAT(is_, e)() const noexcept { return value & internal::e; }

#define BOOLEAN_FLAG_SET(name, flags)\
    class name {\
        struct internal { enum: unsigned char {\
            EMPTY = 0,\
            BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_DECLARE_ENUMERATION_VALUE, ~, flags)\
        }; };\
        unsigned char value = 0;\
        constexpr name(unsigned v) noexcept: value(v) {}\
    public:\
        name() = default;\
        name(const name &) = default;\
        BOOST_PP_SEQ_FOR_EACH_I(PP_SEQ_I_FLAG_SET_MEMBER_FUNCTIONS, name, flags)\
        constexpr bool operator==(name other) const noexcept { return value == other.value; }\
        constexpr bool operator!=(name other) const noexcept { return !(*this == other); }\
    }

The use of the macro BOOLEAN_FLAG_SET is very easy, for example:

BOOLEAN_FLAG_SET(FO, (bold)(italic)(underlined)(strike_through));

This would declare and define a class FO that does the same things as FontOptions. I hope you’d agree that declaration and definition takes very little effort and prevents lots of errors. Also, we could improve this so that we also get inserter and extractor operators with minimal effort.

Before explaining the implementation, as a conclusion to this article, I advocate for the following:

  • Because there is no support for list of arguments, avoid them
  • Strive to minimize the number of arguments in your function calls
  • Grouping parameters into a single aggregate tends to capture useful semantics of the application domain that will speed up development and improve quality
  • Pure C++ offers limited support for expressing the groups of parameters identified, typically, the preprocessor may palliate this defficiency.

The same technique illustrated in this example can be generalized for arbitrary collections of parameters; for example, to build a date:

Date().month(FEBRUARY).year(2015).day(28)

That is, the same pattern can be used but the setters may take a value argument. Expressing the type of these arguments make the macro more complicated, hence its exposition will be postponed.

Also, there is no need to worry about returning by value and potentially doing wasteful copies of returned values, observe that the compiler has full visibility into what is going on, all the intermediate values are temporaries, and the “return value optimization” can be applied recursively from last to first (in this case from calling day back to Date()); hopefully this will be illustrated in the real world. Also notice that this pattern of acting on a value and not changing it but returning a modified copy is a tried and true technique in functional programming languages.