Archive for Programming

Cargo Cult Programming

I recently encountered a nice example of cargo cult programming. In bug 10980 Robert Wohlrab helpfully built a large number of Debian packages with the gold linker and reported errors about unknown options. These were options supported by the GNU linker but not by gold. (I’ve now added all the options to gold).

Among the options that packages used were -g and -assert. The GNU linker accepts and ignores these options. It has never done anything with them. Why do people pass them to the linker? I can only assume that they were copied from some other linker invocation.

In today’s increasingly complex world of programming, when so much code involves integrating libraries in various ways, I expect that cargo cult programming is on the rise.

Comments (1)

Generics

The goal of generics in a programming language is to save programmer time. Generics permit you to write code that works with multiple types only once. They also permit one programmer to write generic algorithms which you can use with your program’s types. This kind of thing is attractive to anybody who has written a lot of C code, where you find yourself writing the same linked list and hash table code over and over again. The core code will use void * pointers but for everything else you need type casts. Those type casts are mechanical and tedious.

I would say that the most significant advantage of C++ over C is the C++ mechanism for generic programming, which is templates. C++ has a powerful template system which is somewhat like a macro system built into the language proper. It’s not really a macro system—in a true macro system the code would be parsed only at template instantiation time, but in fact C++ code is parsed at template definition time—but I’ve discovered that people who are not deeply familiar with the language think of it that way, and it is an effective approach. C++ pays for this power in compilation time, as each template must be separately compiled for each type for which it is instantiated. It also pays in language complexity, which is most visible to users in the widely noted incomprehensible error messages. For a while the new C++ standard introduced the idea of concepts which would at least improve the error messages, but concepts themselves turned out to be very complex and were dropped from the standardization process for now.

Other languages have other approaches to generics. I know these languages less well, so I hope I don’t make any mistakes. Java restricts generics to types which are inherited from Object, which permits the templates to be instantiated a single time; a cost of Java’s approach is that the actual types are lost (this is known as type erasure). C# instantiates generics at runtime using a JIT; all types which are representable as pointers can share a single instantiation. ML has two different generics implementations: simple polymorphism which can be used for, e.g., type-safe containers, and a powerful system in which you can write a transformation which takes a set of types, variables, and functions and produces a different set; the latter system is pretty much as powerful as C++ templates but does not need to be compiled each time (I think) and produces comprehensible error messages. LISP has an extremely powerful macro system which simply rewrites the program, something which is relatively easy to do in LISP due to the simple syntax; this is so powerful that it lets you add new syntactic constructs to your program.

I’ve been thinking about generics because of Go. Go’s goals include fast compilation and efficient execution. Those goals are in conflict when it comes to generics. Go is not interpreted, and it is not the case that every type looks like a pointer. Therefore efficient execution suggests separate compilation for each instantiation, which of course slows down compilation time. ML’s approach is quite powerful, but it may be complex to use in simple cases, and it’s hard to understand how to implement it for Go. Relying on every type matching the empty interface doesn’t help when somebody wants to write a generic function for chan T, since chan int is not the same as chan interface{}.

Implementing generics in Go looks like a struggle among the goals of fast compilation time (i.e., avoid multiple compilation of generics), easy programming (i.e., have some form of generics), and fast execution (i.e., different types are represented differently). All languages have those goals in different degrees, and all languages seem to compromise on one of them when it comes to implementing generics. The question for Go is whether we can find an appropriate compromise.

Comments (18)

A Gcc Frontend

When writing the gccgo frontend I had to figure out how to write a new gcc frontend. This is a largely undocumented procedure. Unfortunately, I did not take notes as I went along. However, here are some retrospective comments.

Every gcc frontend needs a set of language hooks. This is done by including "langhooks-def.h" and writing struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;. The specific language hooks are defined via various LANG_HOOKS_xxx macros.

Some language hooks are required even though for a language like Go there is nothing for them to do: LANG_HOOKS_GLOBAL_BINDINGS_P, LANG_HOOKS_PUSHDECL, LANG_HOOKS_GETDECLS. Also LANG_HOOKS_TYPE_FOR_MODE and LANG_HOOKS_TYPE_FOR_SIZE must be defined and must do something reasonable. The LANG_HOOKS_INIT function must call some functions: build_common_tree_nodes, set_sizetype, build_common_tree_nodes_2, build_common_builtin_nodes. It must also set the global variable void_list_node. As far as I know all these steps are required and none of them are documented. The LANG_HOOKS_POST_OPTIONS hook must set flag_excess_precision_cmdline.

The main language hook is LANG_HOOKS_PARSE_FILE. It will find the input file names in the global variables in_fnames and num_in_fnames. At that point the frontend can take over and do the actual parsing and initial compilation.

After LANG_HOOKS_PARSE_FILE creates a complete parse tree (in a global variable) and returns, the rest of the work is done by LANG_HOOKS_WRITE_GLOBALS. The gccgo frontend generates GENERIC, although these days it could be modified to generate GIMPLE instead. This basically means creating appropriate DECL nodes for all the global types, variables, and functions. For a function, the frontend must create a cfun structure via push_struct_function or similar, and it must set current_function_decl. The frontend must not only create the FUNCTION_DECL, it must fill in the DECL_RESULT field with a RESULT_DECL. After creating the GENERIC or GIMPLE, it should call cgraph_finalize_function.

After all the functions have been finalized and the global variables created, the frontend must call cgraph_finalize_compilation_unit. That is where the middle-end really takes over and generates code. The frontend must finish by calling wrapup_global_declarations, check_global_declarations, and emit_debug_global_declarations.

The above is probably slightly inaccurate, and I’m sure I’ve left out some details. And, of course, the frontend interfaces changes with each new gcc release. However, if you are interested in writing a gcc frontend, I hope this will give you a start.

Comments (2)

Go New/Make

One of the aspects of Go that some people find confusing is the two predeclared functions, new and make. Both functions are specially implemented in the compiler. They take a type as an argument and return a value. This can make it seem confusing when you should use one and when you should use the other. In fact the functions are quite different.

The new function is the familiar one from C++ and other languages: it takes a type, allocates a new zero value of that type in the heap, and returns a pointer to the new value. A call new(T) returns a value *T (in Go the * in a pointer type appears before the type to which it points, not after). The important point here is that new creates a zero value of the given type.

The make function is different. It creates a non-zero value of the given type. The make function may only be used with a few specific types: slices, maps, and channels. The zero value for these types is simply nil. In the case of maps and channels, make is the only way to create a real, non-nil, value. In the case of a slice, make creates an array in the heap (as with new) and then returns a pointer to that array converted to a slice.

So you use new when you want to allocate space in the heap, e.g., for a linked list. You use make when you want to create a map or channel, or as a convenient shorthand for creating a slice.

Composite literals are also related to new, but not make. If you take the address of a composite literal, as in &(struct { i int}){1} then you get a new value on the heap. Each time you execute that code, you get a new one. So taking the address of a composite literal is similar to using new, except that you get a non-zero value. This serves as a convenient, frequently used, shorthand.

Comments

Go Interface Values

While interface values in Go are flexible, they do have confusing aspects.

An interface value—e.g., a variable of an interface type— holds a value of some other type. The interface type is known as the static type, since that is the type that the compiler sees at compile type. The other type, which is only visible at runtime, is known as the dynamic type. The dynamic type is, by definition, never an interface type; it can be any other sort of type, though.

When you copy an interface value, via assignment or a function call, you are making a copy of the value of the dynamic type. This is how most types work in general. However, a very common case of using an interface is to have the dynamic type be a pointer type. In that case, when you copy the interface, you copy the pointer, but you of course do not copy the value to which the pointer points. In many cases, the methods of a type will take a receiver of pointer type; this is required if the methods are going to change the value itself. When that happens for a method which the interface requires, then the interface actually requires that the dynamic type be a pointer type.

What this means is that although interfaces are technically always copied as a value, in actual use they often behave as though they are copied by reference. That is, although there is no explicit marker, interface objects often act like pointers. This can be confusing until you understand what is really going on.

In my last post I mentioned that for gccgo, an interface value always holds a pointer to the value stored in the interface. Now I’m going to correct that: if a program stores a pointer (or a value of the type unsafe.Pointer) in an interface, then the value stored in the interface is the pointer itself. That is, gccgo does not store a pointer to the pointer (which would require allocating heap space to hold the pointer). This is a natural efficiency hack, since in practice most interface objects hold pointers.

This efficiency hack carries through to the implementation of methods. Methods are implemented to always accept a pointer as the receiver parameter. If the receiver type of the method is actually not a pointer, then the pointer is implicitly dereferenced, and the value copied, at the start of the method’s code. This means that when calling a method on an interface, the value stored in the interface can be passed directly to the method, regardless of whether the dynamic type is a pointer or not. (As with the last post, the gc compiler does things somewhat differently.)

Comments (2)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »