Archive for December, 2009

A Gcc Frontend

When writing the gccgo frontend I had to figure out how to write a new gcc frontend. This is a largely undocumented procedure. Unfortunately, I did not take notes as I went along. However, here are some retrospective comments.

Every gcc frontend needs a set of language hooks. This is done by including "langhooks-def.h" and writing struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;. The specific language hooks are defined via various LANG_HOOKS_xxx macros.

Some language hooks are required even though for a language like Go there is nothing for them to do: LANG_HOOKS_GLOBAL_BINDINGS_P, LANG_HOOKS_PUSHDECL, LANG_HOOKS_GETDECLS. Also LANG_HOOKS_TYPE_FOR_MODE and LANG_HOOKS_TYPE_FOR_SIZE must be defined and must do something reasonable. The LANG_HOOKS_INIT function must call some functions: build_common_tree_nodes, set_sizetype, build_common_tree_nodes_2, build_common_builtin_nodes. It must also set the global variable void_list_node. As far as I know all these steps are required and none of them are documented. The LANG_HOOKS_POST_OPTIONS hook must set flag_excess_precision_cmdline.

The main language hook is LANG_HOOKS_PARSE_FILE. It will find the input file names in the global variables in_fnames and num_in_fnames. At that point the frontend can take over and do the actual parsing and initial compilation.

After LANG_HOOKS_PARSE_FILE creates a complete parse tree (in a global variable) and returns, the rest of the work is done by LANG_HOOKS_WRITE_GLOBALS. The gccgo frontend generates GENERIC, although these days it could be modified to generate GIMPLE instead. This basically means creating appropriate DECL nodes for all the global types, variables, and functions. For a function, the frontend must create a cfun structure via push_struct_function or similar, and it must set current_function_decl. The frontend must not only create the FUNCTION_DECL, it must fill in the DECL_RESULT field with a RESULT_DECL. After creating the GENERIC or GIMPLE, it should call cgraph_finalize_function.

After all the functions have been finalized and the global variables created, the frontend must call cgraph_finalize_compilation_unit. That is where the middle-end really takes over and generates code. The frontend must finish by calling wrapup_global_declarations, check_global_declarations, and emit_debug_global_declarations.

The above is probably slightly inaccurate, and I’m sure I’ve left out some details. And, of course, the frontend interfaces changes with each new gcc release. However, if you are interested in writing a gcc frontend, I hope this will give you a start.

Comments (2)

Go New/Make

One of the aspects of Go that some people find confusing is the two predeclared functions, new and make. Both functions are specially implemented in the compiler. They take a type as an argument and return a value. This can make it seem confusing when you should use one and when you should use the other. In fact the functions are quite different.

The new function is the familiar one from C++ and other languages: it takes a type, allocates a new zero value of that type in the heap, and returns a pointer to the new value. A call new(T) returns a value *T (in Go the * in a pointer type appears before the type to which it points, not after). The important point here is that new creates a zero value of the given type.

The make function is different. It creates a non-zero value of the given type. The make function may only be used with a few specific types: slices, maps, and channels. The zero value for these types is simply nil. In the case of maps and channels, make is the only way to create a real, non-nil, value. In the case of a slice, make creates an array in the heap (as with new) and then returns a pointer to that array converted to a slice.

So you use new when you want to allocate space in the heap, e.g., for a linked list. You use make when you want to create a map or channel, or as a convenient shorthand for creating a slice.

Composite literals are also related to new, but not make. If you take the address of a composite literal, as in &(struct { i int}){1} then you get a new value on the heap. Each time you execute that code, you get a new one. So taking the address of a composite literal is similar to using new, except that you get a non-zero value. This serves as a convenient, frequently used, shorthand.