Archive for Programming

Go Interface Values

While interface values in Go are flexible, they do have confusing aspects.

An interface value—e.g., a variable of an interface type— holds a value of some other type. The interface type is known as the static type, since that is the type that the compiler sees at compile type. The other type, which is only visible at runtime, is known as the dynamic type. The dynamic type is, by definition, never an interface type; it can be any other sort of type, though.

When you copy an interface value, via assignment or a function call, you are making a copy of the value of the dynamic type. This is how most types work in general. However, a very common case of using an interface is to have the dynamic type be a pointer type. In that case, when you copy the interface, you copy the pointer, but you of course do not copy the value to which the pointer points. In many cases, the methods of a type will take a receiver of pointer type; this is required if the methods are going to change the value itself. When that happens for a method which the interface requires, then the interface actually requires that the dynamic type be a pointer type.

What this means is that although interfaces are technically always copied as a value, in actual use they often behave as though they are copied by reference. That is, although there is no explicit marker, interface objects often act like pointers. This can be confusing until you understand what is really going on.

In my last post I mentioned that for gccgo, an interface value always holds a pointer to the value stored in the interface. Now I’m going to correct that: if a program stores a pointer (or a value of the type unsafe.Pointer) in an interface, then the value stored in the interface is the pointer itself. That is, gccgo does not store a pointer to the pointer (which would require allocating heap space to hold the pointer). This is a natural efficiency hack, since in practice most interface objects hold pointers.

This efficiency hack carries through to the implementation of methods. Methods are implemented to always accept a pointer as the receiver parameter. If the receiver type of the method is actually not a pointer, then the pointer is implicitly dereferenced, and the value copied, at the start of the method’s code. This means that when calling a method on an interface, the value stored in the interface can be passed directly to the method, regardless of whether the dynamic type is a pointer or not. (As with the last post, the gc compiler does things somewhat differently.)

Comments (2)

Go Interfaces

One of the interesting aspects of the Go language is interface objects. In Go, the word interface is overloaded to mean several different things.

Every type has an interface, which is the set of methods defined for that type. This bit of code defines a struct type S with one field, and defines two methods for S.

type S struct { i int }
func (p *S) Get() int { return p.i }
func (p *S) Put(v int) { p.i = v }

You can also define an interface type, which is simply a set of methods. This defines an interface I with two methods:

type I interface {
  Get() int;
  Put(int);
}

S is a valid implementation for I, because it defines the two methods which I requires. Note that this is true even though there is no explicit declaration that S implements I. A Go program can use this fact via yet another meaning of interface, which is an interface value:

func f(p I) { fmt.Println(p.Get()); p.Put(1) }

Here the variable p holds a value of interface type. Because S implements I, we can call f passing in a pointer to a value of type S:

var s S; f(&s)

The reason we need to take the address of S, rather than a value of type S, is because we defined the methods on S to operate on pointers. This is not a requirement—we could have defined the methods to take values—but then the Put method would not work as expected.

The fact that you do not need to declare whether a type implements an interface means that Go implements a form of duck typing. This is not pure duck typing, because when possible the Go compiler will statically check whether the type implements the interface. However, Go does have a purely dynamic aspect, in that you can convert from one interface type to another. In the general case, that conversion is checked at runtime. If the conversion is invalid—if the type of the value stored in the existing interface value does not satisfy the interface to which it is being converted—the program will fail with a runtime error.

For example, since every type satisfies the empty interface interface {}:

func g(i interface{}) int { return i.(I).Get() }
func h() {
  var s S;
  fmt.Println(g(&s));
  fmt.Println(g(s)); // will fail at runtime
}

The first call to g will work fine and will print 0. The second call will fail at runtime; when using gccgo, the program will print

panic: interface conversion failed: no 'Get' method

This is because, as discussed above, a value of type S rather than *S does not have any methods.

So, how does this work? I will describe the current gccgo implementation. The implementation used in the 6g/8g compiler is generally similar but is different in important respects.

For every type which is converted to an interface type, gccgo will build a type descriptor. Type descriptors are used by the type reflection support in the reflect package, and they are also used by the internal runtime support. Type descriptors are defined in the file libgo/runtime/go-type.h in the source code. The relevant part here is that for each type they define a (possibly empty) list of methods. For each method five pieces of information are stored: a hash code for the type of the method (i.e., the parameter and result types); the name of the method; for a method which is not exported, the name of the package in which it is defined; the type descriptor for the type of the method; a pointer to the function which implements the method.

In gccgo, an interface value is really a struct with three fields (libgo/runtime/interface.h): a pointer to the type descriptor for the type of the value currently stored in the interface value; a pointer to a table of functions implementing the methods for the current value; a pointer to the current value itself.

The table of functions is stored sorted by the name of the method, although the name does not appear in the table. Calling a method on an interface means calling a specific entry in this table, where the entry to call is known at compile time. Thus the table of functions is essentially the same as a C++ virtual function table, and calling a method on an interface requires the same series of steps as calling a C++ virtual function: load the address of the virtual table, load a specific entry in the table, and call it.

When a value is statically converted to an interface type, the gccgo compiler will build the table of methods required for that value and that interface type. This table is specific to the pair of types involved. gccgo will give this table comdat linkage, so that it is only built once for each pair of types in a program. Thus a static conversion to an interface simply requires loading the three fields of the interface struct with values known at compile time.

A dynamic conversion from one interface type to another is more complex. Of the three fields in an interface value, the type descriptor and the pointer to the real value can simply be copied to the new interface value. However, the table of methods must be built at runtime. This is done by looking at the list of methods defined in the value’s type descriptors and the list of methods defined in the type descriptor for the interface type itself. Both lists are sorted by the name of the method. The runtime code (libgo/runtime/go-convert-interface.c) merges the two sorted lists to produce the method table. The merging is done using the name of the method and the type hash code. If the interface requires a method which the type does not provide, the conversion fails.

Interfaces in Go are similar to ideas in several other programming languages: pure abstract virtual base classes in C++; typeclasses in Haskell; duck typing in Python; etc. That said, I’m not aware of any other language which combines interface values, static type checking, dynamic runtime conversion, and no requirement for explicitly declaring that a type satisfies an interface. The result in Go is powerful, flexible, efficient, and easy to write.

Comments (3)

Go Channels

Since the first thing I liked about Go is channels, I’ll talk some more about them.

IA channel can transmit a value of a specific type. That is, a channel value of type chan T can transmit any value of type T. This is not much of a restriction, as the type interface {} can be used to match any value at all.

There are two different kinds of channels: buffered and unbuffered. Unbuffered ones are the default. An unbuffered channel provides synchronous transmission of a value from a sender to a receiver. This means that if you try to send a value on the channel, and nothing is receiving from the channel, then you will block until there is a receiver. Similarly, if you try to receive and nothing is sending, you will block until there is a sender (which is pretty obvious). In other words, an unbuffered channel serves as a synchronization point between two goroutines. This is the key point of the Go memory model, which more or less amounts to saying that if you receive a value on a channel, then you will see all the writes made by the sender of the channel.

Buffered channels are more straightforward: if you write a value to the channel, it simply goes into the buffer. If there isn’t room, you want until there is room.

Implementing the above is fairly straightforward. However, there is another aspect to channels in Go: the select statement. The select statement permits a program to wait until some operation on a channel can proceed. It can list an arbitrary number of channels to wait for, and for each channel it can indicate whether it wants to wait to send a value or to receive a value. Implementing this efficiently is at the heart of the goroutine scheduling mechanism. For example, one interesting case is when one goroutine is selecting to write to an unbuffered channel while another goroutine is selecting to read from the channel. The scheduler must get both goroutines to synch up and transmit the value.

In gccgo I currently use a thread for each goroutine, although this will have to change at some point. In order to implement select, each unbuffered channel has a linked list of select statements that are waiting for the channel to be available. When a new select statement comes along such that a send and a receive are linked, and the new select statements picks that operation to run, it has to wake up the other select such that it is forced to use the new channel. This can fail due to race conditions if there is a third select statement involved, in which case the first select has to restart. This was all a huge headache to get working, and I completely rewrote all the select code three times.

Comments

Go

I heard about the Go programming language in May 2008 and was immediately interested. I spent a couple of weeks writing a preliminary gcc frontend for Go. I’ve been working on it ever since, with just a few interruptions here and there. It’s a relief that this work is finally out there.

Both the language and the frontend are still explicitly experimental. So I was quite surprised by how much reaction it got. I was expecting just a few people to actually try it. Instead, I’ve basically spent all my time since Tuesday responding to comments on IRC and on the public mailing list.

Anyhow, what attracted me to the language is something I’ve written about in the past, namely multicore programming. Go provides cheap threads built into the language (they’re called goroutines) and it provides cheap communication between the threads, via channels. I won’t describe them in detail in this post, but I think that making these the easy mechanisms for multi-core, as opposed to making locks easy, greatly encourages correctness.

Go has a simple philosophy for multicore, which is “Do not communicate by sharing memory; instead, share memory by communicating.” What this means is that when you want two threads to share some data structure, you should do it by using explicit communication between the threads handing off ownership. You should not do it by having them both modify shared memory. Of course this only works if they have some way to communicate, which is what Go provides via channels. The point is that communication can be used to make it clear who is free to modify the data structure, and strict adherence to this approach eliminates race conditions.

It’s far too early to tell whether Go will catch on, but I certainly hope that even if it doesn’t, the basic ideas in the language will become part of programming’s standard tools for multicore.

There is of course more to the language than multicore, and I plan to write a few more notes about it as a I find time between mailing list replies.

Comments (7)

Compiler to Assembler

The gcc compiler has always worked by writing out assembly code in text format. The assembler reads this text file to produce an object file. Most compilers work this way, although there have been some exceptions such as the MIPS compiler used on Irix.

Clearly this process of producing text and then parsing it again is inefficient. When the GNU assembler was first written, it read a text format that was very precisely specified. Spacing had to be exact and generally omitted, no comments were permitted, etc. A special directive, #APP, was used to tell the assembler to go back to a normal parsing mode, which was implemented using an input scrubber which converted more free-form text into the precise form. The idea was that the compiler would generate this precise specification, reducing the parsing costs in the assembler, while still permitting users to write assembler code by hand for use in asm statements.

This idea still exists in the assembler, but it has been lost to some extent. By default, today, the assembler will accept input with arbitrary spacing, comments, etc. This can be disabled using the -f option. The assembler used to also accept a #NO_APP directive to go back to precise mode, but that is now only effective if it appears in the first line of the file. OgccGNU/Linux, the compiler does neither of these by default. Oddly, it does generate #NO_APP after an asm statement, where the assembler ignores it. The assembler is also now built in a mode which does permit some whitespace even in precise mode that was, for a while, not permitted.

What this tells me is that nobody cares very much about how long it takes the assembler to parse text. This is not unreasonable, since the assembler is in fact quite fast, and is not a major part of overall compilation time. Still, time spent parsing in the assembler is time lost.

In 1997 David Henkel-Wallace at Cygnus proposed converting the GNU assembler into a library which would be invoked directly by the compiler. That plan was never implemented in gcc. In today’s world it no longer makes much sense. Running the compiler and assembler as separate processes takes better advantage of today’s multicore machines. It’s hard to make a compiler multi-threaded; let’s not take away the one limited form of multi-threading that we already have.

What does make sense is using a structured data format, rather than text, to communicate between the compiler and the assembler. In gcc’s terms, the compiler should generate insn patterns with associated operands. The assembler should piece those together into its own internal data structures. In fact, of course, ideally gcc and the assembler would use the same internal data structure. It would be interesting to design such a structure so that it could be transmitted in a file, or over a pipe, or in shared memory.

I think the time savings would happen less on the assembler side than on the gcc side: gcc would no longer have to format the output.

I think this would have the potential to cut compilation time by 5 to 10 percent. Not a big savings for the effort required, which is why nobody has done it. Compilation time is less important these days due to the use of big compilation clusters, but programs are also getting bigger, so it is not wholly unimportant.

Comments (3)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »