Go Linkage Names

All Go code lives in a package. Every Go source file starts with a package declaration which names the package that it lives in. A package name is a simple identifier; besides appearing in a package clause, package names are also used when referring to names imported from another package. That poses the problem of what to do when one program links together two different packages which use the same package name. We can’t expect the author of a large program to be aware of every package that the program uses. However, since Go compiles straight to object files, it’s natural to use the package name in the generated symbol names. How can we avoid multiple definition errors?

The gc compiler comes with its own Go specific linker. That linker now supports automatic symbol renaming at link time based on the name used to import the package. That name is presumed to be unique. This means that all imports of the same package must use the same name to import it; otherwise you might get multiple definitions of a global variable in the package. In the future there may be some need to adjust packages which are distributed without their source code, to ensure that they don’t accidentally alias locally compiled package names.

For the gccgo compiler I have so far avoided using a specific linker, or rather linker wrapper. For large programs gccgo now requires a new option, -fgo-prefix=PREFIX to be used when compiling a package. The PREFIX should be a string unique to that package; for example, in a typical installation, it could be the directory where the package is installed. This gives a unique name used in the compiled code. If the -fgo-prefix option is not used, everything will still work as long as there are not, in fact, two packages with the same name.

9 Comments »

  1. Brian Slesinsky said,

    January 27, 2010 @ 10:22 am

    This is pretty confusing in a way that reminds me of classloaders in Java. I thought that the name of a Go package was the name at the top of the file, but apparently it isn’t? What do you mean by “based on the name used to import the package?” Does it make a difference whether the import is a relative or absolute path? How do I know whether two import statements import the same package?

    It seems like it would be much clearer to do no renaming and simply disallow two definitions of the same package in a single program.

  2. Ian Lance Taylor said,

    January 27, 2010 @ 8:26 pm

    There are two names to consider. The first is the name of the package in the Go language. The second is the name of the file that is created. The name in the language is indeed the name given in the package clause which starts every Go file. The file name is conventionally the same, but it does not have to be.

    When you import a package, the import statement gives the name of the file which contains the package (it can also give a local alias, but I’ll disregard that here). The code doing the import can then refer to the imported symbols using the name of the package. That is, the import statement uses the file name, and the code uses the package name. Again, they are conventionally the same, but need not be.

    In the gc compiler, one should now use always use the same file name to import a package. That is, it does make a difference whether the import is a relative or an absolute path.

    It’s unfortunately not feasible to disallow two packages to have the same name in a large program. A package name in Go is simply an identifier. There aren’t enough good identifiers. When a large program combines packages written by different people, we can’t assume that there is no package name conflict.

    It’s true that this sort of problem can arise in C, but that is no reason to repeat the mistake in Go.

  3. Brian Slesinsky said,

    January 27, 2010 @ 10:36 pm

    Thanks for the explanation! I had never really noticed before that package names in Go cannot contain dots. (That is, for the benefit of anyone else reading this, there’s no “container.list” package; it’s really just “list”.)

    This is so astonishing from a Java programmer’s point of view that I’d never imagined it might be different. Even saying it’s “just an identifier” didn’t quite make it clear for some reason, so strong was my assumption that of course package names must be hierarchical. (I had assumed that when people talked about the list package, it was just an abbreviation of the full name.)

    It seems like a cleaner solution would be to just do package names the same way as Java. That way filenames appear only in makefiles, not in source code, and the meaning of a program only depends on the contents of source files included in the program, not where they’re located in the filesystem. (Of course, a compiler may require source code to be laid out in a certain way.) Also, giving each package a single, unique identifier that appears in every source file and import statement makes searching for all versions and usages of it much easier.

  4. Ian Lance Taylor said,

    January 28, 2010 @ 10:49 pm

    I am not at all a Java expert, but as far as I can tell when you import a Java package it adds all the names to the global level. That was explicitly not what Go wanted to do, because it effectively forces you to include the name of your module in the name of your functions and types. If you don’t do that, then you get name conflicts. In Go we want to avoid those extra names, so if you want to call your function New, you can go right ahead and do that without fear of conflict. Go uses the package name to disambiguate. So Go accepts PACKAGENAME.IDENTIFIER, and package names appear all over the place.

    Also, if package names are not simple identifiers, then parsing becomes harder. When we see A.B.C we’re not sure what we’re looking at, because A might not be anything at all. There might only be A.B. So complex package names complicate and slow down the parser, which is not good. Worse, what should happen if you have a package A.B and a struct A? Does the existence of package A.B prevent you from having struct A?

    The meaning of a Go program does depend only on the contents of the source files, and does not depend on where they’re located in the file system. When using the gc compiler, it is important (at least for now) that you import the same package under the same name. But this is not a matter of where the package is in the file system. It’s a matter of the name used in the import statement. The gc compiler says that must use a consistent name in the import statement for every reference to a single package. As far as I know Java has the same requirement, although it may not look that way because it falls out naturally from how Java looks up files.

  5. Brian Slesinsky said,

    January 29, 2010 @ 9:44 pm

    In Java, some classes are used in a way that’s similar to packages in Go. For example, the java.util.Arrays class is not really used as a class at all; it just has a bunch of static methods (basically functions) that are useful with arrays, so you can say:

    import java.util.Arrays;

    Arrays.sort(anArray);

    So Go-style naming mostly works fine in Java though it’s not used as much as it should be. The main issue is that you occasionally get naming conflicts for class names (not packages or method names) because there’s no way to locally rename a class. In that case, Java forces you to write out the full package name for one of the classes you’d like to use, and people don’t like that so they use longer names for classes. But this shouldn’t be an issue for Go since it does have imports that do a rename.

    So I’m thinking that Go should have something like this: packages would have both a global name (“container.list”) and a local short name (“list”) where by default, the short name is the last part of the global name, but this can be overridden in an import statement. Global names appear only in package and import statements, so the rest of the language is unaffected. An import statement always takes a global name and binds a short name. So:

    package container.list

    func New() *List { return new(List).Init() }

    package main
    import container.list

    func main() {
    aList := list.New();

    }

    (Perhaps there could be quotes around the global package names, but I don’t think it’s necessary. For consistency, the name in the import statement should be the same as the name in the package statement, so I think if there are quotes they should be in both places.)

    The symbols that the compiler outputs would aways be based on the package’s global name, so in this case the linker would see a definition and usage of “container.list.New” in suitably mangled form (I don’t know that much about linker symbols.)

    It seems to me that this would be more transparent than generating global symbol names from filenames used in import statements, because symbols are not really files and corner cases like relative filenames are a trap for the unwary when they seem to refer to the same package. Does a Go compiler even need to look at the source code in the directory named by an import statement? If not, there’s no reason for it to look like a filesystem path.

  6. Brian Slesinsky said,

    January 29, 2010 @ 9:55 pm

    Hmm, on the other hand, if Go did consistently use quotes then perhaps packages could have more descriptive names? For example:

    package “Standard Go Lists” list;

    func New() *List { return new(List).Init() }

    package main
    import “Standard Go Lists” list;

    func main() {
    aList := list.New();

    }

  7. Ian Lance Taylor said,

    February 2, 2010 @ 6:25 am

    The Go compiler does not need to look at the source code, but it does need to look at the compiled package.

    As far as I can tell, in Java you also have to always import packages using the same name. It’s just that the name is forced so no confusion is possible. You seem to be suggesting that Go should adopt the Java rule, but to me the advantage—avoiding hypothetical confusion—seems minimal. Why would a single program use different import names for the same package? If we can fix the problem with no additional complexity, we should, but in practice there is always a tradeoff.

    Using descriptive names would require some mapping from the descriptive names to the actual compiled package file, which means updating the mapping when a package is compiled, which does not seem like a good idea. It’s true that the language doesn’t specify exactly how the compiler finds the compiled package given the name of the package, and that is intentional. There eventually ought to be a way to relatively easily import a package from some third party repository out on the net.

  8. Brian Slesinsky said,

    February 2, 2010 @ 10:42 am

    Actually, the mapping doesn’t need to be stored anywhere. The compiled package could be found via a package path and some kind of name mangling. For example, in Java, something like “com.example.foo.Bar” is converted to “com/example/foo/Bar.class”, which is then searched for in the classpath. We could invent a similar scheme for finding packages, whether using hierarchical or descriptive names.

    I’m not sure where you’re going with the idea of third-party repositories? It seems like the list of code repositories you use (whether local or remote) is a developer-specific flag that doesn’t belong in the source code and shouldn’t necessarily be checked in. Developers should be able to change it by setting a flag in the build system, not by editing the imports in all their source files. (Java handles this by changing the classpath.)

    On the other hand, the global unique id of a package is something that should be independent of any build system or linker, so it makes sense for it to be in the source code and checked in, so that it’s available to editors and source code indexers. Having a -fgo-prefix compiler option puts the unique id of a package in the build system which makes it harder to get at it from tools that shouldn’t need to know about specific build systems.

    An analogy to relational databases: the primary key and foreign keys in a database are part of the data, which is implementation-independent. The way that the implementation uses a key to look up a record is implementation-specific. Package names are essentially like keys, and it should be possible to figure out which foreign keys (imports) refer to a package (record) without any implementation-specific information.

  9. Ian Lance Taylor said,

    February 2, 2010 @ 5:40 pm

    Requiring every package to have a globally unique ID is not a feature.

    You do have some good arguments; if you are interested in changing the language, I recommend that you bring this to the golang-nuts mailing list. Nothing is going to happen based on a discussion here.

RSS feed for comments on this post · TrackBack URI

You must be logged in to post a comment.