Versioning

One of the very nice features of Go is the package system. This permits compilation to be much faster than C++. In C++, if library A depends on library B depends on library C, that generally means that header files in A include header files in B and header files in B include header files in C. That means that when compiling A, the compiler has to parse C’s header files. In a large program, each compilation winds up parsing a lot of lines of code. In real programs this very quickly adds up to hundreds of thousands of lines of code being parsed for each input file. To make matters worse, parsing happens to be the very slowest part of a non-optimizing C++ compilation, taking up nearly 50% of overall compilation time.

Go’s package system means that this does not happen. If package A imports package B but does not import package C, then when compiling A the compiler only needs to read B’s exported data, not C’s. It is possible for B’s exported data to include some of C’s types, in which case those types will be present in B’s exported data. In practice, though, this only happens for a limited subset of C’s types, and of course it does not include C’s functions, variables or constants at all. The effect is that the Go complier sees many many fewer lines of code during a compilation. Plus, of course, parsing Go is faster than parsing C++. This is the main reason why the Go compiler is so much faster than a C++ compiler.

Many languages have similar package systems, of course. It works very well for Go but it’s hardly a unique feature.

But what I want to talk about now is dynamic linking. An issue that always arises in dynamic linking is versioning. Your program expects a certain ABI. The shared object you link against provides a certain ABI. If those ABI’s don’t match, you are in trouble. C and C++ have no internal support for providing a consistent ABI. The developer of the shared library must arrange for that to happen through some other mechanism.

On GNU/Linux, this mechanism is symbol versioning. This works well for C code. When used carefully, it permits the shared library developer to provide backward compatibility for the library: a program compiled against version N of the shared library, and linked dynamically against version N+1, will run correctly. Symbol versioning is all done outside of the C language itself, using asm statements in the code and linker version scripts when creating the shared object. The programmer has to understand what changes affect the ABI and what do not. In general it requires great care to be used correctly to provide backward compatibility.

For C++ backward compatibility is much harder to implement. C++ programs are normally written using classes with methods. Some methods are inlined and some are not. A program compiled using a specific version of some header files will have some inlined methods and some not. Those inlined methods will imply a specific layout for the class. The effect is that almost any change to a class can change the ABI of a shared library. Providing backward compatibility means that you have to provide old versions of all the class methods which may not have been inlined. Of course that class may invoke other classes. In effect, for backward compatibility in the face of changes to class layout, the library has to have a copy of the old class as well as the new one, and the methods of the old class have to be given specific versions.

Unfortunately the current tools provide no good way to support this. Symbol versions are assigned based on the name of the function. In C code old copies of the function can be renamed in the source code, and then named back to the desired names, with specific symbol versions, via asm statements. Those asm statements refer to the mangled names of the functions, which of course in C is the same as the C names. In C++ the names are different. So the programmer who wants to provide backward compatibility has to rename the class and then write asm statements for all the mangled method names.

This is so hard to get right that, as far as I know, nobody even attempts it. This has real consequences. The type std::string in gcc’s standard C++ library is a reference-counted copy-on-write implementation of string. That was a fine implementation in the single-threaded world, but it is a poor choice in today’s multi-core multi-threaded world, because it means that the reference counts must be manipulated with relatively slow atomic instructions. The gcc library includes a more efficient non-copy-on-write implementation of string under the name __gnu_cxx::__vstring. It would be nice to be able to flip the default std::string to be __gnu_cxx::__vstring. But that would require using symbol versioning to provide std::string for all existing C++ programs.

One interesting possibility might be to add a class attribute which changes the name mangling used for that class. If that were done carefully, it might be possible to use that in header files to control the names used for all class methods. Then it would be possible to use a version script to give appropriate versions to all the methods of the two similar classes. This approach just occurred to me and I don’t know if it would handle all cases.

In Go the issues are different. In Go the external ABI of a package is provided in the export information. Any change to the export information may imply a change to the ABI (though the reverse is not true—a change to the export information need not imply a change to the ABI, such as adding a function). This has an advantage over C++: it is much easier to tell when the ABI changes. In particular, the export information generated by gccgo has a checksum. Although this is not currently implemented, the checksum could be stored by the compiler for all imported packages. Then the dynamic linker, or the Go startup code, could compare the checksums for all dynamically linked packages with the checksum stored at compile time. This would implement a simple and reliable test that the program had no ABI issues, something that in C/C++ is only possible if the shared library is developed by a skilled and careful programmer.

The equivalent of the C/C++ versioning scheme would be separate sets of export information for different versions of the shared library. There would have to be some day to indicate which functions/types/etc. were available in which versions, and the names they should have in those versions. I have not thought this through yet. But I’m fairly sure that Go’s package system can be used to make this aspect of dynamic linking more reliable and easier to use than C system, and much simpler than the C++ system.

3 Comments »

  1. Giovanni Bajo said,

    November 12, 2010 @ 10:41 am

    Qt (and KDE) achieves a stable ABI with full backward compatibility. If you compile a program against Qt 4.x, you can run it with any newer version in the Qt 4 serie (which spans 5-10 years of development of a team of around a hundred people, so it’s not like it works because no development is being done).

    They actually grew an internal deep knowledge of all tricks that need to be done to preserve a stable ABI on multiple platforms with multiple ABIs (eg: at one point they verified that they can add an implementation to a virtual method previously defined as pure, on all platforms they support).

    This page reports a little of things that you can and cannot do in C++ while trying to preserve a stable ABI (Itanium ABI in this case):
    http://techbase.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B

    It’s an interesting reading actually, because not everything is obvious and consequential, even if one has a fair understanding of the Itanium ABI.

  2. Giovanni Bajo said,

    November 12, 2010 @ 10:44 am

    For ABI checking, the waf build system has a handy builtin ABI checking feature:
    http://wiki.samba.org/index.php/Waf#ABI_Checking

  3. pfee said,

    November 23, 2010 @ 4:39 pm

    The ability to decorate scope code with visibility attributes means that developers need not deal with the complexity of linker map files.

    http://developers.sun.com/solaris/articles/symbol_scope.html
    http://gcc.gnu.org/wiki/Visibility

    This had brought visibility to the masses and made it practical for use with C++.

    For quite some time I’ve been thinking a way to decorate source code with symbol versioning information would also be very useful, particularly for C++ where mangling and RTTI would make mapfile maintenance quite difficult.

    It’s very encouraging to see you’ve had the same idea independently. Not to mention that you’d be much more capable than me in this area.

    I expect the best I could do would be to encourage you to think some more about C++ symbol versioning via source code attributes. I doubt if I’d be able to contribute to developing a solution myself, though I wouldn’t mind learning more about the topic.

    Thanks,
    Paul

RSS feed for comments on this post · TrackBack URI

You must be logged in to post a comment.