<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Airs - Ian Lance Taylor &#187; Search Results  &#187;  linker</title>
	<atom:link href="http://www.airs.com/blog/?s=linker&#038;feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.airs.com/blog</link>
	<description>Ian Lance Taylor</description>
	<lastBuildDate>Tue, 31 Aug 2010 14:28:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Destructors</title>
		<link>http://www.airs.com/blog/archives/362</link>
		<comments>http://www.airs.com/blog/archives/362#comments</comments>
		<pubDate>Tue, 18 May 2010 13:30:17 +0000</pubDate>
		<dc:creator>Ian Lance Taylor</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.airs.com/blog/?p=362</guid>
		<description><![CDATA[The Go language does not have destructors. Instead, it has two more dynamic mechanisms. A defer statement may be used to run a function on function exit or when processing a panic. A finalizer may be used to run a function when the garbage collector finds that a block of memory has nothing pointing to [...]]]></description>
			<content:encoded><![CDATA[<p>The Go language does not have destructors.  Instead, it has two more dynamic mechanisms.  A <code>defer</code> statement may be used to run a function on function exit or when processing a <code>panic</code>.  A finalizer may be used to run a function when the garbage collector finds that a block of memory has nothing pointing to it and can be released.  Both approaches are dynamic, in that you have to executed the <code>defer</code> statement or call the <code>runtime.SetFinalizer</code> function.  They are have no lexical scoping; a single <code>defer</code> statement in a loop can cause its argument to be called many times on function exit.</p>
<p>These ideas are significantly different from destructors, which are associated with a type, and are executed when an object of that type goes out of lexical scope or is explicitly deleted.  Destructors are primarily used to release resources acquired by an object of the type.  This is a less important concept in a garbage collected language like Go.</p>
<p>The absence of destructors means that Go does not support the RAII pattern, in which an object is used to acquire a mutex or some other resource for the scope of a lexical block.  Implementing this in Go requires two statements: one to acquire the mutex, and a <code>defer</code> statement to release the mutex on function exit.  Because deferred functions are run on function exit, the mapping is not exact; you can not use this technique to acquire a lock in a loop.  In fact, acquiring a mutex in a loop and correctly releasing it when a panic occurs is rather difficult in Go; fortunately it is easy to handle correctly by moving the body of the loop to a separate function.  In any case, Go discourages this type of programming.  Mutexes are available in Go, but channels are the preferred mechanism for synchronization.</p>
<p>Are <code>defer</code> statements and finalizers sufficient replacement for destructors in a garbage collected language?  They are for me.  When I write C++ my destructors are almost entirely concerned with releasing memory.  In fact, in the gold linker I often deliberately omitted destructors, because many of the data structures live for the life the program; in such a case, destructors serve only to slow down program exit.  I would be interested to hear of a pattern of programming which relies on destructors for cases other than releasing memory or RAII.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.airs.com/blog/archives/362/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Thread Sanitizer</title>
		<link>http://www.airs.com/blog/archives/321</link>
		<comments>http://www.airs.com/blog/archives/321#comments</comments>
		<pubDate>Sat, 13 Feb 2010 02:36:51 +0000</pubDate>
		<dc:creator>Ian Lance Taylor</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.airs.com/blog/?p=321</guid>
		<description><![CDATA[I recently ran the gold linker under Thread Sanitizer. It&#8217;s a nice plugin for Valgrind which looks for race conditions in multi-threaded programs. To describe it briefly, it builds Happens-Before relationships based on mutex operations and warns when it notices a write and a read/write to the same memory location without a Happens-Before relationship. This [...]]]></description>
			<content:encoded><![CDATA[<p>I recently ran the gold linker under <a href="http://code.google.com/p/data-race-test/wiki/ThreadSanitizer">Thread Sanitizer</a>.  It&#8217;s a nice plugin for Valgrind which looks for race conditions in multi-threaded programs.  To describe it briefly, it builds Happens-Before relationships based on mutex operations and warns when it notices a write and a read/write to the same memory location without a Happens-Before relationship.  This approach can yield false positives to be sure, but it does a very nice job of identifying real problems.</p>
<p>It was able to identify one real bug in gold, one problem that led to less efficient link time, and several cases where several threads would set a shared memory location to the same value.  The latter cases are not a problem on x86 architectures, though they could be on other processors.  In order to get clean Thread Sanitizer results in the future, I fixed all of the cases so that I could get a clean run of gold, at least with the default settings.</p>
<p>The real bug that it found was a typical multi-threaded bug: the code looked fine, but it had a well-hidden error.  Gold uses a workqueue of tasks to execute, with a pool of worker threads.  Many of the tasks are run using a blocker token.  The blocker token is set to the number of tasks that precede it.  As each task completes, it decrements the blocker count.  When the count goes to zero, the next set of tasks can start.  This is a simple way to parallelize linker operations, in which one set of operations (e.g., process the symbol tables) must be run before the next set (e.g., process the relocations) can begin.  Naturally I paid close attention to the blocker behaviour when a task completed, and there were no problems there.  The problem arose in setting the blocker count when the tasks started.  The code was doing a loop of &#8220;increment the blocker count&#8221; and then &#8220;queue the task.&#8221;  What I forgot was that the process of queuing the task actually lets another thread in the pool start working on it immediately.  When the task completed, it decremented the blocker count with a lock.  But if the task completed fast enough, the initial code was still running the loop queuing up new tasks, and thus incrementing the blocker count.  I didn&#8217;t think that I needed to lock the increment, since I wasn&#8217;t expecting any task to actually complete before I started all of the tasks.  A dumb mistake&mdash;just the kind of mistake one makes in multi-threaded programming.</p>
<p>Gold is written in C++.  In Go I would of course have each task communicate its completion on a channel.  The locking would be handled by the runtime, and there would be no chance for me to make the same sort of error.  If you write multi-threaded code, and you can&#8217;t use Go, you should definitely check out Thread Sanitizer.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.airs.com/blog/archives/321/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Go Linkage Names</title>
		<link>http://www.airs.com/blog/archives/309</link>
		<comments>http://www.airs.com/blog/archives/309#comments</comments>
		<pubDate>Wed, 27 Jan 2010 14:29:40 +0000</pubDate>
		<dc:creator>Ian Lance Taylor</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.airs.com/blog/?p=309</guid>
		<description><![CDATA[All Go code lives in a package. Every Go source file starts with a package declaration which names the package that it lives in. A package name is a simple identifier; besides appearing in a package clause, package names are also used when referring to names imported from another package. That poses the problem of [...]]]></description>
			<content:encoded><![CDATA[<p>All Go code lives in a package.  Every Go source file starts with a <code>package</code> declaration which names the package that it lives in.  A package name is a simple identifier; besides appearing in a <code>package</code> clause, package names are also used when referring to names imported from another package.  That poses the problem of what to do when one program links together two different packages which use the same package name.  We can&#8217;t expect the author of a large program to  be aware of every package that the program uses.  However, since Go compiles straight to object files, it&#8217;s natural to use the package name in the generated symbol names.  How can we avoid multiple definition errors?</p>
<p>The gc compiler comes with its own Go specific linker.  That linker now supports automatic symbol renaming at link time based on the name used to import the package.  That name is presumed to be unique.  This means that all imports of the same package must use the same name to import it; otherwise you might get multiple definitions of a global variable in the package.  In the future there may be some need to adjust packages which are distributed without their source code, to ensure that they don&#8217;t accidentally alias locally compiled package names.</p>
<p>For the gccgo compiler I have so far avoided using a specific linker, or rather linker wrapper.  For large programs gccgo now requires a new option, <code>-fgo-prefix=PREFIX</code> to be used when compiling a package.  The <code>PREFIX</code> should be a string unique to that package; for example, in a typical installation, it could be the directory where the package is installed.  This gives a unique name used in the compiled code.  If the <code>-fgo-prefix</code> option is not used, everything will still work as long as there are not, in fact, two packages with the same name.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.airs.com/blog/archives/309/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Protected Symbols</title>
		<link>http://www.airs.com/blog/archives/307</link>
		<comments>http://www.airs.com/blog/archives/307#comments</comments>
		<pubDate>Fri, 22 Jan 2010 05:41:52 +0000</pubDate>
		<dc:creator>Ian Lance Taylor</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.airs.com/blog/?p=307</guid>
		<description><![CDATA[Now for something really controversial: what&#8217;s wrong with protected symbols? In an ELF shared library, an ordinary global symbol may be overridden if a symbol of the same name is defined in the executable or in a shared library which appears earlier in the runtime search path. This is called symbol interposition. It is often [...]]]></description>
			<content:encoded><![CDATA[<p>Now for something really controversial: what&#8217;s wrong with protected symbols?</p>
<p>In an ELF shared library, an ordinary global symbol may be overridden if a symbol of the same name is defined in the executable or in a shared library which appears earlier in the runtime search path.  This is called symbol interposition.  It is often used with functions such as <code>malloc</code>.  A shared library can define <code>malloc</code> and it can have code which calls <code>malloc</code>.  If the executable linked with the shared library defines <code>malloc</code> itself, then the version in the executable will be used rather than the version in the shared library.  This permits the executable to control the memory allocation done by the shared library, perhaps for debugging or logging purposes.  In this regard, shared libraries act much as static archives do.</p>
<p>This has a few consequences.  One of them is that within a shared library, all references to a global symbol must use the GOT and PLT, to make the overriding possible.  That means that all function calls and variable accesses are slightly slower.  Also, some compiler optimizations are forbidden: the compiler can not inline a call to a global symbol, since that symbol might be overridden at run time.</p>
<p>When building a shared library, you can provide a version script which indicates that some symbols are actually not global.  That can eliminate the GOT and PLT accesses, but it does not permit the compiler optimizations, and you do have to write that version script and keep it up to date.</p>
<p>When compiling code that goes into a shared library, you can set the visibility of symbols.  You can use hidden visibility, which means that the symbol is not visible outside the shared library.  You can use internal visibility, which is a lot like hidden&mdash;I&#8217;ll skip the difference here.  Or you can use protected visibility.  Protected visibility means that the symbol is visible outside of the shared library, and can be accessed as usual.  However, all references from within the shared library will use the definition in the shared library.  In other words, the symbol acts more or less as usual, but it can not be overridden.  This means that accesses to the symbol avoid the GOT and PLT, and it permits compiler optimizations.</p>
<p>So, what&#8217;s wrong with them?  It turns out that protected symbols are slower at dynamic link time, which means that programs which use the shared library start up slower.  This happens because of the C rule that two pointers to the same function must compare as equal.  Since protected symbols are globally visible, you can get a pointer to a protected function in the main executable.  You can also get a pointer to that same function in the shared library, of course.  Those pointers have to be equal, or the C rule will break.</p>
<p>As noted, the access to the function in the shared library will not use the GOT or PLT.  The access in the main executable obviously will use the PLT.  How can we make those function pointers equal?  We can&#8217;t.  The executable will have a direct reference to the PLT.  The shared library will have a direct reference to the function itself.  In neither case will there be a relocation for the reference.  So there is no way to make the results equal.  (This can work for some targets, but not for ones with simple function references like the x86 targets.)</p>
<p>So, I must have lied.  The lie was that there is a case where you need to use the GOT for a protected symbol: when compiling position independent code for a shared library, and taking the address of a protected function, you need to use the GOT.  Unfortunately, gcc for the x86_64 target, surely the most widely used gcc target today, gets this wrong: <a href="http://gcc.gnu.org/PR19520">http://gcc.gnu.org/PR19520</a>.  This generally reveals itself as an error report when you go to create a shared library: <code>relocation R_X86_64_PC32 against protected symbol `NAME' can not be used when making a shared object</code>.</p>
<p>In any case, when the compiler gets it right, the dynamic linker has to fill in that GOT entry.  In order to make the function pointers compare as equal, it has to fill in the entry with the address of the PLT in the executable (or the earlier shared library).  But remember, this is a protected symbol, and protected symbols don&#8217;t support symbol interposition.  So the dynamic linker must only use the PLT of the executable if the reference in the executable refers to the definition in the shared library.  That means that when the dynamic linker sees a reloc against a protected symbol in a shared library, it has to do another walk through the executable and earlier shared libraries to see if any of them have a definition for the symbol, in which case the GOT entry must <i>not</i> be set to that earlier PLT entry but must instead be set to the address of the symbol in the shared library itself.  This check has to be done for every symbol in the shared library.</p>
<p>Those extra symbol resolution passes means a slow down for every program which uses the shared library, and that is what is wrong with protected symbols.</p>
<p>So how do you get the compiler and linker speedups available by avoiding symbol interpositioning?  Unfortunately, you have to give your symbols hidden visibility, which means that they can not be accessed from other modules.  Assuming you do want them to be accessed, you need to define symbol aliases for the ones which should be publicly visible.  That means that you need to use different names for the hidden symbols.  This is awkward at best.  Unfortunately I have nothing better to offer.  ELF is designed to support symbol interpositioning, and there is no very good way to avoid that without causing other consequences.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.airs.com/blog/archives/307/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Version Scripts</title>
		<link>http://www.airs.com/blog/archives/300</link>
		<comments>http://www.airs.com/blog/archives/300#comments</comments>
		<pubDate>Wed, 13 Jan 2010 01:20:34 +0000</pubDate>
		<dc:creator>Ian Lance Taylor</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.airs.com/blog/?p=300</guid>
		<description><![CDATA[I recently spent some time sorting through linker version script issues, so I&#8217;m going to document what I discovered. Linker symbol versioning was invented at Sun. The Solaris linker lets you use a version script when you create a shared library. This script assigns versions to specific named symbols, and defines a version hierarchy. When [...]]]></description>
			<content:encoded><![CDATA[<p>I recently spent some time sorting through linker version script issues, so I&#8217;m going to document what I discovered.</p>
<p>Linker symbol versioning was invented at Sun.  The Solaris linker lets you use a version script when you create a shared library.  This script assigns versions to specific named symbols, and  defines a version hierarchy.  When an executable is linked against the shared library, the versions that it uses are recorded in the executable.  If you later try to dynamically link the executable with a shared library which does not provide the required versions, you get a sensible error message.</p>
<p>Sun&#8217;s scheme (as I understand it) only permits you to add new versions and new symbols.  Once a symbol has been defined at a specific version, you can not change that in later releases.  if you change the behaviour of a symbol, you don&#8217;t change the version of the symbol itself, instead you add a new version to the library even if it does not define any symbols.  That is sufficient to ensure that an executable will not be dynamically linked against a version of the shared library which is too old.</p>
<p>Eric Youngdale and Ulrich Drepper introduced a more sophisticated symbol versioning scheme in the GNU linker and the GNU/Linux dynamic linker.  The GNU linker permits symbols to have multiple versions, of which only one is the default.  These versions are specified in the object files linked together to form the shared library.  The assembler <code>.symver</code> directive is used to assign a version to a symbol (the version is simply encoded in the name of the symbol).  This scheme permits using symbol versioning to actually change the behaviour of a symbol; older executables will continue to use the old version.  This also permits deleting symbols, by removing the default version.  The older versions of the symbol remain but are inaccessible.</p>
<p>That is all fine.  The problems come in with the extensions to the version script language.  First, the GNU linker permits wildcards in version scripts.  Second, the GNU linker permits symbols to match against demangled names, again typically using wildcards.  Third, the GNU linker permits the version script to hide symbols which have explicit versions in input object files.</p>
<p>Every symbol can only have one version.  When the linker asks for the version of a symbol, there can only be one answer.  The support for wildcards and matching of demangled names in the GNU linker script means that there may not be a unique answer for the version to use for a given name.  The fact that the GNU linker permits version scripts to hide symbols with explicit versions means that in some cases you absolutely must list a symbol two times in a version script (because you might have a <code>local: *;</code> entry which must not match your symbol with an old version).  This potential confusion means that using linker scripts correctly with wildcards requires a clear understanding of exactly how the linker parses a version script.</p>
<p>Unfortunately, this was never documented.  Until now.  Here are the rules which the GNU linker uses to parse version scripts, as of 2010-01-11.</p>
<p>The GNU linker walks through the version tags in the order in which they appear in the version script.  For each tag, it first walks through the global patterns for that tag, then the local patterns.  When looking at a single pattern, it first applies any language specific demangling as specified for the pattern, and then matches the resulting symbol name to the pattern.  If it finds an exact match for a literal pattern (a pattern enclosed in quotes or with no wildcard characters), then that is the match that it uses.  If finds a match with a wildcard pattern, then it saves it and continues searching.  Wildcard patterns that are exactly &#8220;*&#8221; are saved separately.</p>
<p>If no exact match with a literal pattern is ever found, then if a wildcard match with a global pattern was found it is used, otherwise if a wildcard match with a local pattern was found it is used.</p>
<p>This is the result:</p>
<ul>
<li>If there is an exact match, then we use the first tag in the version script where it matches.
<ul>
<li>If the exact match in that tag is global, it is used.</li>
<li>Otherwise the exact match in that tag is local, and is used.</li>
</ul>
</li>
<li>Otherwise, if there is any match with a global wildcard pattern:
<ul>
<li>If there is any match with a wildcard pattern which is not &#8220;*&#8221;, then we use the tag in which the <i>last</i> such pattern appears.
</li>
<li>Otherwise, we matched &#8220;*&#8221;.  If there is no match with a local wildcard pattern which is not &#8220;*&#8221;, then we use the <i>last</i> match with a global &#8220;*&#8221;.  Otherwise, continue.
</li>
</ul>
</li>
<li>Otherwise, if there is any match with a local wildcard pattern:
<ul>
<li>If there is any match with a wildcard pattern which is not &#8220;*&#8221;, then we use the tag in which the <i>last</i> such pattern appears.
</li>
<li>Otherwise, we matched &#8220;*&#8221;, and we use the tag in which the <i>last</i> such match occurred.
</li>
</ul>
</li>
</ul>
<p>As mentioned above, there is an additional wrinkle.  When the GNU linker finds a symbol with a version defined in an object file due to a <code>.symver</code> directive, it looks up that symbol name in that version tag.  If it finds it, it matches the symbol name against the patterns for that version.  If there is no match with a global pattern, but there is a match with a local pattern, then the GNU linker marks the symbol as local.</p>
<p>I want gold to be compatible, but I also want gold to be efficient.  I&#8217;ve introduced a hash table in gold to do fast lookups for exact matches.  That makes it impossible for gold to follow the exact rules when matching demangled names.  Currently gold does not do the final lookup to see if a symbol with an explicit version should be forced local; I don&#8217;t understand why that is useful.  It is possible that I will be forced to add that to gold at some later date.</p>
<p>Here are the current rules for gold:</p>
<ul>
<li>If there is an exact match for the mangled name, we use it.</p>
<ul>
<li>If there is more than one exact match, we give a warning, and we use the first tag in the script which matches.
</li>
<li>If a symbol has an exact match as both global and local for the same version tag, we give an error.
</li>
</ul>
</li>
<li>Otherwise, we look for an extern C++ or an extern Java exact match.  If we find an exact match, we use it.
<ul>
<li>If there is more than one exact match, we give a warning, and we use the first tag in the script which matches.
</li>
<li>If a symbol has an exact match as both global and local for the same version tag, we give an error.
</li>
</ul>
</li>
<li>Otherwise, we look through the wildcard patterns, ignoring &#8220;*&#8221; patterns.  We look through the version tags in reverse order.  For each version tag, we look through the global patterns and then the local patterns.  We use the first match we find (i.e., the <i>last</i> matching version tag in the file).
</li>
<li>Otherwise, we use the &#8220;*&#8221; pattern if there is one.  We give a warning if there are multiple &#8220;*&#8221; patterns.
</li>
</ul>
<p>I hope for your sake that this information never actually matters to you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.airs.com/blog/archives/300/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cargo Cult Programming</title>
		<link>http://www.airs.com/blog/archives/294</link>
		<comments>http://www.airs.com/blog/archives/294#comments</comments>
		<pubDate>Sat, 09 Jan 2010 05:35:29 +0000</pubDate>
		<dc:creator>Ian Lance Taylor</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.airs.com/blog/?p=294</guid>
		<description><![CDATA[I recently encountered a nice example of cargo cult programming. In bug 10980 Robert Wohlrab helpfully built a large number of Debian packages with the gold linker and reported errors about unknown options. These were options supported by the GNU linker but not by gold. (I&#8217;ve now added all the options to gold). Among the [...]]]></description>
			<content:encoded><![CDATA[<p>I recently encountered a nice example of cargo cult programming.  In <a href="http://sourceware.org/bugzilla/show_bug.cgi?id=10980">bug 10980</a> Robert Wohlrab helpfully built  a large number of Debian packages with the gold linker and reported errors about unknown options.  These were options supported by the GNU linker but not by gold.  (I&#8217;ve now added all the options to gold).</p>
<p>Among the options that packages used were <code>-g</code> and <code>-assert</code>.  The GNU linker accepts and ignores these options.  It has never done anything with them.  Why do people pass them to the linker?  I can only assume that they were copied from some other linker invocation.</p>
<p>In today&#8217;s increasingly complex world of programming, when so much code involves integrating libraries in various ways, I expect that cargo cult programming is on the rise.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.airs.com/blog/archives/294/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Back</title>
		<link>http://www.airs.com/blog/archives/266</link>
		<comments>http://www.airs.com/blog/archives/266#comments</comments>
		<pubDate>Fri, 06 Nov 2009 05:59:35 +0000</pubDate>
		<dc:creator>Ian Lance Taylor</dc:creator>
				<category><![CDATA[Random]]></category>

		<guid isPermaLink="false">http://www.airs.com/blog/?p=266</guid>
		<description><![CDATA[It&#8217;s been a year. I&#8217;m back. I don&#8217;t have much new to say, but I&#8217;ve been starting to write blog entries in my head, so I might as well write them here. I&#8217;ll be aiming for three posts a week for now. I&#8217;ll start with a few quick comments that came to mind as I [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been a year.  I&#8217;m back.  I don&#8217;t have much new to say, but I&#8217;ve been starting to write blog entries in my head, so I might as well write them here.  I&#8217;ll be aiming for three posts a week for now.</p>
<p>I&#8217;ll start with a few quick comments that came to mind as I skimmed my blog posts from November 2007 to November 2008.</p>
<ul>
<li>Obviously, Obama did win, and he did increase spending on infrastructure, and it does seem to have helped end the recession.  Open questions are how long unemployment will stay high (employment historically lags economic recovery) and whether there will be any structural reforms to prevent the same sort of thing from happening again in a few years.</li>
<li>We were able to trap the feral mother cat several months later and had her neutered as well.  No new feral kittens have been seen on our street for some time.</li>
<li>Everything continues to get more complicated.</li>
<li>Iraq is doing much better than I ever thought it would, and Moktada al-Sadr seems to have disappeared.  Was the U.S. right to invade?  Was it worth the cost?  I haven&#8217;t seen much discussion of these questions recently.</li>
<li>On the other hand lots of people are discussing whether it&#8217;s worth it for the U.S. to keep pushing that Sisyphean stone in Afghanistan.  How long should the U.S. continue?  What does success even look like?</li>
<li>What&#8217;s up with Israel and settlements?  Just stop building new ones, already.</li>
<li>The Watchmen movie.  It was better than I thought it would be.  I&#8217;m not sure it was actually good, but I did enjoy it.  The credit sequence was really interesting&#8211;in fact, I just saw the same idea in Zombieland.</li>
<li>The gcc in C++ work is now available as a configure option in gcc mainline.  I&#8217;m letting is rest before I make the next push toward making it the default.</li>
<li>The gold linker seems to be getting a fair amount of use, judging by the bug reports.  The most contentious issue is different handling of linking against shared libraries which themselves refer to other shared libraries not mentioned in the link.</li>
<li>Robert Zemeckis is making yet another movie using technology that fails the uncanny valley test.  I won&#8217;t be seeing it.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.airs.com/blog/archives/266/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Exception Destruction</title>
		<link>http://www.airs.com/blog/archives/257</link>
		<comments>http://www.airs.com/blog/archives/257#comments</comments>
		<pubDate>Thu, 16 Oct 2008 01:26:56 +0000</pubDate>
		<dc:creator>Ian Lance Taylor</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.airs.com/blog/archives/257</guid>
		<description><![CDATA[Languages that support exceptions need to support destructors or they need to support a try/finally construct. Otherwise using exceptions is too difficult, because if you have some local state to clean up in a function, you have to catch and rethrow every exception. The goal of exceptions in C++ is that code which does not [...]]]></description>
			<content:encoded><![CDATA[<p>Languages that support exceptions need to support destructors or they need to support a try/finally construct.  Otherwise using exceptions is too difficult, because if you have some local state to clean up in a function, you have to catch and rethrow every exception.</p>
<p>The goal of exceptions in C++ is that code which does not throw an exception should be just as efficient as code which is compiled without any support for exceptions.  Unfortunately, this is impossible.  When any function can throw an exception, and when there are destructors which must be run if an exception is thrown, the compiler is limited in its ability to move instructions across function calls.  Of course it is not generally possible to move instructions which change global or heap memory across a function call, but in the absence of exceptions it is generally possible to move instructions which do not change memory or which change only stack memory.  This means that exceptions limit what the compiler is able to do, and it follows that compiling with exception support generates code which is less efficient than compiling without exception support.</p>
<p>Of course exceptions still have their uses, but lets consider programming without them (this is easy for me to imagine&#8211;I didn&#8217;t use exceptions in the gold linker).  If you program without exceptions, how useful are destructors and/or try/finally?  What comes to mind is functions with multiple return points, loops with multiple exits, and RAII coding.</p>
<p>C has neither destructors nor try/finally.  Does it miss them?  I would say yes.  A common workaround I&#8217;ve seen is to change all return points and loop exit points to use a goto to a label which does cleanups.</p>
<p>The gcc compiler has an extension to C to support, in effect, destructors.  You can use <code>__attribute__ ((__cleanup__ (function)))</code>  with any local variable.  When the variable goes out of scope, the function will be called, passing it the address of the variable.  This is an effective extension, but it is not widely used.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.airs.com/blog/archives/257/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Combining Versions</title>
		<link>http://www.airs.com/blog/archives/220</link>
		<comments>http://www.airs.com/blog/archives/220#comments</comments>
		<pubDate>Fri, 18 Jul 2008 14:43:36 +0000</pubDate>
		<dc:creator>Ian Lance Taylor</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.airs.com/blog/archives/220</guid>
		<description><![CDATA[Sun introduced a symbol versioning scheme to use for the linker. Their implementation is relatively simple: symbol versions are defined in a version script provided when a shared library was created. The dynamic linker can verify that all required versions are present. This is useful for ensuring that an application can run with a specific [...]]]></description>
			<content:encoded><![CDATA[<p>Sun introduced a symbol versioning scheme to use for the linker.  Their implementation is relatively simple: symbol versions are defined  in a version script provided when a shared library was created.  The dynamic linker can verify that all required versions are present.  This is useful for ensuring that an application can run with a specific version of the library.</p>
<p>In the Sun versioning scheme, when a symbol is changed to have an incompatible interface, the library file name must change.  This then produces a new DT_SONAME entry, which leads to new DT_NEEDED entries, and thus manages incompatibility at that level.</p>
<p>Ulrich Drepper and Eric Youngdale introduced a much more sophisticated symbol versioning scheme, which is used by the glibc, the GNU linker, and gold.  The key differences are that versions may be specified in object files and that shared libraries may contain multiple independent versions of the same symbol.  Versions are specified in object files by naming the symbol NAME@VERSION or NAME@@VERSION.  In the former case the symbol is a hidden version, available only by specific request.  In the latter case the symbol is a default version, and references to NAME will be linked to NAME@@VERSION.  Versions may also be specified in version scripts.</p>
<p>This facility means that in principle it is never necessary to change the library file name.  The versioning scheme lets the dynamic linker direct each symbol reference to the appropriate version.  This in turn means that in a complicated program with many shared libraries compiled against different versions of the base library, only one instance of the base library needs to be loaded.</p>
<p>However, this additional complexity leads to additional ambiguity.  There are now two possible sources of a symbol version: the name in the object file and an entry in the version script.  There is the possibility that two instances of the same name will disagree on whether the name should be globally visible or not&#8211;in fact, this is normal, as undefined references will always use NAME@VERSION, not NAME@@VERSION.  Symbol overriding can be confusing: if the main executable defines NAME without a version, which versions should it override in the shared library?  Which version should be used in the program?  Symbol visibility adds an additional wrinkle to this.</p>
<p>The most important issue for the linker arises when it sees both NAME and NAME@VERSION, and then sees NAME@@VERSION.  At that time the linker has seen two separate symbols and has to decide whether to merge them.  The rules that gold currently follows are these:</p>
<ul>
<li>If NAME is hidden, and NAME@@VERSION is in a shared object, they are two independent symbols, and we do not change NAME or its version.</li>
<li>If NAME already has a version, because we earlier saw NAME@@VERSION2, then we produce two separate symbols, and leave NAME@@VERSION2 as the default symbol.</li>
<li>Otherwise, we change the version of NAME to VERSION, and do normal symbol resolution.</li>
</ul>
<p>I recently fixed a bug in this code in gold, which was breaking symbol overriding in a specific case.  I wouldn&#8217;t be surprised if there are more bugs.  As far as I know nobody has worked through all the symbol combining issues and defined what should happen.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.airs.com/blog/archives/220/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Linker relro</title>
		<link>http://www.airs.com/blog/archives/189</link>
		<comments>http://www.airs.com/blog/archives/189#comments</comments>
		<pubDate>Sat, 10 May 2008 01:16:44 +0000</pubDate>
		<dc:creator>Ian Lance Taylor</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.airs.com/blog/archives/189</guid>
		<description><![CDATA[gcc, the GNU linker, and the glibc dynamic linker cooperate to implement an idea called read-only relocations, or relro. This permits the linker to designate a part of an executable or (more commonly) a shared library as being read-only after dynamic relocations have been applied. This may be used for read-only global variables which are [...]]]></description>
			<content:encoded><![CDATA[<p>gcc, the GNU linker, and the glibc dynamic linker cooperate to implement an idea called read-only relocations, or relro.  This permits the linker to designate a part of an executable or (more commonly) a shared library as being read-only after dynamic relocations have been applied.</p>
<p>This may be used for read-only global variables which are initialized to something which requires a relocation, such as the address of a function or a different global variable.  Because the global variable requires a runtime initialization in the form of a dynamic relocation, it can not be placed in a read-only segment.  However, because it is declared to be constant, and therefore may not be changed by the program, the dynamic linker can mark it as read-only after the dynamic relocation has been applied.</p>
<p>For some targets this technique may also be used for the PLT or parts of the GOT.</p>
<p>Making these pages read-only helps catch some cases of memory corruption, and making the PLT in particular read-only helps prevent some types of buffer overflow exploits.</p>
<p>The first step is in gcc.  When gcc sees a variable which is constant but requires a dynamic relocation, it puts it into a section named <code>.data.rel.ro</code> (this functionality unfortunately relies on magic section names).  A variable which requires a dynamic relocation against a local symbol is put into a <code>.data.rel.ro.local</code> section; this helps group such variables together, so that the dynamic linker may apply the relocations, which will always be <code>RELATIVE</code> relocations, more efficiently, especially when using <code>combreloc</code>.</p>
<p>The linker groups <code>.data.rel.ro</code> and <code>.data.rel.ro.local</code> sections as usual.  The new step is that the linker then emits a <code>PT_GNU_RELRO</code> program segment which covers these sections.  If the PLT and/or GOT can be read-only after dynamic relocations, they are put next to the <code>.data.rel.ro</code> sections and also become part of the new segment.  This segment will enclosed within a <code>PT_LOAD</code> segment.  The <code>p_vaddr</code> field of the <code>PT_GNU_RELRO</code> segment gives the virtual address of the start of the read-only after dynamic relocations code, and the <code>p_memsz</code> field gives its length.</p>
<p>When the dynamic linker sees a <code>PT_GNU_RELRO</code> segment, it uses <code>mprotect</code> to mark the pages as read-only after the dynamic relocations have been applied.  Of course this only works if the segment does in fact cover an entire page.  The linker will try to force this to happen.</p>
<p>Note that the current dynamic linker code will only work correctly if the <code>PT_GNU_RELRO</code> segment starts on a page boundary.  This is because the dynamic linker rounds the <code>p_vaddr</code> field down to the previous page boundary.  If there is anything on the page which should not be read-only, the program is likely to fail at runtime.  So in effect the linker must only emit a <code>PT_GNU_RELRO</code> segment if it ensures that it starts on a page boundary.</p>
<p>I see this as a relatively minor security benefit.  It is not an optimization as far as I can see.  I am documenting it here as part of my general documentation of obscure linker features.  The current description of this feature in the GNU linker manual is rather obscure.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.airs.com/blog/archives/189/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
