<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Single Threaded Memory Model</title>
	<atom:link href="http://www.airs.com/blog/archives/79/feed" rel="self" type="application/rss+xml" />
	<link>http://www.airs.com/blog/archives/79</link>
	<description>Ian Lance Taylor</description>
	<lastBuildDate>Sun, 07 Mar 2010 17:12:02 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Memory access vs variable access &#124; keyongtech</title>
		<link>http://www.airs.com/blog/archives/79/comment-page-1#comment-15788</link>
		<dc:creator>Memory access vs variable access &#124; keyongtech</dc:creator>
		<pubDate>Sun, 18 Jan 2009 16:25:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.airs.com/blog/archives/79#comment-15788</guid>
		<description>[...] Re: Memory access vs variable access     On Jun 24, 5:51*pm, Gerhard Fiedler &lt;geli...@gmail.com&gt; wrote: &gt; On 2008-06-24 11:50:26, gpderetta wrote: &gt; &gt; &gt; If a specific architecture didn&#039;t allow 32 bit load/stores to 32 bit &gt; &gt; objects, it would require the implementation to pad every object to the &gt; &gt; smaller load/store granularity. Pretty much all common architectures &gt; &gt; allow access to memory at least at 8/16/32 bit granularity (except for &gt; &gt; DSPs I guess), so it is not a problem. &gt; &gt; Ah, I didn&#039;t know that. So on common hardware (maybe x86, x64, AMD, AMD64, &gt; IA-64, PowerPC, ARM, Alpha, PA-RISC, MIPS, SPARC), memory access is &gt; possible in byte granularity? Which then means that no common compiler &gt; would write to locations that are not the actual purpose of the write &gt; access?  All x86 derivatives allow 8/16/32/64 access at any offset. I think both PowerPC and ARM allows access at any granularity as the access is properly aligned. IIRC very old Alphas only allowed accessing aligned 32/64 bits (no byte access), but it got fixed because it was extremely inconvenient. I do not know about IA-64, MIPS, SPARC and PA-RISC, but I would be extremely surprised if they didn&#039;t.  &gt; &gt; &gt; Current compilers do not implement the rule above, but thread aware &gt; &gt; compilers approximate it well enough that, as long as you use correct &gt; &gt; locks, things work correctly *most of the time* (some compilers have &gt; &gt; been known to miscompile code which used trylocks for example). &gt; &gt; Do you have any links about which compilers specifically don&#039;t create code &gt; that works correctly? One objective of mine is to be able to separate this &gt; &quot;most of the time&quot; into two clearly defined subsets, one of which works &gt; &quot;all of the time&quot; :) &gt;  Many in corner cases do. Usually these are considered bugs and are fixed when they are encountered. See for example http://www.airs.com/blog/archives/79  &gt; &gt; Actually, discussing whether the next C++ standard prohibits &gt; &gt; speculative writes, is language specific and definitely on topic. &gt; &gt; Is &quot;speculative writes&quot; the technical term for the situation I described? &gt;  I&#039;m not sure if it applies to this example. I think that &quot;speculative store&quot; is defined as the motion of a store outside of its position in program order (usually sinking it outside of loops or branches). It doesn&#039;t take much to generalize the concept to that of the *addition* of a store not present in the original program (i.e. adjacent fields overwrites).  For details see &quot;Concurrency memory model compiler consequences&quot; by Hans Bohem:  http://www.open-std.org/jtc1/sc22/wg...007/n2338.html  HTH,  -- gpd [...]</description>
		<content:encoded><![CDATA[<p>[...] Re: Memory access vs variable access     On Jun 24, 5:51*pm, Gerhard Fiedler &lt;geli&#8230;@gmail.com&gt; wrote: &gt; On 2008-06-24 11:50:26, gpderetta wrote: &gt; &gt; &gt; If a specific architecture didn&#8217;t allow 32 bit load/stores to 32 bit &gt; &gt; objects, it would require the implementation to pad every object to the &gt; &gt; smaller load/store granularity. Pretty much all common architectures &gt; &gt; allow access to memory at least at 8/16/32 bit granularity (except for &gt; &gt; DSPs I guess), so it is not a problem. &gt; &gt; Ah, I didn&#8217;t know that. So on common hardware (maybe x86, x64, AMD, AMD64, &gt; IA-64, PowerPC, ARM, Alpha, PA-RISC, MIPS, SPARC), memory access is &gt; possible in byte granularity? Which then means that no common compiler &gt; would write to locations that are not the actual purpose of the write &gt; access?  All x86 derivatives allow 8/16/32/64 access at any offset. I think both PowerPC and ARM allows access at any granularity as the access is properly aligned. IIRC very old Alphas only allowed accessing aligned 32/64 bits (no byte access), but it got fixed because it was extremely inconvenient. I do not know about IA-64, MIPS, SPARC and PA-RISC, but I would be extremely surprised if they didn&#8217;t.  &gt; &gt; &gt; Current compilers do not implement the rule above, but thread aware &gt; &gt; compilers approximate it well enough that, as long as you use correct &gt; &gt; locks, things work correctly *most of the time* (some compilers have &gt; &gt; been known to miscompile code which used trylocks for example). &gt; &gt; Do you have any links about which compilers specifically don&#8217;t create code &gt; that works correctly? One objective of mine is to be able to separate this &gt; &quot;most of the time&quot; into two clearly defined subsets, one of which works &gt; &quot;all of the time&quot; <img src='http://www.airs.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  &gt;  Many in corner cases do. Usually these are considered bugs and are fixed when they are encountered. See for example <a href="http://www.airs.com/blog/archives/79" rel="nofollow">http://www.airs.com/blog/archives/79</a>  &gt; &gt; Actually, discussing whether the next C++ standard prohibits &gt; &gt; speculative writes, is language specific and definitely on topic. &gt; &gt; Is &quot;speculative writes&quot; the technical term for the situation I described? &gt;  I&#8217;m not sure if it applies to this example. I think that &quot;speculative store&quot; is defined as the motion of a store outside of its position in program order (usually sinking it outside of loops or branches). It doesn&#8217;t take much to generalize the concept to that of the *addition* of a store not present in the original program (i.e. adjacent fields overwrites).  For details see &quot;Concurrency memory model compiler consequences&quot; by Hans Bohem:  <a href="http://www.open-std.org/jtc1/sc22/wg...007/n2338.html" rel="nofollow">http://www.open-std.org/jtc1/sc22/wg&#8230;007/n2338.html</a>  HTH,  &#8212; gpd [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jarkao2</title>
		<link>http://www.airs.com/blog/archives/79/comment-page-1#comment-6893</link>
		<dc:creator>jarkao2</dc:creator>
		<pubDate>Fri, 16 Nov 2007 17:47:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.airs.com/blog/archives/79#comment-6893</guid>
		<description>Yes! After re-rethinking I&#039;ve found this example is still wrong, but you were faster... But, I&#039;ve thought about
division being too invasive, so such transformation would have changed the &#039;else&#039; branch (at least except division by 1). Sorry for such loud thinking.

I&#039;ve simply wondered, if such optimization is safe
against all such illegal traps, but it seems it really is!
It&#039;s only hard to get used to.

On the other hand isn&#039;t it funny - gcc enforces people to respect C&#039;s single threadedness just when they are
about to forget single processor boxes!

Thanks very much for these great programming articles!</description>
		<content:encoded><![CDATA[<p>Yes! After re-rethinking I&#8217;ve found this example is still wrong, but you were faster&#8230; But, I&#8217;ve thought about<br />
division being too invasive, so such transformation would have changed the &#8216;else&#8217; branch (at least except division by 1). Sorry for such loud thinking.</p>
<p>I&#8217;ve simply wondered, if such optimization is safe<br />
against all such illegal traps, but it seems it really is!<br />
It&#8217;s only hard to get used to.</p>
<p>On the other hand isn&#8217;t it funny &#8211; gcc enforces people to respect C&#8217;s single threadedness just when they are<br />
about to forget single processor boxes!</p>
<p>Thanks very much for these great programming articles!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian Lance Taylor</title>
		<link>http://www.airs.com/blog/archives/79/comment-page-1#comment-6887</link>
		<dc:creator>Ian Lance Taylor</dc:creator>
		<pubDate>Fri, 16 Nov 2007 15:24:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.airs.com/blog/archives/79#comment-6887</guid>
		<description>Division by zero is a trapping instruction, so that modification could introduce a trap where one wasn&#039;t before.  Therefore, the transformation is not valid in this case.</description>
		<content:encoded><![CDATA[<p>Division by zero is a trapping instruction, so that modification could introduce a trap where one wasn&#8217;t before.  Therefore, the transformation is not valid in this case.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jarkao2</title>
		<link>http://www.airs.com/blog/archives/79/comment-page-1#comment-6881</link>
		<dc:creator>jarkao2</dc:creator>
		<pubDate>Fri, 16 Nov 2007 14:09:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.airs.com/blog/archives/79#comment-6881</guid>
		<description>OK, this y changes too much, sorry.
Let it be the same memory still:

But, doesnâ€™t the standard say anything about flow control?
Eg., can something like this:
if (sin(x) == 2.0)
acquires_count = ++acquires_count/0;

be turned to this as well?:
r = sin(x) == 2.0;
acquires_count = (acquires_count += r == 0)/0;</description>
		<content:encoded><![CDATA[<p>OK, this y changes too much, sorry.<br />
Let it be the same memory still:</p>
<p>But, doesnâ€™t the standard say anything about flow control?<br />
Eg., can something like this:<br />
if (sin(x) == 2.0)<br />
acquires_count = ++acquires_count/0;</p>
<p>be turned to this as well?:<br />
r = sin(x) == 2.0;<br />
acquires_count = (acquires_count += r == 0)/0;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jarkao2</title>
		<link>http://www.airs.com/blog/archives/79/comment-page-1#comment-6879</link>
		<dc:creator>jarkao2</dc:creator>
		<pubDate>Fri, 16 Nov 2007 13:00:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.airs.com/blog/archives/79#comment-6879</guid>
		<description>&quot;The standard says nothing about precisely when values are written to memory.&quot;

But, doesn&#039;t the standard say anything about flow control?
Eg., can something like this:
if (sin(x) == 2.0)
    y = ++acquires_count/0;

be turned to this as well?:
r = sin(x) == 2.0;
y = (acquires_count += r == 0)/0;</description>
		<content:encoded><![CDATA[<p>&#8220;The standard says nothing about precisely when values are written to memory.&#8221;</p>
<p>But, doesn&#8217;t the standard say anything about flow control?<br />
Eg., can something like this:<br />
if (sin(x) == 2.0)<br />
    y = ++acquires_count/0;</p>
<p>be turned to this as well?:<br />
r = sin(x) == 2.0;<br />
y = (acquires_count += r == 0)/0;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Miriam Ruiz</title>
		<link>http://www.airs.com/blog/archives/79/comment-page-1#comment-6529</link>
		<dc:creator>Miriam Ruiz</dc:creator>
		<pubDate>Thu, 08 Nov 2007 22:30:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.airs.com/blog/archives/79#comment-6529</guid>
		<description>[...] Reading mig21&#8217;s weblog, in which I often find really interesting stuff, I found Ian Lance Taylor&#8217;s article &#8220;Single Threaded Memory Model&#8220;, which kind of bothers me a bit. It reports a recent discussion on the gcc and LKML mailing lists about how C compilers, gcc in this case, optimize for single threaded code, sometimes leading to counter-intuitive results which won&#8217;t work properly in multi-threaded software (leading, for example, to race conditions). [...]</description>
		<content:encoded><![CDATA[<p>[...] Reading mig21&#8217;s weblog, in which I often find really interesting stuff, I found Ian Lance Taylor&#8217;s article &#8220;Single Threaded Memory Model&#8220;, which kind of bothers me a bit. It reports a recent discussion on the gcc and LKML mailing lists about how C compilers, gcc in this case, optimize for single threaded code, sometimes leading to counter-intuitive results which won&#8217;t work properly in multi-threaded software (leading, for example, to race conditions). [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ncm</title>
		<link>http://www.airs.com/blog/archives/79/comment-page-1#comment-6145</link>
		<dc:creator>ncm</dc:creator>
		<pubDate>Wed, 31 Oct 2007 21:01:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.airs.com/blog/archives/79#comment-6145</guid>
		<description>I should never post when I have a fever.</description>
		<content:encoded><![CDATA[<p>I should never post when I have a fever.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian Lance Taylor</title>
		<link>http://www.airs.com/blog/archives/79/comment-page-1#comment-6128</link>
		<dc:creator>Ian Lance Taylor</dc:creator>
		<pubDate>Wed, 31 Oct 2007 03:45:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.airs.com/blog/archives/79#comment-6128</guid>
		<description>But no memory operations were moved over the pthread_mutex_trylock call.  It didn&#039;t make the transformation you state.  The original test case looked like
&lt;blockquote&gt;
    if (pthread_mutex_trylock (&amp;m) == 0)
      ++acquires_count;
&lt;/blockquote&gt;
gcc turned it into
&lt;blockquote&gt;
    r = pthread_mutex_trylock (&amp;m);
    acquires_count += r == 0;
&lt;/blockquote&gt;
(gcc generated an add with carry flag to memory, which is a standard x86 instruction).  A load and store were effectively added in the implicit else branch of the conditional, but no loads or stores were moved over the call to pthread_mutex_trylock.</description>
		<content:encoded><![CDATA[<p>But no memory operations were moved over the pthread_mutex_trylock call.  It didn&#8217;t make the transformation you state.  The original test case looked like</p>
<blockquote><p>
    if (pthread_mutex_trylock (&amp;m) == 0)<br />
      ++acquires_count;
</p></blockquote>
<p>gcc turned it into</p>
<blockquote><p>
    r = pthread_mutex_trylock (&amp;m);<br />
    acquires_count += r == 0;
</p></blockquote>
<p>(gcc generated an add with carry flag to memory, which is a standard x86 instruction).  A load and store were effectively added in the implicit else branch of the conditional, but no loads or stores were moved over the call to pthread_mutex_trylock.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ncm</title>
		<link>http://www.airs.com/blog/archives/79/comment-page-1#comment-6099</link>
		<dc:creator>ncm</dc:creator>
		<pubDate>Tue, 30 Oct 2007 19:55:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.airs.com/blog/archives/79#comment-6099</guid>
		<description>So (pulling code from one of the e-mails referenced) the code generated is as if the source said

tmp = acquires_count;
res = pthread_mutex_trylock(&amp;mutex);
acquires_count = tmp + (res == 0)

and as somebody else noted, under POSIX threads, moving the load across the pthread_* call is not allowed.  Everybody writing from Gcc says acquires_count should have been declared volatile.  It seems to me that the pthread call should have an attribute forbidding the compiler from moving memory operations across it.  I&#039;m astonished there isn&#039;t such an attribute.  It seems to me it&#039;s needed for practically every synchronization primitive.</description>
		<content:encoded><![CDATA[<p>So (pulling code from one of the e-mails referenced) the code generated is as if the source said</p>
<p>tmp = acquires_count;<br />
res = pthread_mutex_trylock(&amp;mutex);<br />
acquires_count = tmp + (res == 0)</p>
<p>and as somebody else noted, under POSIX threads, moving the load across the pthread_* call is not allowed.  Everybody writing from Gcc says acquires_count should have been declared volatile.  It seems to me that the pthread call should have an attribute forbidding the compiler from moving memory operations across it.  I&#8217;m astonished there isn&#8217;t such an attribute.  It seems to me it&#8217;s needed for practically every synchronization primitive.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ncm</title>
		<link>http://www.airs.com/blog/archives/79/comment-page-1#comment-6096</link>
		<dc:creator>ncm</dc:creator>
		<pubDate>Tue, 30 Oct 2007 18:25:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.airs.com/blog/archives/79#comment-6096</guid>
		<description>I&#039;m sorry, I missed seeing the links you posted.</description>
		<content:encoded><![CDATA[<p>I&#8217;m sorry, I missed seeing the links you posted.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
