Archive for Programming

Piece of PIE

Modern ELF systems can randomize the address at which shared libraries are loaded. This is generally referred to as Address Space Layout Randomization, or ASLR. Shared libraries are always position independent, which means that they can be loaded at any address. Randomizing the load address makes it slightly harder for attackers of a running program to exploit buffer overflows or similar problems, because they have no fixed addresses that they can rely on. ASLR is part of defense in depth: it does not by itself prevent any attacks, but it makes it slightly more difficult for attackers to exploit certain kinds of programming errors in a useful way beyond simply crashing the program.

Although it is straightforward to randomize the load address of a shared library, an ELF executable is normally linked to run at a fixed address that can not be changed. This means that attackers have a set of fixed addresses they can rely on. Permitting the kernel to randomize the address of the executable itself is done by generating a Position Independent Executable, or PIE.

It turns out to be quite simple to create a PIE: a PIE is simply an executable shared library. To make a shared library executable you just need to give it a PT_INTERP segment and appropriate startup code. The startup code can be the same as the usual executable startup code, though of course it must be compiled to be position independent.

When compiling code to go into a shared library, you use the -fpic option. When compiling code to go into a PIE, you use the -fpie option. Since a PIE is just a shared library, these options are almost exactly the same. The only difference is that since -fpie implies that you are building the main executable, there is no need to support symbol interposition for defined symbols. In a shared library, if function f1 calls f2, and f2 is globally visible, the code has to consider the possibility that f2 will be interposed. Thus, the call must go through the PLT. In a PIE, f2 can not be interposed, so the call may be made directly, though of course still in a position independent manner. Similarly, if the processor can do PC-relative loads and stores, all global variables can be accessed directly rather than going through the GOT.

Other than that ability to avoid the PLT and GOT in some cases, a PIE is really just a shared library. The dynamic linker will ask the kernel to map it at a random address and will then relocate it as usual.

This does imply that a PIE must be dynamically linked, in the sense of using the dynamic linker. Since the dynamic linker and the C library are closely intertwined, linking the PIE statically with the C library is unlikely to work in general. It is possible to design a statically linked PIE, in which the program relocates itself at startup time. The dynamic linker itself does this. However, there is no general mechanism for this at present.

Comments (2)

Non-free Services

As both of my faithful readers can see, my blog postings have dropped significantly. I’ve been posting my random little comments on Google+ instead.

Which leads me to the following. There is a hard-core group of people who only use free software. I’m not quite that hard-core, but in practice I do use only free software, except perhaps for some binary drivers in the kernel (I don’t actually know whether the systems I’m running use binary drivers or not, and I’m not hard-core enough to find out).

I’ve seen some people argue that if you are serious about using free software, you should also only use Internet services which are themselves free software. For example, you should not use Facebook or Google+, because the software used to run those services is not free.

I don’t agree with that argument. The key goal of free software is that I always have the right to change the software that I am running. When I use an Internet service like Google+, I am not running the software. Even if I had a copy of the software, I would not be able to run it, because I don’t have enough servers. And even if I had enough servers, it would be useless for me to run the software, because I don’t have the data. And there is no way to grant me access to the data, because that would violate the reasonable privacy choices of everybody else using the service.

When it comes to a service like Google+, whether the software is free is not important. Releasing the software would not give me any more freedom than I already have. Google+ is only interesting when many people are operating out of a single shared data base, and that data base must have privacy safeguards to ensure that it is not copied.

What matters with Google+ is not the software, but the data. It is important that I be able to retrieve all my data associated with Google+, and that I be able to retrieve it in a way that makes it possible to use with other software. That is, I should be able to retrieve my posts, my comments on other people’s posts, my list of followers, my photos, etc. And I should be able to plug them into some other software service if I so choose.

In fact Google+ does have a set of APIs which permit me to retrieve my data. I haven’t verified that all Google+ data is available via the APIs, but all the obvious stuff seems to be available. Given those APIs, it should be possible for me to move all my data to some other service which provides te required APIs itself.

So I personally don’t see any reason why even a hard-core free software supporter should avoid using a service like Google+. This isn’t to say that it wouldn’t be nice if Google freed up the software and accepted patches from outside users. It’s just that that is not a critical part of freedom to use software.

Comments (5)


I’ve been playing Sudoku on Google+. i’ve more or less mastered the easy and medium levels, but it takes me about 30 minutes to do a hard level, and I haven’t tried expert yet. Sudoku is a fairly dumb game in some ways; as a colleague of mine pointed out, it’s trivial to write a computer program which will win every time. But I find the game somewhat interesting because it mirrors, in reverse, the way I think about programming.

You can write a computer program more or less any way you like. So I tend to think of a program in terms of constraints. Typical constraints are: the desired behaviour; the available runtime; the algorithmic complexity; the available libraries; the language; maintainability; who is going to review the code and what they will accept. Write a program is a matter of finding the simplest solution which meets the constraints. Difficult programming problems are ones where the constraints come into conflict, and it’s hard to see your way through.

Sudoku works the same way, only in reverse. In programming you are allowed to write any code that meets the constraints. In Sudoku you know that there is only one solution, so you have to look for moves that are forced by the constraints. Solving a Sudoku puzzle is a matter of looking deeper and deeper into the problem until you have eliminated all moves but one.

My hope is that practice in this area will subconsciously encourage me to look deeper for constraints when writing code, which will save time in the long run because I will have to throw away less code. I doubt this will actually work, but it seems worth a try.

Also Sudoku is a good way to exercise short term memory, as I’m avoiding writing anything down while solving the puzzle. I used to play cards regularly (bridge, whist) and I was able to remember the location of many of the cards in other people’s hands. I noticed that I lost that facility as I’ve failed to practice it. As i write this I realize that short term memory is not too important in today’s world, but at least it makes me feel smarter.

Comments (4)


Sorry for the long posting hiatus. I have enough readers now that it’s hard to write the usual random nonsense.

I was recently reminded of an old problem using CVS over SSH, which was an interesting example of various different instances of reasonable behaviour adding up to a bug. It’s possible that this bug has been fixed, but I’ll assume that it hasn’t. The bug is that “cvs diff 2>&1 | less” will sometimes drop data, leaving you looking at an incomplete diff.

When CVS invokes SSH, it sets up file descriptors 0 (standard input) and 1 (standard output), but not 2 (standard error). Thus SSH inherits file descriptor 2 from CVS. This means that any SSH errors get reported to the standard error passed to CVS, which is what you want to have happen. However, when using 2>&1, this means that SSH’s file descriptor 2 will be the same as CVS’s file descriptors 1 and 2.

SSH puts its file descriptors 0, 1, and 2 into nonblocking mode, so that it can use select to send data back and forth without blocking. This means that SSH puts CVS’s file descriptor 2 into nonblocking mode. When using 2>&1, file descriptors 1 and 2 are the same, so this puts CVS’s file descriptor 1 into nonblocking mode.

CVS uses stdio to output data to standard output. When writing to a pipe, the buffer can fill up. CVS naturally never puts the descriptor into nonblocking mode, but when SSH has done it indirectly, and the buffer fills, the data being written will be discarded. This is what causes the bug: the discarded data is never seen by the user.

So what’s the fix? It’s reasonable for SSH to put its descriptors into nonblocking mode. It’s reasonable for CVS to pass its file descriptor 2 to SSH. It’s reasonable for CVS to use stdio to output data. It’s reasonable for stdio to not specially handle a nonblocking file descriptor—any program which wants to use a nonblocking descriptor needs to handle I/O retries itself. It’s reasonable for 2>&1 to mean that file descriptors 1 and 2 refer to the same underlying pipe. It’s reasonable for the user to use 2>&1 when piping cvs diff output to less.

I think the only remaining link in the sequence leading to the bug is that when SSH sets its file descriptor 2 to nonblocking mode, this affects the file in CVS. This is a consequence of the Unix file model, in which file descriptors refer to underlying files. A file descriptor has only one flag: whether it is closed if the exec system cal is run. All the other information is attached to the underlying file. Using 2>&1 means that two file descriptors point to the same file. Forking and execing SSH does not change this–in fact, it adds two more file descriptors, in the SSH process, which point to the same file. Any change in the flags associated with that file is seen by all the associated file descriptors.

This separation of file descriptor and file is what makes 2>&1 work. It’s also what makes >> work; >> opens a file in append mode, and the append flag is inherited by other processes which refer to that file. In any case, what really counts here is not the exec, but the fork; forking a process should not change the flags associated with a file. Further, I’m sure there programs which depend on the fact that changing the flags on a file after a fork affects the file as seen by the parent process.

It’s possible to imagine that file descriptors point to a new shared structure which then points to the underlying file. The file position and some flags would stay with the underlying flie. The new shared structure would just hold some flags which need not always be shared: O_NONBLOCK, O_ASYNC, etc. Calling fork would not create a new shared structure, but calling exec would, copying the existing structure. That would let some flags not be copied across exec process boundaries.

However, that would be a significant change to the Unix file model, a model which has lasted for decades and is not clearly broken. Absent that change, we are left with a complex bug, in which all choices are reasonable.

The workaround for the problem is to invoke SSH with 2>/dev/null, and assume that SSH never writes anything useful to standard error. The 2>/dev/null disassociates SSH file descriptor 2 from CVS file descriptor 2, so CVS file descriptor 1 is not accidentally set into nonblocking mode.

Comments (7)

Executable stack

The gcc compiler implements an extension to C: nested functions. A trivial example:

int f() {
int i = 2;
int g(int j) { return i + j; }
return g(3);

The function f will return 5. Note in particular that the nested function g refers to the variable i defined in the enclosing function.

You can mostly treat nested functions as ordinary functions. In particular, you can take the address of a nested function, and you can pass the resulting function pointer to another function, that function can make a call through the function pointer to the nested function, and the nested function will correctly refer to variables in its caller’s stack frame. I’m not here going to go into the details of how this is implemented. What I will say is that gcc currently implements this by writing instructions to the stack and using a pointer to those instructions. This requires that the stack be executable.

This approach was implemented many years ago, before computers were routinely attacked. In the hostile Internet environment of today, an area of memory that is both writable and executable is dangerous, because it gives an attacker space to create brand new instructions to execute. Since the stack must be writable, this means that we want to make the stack non-executable if possible. Since very few programs use nested functions, this is normally possible. But we don’t want to break those few programs either.

This is how the GNU tools do it on ELF systems such as GNU/Linux. The compiler adds a new section to all code that it compiles. The section is named .note.GNU-stack. It is empty and not allocated, which means that it takes up no space at runtime. If the code being compiled does not require an executable stack—the normal case—the compiler doesn’t set any flags for the section. If the code does require an executable stack, the compiler sets the SHF_EXECINSTR flag.

When the linker links a program, it checks each input object for a .note.GNU-stack section. If there is no such section, the linker assumes that the object must be old, and therefore may require an executable stack. If there is such a section, the linker checks the section flags to see whether the code requires an executable stack. The linker discards the .note.GNU-stack sections, and creates a PT_GNU_STACK segment in the output executable. The PT_GNU_STACK segment is empty and is not part of any PT_LOAD segment. The segment flags PF_R and PF_W are always set. If the linker has determined that the program requires an executable stack, it also sets the PF_X flag.

When the Linux kernel starts a program, it looks for a PT_GNU_STACK segment. If it does not find one, it sets the stack to be executable (if appropriate for the architecture). If it does find a PT_GNU_STACK segment, it marks the stack as executable if the segment flags call for it. (It’s possible to override this and force the kernel to never use an executable stack.) Similarly, the dynamic linker looks for a PT_GNU_STACK in any executable or shared library that it loads, and changes the stack to be executable if any of them require it.

When this all works smoothly, most programs wind up with a non-executable stack, which is what we want. The most common reason that this fails these days is that part of the program is written in assembler, and the assembler code does not create a .note.GNU_stack section. If you write assembler code for GNU/Linux, you must always be careful to add the appropriate line to your file. For most targets, the line you want is:

.section .note.GNU-stack,"",@progbits

There are some linker options to control this. The -z execstack option tells the linker to mark the program as requiring an executable stack, regardless of the input files. The -z noexecstack option marks it as not requiring an executable stack. The gold linker has a --warn-execstack option which will cause the linker to warn about any object which is missing a .note.GNU-stack option or which has an executable .note.GNU-stack option.

The execstack program may also be used to query whether a program requires an executable stack, and to change its setting.

These days we could probably change the default: we could probably say that if an object file does not have a .note.GNU-stack section, then it does not require an executable stack. That would avoid the problem of files written in assembler which do not create the section. It’s possible that this would cause some programs to incorrectly get a non-executable stack, but I think that would be quite unlikely in practice. An advantage of changing the default would be that the compiler would not have to create an empty .note.GNU-stack section in all object files.

By the way, there is one thing you can do with a normal function that you can not do with a nested function: if the nested function refers to any variables in the enclosing function, you can not return a pointer to the nested function to the caller. If you do, the variable will disappear, so the variable reference in the nested function will be dangling reference. It’s worth noting here that the Go language supports nested function literals which may refer to variables in the enclosing function, and when using Go this works correctly. The compiler creates variables on the heap if necessary, so they do not disappear until the garbage collector determines that nothing refers to them any more.

Finally, I’ll mention that there are some plans to implement a different scheme for nested functions in C, one which does not require any memory to be both writable and executable, but these plans have not yet been implemented. I’ll leave the implementation as an exercise for the reader.

Comments (4)

« Previous entries Next Page » Next Page »