Archive for Programming

Newsqueak

A mention of Squeak in a comment on my last post reminded me of Newsqueak, an interesting little language by Rob Pike. Newsqueak has nothing to do with Squeak. Newsqueak implements Hoare’s idea of Communicating Sequential Processes.

The interesting part of Newsqueak is the channel data type. A channel is a two-way communication path. Given a variable v of type chan of int, the assignment c< - 3 sends 3 on the channel, and the expression < -c receives the next value on the channel.

Thus a channel is basically a Unix pipe, represented as a fundamental data type. It can be used for synchronization as well as communication. This should make it easier to write safe multi-threaded programs.

I’ve never programmed in Newsqueak, and I don’t know if there are even any implementations out there outside of Plan 9. And of course it’s possible to use Unix pipes to communicate between threads today, although the required kernel calls and marshalling may make them less efficient than in-process channels. Still it’s interesting to think about threading using channels.

Comments (5)

Scheme

When I was in school I wrote programs in Scheme and its variant T. I still remember that as the easiest programming language I’ve ever used. In Scheme you never waste time on pointless boilerplate. You just write code. In order to run some function on a bunch of data, a very common operation, you just write a function closure, which is trivial. Dynamic typing means that you don’t waste time writing types.

In other words, Scheme has all the advantages of today’s interpreted languages like Python and Ruby. It is more powerful in practice, because Scheme makes it very easy to manipulate and evaluate Scheme code itself, something which is not feasible in most other languages. This means that Scheme can be its own macro processor.

And of course it is possible to compile Scheme to reasonably efficient machine code–not C/C++ efficient, but not bad.

So why hasn’t Scheme caught on? It still lives in various niche environments, but it is not popular. Is it as simple as people not liking prefix notation?

When people ask me what they should do to learn to program, or more commonly these days what their teenage kids should do, I always recommend Abelson and Sussman’s book Structure and Interpretation of Computer Programs (which can now be read online). It uses Scheme. It’s the best introduction I know to what computer programming is. It’s only an introduction, of course; it doesn’t cover the issues which arise in the workplace. But I think that anybody who wants to be a programmer has to be able to master the material in that book.

Comments (8)

Multithreaded Garbage Collection

Garbage collection is the traditional solution to the problem of managing memory. Multithreaded programming is the current wave of the future. I’ve written about the difficulties of multithreaded programming before, but people are going to do it regardless. In which case: how do we garbage collection in a multithreaded program?

Let’s assume that we don’t want to halt the whole program during garbage collection, because that is expensive. In that case, it’s not too hard to understand how it can be done if you can 1) halt the whole program (other than the garbage collection thread) for a brief period; 2) any change to a heap object will put the object on a list of changed objects; and 3) you can assume that all pointer loads and stores are atomic with respect to each other. Then the garbage collection thread can halt the program while it scans the roots, let the program run while it does a mark pass, halt the program again and scan the changed objects, and let the program while it does a sweep. (This has to be an in-place garbage collector, not one that moves the valid objects).

It’s possible to implement those requirements for an interpreted language like the traditional setting for Java. You can still JIT code that uses the heap, and it will help to do some escape analysis to see whether a heap pointer can possibly escape the function.

I don’t really see how to implement those requirements for a native code language like C++. In particular tracking the changed objects seems somewhat painful. There was a garbage collection proposal for the next C++ standard, though I believe that it may have been withdrawn. But I don’t see how to implement garbage collection efficiently in a multi-threaded programming. I did some web searches, but the most helpful sounding ideas I could find were all in academic papers which weren’t online. I wonder if there are any actual implementations which try to implement my suggested requirements.

Comments (3)

Layered Programming

Many programs today are written at a very high level. They are run in an interpreted environment, not a compiler. Often many different components running in different interpreted environments are hooked together. HTML and XML, for example, started out as markup languages, but now they are often also used as components of programs hooking together the output of different servers.

Computer programming has always been based on layering and abstraction. The processor abtracts the transistor, the traditional programming language abstracts the processor, the kernel abstracts the hardware. What seems fairly new to me is the speed at which these layers change and their complexity. New ideas are implemented in the form of extensive libraries. Each library can be learned in isolation, but there is no unifying principle across libraries.

It is becoming increasingly difficult to be a systems expert. When I learned to program, it was possible to understand your entire program from the source code, in whatever language, down to the machine code. When writing a modern Ajax application, that is simply impossible. There are too many different interpreters. There is too much code involved. Even fixing on a new base level above the processor–perhaps the browser–doesn’t help. This all leads to decreased performance, which is sometimes important, and decreased security, which is often important.

We can’t go back. What I wonder is whether we will again cohere to a programming model which can be understood at all relevant layers. Or whether things are just going to get increasingly complicated.

Comments (4)

Peer Review

Peer review can be a useful technique when programming. It ensures that at least one other person has read the code. It can catch dumb bugs and help ensure that the code is not unnecessarily obscure. Several popular programming methodologies use it. (Pair programming has the same benefits.)

Peer review has one obvious disadvantage: it slows down coding. In order for peer review to be meaningful, you have to present digestible chunks for review. And that
mean waiting for the review, or using some sort of patch management to permit continued coding until the review is complete and to incorporate changes suggested by the review.

I generally have not worked on project that require peer review. The gcc project requires maintainer approval of all changes, but maintainers are permitted to commit their own changes without review. I can see the advantages of a peer review system, provided there is some mechanism to ensure that reviews happen quickly. If reviews can linger, then projects can stall very quickly.

gcc has a difficult enough time getting patches reviewed as it is. It’s hard to recommend anything which would make it slower. One approach that might make it more acceptable would be to say that if a maintainer writes a patch, the peer review can be done by anybody–it would not have to be another maintainer. That is, require a reviewer for every patch, but only require that either the author or the reviewer be a maintainer.

I’m not sure whether this would be a good idea or not. It would be good to improve the quality of the gcc code base, but the quality is not so bad that drastic measures are required. Only a small additional cost would be acceptable.

Comments (2)

« Previous entries · Next entries »