School Testing

I always did extremely well on standardized tests in school, which convinces me that the standardized tests are almost completely useless for assessing knowledge or intelligence or the skills of one’s teacher. Success at standardized tests does not require general intelligence, it requires the particular ability to think like the people who create the tests. Where my friends would sometimes get hung up on issues like trying to figure out the right answer to a question, my approach was instead to figure out which answer the test writers were looking for. This approach only works for multiple choice tests, but in that limited arena it was highly effective.

I still recall my moment of realization in high school. There was a pair of math tests which were the leadin to selecting the members of the U.S. team for the International Math Olympiad. The first test was multiple choice, I think 100 questions, and I did very well as usual. I placed third in the state or something, at any rate good enough to go on to the next level. The second test was a series of five or so math problems. You had three or four hours to solve them. The questions were essays, and so were the answers—no multiple choice. I scored a zero. I could understand the questions, but I had no idea whatsoever how to actually solve them. A good friend of mine, on the other hand, did know how to solve them, and in fact went on to be a member of the U.S. team.

My conclusion is that standardized tests tell you something, but they don’t tell you what you really want to know: a good a student is operating in the real world. The big school reforms which are based on using standardized tests to assess students, like Bush’s No Child Left Behind or Obama’s Race to the Top, are thus based on an invalid premise. And when you start to judge teachers based on how well their students do on standardized tests, you are creating a perverse incentive: you are rewarding them if they produce students who do well on tests rather than producing students who do well in the real world.

I do agree that assessing students is important: you need to know how well your student body is doing. And standardized tests are the best mechanism I know for doing so: they are easy to hand out, easy to grade, and if they are carefully written (which is very hard) they can results which can be compared across socio-economic gaps. But you must always be deeply aware of the shortcomings of these tests. They are telling you something, but they are not telling you everything or even most of what is important.

When tests are used to assess the quality of teachers, you are falling deep into the measurement problem: you are judging people based not on what they are achieving, but based on what you can measure.

And all that said, I do agree that it is important to assess teachers, because there are bad teachers out there. I had some of them myself. There has to be some way to get them out of schools and into some place where they can stop holding children back. It’s just that standardized testing is not that way. Just because there is something we need to do, and there is something else that we can measure, we should not start to think that the thing we can measure will tell us what we need to do.


  1. fche said,

    September 24, 2010 @ 7:52 am

    An alternate interpretation of your math exam experience would be that you were really smart (especially in contrast to classmates who could not figure out the tests), but not quite as smart as your friend. So at the top levels of achievements, the multiple-choice test banks don’t help discriminate, but then again maybe they don’t need to. If used to evaluate overall the overall student body / teaching, the excellent and super-excellent can be pigeonholed together without harm.

    So maybe your personal experiences don’t mean much as to testing applied to the broader population.

    (Your point re. measuring something, anything, out of desperation. One’d need to crunch a lot of advice, data, and try to unravel bias everywhere, to tune this peculiar monopoly. If only there were a proper market, so central planning were not required…)

  2. Paul Clayton said,

    September 24, 2010 @ 5:04 pm

    Multiple choice is not identical to standardized.

    Standardized merely indicates that the same test (and evaluation
    mechanism) is applied. For some things this is relatively easy
    to achieve. Even automatically demonstrating equivalence of
    multiple mathematical formulae might not be that difficult
    (assuming no intentional convolution). The labeling of a cell
    diagram is easy for a machine to evaluate. The placement of
    historical events on a timeline is easy for a machine to
    evaluate–but that is a very tiny fraction of what should be
    learned from history classes.

    For areas that require human evaluation, it seems one
    would need multiple samples and perhaps multiple
    evaluations of each sample. In order to allow such
    broad coverage, it seems one would need to integrate
    the standardized testing into the class. A teacher
    would not evaluate the tests for that teacher’s students
    (but, of course, would receive the graded tests). With
    the low cost of transmitting information this might be
    practical. (I think the specialization–external
    end-of-term–of the tests is one of the problems.
    Even with a perfect test, two hours per term is not
    enough to provide strong evaluation.)

    (I think sampling error is a problem with current
    standardized testing. I remember a reading comprehension
    essay on the subject of the Sun–already having 80% of
    the factual content and having a framework to which new
    content could be easily attached that essay was far easier
    for me to understand than it might have been for most

    Human evaluation is also flawed. In the second semester
    of introductory physics, a test had an electronics problem
    that I solved as a mechanics problem and my answer was
    initially marked as incorrect.

    I also did not receive the impression that the programs
    actually evaluate individual teachers–rather schools
    are ‘graded’. I think the results are also not well
    designed for teacher evaluation–no consideration
    seems to be given for the differences in entry-knowledge,
    ‘giftedness’, etc. of the students.

    One of the problems with public schools is that they are
    ‘free’–one is more likely to pay attention to value when
    one is more directly exposed to the cost. The lack of
    choice also reduces the inclination to care–after all, if
    any positive difference was possible an alternative would
    exist, and a conscious choice also implies an investment
    in the choice–and the inability to express disapproval
    by choosing an alternative makes it less likely that
    disapproval will be expressed.

  3. Adam Olsen said,

    September 24, 2010 @ 7:54 pm

    If essay form is more reliable, but too expensive to evaluate, and it wouldn’t be fair to only make a few students do an essay form, why not make them all do multiple choice+essay, give all the students their result on multiple choice, and select a random sample of 10% of them to evaluate the multiple choice for?

    That would at least motivate the teachers to do better. Unfortunately it risks the students not taking the essay seriously. Not sure how to solve that. Bumping up the sample size to 33% or 50% might be sufficient, while still posing a significant cost savings.

  4. ppluzhnikov said,

    September 24, 2010 @ 10:13 pm

    I have found GRE tests to be excellent: even though they are multiple choice, as far as I can tell there is no way to guess which answer — is it 5, 15 or 1/3 — test writers are looking for, without actually solving the problem.

    So perhaps the problem is not multiple-choice tests, but low quality multiple-choice tests?

    When my son was in middle school, I was often astounded by multiple-choice tests he brought home — the questions were ill-formed, and didn’t have a correct answer at all!

  5. Simetrical said,

    September 26, 2010 @ 11:32 am

    It’s necessary to shut down schools that perform too poorly. Running a school (or anything else) is a hard job, and we should expect most people given that job to fail. Most new businesses fail quickly, and almost all fail eventually. That this doesn’t happen to public schools is only because they’re propped up by the government, which normally cannot fail. Mark Shuttleworth has an interesting essay on this (although I think it veers into impracticality at the end):

    The problem is evaluating which schools are doing poorly. As you note, standardized tests are a crude instrument. They could certainly improve the situation despite that — a standardized test can easily distinguish someone who can read at a first-grade level from one who can read at a seventh-grade level, and that kind of disparity apparently can exist if the teachers are terrible enough.

    What we generally do when this problem comes up in capitalist societies is just let people choose what goods and services they use, and then let the ones they don’t like disappear through competition. In the context of schooling, this would be a voucher system. If parents can be trusted to figure out what’s best for their children at least as well as the government can, such a system would presumably produce students who are at least as well off as students today, and certainly at lower cost.

    I’m not clear what the problem with vouchers is, if the only goal is actually improving students’ education. Clearly they’d lead to public schools becoming much less important, with resultant reduction of government control over education, and the loss of many cushy union teacher jobs. But is there any credible reason to think it would do anything but improve education quality and reduce costs in the long term? What makes it different from any other industry in that regard?

  6. fche said,

    September 26, 2010 @ 2:53 pm

    “I’m not clear what the problem with vouchers is, if the only goal is actually improving students’ education.”

    I suppose the theory would be that they constitute an escape from another redistribution-of-wealth scheme, in this case one that is supposed to benefit the poor’s kids.

  7. Simetrical said,

    September 26, 2010 @ 3:23 pm

    That doesn’t make sense, since vouchers redistribute wealth just as much as public schools. I think the biggest reason is that there are massive vested interests in the status quo — particularly teachers’ unions, but also just the thousands of public employees who would inevitably lose their jobs if public schools were outcompeted. Similar to the post office, which gets huge government assistance compared to private competitors. People tend to get very angry if they lose their jobs, so this is a powerful incentive for politicians not to encourage large institutions to die. GM also falls into this category, for that matter.

    It also helps that people who are involved in politics often went to private school and send their children to private school, so it doesn’t affect them much. And there are some liberals who are afraid that private schools will reinforce values they don’t like or think are false, such as religion or creationism. Or, more generally, that it will reduce centralized control over education. I like this quote from John Stuart Mill’s On Liberty, arguing in favor of mandatory education (which I guess was controversial in his time):

    All that has been said of the importance of individuality of character, and diversity in opinions and modes of conduct, involves, as of the same unspeakable importance, diversity of education. A general State education is a mere contrivance for moulding people to be exactly like one another: and as the mould in which it casts them is that which pleases the predominant power in the government, whether this be a monarch, a priesthood, an aristocracy, or the majority of the existing generation, in proportion as it is efficient and successful, it establishes a despotism over the mind, leading by natural tendency to one over the body. An education established and controlled by the State, should only exist, if it exist at all, as one among many competing experiments, carried on for the purpose of example and stimulus, to keep the others up to a certain standard of excellence.

  8. Ian Lance Taylor said,

    September 28, 2010 @ 6:29 am

    Thanks for all the comments.

    fche: It’s true that I am fairly good at math, but I think I’m even better at test taking. It’s not my only example.

    Paul Clayton: Admittedly my test-taking days are long past. The standardized tests I took were all multiple choice, except for the AP tests. Perhaps test technology has gotten better, although I haven’t heard that. Using tests to grade schools does indirectly mean that they grade teachers, and it means that teachers are under pressure to have their students get good results on the tests.

    ppluzhnikov: The only GRE I took was the computer science one, and it was the first year they gave it, so it was not a good sample for me. Multiple choice tests let you work backward from the solution to the problem, which is often much easier than solving the problem directly. Also, of course, most tests are designed such that if you can eliminate one answer, your score is on average improved by guessing. And as I mentioned you can often solve the problem by thinking about what the test writer is thinking about, rather than thinking about the problem directly. This is how one approaches those questions you mention that are ill-formed. It’s a particular skill, and it’s a useful one; it’s the same skill I use to figure out how to use a DVD player. It’s just not what the test is looking for.

    Simetrical: one problem with a voucher system in the U.S. is that people will opt their kids into schools which teach their beliefs, such as creationism, rather than at least exposing their kids to the prevailing ideas in society. Today many people do that by home schooling, but of course that is expensive, and many more would do it if it were free. I’m a little scared to think what this country would look like without a certain leveling from the public schools. However, I agree that that is not a terribly strong argument, and perhaps a voucher system would be the best approach. It seems to be working OK in Sweden, admittedly a far more homogeneous society.

  9. fche said,

    September 28, 2010 @ 7:55 am

    “people will opt their kids into schools which teach their beliefs, such as creationism, rather than at least exposing their kids to the prevailing ideas in society”

    Yeah, but but but … there exist worse myths to believe. People can be so illogical about some things; it might as well be on a topic that won’t impact their daily lives. When myths help other people behave in ways that we imagine rationally appropriate anyway, well, let them believe.

  10. Paul Clayton said,

    September 28, 2010 @ 5:18 pm

    Sorry to have misguided. I did not mean that existing practice used
    mechanisms more advanced than multiple choice merely that such
    was well within the technical ability of automated evaluation. When
    ambiguity or creativity are involved, machine ‘intelligence’ is quite

    If I were a good teacher, I would not want my evaluation to be
    the mean of the quality of all teachers at the school. (Imagine if
    students were evaluated in this manner!) If I were a teacher
    with integrity, I would want to be managed as appropriate for
    my ability–not just monetary compensation but other
    incentives to improve, stop teaching, or otherwise maximize
    the general good and assignment to match my ability.
    School grading does make sense–a great teacher cannot
    perform well with inadequate facilities, unmanaged
    discipline problems, etc.–; and the school management
    is responsible for managing the teachers. However, the
    school management needs a means of evaluating teachers
    and making those evaluations available to the consumers
    is good for market economics. (Of course, most
    organizations would be disinclined to provide information
    about the competence of their workers. It seems that
    many software vendors do not even allow distribution of
    information about the performance of their products!)

    With respect to vouchers supporting ignorance, it
    would not be a significant burden to have national
    standards for certification of passing various grade
    levels–i.e., the students would have to be exposed
    to orthodox ideas. I do seriously dislike supporting
    indoctrination into highly disagreeable belief systems
    (‘the space program is a hoax’ is unpleasant but
    ‘women are worth less than dogs’ is [understatement]
    just a bit worse).

    Does one want the poor indoctrinated by the State
    (which is also the sole source of authority for the
    use of force) to generate loyal subjects, by
    corporations to generate loyal customers, by
    non-profit organizations committed to
    particular truths to generate loyal disciples?

    I think broad participation is important to
    provide checks and balances. (It is interesting
    that the U.S. three branches of government
    seem to loosely follow the Judeo-Christian model
    of Prophet (judicial–truth speakers), Priest
    (legislative–‘the voice of the People is the voice
    of God’), and King (administrative).) Without
    choice, parental involvement will not be
    maximized. (Parents also need to believe that
    educational choices matter. If one believes
    that one’s poverty (or wealth) is perpetual and
    generational, the incentive to exert effort is
    significantly diminished.)

  11. Simetrical said,

    September 28, 2010 @ 5:46 pm

    I’ll vouch for the fact that at least the math subject GRE is very well-designed. It’s multiple-choice, but in most cases you really do have to solve the problem to figure out which answer is right, because they very cleverly select choices that all seem plausible and questions that don’t let you easily work backward from the answers.

    But test-taking skills are still critical — if nothing else, some people are more calm and focused when taking tests than others. I’m also very good at test-taking, but I have smart friends who aren’t. Still, this doesn’t invalidate the tests. Being good at test-taking will help you a lot, but you can’t get anywhere if you haven’t studied the material enough.

    Voucher systems will of course lead to a greater variety of beliefs, including beliefs that most people don’t like, or think are harmful. This is a good thing. Diversity of beliefs is part and parcel of free speech, and allowing the state to control what beliefs most people are exposed to in their childhood gives the majority too much control over minorities’ beliefs. To ensure that students are at least exposed to standard beliefs like evolution that they might be taught are false, of course, we have mandatory standardized tests.

    (Although I’ve never seen anything in a course that provides much convincing evidence for evolution, or other scientific facts that are controversial among the general public. Biology courses mostly expect students to accept evolution on authority, just like they expect them to accept the role of the endoplasmic reticulum on authority. This is a poor approach when some students come into the class believing that evolution is a fraud — these courses should really go into depth on some of the clear-cut evidence. Ditto for the Big Bang and so on.)

    Anyway, to all this one could reply that students should be exposed to a diversity of beliefs, and not go to a school that solely consists of people like them, so as to make them less dogmatic and reduce conflict within society. This is a plausible line of argument (although I don’t think I buy it), but again, it could be solved by narrowly-targeted regulation of some kind, far short of total government control of the school system.

    Paul Clayton: The Biblical Israelite model of governance isn’t at all like the U.S. model. There’s really no separation of powers at all in the Biblical model: the judge or king runs everything, and the priests get certain rights but have no real say. Prophets don’t fit in anywhere, since they can range from random people who have no practical authority (e.g., Jonah) to judges, kings, or priests (Joshua, Solomon, Aaron, etc.). Likewise, medieval Christian rule generally gave the secular authorities total power provided that they were suitably respectful of the clergy. There are certainly major Judeo-Christian influences on the Constitution, but separation of powers isn’t one of them.

  12. redbrain said,

    October 1, 2010 @ 10:55 am

    i havent finished reading it all yet read most of it (well written btw!) but yeah i personally think education over here in uk and northern ireland where i am is a bit of a mess to say the least. For example over here we do whats called your GCSE’s and you can leave high school with this or do what i did, and do Alevels and then go to university. I was only ever good at Math and never took any computer courses in school i just thought i like playing computer games untill i went to university.

    But when you take these high school exams be them the GCSE’s or Alevels there are whats called different exam boards. So over here we sat the CCEA exams which gave you your gcse or your alevel in math or further math etc and its seem to be the hardest exam board but on the contrary you can do the AQA a seen to be much easier exam but gives you the exact same qualification. And we even got to compare the exams and the aqa ones where much easier by leading you into questions etc and you didnt have to learn as many techniques etc but the ccea one gave you problems which you had to solve which makes more sense in my book. I think we need a more standard exam and i dont nessecarily agree with the amount of choice students get in high school these days, it gives the notion that a student can focus in a certain area be it science or art etc but really whats important for high school students is math science language, and maybe a choice to do something like 1 of the following art music history etc. And focus on teaching the core things very well instead of others.

    But then i guess thats why i do a degree in Mathematics and Computer Science lol, i try to do mostly math since the computer science dept here isnt that great and told me off for handing in a software project that was working on linux and not windows since they didnt know what a Makefile was :S.

RSS feed for comments on this post · TrackBack URI

You must be logged in to post a comment.