Scoring

The AWA section is scored separately from the Quantitative and Verbal sections. Because schools place more em- phasis on the combined score from the Quantitative and Verbal sections (scored on a 200 to 800 point scale), most students focus on those sections of the test and spend less time preparing for the essay section.

Analytical Writing Assessment (AWA) Scoring

The overall AWA score will be one number ranging from zero (low) to six (high), inclusive, in half-point increments (for example, 4.5 is one possible score). In order to calculate that number, each essay is first given two separate ratings on the zero to six scale. One rating is given by a person and the other by a software program called GMAT Write™, an automated essay-scoring engine. In total, there will be four scores for the two essays. These four scores are averaged and rounded up to the nearest half-point increment, resulting in an overall essay score between zero and six. A score of zero will be given only if the essay is not completed or if the essay did not address the given topic.

If the initial two ratings for an essay (the one given by a person and the one assigned by GMAT Write) differ by more than one point, then a second person will be assigned to rate the essay and resolve the discrepancy.

The GMAT Scoring Algorithm for the Quantitative and Verbal Sections

Scores on the GMAT are not based on the percentage of questions answered correctly. Tests you took in school were generally based on percentage of questions correct: the more you got right, the higher the score you received. As a result, we have been trained to take our time and try to get everything right when we take a test. This general strategy does not work well on computer-adaptive tests such as the GMAT. On the GMAT, most people actually answer similar percentages of questions correctly, typically in the 50% to 70% range (even at high scoring levels).

If test-takers all get a similar percentage correct, how does the GMAT distinguish among different performance levels? “Regular” school tests gave everyone the same questions and performance was determined based upon percentage correct. On the GMAT, everybody answers different questions, some easier, some harder. You can think of the GMAT as a test that searches for each person’s “60% level,” or the difficulty range in which the person is able to answer approximately 60% of the questions correctly. (This is not exactly what happens, but it’s a good way to think of the difference between “regular” tests and computer-adaptive tests.) Your score will be determined by the difficulty of the questions that you answer correctly versus the difficulty of those that you answer incorrectly.

How the Algorithm Works: An Overview

The approach discussed above requires the test-writers to know something about the difficulty level of the various questions offered on the exam. Although the “difficulty level” or “difficulty bucket” of an individual question is often talked about, the questions actually are not ranked by a specific percentile or difficulty level. Instead, each question has what’s called an “Item Characteristic Curve” (ICC), a probability curve that describes how likely it is for a student of a certain ability level to get that particular question right.

The ICC shown above indicates that a student with a 500-level ability has a 30% chance of answering this question correctly. A student with a 700-level ability, by contrast, has a 90% chance of answering this question correctly. Every question has its own ICC, developed during the experimental phase (discussed later in this section). Every question also has its own inverse ICC, a curve that shows the probability of answering the question incorrectly; this curve is called an inverse ICC because it is simply the mirror image, or inverse, of the regular ICC.

Now, think back to your studies of probability. When you want to calculate the probability of multiple events occurring (say, flipping a coin twice and getting heads each time), you multiply the probabilities of each individual event. In this coin example, you would multiply by to calculate an overall probability of . The same thing happens on the GMAT, but the overall curves get multiplied, not just single numbers. If you get a question right, the scoring algorithm uses the regular ICC; if you get a question wrong, the scoring algorithm uses the inverse ICC. All of the curves (regular or inverse) for all of the questions you’ve answered are then multiplied to give a new “estimator” curve. That new estimator curve will look like a bell curve (pictured below), with a peak somewhere in between the two end-points; this peak represents the algorithm’s best estimate of the test-taker’s current performance up to that point on the test.

Calculating the Scaled Scores

An individual, two-digit score, called a scaled score, will be calculated for the Quantitative and Verbal multiple-choice sections. While both sections will be scored on a scale of zero (low) to sixty (high), the two scoring scales are not the same. For example, a scaled score of 40 on the Quantitative section represents the 58th percentile, while a scaled score of 40 on the Verbal section represents the 89th percentile (all statistics as of November 2009). Essentially, the same scaled score, 40, represents a much higher performance on Verbal than on Quantitative.

The two individual sub-scores are then converted into one three-digit scaled score given on a scale of 200 (low) to 800 (high). This is the score people are talking about when they tell you what they got on the GMAT. The exact conversion mechanism, from two-digit sub-scores to three-digit scaled score, has not been made public by the test- makers, but the Verbal sub-score appears to be given somewhat more weight in the overall score than the Quantitative sub-score (this effect can range from minimal to mild, depending upon the exact mix of sub-scores).

Pacing and a Bit More About the GMAT Algorithm

Because of the way the scoring works on an adaptive test, there are some crucial recommendations for maximizing your score when taking the GMAT.

To begin with, you need to accept that you are going to get a lot of questions wrong. Not only do you not need to get everything right, you actively do not even want to try to get everything right. Such an attempt will likely negatively impact your score.

How is that possible? Let’s revisit the scoring algorithm for a moment.

Because of the way the algorithm works, certain events cause especially steep drops in scoring.

First, getting an easier question wrong hurts your score more than getting a harder question wrong. In fact, the easier the question, relative to your overall score at that point, the more damage to your score if you get the question wrong.
(Note: it is still very possible to get the score you want even if you make mistakes on a few of the easier questions.)

Second, getting three or four questions wrong in a row hurts your score more, on a per-question basis, than getting the same number of questions wrong but having them interspersed with correct answers. In other words, the effective per-question penalty actually increases as you have more questions wrong in a row.

The second of the two penalties just discussed is the more important of the two: it is critical to avoid putting yourself in a position to get more than four questions wrong in a row. The most common way in which people miss that many questions in a row is by mismanaging their time. The most widespread scenario is simply running low on time — using too much time earlier in the section and then having to rush towards the end, increasing the chances of making multiple mistakes. A less common scenario is rushing toward the beginning of the section due to general time pressure, thus making multiple mistakes in a row early on.

Note: the biggest penalty of all is reserved for running out of time before all of the questions have been answered. If you are running out of time, guess an answer for the remaining questions; getting a question wrong will hurt your score less than leaving a question blank.

So, given the significant differences in the way that this test is scored (compared to regular paper tests), test-takers need to approach computer-adaptive tests with a different mindset. It’s critical to maintain steady timing, giving yourself a fair chance at every single question, including the last one. This means you will have to “let go” of some questions, guess, and move on; most people have to do this on 4 to 7 questions per section.

If you have trouble adopting this mindset, pretend you’re playing tennis — yes, tennis! In tennis, you don’t need to win every point in order to win the match. Ultimately, the point that matters most is the very last point; that’s the one you absolutely have to win in order to win the match. Your overall goal is to put yourself into a position to win that last point.

The GMAT is similar to tennis in this regard: you need to put yourself into the best position possible to “win that last point,” or have a chance to answer the last question correctly (though, on the GMAT, it’s still okay to get that last question wrong). Getting any one question right along the way is not worth it if you have to spend so much extra time that you do not even give yourself a chance to “win that last point” — that is, if you cause yourself to run out of time before you’re done with the section.

This is exactly why it is so critical to maintain steady timing throughout the test, giving you a fair chance at every single question, including the last one. As you study and take practice tests, keep reminding yourself of the tennis analogy to help maintain that steady timing. If necessary, let a problem go; guess and move on if it is taking too much time. Remember, almost everyone has to do this at least a few times during the test.

Debunking a Myth: The Early Questions are NOT Worth More

Many test-takers are under the impression that the earlier questions on the exam are worth more and thus believe more time should be spent on those questions. That line of thinking is actually based on a complete myth.

About 10 years ago, when the GMAT first switched to the computer-adaptive format, some researchers at the Educational Testing Service (Manfred Steffen and Walter D. Way) did a study on adaptive testing. (At the time, the Educational Testing Service was responsible for making the GMAT.) In the study, they examined many different simulated scenarios, starting with what would happen if someone got the first question right vs. wrong, or the first two questions right vs. wrong (the remaining questions were answered identically). The results showed that answering the first questions correctly led to a score increase in some circumstances, but the simulation didn’t completely mirror reality. The assumption was that the test-taker did not take any extra time to answer those early questions correctly.

The researchers later adjusted the simulation to account for the reality of the situation: spending more time on earlier questions may improve performance earlier in the section, but it would also decrease performance toward the end of the exam due to lack of time later in that section. The researchers first assumed that the test-taker would answer a certain number of questions in a row correctly at the beginning, earning a certain score premium at that point on the test. Next, the researchers calculated how many questions the test-taker could answer incorrectly in a row at the end without offsetting that score premium earned at the beginning. In other words, if the test-taker had more questions wrong in a row at the end than “allowed,” then the score premium earned earlier would be completely erased; if enough problems were wrong, the test-taker could see a significant drop in the score.

Note: the researchers assumed that spending extra time at the beginning automatically meant that those questions would be answered correctly. Obviously, when you spend extra time on the real test, there is no guarantee that you will answer that question correctly!

“True” level # questions correct at beginning # allowed wrong at end before score is damaged
370 3 6
500 3 3
780 5 1

All data from “Test-Taking Strategies in Computerized Adaptive Testing, Steffen and Way, Educational Testing Service, presented at the National Council on Measurement in Education, Montreal, April 1999.

Let’s look at the data. If a 370-level scorer could get the first 3 questions right, the test-taker could get as many as 6 questions wrong in a row at the end before wiping out the score premium earned at the beginning. That sounds pretty good, except for one thing: it’s very unlikely that a 370-level scorer will answer the first three questions in a row correctly, no matter how much extra time is spent.

The performance for a mid-level scorer at the 500-level ends up evening out. The extra time spent to get 3 questions right at the beginning would probably result in at least 2, if not 3, wrong answers at the end, due to lack of time. In addition, it would be challenging for a 500-level tester to answer the first 3 questions in a row correctly, regardless of time spent.

Now let’s look at a high-level scorer at the 780-level. If the highest-level test-taker answers the first 5 questions in a row correctly, he or she cannot get more than one question wrong at the end; if the test-taker does get more than one wrong at the end, then the score premium earned from the first 5 problems will completely disappear! This means that the highest-level test-taker has to answer all of those early questions correctly while spending almost no extra time.

So what’s the big take-away? If you want to spend an extra 15 to 20 seconds on a few of the early questions, feel free to do so – but choose to do so specifically because the problem seems to warrant a little bit of extra time, not just because the problem is an early problem. Absolutely do not, however, spend 60+ extra seconds on those early questions (or any questions anywhere in the section); the data clearly shows that it’s not worth it in the end.

Why Educated Guessing is Important

Given what was discussed earlier about scoring and timing, you should anticipate guessing on some questions. There are two kinds of guessing: random and educated.

Random guessing is exactly what it sounds like: you have no idea what to do on a problem (or maybe you don’t even have time to read the problem) and you guess randomly from among the five answer choices, giving you a 20% chance of answering the question correctly. Ideally, you would like to avoid having to make any random guesses at all during the GMAT.

You cannot, however, entirely avoid making guesses on the test, so when you do have to guess, you want to make educated guesses. An educated guess is simply this: you identify and cross off some wrong answers before guessing, improving the odds that you will guess correctly. (On occasion, you may be able to use educated guessing to identify and eliminate all four wrong answers, so you can answer the question correctly even if you don’t know how to figure out the right answer in the “official” way!) There are multiple ways to make an educated guess, and different methods are appropriate for different kinds of questions. One of your tasks, when studying, is to learn how to make educated guesses, depending upon the type of problem or the content being tested.

Here are some examples of educated guessing methods on the Quantitative section (there are many more than appear in this table):

Note: PS = Problem Solving; DS = Data Sufficiency

Problem Type Technique When we can use the technique
PS Estimation when the answers are in the form of real numbers
PS Test real numbers when the answers are in the form of variable expressions (e.g., 3x+5)
PS or DS Test real numbers when the problem tests pure theory; the solution is not tied to a specific real number
PS Partial answer when a number appears part-way through the calculations and is also in the answer choices, that number is almost never the right answer
PS Wrong calculation when an answer choice is the result of calculations that you know are the wrong way to solve the problem; for instance, if an answer is the result of multiplying two numbers but you know that multiplication is the wrong way to solve
Rate or Work Odd one out when the answers are in "pairs," eliminate the "odd one out." For instance, Johnny and Susie together walk a total of 20 miles. How far does Johnny walk? 6, 8, 9, 11, 12. The answers are in pairs of possibilities for Johnny and Susie: 8+12 = 20 and 9+11 = 20. 6 is the odd one out.

On the Verbal section, it is rare not to be able to eliminate at least one answer choice (in particular, on Sentence Correction) and it is often the case that you’ll find the right answer on a verbal question by first eliminating the four wrong answers. On Sentence Correction, even if you don’t know all of the grammar rules being tested, you will likely know or be able to make a good guess about at least one of the rules.

To get better at eliminating wrong answers on Verbal, you need to study not just why the wrong answers are wrong, but why the tempting wrong answers are so tempting. You also need to study both why the right answer is right and why someone might mistakenly think the right answer is wrong. On Reading Comprehension questions, for example, wrong answers are often “out of scope” — things that go beyond the scope of the information given in the passage. Even if you’re not entirely sure what the question is asking, you may be able to eliminate a couple of choices because they talk about things that were not actually discussed in the passage.

There are innumerable ways in which you can make these kinds of educated guesses on both the Quantitative and Verbal sections; it’s necessary to analyze problems (ideally from one of the official sources published by GMAC, the makers of the test) in order to learn how to eliminate wrong answers effectively. Remember to include time for this analysis in your study.

Experimental Questions

The GMAT includes what are called experimental, or nonoperational, questions. These questions do not count at all towards your score; instead, the test-makers are testing these questions on you in order to determine the Item Characteristic Curves (among other things) so that these questions can be used on future tests. (See the discussion on algorithms, earlier in this document, for more on Item Characteristic Curves.) Up to ten of the questions in each multiple-choice section (Quantitative and Verbal) may be experimental. (That means up to 25% of the questions that you answer in each section do not count toward your score.)

There are two big drawbacks to this. First, you have no idea which questions are experimental. You have to assume that any question you see counts; even very experienced test-takers cannot tell which ones don’t count. Second, the experimental questions don’t have any assigned difficulty level, so the algorithm doesn’t know that it’s about to give a very high-level test-taker a 10th percentile question, or vice versa. If you’re suddenly given a question that seems much easier than the previous questions, that doesn’t mean you’ve bombed the test; the question may be experimental. By the same token, if you suddenly see an impossible question, don’t despair or celebrate; again, there is a good chance that the question is experimental. Try your best within the expected timeframe for a question of that type and then move on.