Mathematics

Quartiles and Interquartile Range

Quartiles are values that divide a data set into four equal parts. The interquartile range is the difference between the third quartile (Q3) and the first quartile (Q1), representing the middle 50% of the data. These measures are used to understand the spread and distribution of a data set, particularly in statistics and data analysis.

Written by Perlego with AI-assistance

12 Key excerpts on "Quartiles and Interquartile Range"

  • Practitioner's Guide to Statistics and Lean Six Sigma for Process Improvements
    • Mikel J. Harry, Prem S. Mann, Ofelia C. De Hodgins, Richard L. Hulbert, Christopher J. Lacke(Authors)
    • 2011(Publication Date)
    • Wiley
      (Publisher)
    quartiles . Looking at the root of this word, one might suspect that the quartiles divide a dataset into four quarters, just as the median divides a dataset into two halves. There are three quartiles that divide an arranged dataset in four equal parts. In fact, the median is one of the three quartiles, because it identifies the upper boundary of the second quarter of the data.
    The three quartiles are called the first (also called the lower ) quartile , the second quartile (which is also the median), and the third (also called the upper ) quartile . The first quartile divides the arranged data in such a way that 25% of the values are smaller than this quartile and 75% are larger than it. Similarly, 50% of the values are smaller than the second quartile and 50% are larger; and 75% of the values are smaller than the third quartile and 25% are larger. To calculate the three quartiles, first we arrange the given data in increasing order. The value of the middle term in this arranged dataset is the second quartile, which is the same as the median. The value of the middle term in the observations smaller than the median gives the first quartile. The value of the middle term in the observations larger than the median gives the third quartile. The three quartiles are denoted by Q 1 , Q 2 , and Q 3 , respectively.
    The interquartile range (IQR), is obtained by taking the difference between the third quartile and the first quartile:
    Example 8.10 shows how to calculate the three quartiles and the IQR.
    Example 8.10 The following dataset, reproduced from Example 8.3, gives rents paid per month by 30 tenants selected from a small city:
    Find the three quartiles and the interquartile range for these data.
    Solution To find the three quartiles for these data, first we arrange these 30 values in increasing order as follows.
    The value of the middle term in these 30 observations is given by the average of the 15th and the 16th values as was discussed in Example 8.7. Thus
  • Understanding Quantitative Data in Educational Research
    Because the interquartile range is based only on the middle half of the distribution, or middle 50% of the values, it reflects only the dispersion in this defined section of the distribution; the outliers do not impact it. We should always also report the first and third quartiles because a quarter of scores were smaller than the lower quartile, and a quarter higher than the upper quartile. In Example 5.1, the interquartile range is as follows:
    which means 50% of the values lie within an interval of length 28, whereas the range is a much higher value, 69. The interquartile range is best used with measurements such as the median and total range to give a complete picture of the distribution of data around the median. For example, the higher the interquartile range, the more spread out the data values are around the mean.
    Displaying the quartiles and the interquartile range in R using the boxplot Create the boxplot for the data set from Example 5.1:
    boxplot(MathTest$Scores, horizontal=TRUE, xlab="Scores")
    Add the quartiles, maximum and minimum values to the graph:
    text(x = boxplot.stats(MathTest$Scores)$stats, labels = boxplot.stats(MathTest$Scores)$stats, y = 1.25)
    Add appropriate text to the graph above each quartile and the maximum and minimum values:
    text(x=boxplot.stats(MathTest$Scores)$stats, labels = c("Min", "Q1", "Q2", "Q3", "Max"), y = 1.3)
    Figure 5.3 Boxplot for Example 5.1
    Figure 5.3 shows the values of each quartile on the boxplot. The box represents the distance between the first quartile (Q1 = 47) and the third quartile (Q3 = 75), which is the interquartile range (IQR = Q3 Q1 = 75 – 47 = 28). The second quartile (Q2 = 57) is the median. The whiskers extend to the smallest and largest data values that are not outliers, in this example to the minimum and maximum values, 28 and 97, respectively.

    Interpretation of boxplots and quartiles

    In Example 5.1, knowing that the median (Q2 ) is 57, shown in Figure 5.3 as the thickest line inside the box, gives no indication of how the scores are spread. In addition, if we calculate the first quartile (Q1 ), which is 47, and the third quartile (Q3
  • Research with People
    eBook - ePub

    Research with People

    Theory, Plans and Practicals

    There is no real difference between these and you should feel free to use whichever form you prefer the look of. The second version is perhaps slightly easier to read, as long as it is clear you have used a dash rather than a minus sign!
    The interquartile range and interpercentile range
    The interquartile range is a version of the range which should not be affected by extreme scores, and it is really quite a simple idea. If you line up all your scores from the lowest to the highest, the interquartile range is the middle 50% of the scores. Look at this set of 12 scores, which we have arranged from the lowest to the highest:
    The mean of these 12 scores is 18.25 and the interquartile range – calculated from the central six scores – is 7 to 15. So, if we were describing this set of numbers to somebody, we might say something like ‘Our data had a mean score of 18.25 with an interquartile range of 7–15.’
    You can see how the interquartile range deals with extreme scores – outliers – by the magnificently simple method of totally ignoring them: if only we could deal with all life’s difficulties this way!
    Remember the big set of ages in the previous section, which had a mean of 20 with a range of 40, even though almost everybody in the group was aged 20? If we were to use the interquartile range, we would describe these data as having a mean of 20 with an interquartile range of 0 (20 minus 20). We’re sure you’ll agree this value of zero, indicating no variation, can give a more accurate picture of what the group looks like.
    The main difficulty with the interquartile range is that somebody without any statistics training will run away if you use the word ‘interquartile’ at them.
    The interquartile range is making use of percentiles.
  • Making Sense of Statistics
    eBook - ePub

    Making Sense of Statistics

    A Conceptual Overview

    • Deborah M. Oh, Fred Pyrczak(Authors)
    • 2023(Publication Date)
    • Routledge
      (Publisher)
    Figure 13.1 ). Notice that the scores are in order from low to high. The arrow on the left separates the lowest 25% from the middle 50%, and the arrow on the right separates the highest 25% from the middle 50%. It turns out that the range for the middle 50% is 3 points. For those interested in the computation of the IQR, notice that the right arrow is at 5.5 and the left arrow is at 2.5. By subtracting (5.5 – 2.5 = 3.0), the approximate IQR is obtained.
    Figure 13.1 Illustration of the meaning of interquartile range (IQR)
    When 3.0 is reported as the IQR, consumers of research will know that the range of the middle 50% of participants is 3 points, indicating little variability for the majority of the participants. Note that the undue influence of the outlier score of 20 has been overcome by using the interquartile range. In some research, semi-interquartile range is used, which is half of the interquartile range, measuring the mean of the two middle quartiles.
    Semi-interquartile range
    is half of the interquartile range, meaning the mean of the two middle quartiles.
    Example 3 Scores: 2, 2, 2, 3, 4, 4, 5, 5, 5, 6, 6, 20
    The interquartile range may be thought of as the “first cousin” of the median.3 You recall that median is reported for skewed data that may be caused by extreme outliers. (To review median, see Chapter 12 .) Thus, when the median is reported as the average for a set of scores, it is customary to also report the interquartile range as the measure of variability.4
    When median is reported as the average, IQR is usually reported as a measure of variability.
    As a general rule, it is customary to report the value of an average (such as the value of the median) first, followed by the value of a measure of variability (such as the interquartile range).
    A summary of measures of central tendency discussed in Chapter 12 and measures of variability (range and interquartile range) are shown in Table 13.1 . Standard deviation, which is the most widely used measure of variability, will be discussed in Chapter 14
  • Statistical Inference
    eBook - ePub

    Statistical Inference

    A Short Course

    • Michael J. Panik(Author)
    • 2012(Publication Date)
    • Wiley
      (Publisher)
    3 . Remember that quartiles are “positional values.” Hence the following:
    (3.12a)
    (3.12b)
    (3.12c)
    (provided, of course, that our data have been arranged in an increasing sequence). Given Equation (3.12) , we can easily calculate the interquartile range (IQR ) as IQR = Q 3 Q 1 —it is the range between the first and third quartiles and serves to locate the middle 50% of the observations on a variable X . Then the quartile deviation is readily formed as
    (3.13)
    and can be used as a measure of dispersion in the middle half of a distribution—a device that is particularly useful when outliers are present.
    There are nine computed deciles that divide all the observations on X into 10 equal parts: D 1 , D 2 , . . . , D 9 . Thus 10% of all observations on X are below D 1 , 20% of all data points for X lie below D 2 , and so on. Here too deciles are “positional values” so that:
    (3.14)
    For instance, once all the observations on X are arranged in an increasing sequence, D 7 occupies the position (n +1).
    Finally, 99 computed percentiles divide the observations on X into 100 equal parts: P 1 , P 2 , . . . , P 99 . Here 1% of all observations on X lie below P 1 , 2% of the X values lie below P 2 , and so on. Looking to the positions of these percentiles, we find that:
    (3.15)
    with, say, P 40 found at (n + 1).
    It is important to note that we can actually move in the reverse direction and find the percentile that corresponds to a particular value of X . That is, to determine the percentile of the score X′ , find
    (3.16)
    provided, of course, that the observations on X are in an increasing sequence. The value obtained in Equation (3.16) is rounded to the nearest integer.
    Example 3.11
    Given the following set of observations on X : 3, 8, 8, 6, 7, 9, 5, 10, 11, 20, 19, 15, 11, 16, find Q 1 , D 2 and P 30 . Upon arranging these n = 14 X i
  • Research Methods in Politics
    positively skewed. In many universities, the final degree classification uses the median exam mark rather than average mark. In this way, exceptionally good or bad exam marks are discounted.
    Another statistical measure of the data is the range – the difference between the highest and lowest term. However, as noted earlier, the range can be distorted by exceptionally high or low outliers, e.g. the very mature student. In the revised class, the range would be 67 (i.e. 85–18). So statisticians developed the interquartile range. This is the difference between the first and third quartiles when the terms of the series are placed in ranked order. So in a series of 99 terms, the interquartile range is the difference between the 25th and 75th term. The first and third quartiles are identified in a similar way to the median by calculating
    For example, in our initial class of 22 students ranked by age, then the interquartile range is the difference between the ages of the 5.75th and 17.25th students, i.e. 19 and 20 = 1.
    The greatest use of median measurements probably lies in representing unequal distributions especially in terms of resources. Income is a prime example where a small number of people may have vast wealth and, at the other end of the scale, a large number virtually nothing. In these cases, the inequality is shown by calculating and comparing the income of the 10th and 90th ‘percentiles’. So, in a fair society where incomes are equal, then the 10th and 90th percentiles will be the same whereas, in less equal countries, the ratio of 10th to 90th percentiles may be as high as 100. A more sophisticated descriptive statistic for measuring unequal distribution of income and wealth is provided by the Gini coefficient. This is the ratio between the areas above and below the Lorenz curve
  • Empirical Political Analysis
    eBook - ePub

    Empirical Political Analysis

    International Edition

    • Richard C. Rich, Craig Leonard Brians, Jarol B. Manheim, Lars Willnat(Authors)
    • 2018(Publication Date)
    • Routledge
      (Publisher)
    We then subtract the value associated with the twenty-first case from that associated with the eighty-first (q = q 4 − q 1 = 4 − 1 = 3) to obtain the quintile range. In sample 3, the equivalent computation yields a quintile range of 1(q = 3 − 2 = 1), suggesting by comparison that this distribution is better typified by its median of 2.5 than is sample 2 by its median of 3. An examination of the two frequency distributions will confirm the validity of this conclusion. One difficulty in interpreting quantile ranges is that they are extremely sensitive to variation in the number of categories on a given variable. The more categories there are, the greater the range is likely to be. For this reason, quantile ranges can prove difficult to interpret for comparison between variables that differ significantly in their number of categories. For similarly coded variables, for longitudinal or cross-sectional comparisons of the values of any single variable, or for some absolute indication of variability around the median, however, quantile ranges are generally quite adequate. Measures for Interval/Ratio Variables Interval data provide us with the most complete information of all, including categorization, rank, and distance. Interval values can be subjected to any arithmetic manipulation. Consequently, our measures of central tendency and dispersion for interval data can and should take this added information and capability into account. The Mean The measure of central tendency for interval data is the mean —a measure that locates the central point of a distribution in terms of both the number of cases on either side of that point and their distance from it. The mean of a distribution is the statistic many people commonly associate with the term average. You can visualize the nature of the mean by using Figure 15.2
  • Probability and Statistics for Computer Scientists
    σ / n , and it can be estimated by
    s (
    X ¯
    ) = s / n
    .
    8.2.6Interquartile range
    Sample mean, variance, and standard deviation are sensitive to outliers . If an extreme observation (an outlier ) erroneously appears in our data set, it can rather significantly affect the values of
    X ¯
    and s 2 .
    In practice, outliers may be a real problem that is hard to avoid. To detect and identify outliers, we need measures of variability that are not very sensitive to them. One such measure is an interquartile range.
    Definition 8.10
    An interquartile range is defined as the difference between the first and the third quartiles,
    I Q R =
    Q 3
    Q 1
    .
    It measures variability of data. Not much affected by outliers, it is often used to detect them. IQR is estimated by the sample interquartile range
    I Q R
    ^
    =
    Q ^
    3
    Q ^
    1
    .
    Detection of outliers
    A “rule of thumb” for identifying outliers is the rule of 1.5(IQR) . Measure
    1.5 (
    Q ^
    3
    Q ^
    1
    )
    down from the first quartile and up from the third quartile. All the data points observed outside of this interval are assumed suspiciously far. They are the first candidates to be handled as outliers.
    Remark: : The rule of 1.5(IQR ) originally comes from the assumption that the data are nearly normally distributed. If this is a valid assumption, then 99.3% of the population should appear within 1.5 interquartile ranges from quartiles (Exercise 8.4 ). It is so unlikely to see a value of X outside of this range that such an observation may be treated as an outlier.
    Example 8.18 (ANY OUTLYING CPU TIMES ?). Can we suspect that sample (8.1 ) (data set CPU ) has outliers? Compute
    I Q R
    ^
    =
    Q ^
    3
    Q ^
    1
    = 59 34 = 25
    and measure 1.5 interquartile ranges from each quartile:
    Q ^
    1
    1.5 (
    I Q R
    ^
    )
    =
    34 37.5
    =
    3.5 ;
    Q ^
    3
    + 1.5 (
    I Q R
    ^
    )
    =
    59 + 37.5
    = 96.5.
    In our data, one task took 139 seconds, which is well outside of the interval [− 3.5, 96.5]. This may be an outlier.
  • Teacher's Skills Tests For Dummies
    • Colin Beveridge, Andrew Green(Authors)
    • 2014(Publication Date)
    • For Dummies
      (Publisher)
    Quartiles are variations on the median (which we define earlier in ‘Meeting the three averages’): in a sense, the upper quartile is ‘a typical high value’ and the median of the upper half, and the lower quartile is ‘a typical low value’ and the median of the lower half. They’re not particularly interesting in their own right (as concerns the on-screen tests), but you use them to find the interquartile range.
    Here’s how to find a quartile:
    1. Put the list in order and count how many items it contains.
    2. Add one to the number: for the lower quartile, find a quarter of this number; for the upper quartile, find three-quarters of it (don’t worry if you don’t get a whole number).
      • If you have a whole number, you want that number item in the list. For instance, with seven items the lower quartile is the second item and the upper quartile is the sixth.
      • If you don’t have a whole number, you want to find the mean of the numbers on either side of it. For instance, with a list of 50, you get 12.75 as the number for the lower quartile, and you’d find the mean of the 12th and 13th numbers. The upper quartile number is 38.25, and so you’d find the mean of the 38th and 39th numbers.
    When you’ve found the quartiles, finding the interquartile range is just as simple as finding the range:
    1. Find the upper quartile.
    2. Find the lower quartile.
    3. Find the difference between them and that’s the interquartile range.

    Tracking Trends

    This section isn’t about catwalk fashions or the latest Internet hot topics. Trend in this context is any kind of consistent pattern.
    For example, what’s the next item in this sequence: 2, 4, 6, 8, … ? (You got 10, right? Good. We’d also accept ‘who do we appreciate’.) That sequence is an example of a trend and that’s really all you need to do: spot the pattern and carry it on.
    You may be asked in the test to decide whether a pattern has a consistent trend (something like ‘true or false: school attendance improved each year?’). That can be a bit of a booby-trap question, because even if attendance dropped for just one year (or even stayed the same), the answer is false. It’s only true if every year saw an increase.

    Carrying on extrapolating

    Extrapolating
  • Interpreting Statistics for Beginners
    eBook - ePub

    Interpreting Statistics for Beginners

    A Guide for Behavioural and Social Scientists

    • Vladimir Hedrih, Andjelka Hedrih(Authors)
    • 2022(Publication Date)
    • Routledge
      (Publisher)
    Such entities are called outliers. For example, one way to decide which entities are outliers is the so-called 1,5 interquartile range rule or the 1,5 IQR rule. It essentially states that we should multiply the interquartile range by 1.5 and then add that value to the 3rd quartile and subtract it from the 1st quartile. All entities whose values fall outside the interval defined by these two values are considered outliers according to this rule. (Figure 3.1). Figure 3.1 An illustration of the 1.5 interquartile range rule on a number line. 1.5 times the span of the interquartile range is added to the 75th percentile (i.e. 3rd quartile) and the same value is subtracted from the 25th percentile (i.e. 1st quartile). Entities outside the range defined by these two values are considered outliers. 3.5 How can a distribution be represented? Other than simply making a list of all entities in the sample with their values, i.e. as a vector, distribution is typically represented in literature either as a table in which all different values of the variables are presented with the frequency, proportion or percentage of each value or through a number of graphical means. Discrete distributions can be represented through pie charts, line diagrams and similar graphical depictions, while continuous distributions are typically presented using histograms, boxplots and graphical presentation of the probability density function (Figures 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8). Figure 3.2 A representation of a distribution through a line graph – distribution of education levels of the sample from a study of work-family relations in Serbia. The vertical axis represents frequencies i.e. the number of study participants in the category, while different education categories are listed on the horizontal axis (Hedrih, 2017)
  • Mathematics Content for Elementary Teachers
    • Douglas K. Brumbaugh, Peggy L. Moch, MaryE Wilkinson(Authors)
    • 2004(Publication Date)
    • Routledge
      (Publisher)
    Percentiles should only be used when there is a very large data set, like scores from a whole state or at least scores from several school districts. You may have taken a standardized test on which you scored in the 85th percentile. Did you think that meant you made an 85% on the exam? Nope, you might have gotten 90% or 60% of the problems correct, but you did better than 85% of the other people who took the exam. Another way to think about it is that only 15% of the people taking the exam got more right answers than you did.
    The 50th percentile is approximately equal to the median score. A box and whisker plot, such as the one in Fig. 6.14 , is often used when discussing the results of standardized tests. Remember, for the box and whisker plot we needed five things: the lowest score, the highest score, the second quartile (the median score), the first quartile (the median of the scores below the median), and the third quartile (the median of the scores above the median). If the median or second quartile is equivalent to the 50th percentile, then you might guess that the first quartile is the 25th percentile and the third quartile is the 75th percentile. This connection between percentiles and quartiles points out the usefulness of quartiles as well as box and whisker plots.
    Fig. 6.14.
    Data points falling outside of two standard deviations are considered outliers. But what if you only have a box and whisker plot? Aha! We can determine outliers by using the interquartile range (IQR). The IQR is found by subtracting the first quartile (Q1 ) from the third quartile (Q3 ) or Q3 Q1 = IQR. The expected range is calculated by subtracting 1.5 times IQR from the first quartile, and adding 1.5 times IQR to the third quartile [Q1 1.5IQR, Q3 + 1.5IQR]. Any data point outside these values is considered an outlier. Figure 6.14 is the box and whisker plot developed for the key data.
    Without peeking back, use Fig. 6.14 to determine what value represented the first quartile. What value represented the second quartile or median? What value represented the third quartile? You should have said 2 keys, 3 keys, and 5 keys, respectively. Using this information, Q1 = 2 and Q3 = 5, the IQR is 5 2 or 3.
    Because the product of the IQR and 1.5 is 4.5, Q1 1.5 (IQR) = 2.5, and Q3 + 1.5 (IQR) = 9.5 for the expected range. What were the fewest and most keys present according to the Fig. 6.14
  • Social Statistics
    eBook - ePub

    Social Statistics

    Managing Data, Conducting Analyses, Presenting Results

    • Thomas J. Linneman(Author)
    • 2017(Publication Date)
    • Routledge
      (Publisher)
    If the variable is measured at the nominal level, the only center we can find is the mode. For example, if we were examining the types of schools students attend (public, private secular, private religious), the type of school with the highest frequency of children would be the mode. If the variable is measured at the ordinal level, there is a mode, but we can also find the median. This is because the median requires ordered data, and we can order the categories of an ordinal-level variable. For example, if we are using an ordinal-level measure of education (less than high school, high school, some college, college degree, graduate degree), we would be able to say, hypothetically, that the modal respondent graduated from high school and the median respondent had some college. If we had a ratio-level variable, then we could find all three measures of the center: the mode, the median, and the mean. For example, if we were measuring education in years, we might find that the mode was 12 years, the median was 13 years, and the mean was 13.27 years of education. If you want to make sure that you have the ability to measure a variable at all three centers, then you have to make sure that you measure that variable at the ratio level.

    Close Relatives of the Median: Quartiles and Percentiles

    The median, with half the cases above and half below, can be considered by another name: the 50th percentile : 50% of the cases are above and 50% are below. Often in the real world, you will hear other percentiles. A common use of percentiles in today’s world is those unsavory standardized tests. If you received your grades, and found out that you were at the 50th percentile, this would mean that 50% of the people who took the test scored lower than your score, and 50% of the people who took the test scored higher than your score. If you found out you were at the 93rd percentile, this would mean that 93% of the people who took the test scored lower than your score, and 7% of the people who took the test scored higher than your score. Although the 50th percentile is the most often used of the percentiles, people sometimes use the 25th percentile and 75th percentile. With these three percentiles, people sometimes organize their results into quartiles :
    • First quartile: from the lowest case to the 25th percentile
    • Second quartile: from the 26th percentile to the 50th percentile
    • Third quartile: from the 51st percentile to the 75th percentile
    • Fourth quartile: from the 76th percentile to the highest case
    This provides a nice way to convert a large set of values to a mere four categories. Let’s use the frequency distribution from the GSS black women TV watching example from above. Here it is again, with the Cumulative Percent column added:
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.