Mathematics

Quartiles

Quartiles are values that divide a data set into four equal parts, each representing 25% of the data. The first quartile (Q1) marks the 25th percentile, the second quartile (Q2) is the median, and the third quartile (Q3) represents the 75th percentile. Quartiles are useful for understanding the spread and distribution of a dataset.

Written by Perlego with AI-assistance

10 Key excerpts on "Quartiles"

  • Stats Means Business
    eBook - ePub

    Stats Means Business

    Statistics and Business Analytics for Business, Hospitality and Tourism

    • John Buglear(Author)
    • 2019(Publication Date)
    • Routledge
      (Publisher)
    One way of looking at the median, or middle observation, of a distribution is to regard it as the point that separates the distribution into two equal halves, one consisting of the lower half of the observations and the other consisting of the upper half of the observations. The median, in effect, cuts the distribution into two.
    If the median is a single cut that divides a distribution into two, the Quartiles are a set of three separate points in a distribution that divide it into four equal quarters. The first, or lower quartile, known as Q1, is the point that separates the lowest quarter of the observations in a distribution from the rest. The second quartile is the median itself; it separates the lower two quarters (i.e. the lower half) of the observations in the distribution from the upper two quarters (i.e. the upper half). The third, or upper quartile, known as Q3, separates the highest quarter of observations in the distribution from the rest.
    The median and the Quartiles are known as order statistics because their values are determined by using the order or sequence of observations in a distribution. You may come across other order statistics such as deciles, which divide a distribution into tenths, and percentiles, which divide a distribution into hundredths.
    You can find the Quartiles of a distribution from an array or a stem and leaf display of the observations in the distribution. The quartile position is halfway between the end of the distribution and the median, so it can be defined in relation to the median position, which is (n + 1)/2, where n
  • Interpreting Statistics for Beginners
    eBook - ePub

    Interpreting Statistics for Beginners

    A Guide for Behavioural and Social Scientists

    • Vladimir Hedrih, Andjelka Hedrih(Authors)
    • 2022(Publication Date)
    • Routledge
      (Publisher)
    how big certain value is in comparison to values of entities in a sample) and these are based on percentages i.e. division of the sample into 1/100th part fractions. However, there is no universal rule stating that position in the distribution must be indicated using 1/100th part fractions. This means that any other fraction i.e. dividing the whole distribution into any other number of equal parts, is equally valid. The general name for such divisions of the distribution into a certain number of equal parts is quantiles (Harding et al., 2014), also often called n-tiles where n stands for the designation of the number of parts the whole distribution is divided into. Percentiles are division of the distribution into 100 equal parts, but it can be any number of parts with equal validity. Other commonly used quantiles aside from percentiles are: Quartiles – that divide the distribution into 4 equal parts. There are therefore 4 Quartiles. the 1st quartile corresponds to the 25th percentile, the 2nd quartile to the 50th percentile, 3rd equals the 75th percentile and the 4th quartile equals the 100th percentile. It should be noted that some authors use Quartiles not as indications of points in the distribution, but to denote intervals on the distribution. In such a system the 1st quartile refers to the quarter of the sample with the lowest scores, 2nd quartile refers to variable values that are between the 25th and the 50th percentile, the 3rd quartile refers to values that are between the 50th and the 75th percentile and the 4th quartile would be comprised of the top quarter of the sample with regard to the values on the considered variable. Quintiles – divide the distribution into 5 equal parts. All the details described for the Quartiles hold equally with quintiles with the only difference being that the sample is now divided into 5 equal groups instead of 4. Deciles – divide the sample into 10 equal groups
  • Statistics and Data Visualisation with Python
    quantiles and they split our observations into equal groups.
    The median splits a dataset into two halves.
    Quantiles split our data into equal groups.
    We may want to create not only two groups, but for example four groups, such that the values that split the data give 25% of the full set each. In this case these groups have a special name: Quartiles. Note that if we partition our data into n groups, we have
    n 1
    quantiles. In the case of Quartiles we have therefore 3 of them, with the middle one being equal to the median. Other quantiles with special names include deciles, which split the data into 10 groups, and percentiles split it into 100 groups. In this case the
    50
    t h
    percentile corresponds to the median.
    Quartiles split the data into 4 groups. The median is therefore also called the second quartile.
    Percentiles partition the data into 100 equal groups. The median is thus the
    50
    t h
    percentile.
    We find our
    n 1
    quantiles by first ranking the data in order, and then cutting it into
    n 1
    equally spaced points on the interval, obtaining n groups. It is important to mention that the terms quartile, decile, percentile, etc. refer to the cut-off points and not to the groups obtained. The groups should be referred to as quarters, tenths, etc.
    We need to order our data to obtain our quantiles.
    As we mentioned before, quantiles are useful to specify the position of a set of data. In that way, given an unknown distribution of observations we may be able to compare its quantiles against the values of a known distribution. This can help us determine whether a model is a good fit for our data. A widely used method is to create a scatterplot known as the Q-Q plot or quantile-quantile plot. If the result is (roughly) linear, the model is indeed a good fit.
  • Understanding Quantitative Data in Educational Research
    Figure 5.1 One-dimensional plot for the MathScores.csv data set, which shows the range

    5.2 Percentiles, deciles and Quartiles

    Percentiles are values which divide a set of data into 100 equal intervals; each interval contains 1% of the elements in the data set. There are thus a total of 99 percentiles, and they are a useful and convenient way of ranking data sets with many observations or data values. Deciles are values which divide the data set into 10 intervals, and each interval contains 10% of the elements. A percentile or decile range can also be calculated, and this is the difference between two specified percentiles. The most common decile range is the 10–90, which is a robust estimator of spread and can be found by calculating the difference between the 90th percentile and the 10th percentile.
    The Quartiles are the values of the variable one-quarter, two-quarters and three-quarters of the way through the distribution. The value of the variable one-quarter or 25% of the way through the distribution is called the lower or first quartile (Q1 ), and the one which is three-quarters or 75% of the way through the distribution is called the upper or third quartile (Q3 ). The second quartile (Q2 ), which is 50% of the way through the distribution is the median. Since a quarter of the distribution lies below the lower quartile and a quarter of the distribution lies above the upper quartile, half of the distribution lies between these two Quartiles. Note that percentile, decile or quartile is not a percentage; it is a value in the data set that marks a certain percentage of the way through the data.
    There are different ways (or algorithms) to estimate the percentile, deciles and Quartiles because there are situations when there is no score that is
  • Empirical Political Analysis
    eBook - ePub

    Empirical Political Analysis

    International Edition

    • Richard C. Rich, Craig Leonard Brians, Jarol B. Manheim, Lars Willnat(Authors)
    • 2018(Publication Date)
    • Routledge
      (Publisher)
    We then subtract the value associated with the twenty-first case from that associated with the eighty-first (q = q 4 − q 1 = 4 − 1 = 3) to obtain the quintile range. In sample 3, the equivalent computation yields a quintile range of 1(q = 3 − 2 = 1), suggesting by comparison that this distribution is better typified by its median of 2.5 than is sample 2 by its median of 3. An examination of the two frequency distributions will confirm the validity of this conclusion. One difficulty in interpreting quantile ranges is that they are extremely sensitive to variation in the number of categories on a given variable. The more categories there are, the greater the range is likely to be. For this reason, quantile ranges can prove difficult to interpret for comparison between variables that differ significantly in their number of categories. For similarly coded variables, for longitudinal or cross-sectional comparisons of the values of any single variable, or for some absolute indication of variability around the median, however, quantile ranges are generally quite adequate. Measures for Interval/Ratio Variables Interval data provide us with the most complete information of all, including categorization, rank, and distance. Interval values can be subjected to any arithmetic manipulation. Consequently, our measures of central tendency and dispersion for interval data can and should take this added information and capability into account. The Mean The measure of central tendency for interval data is the mean —a measure that locates the central point of a distribution in terms of both the number of cases on either side of that point and their distance from it. The mean of a distribution is the statistic many people commonly associate with the term average. You can visualize the nature of the mean by using Figure 15.2
  • Mathematics for Biological Scientists
    • Mike Aitken, Bill Broadhurst, Stephen Hladky(Authors)
    • 2009(Publication Date)
    • Garland Science
      (Publisher)
    n + 1)/2!
    Similarly, we can measure dispersion in terms of its range, which is just as it sounds (the smallest to largest values in the data set). The range of our frog mass data is from 7.7 g to 42.5 g, i.e. 34.8 g.
    Additional information about dispersion is provided by the interquartile range (IQR) or H-spread. This is the range of the upper and lower quartile boundaries (also called hinges) of our data set. What are quartile boundaries? These are the values that divide our data, when placed in ascending order, into four equally sized sets, or Quartiles. Now, as the median splits the data into halves, we need two other values to split each half into two quarters: these are known as the upper and lower quartile boundaries (UQB and LQB). For most samples, there are no values that precisely split the data into four quarters, and a variety of techniques exists to generate the quartile boundaries. The commonest, and simplest, is as follows.
    Split your data at the median into two sets. If you started with an odd number of items, include the median in both of these sets. The lower quartile boundary is the median of the bottom half of the data set, and the upper quartile boundary is the median of the top half of the data set (because the lower quartile boundary is halfway into the first half, it is one quarter of the way from the bottom; because the upper quartile boundary is halfway into the top half, it is three quarters of the way from the bottom). By definition, one half of the data points lie within the IQR (i.e. between upper and lower quartile boundaries).
    In our frog mass data, there are 50 data points. The lower quartile boundary is the median of the lower 25 numbers; that is, the 13th number from the bottom, and the upper quartile boundary is the 13th number up from the middle (38th in the overall ascending list). The 13th number in the list is 12.3, and the 38th is 25.7, thus the IQR is 12.3 g to 25.7 g, i.e. 13.4 g.
    We could have used a spreadsheet function to calculate the median and quartile boundaries. For example, Microsoft Excel® provides the functions ‘=MEDIAN(range)’ and ‘=QUARTILE(range,Q)’, in which Q = 1 for the lower quartile boundary and Q = 3 for the upper quartile boundary. However, if we use this QUARTILE function, we get slightly different values for the quartile boundaries than those given above, as Excel® uses a slightly different approach to the one given in the text for calculating Quartiles of small samples. Many statistical calculators use yet another technique slightly different to the one above for samples containing an odd number of data points (calculating the quartile boundaries from each half not
  • Conducting Research in Human Geography
    eBook - ePub

    Conducting Research in Human Geography

    theory, methodology and practice

    • Rob Kitchin, Nick Tate(Authors)
    • 2013(Publication Date)
    • Routledge
      (Publisher)
    Now calculate the quar- tile values. The upper quartile (also known as the upper hinge) is obtained from the value which is midway between the median and the highest value, and the lower quartile (lower hinge) is obtained from the value which is midway between the median and the lowest value. Similar to the calculation of the median, this may not coincide with an actual data value, but may be in between two values (Step 3). Using these values, we can define the interquartile range (also known as the midspread) as the difference between the upper and lower Quartiles. This is often used to identify the farthest points of the whiskers in the plot, and any extreme values (termed outliers) beyond these points (Hartwig and Dearing, 1979). Then, multiplication of the midspread by 1.5 produces a quantity known as the step (Erickson and Nosanchuk, 1992) (Step 4). Observations larger than the upper quartile plus the step, or smaller than the lower quartile minus the step, are identified as outliers. The largest and smallest values in the data set which are not beyond these limits are the end-points for the whiskers in the plot (Step 5). The final stage is to draw the box plot. Each end of the box is marked by the upper and lower Quartiles, with the median drawn as a line across the box. The whiskers extend to the smallest and largest values identified in Step 5, with any outliers identified by a symbol. This is often a dot, or as in the MINITAB software, an 'O' instead. Different data sets can be compared by con structing a series of adjacent box-and-whisker plots, and this often provides a useful comparison prior to a statistical test to test for differences between data sets such as the analysis of variance or t-test, which we will encounter in Chapter 5. As an example we will plot the sheep from Table 4.3 : see Boxes 4.16 and 4.17. 4.6 Probability Probability is concerned with the likelihood or chance of occurrence of a certain event
  • Mathematics Content for Elementary Teachers
    • Douglas K. Brumbaugh, Peggy L. Moch, MaryE Wilkinson(Authors)
    • 2004(Publication Date)
    • Routledge
      (Publisher)
    is considered to be an outlier or an unexpected input. When we were discussing the median values and ranges for Groups A and B, we purposely introduced an outlier, 1000 keys, to show what affect an outlier can have on the mean and the range while not affecting the median or mode. Sometimes it is easy to recognize an outlier. In our example, 1000 keys is a pretty extreme data point. However, if the data point is only one or two values outside the expected range, it may be difficult to spot. Look at the key data for Group C. How many of the data points are within one, two, or three standard deviations of the mean? Are there any outliers? Are any of the data points farther than 1.85, 3.70, or 5.55 from the mean of 2.7 keys?
    Your Turn Use the first 10 key data points reported by your classmates to answer the following:   3.  Find the variance of the sample using both formulas. Calculate each by hand and then verify your results using a software application.   4.  Find the standard deviation for this sample of your data.   5.  Look back at your full data set for the key data. Are there any outliers based on the standard deviation of the sample?   6.  Does this sample accurately represent the whole class? Why or why not?
    Quartiles .   Quartiles? What are Quartiles and why do I have to know about them? You need to be comfortable with Quartiles so you will be able to adequately and comfortably discuss standardized test scores. Quartiles are directly related to percentiles. So naturally the next question is, “What’s a percentile? Is it like a percent?”
    Percentiles should only be used when there is a very large data set, like scores from a whole state or at least scores from several school districts. You may have taken a standardized test on which you scored in the 85th percentile. Did you think that meant you made an 85% on the exam? Nope, you might have gotten 90% or 60% of the problems correct, but you did better than 85% of the other people who took the exam. Another way to think about it is that only 15% of the people taking the exam got more right answers than you did.
    The 50th percentile is approximately equal to the median score. A box and whisker plot, such as the one in Fig. 6.14
  • Practical Statistics for Geographers and Earth Scientists
    • Nigel Walford(Author)
    • 2011(Publication Date)
    • Wiley
      (Publisher)
    Each has its relative strengths and weaknesses and should be used with attributes or variables recorded according to the different scales of measurement. If a large number of values are possible, then the mode is unlikely to provide much useful information. At one extreme it might simply record that each value occurs only once or, slightly more promisingly, that one out of 50 values is repeated twice or perhaps even three times. In other words, the mode is only really helpful when dealing with nominal attributes that have a limited range of values representing the labels attached to the raw data. For example, respondents in a survey may be asked to answer an attitude battery question with 'Strongly Agree', 'Agree', 'Neutral', 'Disagree' or 'Strongly Disagree' as the possible predefined responses. Suppose further that these responses have been assigned numerical codes from 1 to 5 respectively. Although the number of potential values has been limited, it is entirely feasible that the mode will be one of the extreme values (1 or 5). Hence, the notion of central tendency with respect to the mode relates the most common nominal category, since in this example the values 1 to 5 do not imply any order of magnitude in measurement and there is no requirement that 'Strongly Agree' should have been labelled 1,'Agree' as 2, etc. It is simply a matter of arbitrary convenience to allocate the code numbers in this way – they could just as easily have been completely mixed up ('Strongly Agree' as 3, 'Agree' as 5, 'Neutral' as 4, 'Disagree' as 1 and 'Strongly Disagree' as 2). The median is, by definition, more likely to provide a central measure within a given set of numbers. The main drawback with the median is its limited focus on either a single central value or on the two values either side of the midpoint. It says nothing about the values at either extreme.
    Box 4.2a: Central tendency measures for nonspatial data: Sample site for the measurement of water temperature in a fluvioglacial stream from Les Bossons Glacier, France.
    Mean – population symbol: μ ; sample symbol:
    Box 4.2b: Calculation of the mode, median and mean.
    The various measures of central tendency are perhaps some of the most intuitive statistics (or parameters) available. Their purpose is to convey something about the typical value in an attribute or variable and allow you to quantify whether the central values in two or more sets of numbers are similar to or different from each other. Each of the main measures (mode, median and mean) can be determined for variables measured on the interval or ratio scale, ordinal variables can yield their mode and median, but only the mode is appropriate for nominal attributes. So, if you have some data for two or more samples of the same category of observations, for instance samples of downtown, suburban and rural households, then you could compare the amounts of time spent travelling to work.
    The mode is the most frequently occurring value in a set of numbers and is obtained by counting. If two or more adjacent values appear the same number of times, the question arises as to whether there are two modes (i.e. both values) or one, midway between them. If no value occurs more than once, then a mode cannot be determined. The median is the value that lies at the midpoint of an ordered set of numbers. One way of determining its value is simply to sort all the data values into either ascending or descending order and then identify the middle value. This works perfectly well if there is an odd, as opposed to even, number of values, since the median will necessarily be one of the recorded values. However, if there is an even number, the median lies halfway between the two observations in the middle, and will not be one of the recorded values when the two middle observations are different. Calculation of the arithmetic mean involves adding up or summing all the values for a variable and then dividing by the total number of values (i.e. the number of observations).
  • Designing and Conducting Research in Health and Human Performance
    • Tracey D. Matthews, Kimberly T. Kostelis(Authors)
    • 2011(Publication Date)
    • Jossey-Bass
      (Publisher)
    2 ).
    Figure 11.8 Standard Deviation
    Quartile Deviation.
    Quartile deviation is very similar to standard deviation, yet it examines the amount of deviation surrounding the median. More specifically, quartile deviation takes the middle 50 percent of the scores and divides it in half; taking half of the interquartile range.
    Quartile deviation: indicates the semi-interquartile range
    Quartile deviation is not commonly reported, yet it is appropriate for skewed distributions or when the level of data is ordinal. Standard Scores
    Using standard scores allows you to transform scores and examine sample data. This may be necessary to have a reference point when you are comparing research variables that have different units of measurement. This is also commonly used with norm-referenced assessments that you may be using to measure your research variable.
    Standard scores: data scores that can be transformed into common score points, including percentile ranks, z scores, and T scores
    Standard scores typically include transforming data into percentile ranks, z scores, and T scores. A common example is looking at a fitness battery of tests, such as mile run, sit-ups, push-ups, and sit-and-reach scores that all result in a different type of score. You would then transform each test score into a standard score and compare the different tests. You would then be able to compare apples to apples rather than apples to oranges.
    Hypothesis Testing
    Now that you have a better understanding of descriptive statistics to describe your sample data, you are ready to start to calculate inferential statistics. Inferential statistics attempt to draw conclusions to the target population based on the data that were collected from the sample. Before you begin the process of learning, calculating, and interpreting inferential statistics, we want to cover hypothesis testing. Hypothesis testing becomes your statistical decision-making process. This will not tell you the importance of your results; you are the judge of the importance, and you need to interpret your results in light of your research design. There is always a chance of a type I error or type II error, even if you find significance; these terms will be discussed in more detail. As a result, the important part is to take statistical results into consideration along with your judgments. The following list presents the steps in hypothesis testing that will be covered. Some of the steps have already been covered in Chapter Nine, and some steps will be covered in more detail in Chapter Twelve. Nevertheless, we want to introduce this process as your statistical decision-making process. Many aspects within hypothesis testing are true for any statistical analysis.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.