Mathematics

Statistics

Statistics is a branch of mathematics that involves collecting, analyzing, interpreting, and presenting numerical data. It encompasses methods for summarizing and making inferences from data, as well as techniques for dealing with uncertainty and variability. Statistics is widely used in various fields such as science, business, economics, and social sciences to make informed decisions and draw meaningful conclusions from data.

Written by Perlego with AI-assistance

Related key terms

Data Interpretation

Engineering Statistics

Health Statistics

Inferences in Statistics

Official Statistics

Probability and Statistics

Statistical Analysis in Biology

Statistical Graphs

Statistical Measures

Statistical Models

12 Key excerpts on "Statistics"

eBook - ePub
Probability, Statistics, and Stochastic Processes for Engineers and Scientists
- Aliakbar Montazer Haghighi, Indika Wickramasinghe(Authors)
- 2020(Publication Date)
- CRC Press
  (Publisher)
4 Descriptive Statistics

4.1 Introduction and History of Statistics

It seems the word “Statistics” is based on “data” and “analysis of data”. In 1749, Statistik , a book in German, was published describing the analysis of demographic and economic data about the state (political arithmetic in English). However, the word “Statistics” originates from the Latin word “statisticum collegium”, which means “council of state”. “Statista” is the Italian word for Statistics that means “statesman” or “politician”. In 1800s, the word “Statistics” expanded its meaning to cover summarizing and analyzing data. It has further widened its scope to include the concept of probability for the purpose of statistical inference.

4.2 Basic Statistical Concepts

Statistics is essentially linked to the theory of probability, and probability theory is a branch of mathematics. Hence, it could be said that Statistics is a branch of mathematics. On the other hand, since Statistics deals with gathering, analyzing, and interpreting data, there are lots of human judgments involved in statistical analysis. This idea seems to separate Statistics from mathematics, to the extent that it is becoming difficult for pure mathematicians to accept Statistics as part of mathematics. However, the second part of Statistics, inferential Statistics, is mainly mathematics with less human judgment.

Thus, Statistics is sometimes considered as a branch of mathematics, and at other times, a discipline in its own right. Regardless of how it is looked at, it is a very important concept now that its applications are so vast and diverse that no area of science can do without it. In fact, it has been so spread out that humanity, psychology, social sciences, and even communication cannot do without gathering and analysis of data.

We start this chapter by defining some important terms that are widely used throughout this chapter. Of course, concepts of probability theory discussed in Chapter 2
Sign up to read
Learn more about book
eBook - ePub
Total Quality Management
Key Concepts and Case Studies
- D.R. Kiran(Author)
- 2016(Publication Date)
- Butterworth-Heinemann
  (Publisher)
Originally the word “Statistics” was derived from the word “STATUS” or “STATE” ie, a science of dealing with the affairs of administration of a state. Today, Statistics refers to the scientific approach to the collection, presentation, analysis, and interpretation of numerical data or information by presenting it in a way to understand the information.

Different individuals react differently under the same set of circumstances. But the ability to predict with reliable confidence how the entire group is likely to react under the given circumstances is very important. Hence, it is necessary to evaluate a certain statistical parameter, which would represent the characteristics of the entire group. This is called an average. The method of Statistics actually deals with the evaluation of such characteristics and hence it is also defined as the “Science of Averages.”

16.2 Role of Statistics in Analysis

Today Statistics is indispensable for a clear appreciation of any problem affecting human welfare, whether it is industry, transport, business, medicine, and more significantly, in quality control analysis.
Statistics is an aid to the economic measure. It is a technique of analyzing the data obtained to conclude on the economic progress and forecast the future trend.
Statistics is an aid to the manager particularly in these days of impersonal relationships between the employer and employee. With this, he can estimate the demand more accurately then by guesswork. Statistics help in recording the past knowledge and experience, drawing out standards whose results can be compared from time to time. And also the expected changes, the reasons for these changes, and the effects of these changes on the nation’s economy can be estimated.
Sign up to read
Learn more about book
eBook - ePub
Modeling Uncertainty in the Earth Sciences
- Jef Caers(Author)
- 2011(Publication Date)
- Wiley
  (Publisher)
2 Review on Statistical Analysis and Probability Theory 2.1 Introduction
This chapter should be considered mostly as a review of what has been covered in a basic course in Statistics and/or probability theory for engineers or the sciences. It provides a review with an emphasis of what is important for modeling uncertainty in the Earth Sciences.

Statistics is a term often used to describe a “collection or summary of a set of numbers.” Particular examples are labor or baseball Statistics, which represent a series of numbers collected through investigation, sampling, or surveys. These numbers are often recombined or rearranged into a new set of clearly interpretable figures. Such summaries are needed to reduce the possibly large amount of data, make sense of the data, and derive conclusions or decisions thereupon. The scientific field of Statistics is not very different. What Statistics provides, however, is a rigorous mathematical framework in which to analyze sample data and to make conclusions on that basis. However, Statistics as a traditional science has considerable limitations when applied to the Earth sciences. Key to modeling in the Earth Sciences is the spatial aspect or nature of the data. A sample or measurement is often attached to a spatial coordinate (x ,y ,z ) describing where the sample was taken. Traditional Statistics very often neglects this spatial context and simply works with the data as they are. However, from experience, it is known that samples located close together are more “related” to each other, and this relationship may be useful to us when interpreting our data. GeoStatistics and also spatial Statistics are fields that deal explicitly with data distributed in space or time and aim at explicitly modeling the spatial relationship between data.

Critical to any statistical study is to obtain a good overview of the data and to obtain an idea about its key characteristics. This analysis of the data is also termed exploratory data analysis (EDA). We will distinguish between graphical and numerical techniques.
Sign up to read
Learn more about book
eBook - ePub
Hands-on Data Analysis and Visualization with Pandas
Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
- Purna Chander Rao, (Authors)
- 2020(Publication Date)
- BPB Publications
  (Publisher)
HAPTER 8
Introduction to Statistics

S tatistics is a branch of applied mathematics where we collect the data, organize it, and make an interpretation on it so that we can go ahead and visualize it.
So what is the use of it? How to use this for decision making?
In real-time, collecting all the possible data points upfront is not just difficult but close to impossible. In such cases, Statistics provide a mechanism to understand the overall view using only a small sample of data points. Statistics is broadly composed of two categories: Descriptive Statistics and Inferential Statistics ; these two statistical categories give different insights about data. Using one statistic alone does not help to get a better picture of the data.

Structure

Population

Sample

Types of data

Categorical variables

Numerical data

Levels of measurement

Quantitative nominal

Quantitative ordinal

Qualitative interval

Qualitative ratio

Descriptive Statistics

Measures of central tendency

Mean

Median

Mode

Measures of variability

Range

Quartile

Deviation/Variance

Standard deviation

Coefficient of variance

Covariance

Inferential Statistics

What is the distribution?

Standardization (z-score)

Central limit theorem

Standard error

Confidence intervals

Hypothesis Testing

Null hypothesis

Alternate hypothesis

Errors

Type 1 error

Type 2 error

One-tailed test

Two-tailed test

Hypothesis testing steps

The T-Test (Student T-Test)

Z-Test

Objective

The objective of this chapter is to take a brief tour of Statistics and see how we can use pandas and numpy together along with the Statistics library scipy to conduct statistical analysis. This chapter is not a primer on Statistics, but it just serves as an illustration of using pandas along with stats packages. Here we will be learning about population, sample, measures of central tendency, descriptive and inferential Statistics, central limit theorem, confidence intervals, and hypothesis testing.
Sign up to read
Learn more about book
eBook - ePub
Statistics and Probability
- Britannica Educational Publishing, Nicholas Faulkner(Authors)
- 2017(Publication Date)
- Britannica Educational Publishing
  (Publisher)
.e ., a subset of the population—to make statements about a population, they are performing statistical inference. Estimation and hypothesis testing are procedures used to make statistical inferences. Fields such as health care, biology, chemistry, physics, education, engineering, business, and economics make extensive use of statistical inference.

Methods of probability were developed initially for the analysis of gambling games. Probability plays a key role in statistical inference; it is used to provide measures of the quality and precision of the inferences. Some of these methods are used primarily for single-variable studies, while others, such as regression and correlation analysis, are used to make inferences about relationships among two or more variables.
Descriptive Statistics
Descriptive Statistics are tabular, graphical, and numerical summaries of data. The purpose of descriptive Statistics is to facilitate the presentation and interpretation of data. Most of the statistical presentations appearing in newspapers and magazines are descriptive in nature. Univariate methods of descriptive Statistics use data to enhance the understanding of a single variable; multivariate methods focus on using Statistics to understand the relationships among two or more variables. To illustrate methods of descriptive Statistics, the previous example in which data were collected on the age, gender, marital status, and annual income of 100 individuals will be examined.
Tabular Methods
The most commonly used tabular summary of data for a single variable is a frequency distribution. A frequency distribution shows the number of data values in each of several nonoverlapping classes. Another tabular summary, called a relative frequency distribution, shows the fraction, or percentage, of data values in each class. The most common tabular summary of data for two variables is a cross tabulation, a two-variable analogue of a frequency distribution.
Sign up to read
Learn more about book
eBook - ePub
Mathematics for Engineers
- Georges Fiche, Gerard Hebuterne(Authors)
- 2013(Publication Date)
- Wiley-ISTE
  (Publisher)
Chapter 3
Statistics

In order to formulate any statement about a system, engineers need to acquire as precise a knowledge as possible about it and the environment in which it is to work. Such knowledge cannot result from pure reasoning alone, and has to rely on observation and measurement . In the telecommunications field, for instance, these measurements will allow us detect a signal, or to estimate flow volume or the quality of service . For example, the spectral analysis of an unknown signal amounts to measuring some of its statistical characteristics such as average value or covariance, etc., in order to recognize it. Similarly, during field trials, as well as during the normal life of a telecommunications network, we have to estimate traffic levels, response times and loss ratio. Again, for reliability studies, equipment lifetime, system availability, etc. are measured. Similar measurements can be made in the preliminary design step, during simulation studies.

As a consequence, a certain amount of data is collected on the system, and the question is now how to interpret them and to summarize them.

Descriptive Statistics help us to choose the parameters of interest and present the results in a synthetic way – that is, in visualizable form.

Now, exhaustive measurements are clearly impossible to obtain. Mathematical Statistics aim at providing tools to analyze data in order to extract all the possible information from it. For instance, estimation theory intends to evaluate the confidence level to be associated with the prediction of a parameter of interest. Hypothesis testing provides help when making decisions about the population under study, such as comparing two different samples or deciding the conformity of the measurements with a given theoretical distribution function.

Statistics are concerned with a set of elements, called the population . On each of the elements a character
Sign up to read
Learn more about book
eBook - ePub
Techniques in Human Geography
- Jim Lindsay(Author)
- 2006(Publication Date)
- Routledge
  (Publisher)
Why should we use Statistics? For most geographers there are probably two reasons. The first is that they allow us to attach numerical values to observable relationships between data. Statistical testing provides a way of assigning a level of confidence to our assessment of a relationship. The second reason is that more advanced Statistics (largely beyond the scope of this chapter) allow us to detect and assess relationships which may not be directly observable. At this level the use of Statistics becomes more than simply a means of confirming judgements made non-numerically.

Probability

Statistics is a science of probabilities rather than certainties, and its language reflects this carefully weighted assessment of outcomes. In Statistics we accept or reject hypotheses rather than speak confidently about truth and falsity. Many statistical techniques are concerned with the relationship between individual events or cases and general patterns of behaviour. If a researcher asks one person a series of questions about leisure activities, the results may be interesting but can hardly be used as a basis for generalisation. However if the same researcher then goes on to ask the same questions of another 199 people, general patterns will certainly emerge. Another look at the original case at this stage might show that the individual’s responses turn out to be typical of the population in some ways, but quite distinctive in others. Statistics recognises that the members of a population are similar but not identical. A population can be described both by measures of central tendency which indicate the characteristics most typical of it, and measures of dispersion, which show the range that individual members might occupy.

The theory of probability is concerned with the likelihood that events will have particular outcomes. Certain results can be eliminated entirely. Current knowledge says that it is impossible for anyone reading this book to live for ever, so the probability of that outcome can be set at 0 per cent or 0.0. In the same way, since every person must die, the probability of any individual’s eventual death is 100 per cent or 1.0. But when will it happen? The probability that someone will live to be a centenarian is a small but real one. Insurance companies, for which issues like this are significant, maintain actuarial tables based on real observations which allow them to calculate the probability of this taking place with some confidence for members of a particular population. However to calculate at birth the probability that any person will die at a particular age would be a futile exercise. Far too many factors affect the outcome of the individual case.
Sign up to read
Learn more about book
eBook - ePub
Teacher-Led Research
Designing and implementing randomised controlled trials and other forms of experimental research
- Richard Churches, Eleanor Dommett(Authors)
- 2016(Publication Date)
- Crown House Publishing
  (Publisher)
Chapter 5
Statistics – here comes the maths

By the end of this chapter, you will know about:
- Using descriptive and inferential Statistics to analyse your data.
- How inferential tests allow you to accept or reject hypotheses.
- Using a power analysis to estimate the sample size you need.
Analysing your data – descriptive Statistics

Once you have collected data from your study you need to analyse it to see if the data supports your experimental hypothesis. The first stage of this, probably best described as a preliminary analysis, requires you to calculate descriptive Statistics . Descriptive Statistics are summary values that describe your data (such as the mean, median, standard deviation and range). The type of descriptive Statistics you use will be dependent on the distribution of your data (see Brain Box 5.1 ).

Brain Box 5.1. Data distributions

The distribution of data is a description of the data values in a data set. We usually represent distribution as a curve. The easiest way to conceptualise it is to imagine people stacked up in columns underneath the curve, with each column representing a score on your dependent variable as shown in Figure 5.1 .

Our curve is bell shaped and symmetrical. We refer to a curve like this as a normal distribution . Certain mathematical values can be understood in terms of this curve. The mean value corresponds to the peak of the curve, while the width of the curve gives an indication of the spread of values from this mean. A wide curve would indicate a greater spread.

Figure 5.1. A normal distribution.

Where a data set does not give a bell-shaped curve – for example, because it is skewed to the left or right (see Figure 5.2 ) – it cannot be considered to have a normal distribution, so different statistical analyses and descriptions are required.

Figure 5.2. Skewed data compared to normally distributed data.
Sign up to read
Learn more about book
eBook - ePub
The Britannica Guide to Statistics and Probability
- Britannica Educational Publishing, Erik Gregersen(Authors)
- 2010(Publication Date)
- Britannica Educational Publishing
  (Publisher)
CHAPTER 1 HISTORY OF STATISTICS AND PROBABILITY
S tatistics and probability are the branches of mathematics concerned with the laws governing random events, including the collection, analysis, interpretation, and display of numerical data. Probability has its origin in the study of gambling and insurance in the 17th century, and it is now an indispensable tool of both social and natural sciences. Statistics may be said to have its origin in census counts taken thousands of years ago. As a distinct scientific discipline, however, it was developed in the early 19th century as the study of populations, economies, and moral actions and later in that century as the mathematical tool for analyzing such numbers.

EARLY PROBABILITY

It is astounding that for a subject that has altered how humanity views nature and society, probability had its beginnings in frivolous gambling. How much should you bet on the turn of a card? An entirely new branch of mathematics developed from such questions.

GAMES OF CHANCE

The modern mathematics of chance is usually dated to a correspondence between the French mathematicians Pierre de Fermat and Blaise Pascal in 1654. Their inspiration came from a problem about games of chance, proposed by a remarkably philosophical gambler, the chevalier de Méré. De Méré inquired about the proper division of the stakes when a game of chance is interrupted. Suppose two players, A and B , are playing a three-point game, each having wagered 32 pistoles, and are interrupted after A has two points and B has one. How much should each receive?

Blaise Pascal invented the syringe and created the hydraulic press, an instrument based upon the principle that became known as Pascal’s law . Boyer/Roger Viollet/Getty Images

Fermat and Pascal proposed somewhat different solutions, but they agreed about the numerical answer. Each undertook to define a set of equal or symmetrical cases, then to answer the problem by comparing the number for A with that for B . Fermat, however, gave his answer in terms of the chances, or probabilities. He reasoned that two more games would suffice in any case to determine a victory. There are four possible outcomes, each equally likely in a fair game of chance. A might win twice, AA ; or first A then B might win; or B then A ; or BB . Of these four sequences, only the last would result in a victory for B . Thus, the odds for A are 3:1, implying a distribution of 48 pistoles for A and 16 pistoles for B
Sign up to read
Learn more about book
eBook - ePub
An MBA in a Book
Everything You Need to Know to Master Business - In One Book!
- Xander Cansell(Author)
- 2023(Publication Date)
- Arcturus
  (Publisher)
ONE VARIABLE STATISTIC
Statistics allow you to analyze and describe data in different ways.

By accurately describing and analyzing datasets you can determine details and better interpret data. This helps you make better-informed business decisions, or to calculate the likelihood of different future events occurring.

One variable analysis is the simplest way of analyzing statistical data. It describes but does not take into account causes or relationships between data. For example, if you were interested in the scores of students who took an exam, you might be interested in how varied the results are.

You can use single variable Statistics to make summaries of the data, which give you a series of key metrics that measure the performance of the whole group that took the exam. The most basic of these are mean, median and mode. These are sometimes called central tendencies .
Mean is the average of all the results – the total of all the exam results added together and divided by the number of students who took the exam.
The median is the result that lies in the middle of all the exam scores when they are in ascending or descending order. This is more useful than the mean when there are outliers in the data that might skew the results (if one student gets 100% but all the others get between 40–50%, the mean will be skewed higher by that single result).

The mode is the most frequently occurring number in the dataset. The mode of the series 5, 2, 5, 3, 2, 2 would be 2 because it occurs more than any number.

Examiners compare overall exam results to achieve a mean, while judges at a dog show assess against agreed physical standards.

Standard deviation

Standard deviation is a measure of how spread-out numbers in a dataset are. It is usually represented by the symbol σ (the Greek letter sigma).
The formula for standard deviation is the square root of the variance. Variance in this instance is the average of the squared differences from the mean.
Sign up to read
Learn more about book
eBook - ePub
Success at Statistics
A Worktext with Humor
- Fred Pyrczak(Author)
- 2016(Publication Date)
- Routledge
  (Publisher)
Section 1 Why is the Study of Statistics Important?
As humans, we are generally good at recognizing patterns using anecdotal information . When we rely on anecdotal information, we are using our own observations about the world around us. Based on anecdotal information, I can predict that my grandmother is likely to fall asleep if I turn on a movie, my cat will hiss if a stranger reaches to pet her, and I will become very cranky if I do not eat by a certain time. While anecdotal information is useful in many situations, it becomes problematic when we try to draw general conclusions from it because anecdotal experience varies from person to person. Just because my cat hisses at strangers does not mean that all cats do!

Rather than drawing conclusions from anecdotal information, scientists use observations based on anecdotal information as a starting point for a set of procedures referred to as the scientific method. The scientific method is a system that scientists use to acquire new knowledge. Scientists who study psychology, meteorology, biotechnology, and so forth, use these additional steps to reduce the likelihood of drawing incorrect conclusions:

Make observations about a phenomenon.

Create a hypothesis that might explain the observations.

Collect data to challenge the hypothesis.

Interpret the data.

Draw conclusions that state whether the hypothesis held up under scrutiny.

Statistics is inextricably linked to science because of its crucial role in interpreting data (Step 4). Even when using the scientific method, drawing conclusions from the data can be risky if the scientist does not use the proper techniques for data interpretation. Patterns are sometimes misleading, as we can see in the example that follows.

Example 1

Alex conducted a memory experiment to see if monetary incentives increase performance. He presented students with a list of 20 words and told half of them that they would receive $1 for each word correctly remembered. The other half were told they would earn no money for their memory responses. The results showed that the paid group correctly remembered an average of 9 items, and the unpaid group remembered an average of 6 items.
Sign up to read
Learn more about book
eBook - ePub
Statistics Explained
- Perry R. Hinton(Author)
- 2014(Publication Date)
- Routledge
  (Publisher)
Note also that calculating Statistics only gives you information. It is up to you how you interpret and use that information. A difference in means, or standard deviations, might be useful information, but that is all. Calculating Statistics will not explain similarities and differences between distributions. What the Statistics do is to provide us with pieces of information we can work with: they are tools to be used for our own purposes. After that we must use our judgement.

See Hinton et al. 2014
Chapter 3

Details on how to produce Statistics to describe a set of data using the SPSS computer statistical package can be found in Chapter 3 of Hinton et al . (2014).

SOME IMPORTANT INFORMATION ABOUT NUMBERS

Up to now we have been calculating Statistics using sets of examination results. This is fine as examination results are the types of number that it makes sense to calculate means and other Statistics on. But this is not the case for all types of number. We need to know what type of data we have before we know what Statistics we are able to calculate.

Nominal data

Sometimes numbers are used like names. For example, in a sports squad of 22 players the number 15 on the back of a player’s shirt simply allows us to identify him or her during play. It does not mean that player number 15 is better than players 1 to 14 or worse than players 16 to 22. It is meaningless to calculate Statistics on these numbers as they are only
nominal data
, used as names.

nominal data

When we use numbers as labels for categories we refer to the data collected as nominal (names). We cannot perform mathematical operations on these numbers: for example if we label the category ‘men’ as 1 and ‘women’ as 2 we cannot add up two men and claim it equals one woman! The data are usually the frequency of responses in each category.

When we categorise someone or something we can use numbers to label the categories. For example, if we classify people by eye colour we might choose to label brown as 1, blue as 2, green as 3 and so on. Notice that the numbers are arbitrarily assigned to colours: we could have chosen other numbers or assigned the same numbers in a different way. The use of these numbers is nominal. We cannot use these numbers to calculate Statistics: it is nonsense to say that the mean of a brown eyed person (1) and a green eyed person (3) is a blue-eyed person (2)!
Sign up to read
Learn more about book

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

View all

Biological Sciences

Business