Mathematics

Probability Density Function

A probability density function (PDF) is a function that describes the likelihood of a continuous random variable taking on a particular value. It is used to model the probability distribution of a continuous random variable and is often represented graphically as a curve. The area under the curve within a specific interval represents the probability of the variable falling within that interval.

Written by Perlego with AI-assistance

12 Key excerpts on "Probability Density Function"

eBook - ePub
Statistics for Process Control Engineers
A Practical Approach
- Myke King(Author)
- 2017(Publication Date)
- Wiley
  (Publisher)
5 Probability Density Function

The Probability Density Function (PDF) is a mathematical function that represents the distribution of a dataset. For example, if we were to throw a pair of unbiased six‐sided dice 36 times, we would expect (on average) the distribution of the total score to be that shown by Figure 5.1 . This shows the frequency distribution. To convert it to a probability distribution we divide each frequency by the total number of throws. For example, a total score of 5 would be expected to occur four times in 36 throws and so has a probability of 4/36 (about 0.111 or 11.1%). Figure 5.2 shows the resulting probability distribution.

Figure 5.1
Expected frequency of total score from two dice

Figure 5.2
Expected distribution

Throwing dice generates a discrete distribution; in this case the result is restricted to integer values. Probability should not be plotted as continuous line. The probability of a non‐integer result is zero. But we can develop an equation for the line. In this case, if x is the total scored, the probability of scoring x is

(5.1)

Because x is discrete this function is known as the probability mass function (PMF). If the distribution were continuous we can convert the probability distribution to a Probability Density Function (PDF) by dividing the probability by the range over which it applies.
A condition of both functions is that the area they contain must be unity (or 100%) – in other words we are certain to be inside the area. So, in general
(5.2)

Provided this condition is met then any function can be described as a PDF (or PMF). We will show later that there are many functions that have a practical application. Unfortunately there are a larger number that appear to have been invented as a mathematical exercise and are yet to be shown that they describe any real probability behaviour.

While the PMF allows us to estimate the probability of x having a certain value, the PDF does not. It only tells us the probability of x falling within a specified range. The probability of x being between a and b
Sign up to read
Learn more about book
eBook - ePub
Probability and Random Processes
With Applications to Signal Processing and Communications
- Scott Miller, Donald Childers(Authors)
- 2004(Publication Date)
- Academic Press
  (Publisher)
X:
For discrete random variables, the CDF can be written in terms of the probability mass function defined in Chapter 2 . Consider a general random variable, X, which can take on values from the discrete set {x 1 , x 2 , x 3, …} . The CDF for this random variable is

     (3.6)
The constraint in this equation can be incorporated using unit step functions, in which case the CDF of a discrete random variable can be written as
     (3.7)
In conclusion, if we know the PMF of a discrete random variable, we can easily construct its CDF.

3.2 The Probability Density Function

While the CDF introduced in the last section represents a mathematical tool to statistically describe a random variable, it is often quite cumbersome to work with CDFs. For example, we will see later in this chapter that the most important and commonly used random variable, the Gaussian random variable, has a CDF that cannot be expressed in closed form. Furthermore, it can often be difficult to infer various properties of a random variable from its CDF. To help circumvent these problems, an alternative and often more convenient description known as the Probability Density Function (PDF) is often used.

DEFINITION 3.2: The Probability Density Function (PDF) of the random variable X evaluated at the point x is

     (3.8)

As the name implies, the Probability Density Function is the probability that the random variable X lies in an infinitesimal interval about the point X = x , normalized by the length of the interval.

Note that the probability of a random variable falling in an interval can be written in terms of its CDF as specified in Equation 3.4d . For continuous random variables,

     (3.9)
so that
     (3.10)

Hence, it is seen that the PDF of a random variable is the derivative of its CDF. Conversely, the CDF of a random variable can be expressed as the integral of its PDF. This property is illustrated in Figure 3.3 . From the definition of the PDF in Equation 3.8
Sign up to read
Learn more about book
eBook - ePub
Probability and Random Processes
With Applications to Signal Processing and Communications
- Scott Miller, Donald Childers(Authors)
- 2012(Publication Date)
- Academic Press
  (Publisher)
While the CDF introduced in the last section represents a mathematical tool to statistically describe a random variable, it is often quite cumbersome to work with CDFs. For example, we will see later in this chapter that the most important and commonly used random variable, the Gaussian random variable, has a CDF which cannot be expressed in closed form. Furthermore, it can often be difficult to infer various properties of a random variable from its CDF. To help circumvent these problems, an alternative and often more convenient description known as the Probability Density Function is often used.

Definition 3.2 : The PDF of the random variable X evaluated at the point x is

(3.8)

As the name implies, the PDF is the probability that the random variable X lies in an infinitesimal interval about the point X = x , normalized by the length of the interval.

Note that the probability of a random variable falling in an interval can be written in terms of its CDF as specified in Equation (3.4d) . For continuous random variables,

(3.9)
so that
(3.10)

Hence, it is seen that the PDF of a random variable is the derivative of its CDF. Conversely, the CDF of a random variable can be expressed as the integral of its PDF. This property is illustrated in Figure 3.3 . From the definition of the PDF in Equation (3.8) , it is apparent that the PDF is a nonnegative function although it is not restricted to be less than unity as with the CDF. From the properties of the CDFs, we can also infer several important properties of PDFs. Some properties of PDFs are

(1) (3.11a)

(2) (3.11b)

(3) (3.11c)

(4) (3.11d)

(5) (3.11e)

Figure 3.3 Relationship between the PDF and CDF of a random variable.

Example 3.4 Which of the following are valid PDFs?

(a)

(b)

(c)

(d)

(e)

To verify the validity of a potential PDF, we need to verify only that the function is nonnegative and normalized so that the area underneath the function is equal to unity. The function in part (c) takes on negative values, while the function in part (b) is not properly normalized, and therefore these are not valid PDFs. The other three functions are valid PDFs.
Sign up to read
Learn more about book
eBook - ePub
Introduction to Civil Engineering Systems
A Systems Perspective to the Development of Civil Engineering Facilities
- Samuel Labi(Author)
- 2014(Publication Date)
- Wiley
  (Publisher)
Also, the mathematical relationship between the cumulative (upper bound) values of the random variable and their corresponding probabilities are termed cumulative distribution function and probability distribution function, for discrete and continuous variables, respectively (see Figure 5.10). Figure 5.10 Categories of probability functions with illustrations. 5.4.1 Probability Mass Functions and Density Functions The probability mass function, or pmf, which applies to discrete random variables, is typically denoted by or. Because is a discrete random variable, can only take positive integer values. In the case of continuous distributions, the random variable cannot take on exact values; as such for every real number. This implies that, for example,. Also, the Probability Density Function (pdf) has the following properties: 5.4.2 Cumulative Distribution Functions and Probability Distribution Functions In these functions, we describe the probability that the random variable takes on any value up to a certain upper bound limit. For a discrete random variable having possible values, the cumulative distribution function (or cdf) is given by for all. Also, if, then. In other words, every cdf is nondecreasing. For a continuous random variable X, the probability distribution function is given by Also, if, then. In other words, every probability distribution function is nondecreasing. This means that and The probability that the continuous random variable takes any value between and is given by 5.4.3 Constructing a Probability Function As part of prerequisites for describing, analyzing, predicting, or evaluating any aspect in any phase of engineering systems development, it is often necessary to develop probability mass and density functions or their corresponding cumulative functions for some attribute of the system. This can be done using observational data (historical records) or experimentation
Sign up to read
Learn more about book
eBook - ePub
A User's Guide to Business Analytics
- Ayanendranath Basu, Srabashi Basu(Authors)
- 2016(Publication Date)
- Chapman and Hall/CRC
  (Publisher)
a] = 0. For continuous random variables, the concept of probability must therefore be understood in terms of two-dimensional areas, and not in terms of masses of single points.

FIGURE 7.1A Probability Density Function.

The characterizing function f(x) in the case of a continuous random variable is called its Probability Density Function (PDF), which must satisfy the following conditions.

(i)
f
( x )
≥ 0
for all
x ∈ ℝ
.

(ii)
∫
− ∞
∞
f
( x )
d x = 1
. If the support of the corresponding random variable is known to be [α, β], then
∫ α β
f
( x )
d x = 1
. In non-technical language this means that the area bounded by the curve f(x) and the X-axis over the region [α, β] (or whatever may be the endpoints of the region of support of X) must be unity.

There is one important distinction between the PMF and the PDF. In the discrete case, the PMF f(x) represents an actual probability and must be bounded above by 1. In the case of continuous random variables, f(x) does not represent a probability. Although the total area under the curve is bounded by 1, there is nothing to guarantee that the function f(x) itself must be everywhere bounded by 1 in this case. We will frequently come across perfectly legitimate PDFs where f(x
Sign up to read
Learn more about book
eBook - ePub
Statistics Using Python
- Oswald Campesato(Author)
- 2023(Publication Date)
- Mercury Learning and Information
  (Publisher)
CHAPTER 5

P
ROBABILITY
D
ISTRIBUTIONS

This chapter provides an overview of some well-known discrete probability distributions as well as continuous probability distributions. You will see Python code samples for various distributions, along with a generated image with a sample output.

The first section of this chapter starts with an explanation of the PDF, CDF, and PMF, followed by an introduction to discrete probability distributions, such as the Bernoulli, binomial, and Poisson distributions.

The second section introduces continuous probability distributions, such as the chi-squared, Gaussian, and uniform distributions. This section also discusses non-Gaussian distributions and some of the causes of such distributions.

In case you have not already done so, please make sure that the NumPy, Matplotlib, and SciPy libraries are installed in your Python environment by launching the following commands from the command line:
pip3 install numpy matplotlib scipy Most of the code samples for generating probability distributions in this chapter were generated via GPT4 (unless indicated otherwise).
PDF, CDF, AND PMF
PDF, CDF, and PMF are initialisms for two probability functions and a cumulative distribution function that are important in statistics.
• PDF (Probability Density Function)

• CDF (cumulative distribution function)

• PMF (probability mass function)

A Probability Density Function (PDF) is a statistical function used in continuous probability distributions to describe the likelihood of a random variable taking on a particular value. Unlike discrete probability distributions, where probabilities are defined for distinct, individual outcomes, continuous distributions involve a range of outcomes. A PDF has nonnegative values and the area under a PDF equals 1. In addition, a CDF can be expressed as an integral of the PDF in the continuous case.

A PMF is a function that gives the probability of each possible outcome for a discrete random variable. Recall that unlike continuous random variables, which have a PDF, discrete random variables have distinct, individual outcomes, each with a specific probability of occurring. PDFs and CDFs are alternatives to binning data in datasets as a way to avoid binning bias.
Sign up to read
Learn more about book
eBook - ePub
Econometrics For Dummies
- Roberto Pedace(Author)
- 2013(Publication Date)
- For Dummies
  (Publisher)
“Looking at all possibilities: Probability Density Function [PDF]” for details).

If you’re observing a continuous random variable, the CDF can be described in a function or graph. The function shows how the random variable behaves over any possible range of values. In Figure 2-5 , I display the CDF for a normally distributed random variable.

The precise shape of the CDF depends on the mean and variance (the square of the standard deviation) of your random variable. A smaller mean shifts the curve to the left, and a larger mean shifts the curve to the right. A smaller variance makes the curve steeper, whereas a larger variance makes the curve flatter.

Figure 2-5: A graphical depiction of a cumulative density function for a normally distributed random variable.
Putting variable information together: Bivariate or joint probability density Because one primary objective of econometrics is to examine relationships between variables, you need to be familiar with probabilities that combine information on two variables.
A bivariate or joint probability density provides the relative frequencies (or chances) that events with more than one random variable will occur. Generally, this information is shown in a table.

For two random variables, X and Y , you’re already familiar with the notation for joint probabilities from your statistics class, which uses the intersection term, ∩ , like this: P (X = a ∩ Y = b ).

The variables a and b are possible values for the random variable. However, in econometrics, you likely need to become familiar with this mathematical notation for joint probabilities: f (X , Y ). In this notation, the comma is used instead of the intersection operator.

In Table 2-5 , I provide an example of a joint probability table for random variables X and Y . The column headings in the middle of the first row list the X values (1, 2, and 3), and the first column lists the Y values (1, 2, 3, and 4). The values contained in the middle of Table 2-5 represent the joint or intersection probabilities. For example, the probability X equals 3 (see column 3) and Y equals 2 (row 2) is 0.10. In your econometrics class, the mathematical notation used to express this is likely to look like f (X = 3, Y
Sign up to read
Learn more about book
eBook - ePub
Elements of Simulation
- Byron J.T. Morgan(Author)
- 2018(Publication Date)
- Chapman and Hall/CRC
  (Publisher)
) ).

The nature of F(x) determines the type of random variable in question, and we shall normally specify random variables by defining their distribution, which in turn provides us with F(x). If F(x) is a step function we say that X is a discrete random variable, while if F(x) is a continuous function of x then we say that X is a continuous random variable. Certain variables, called mixed random variables, may be expressed in terms of both discrete and continuous random variables, as is the case of the waiting-time experienced by cars approaching traffic lights; with a certain probability the lights are green, and the waiting-time may then be zero, but otherwise if the lights are red the waiting-time may be described by a continuous random variable. Mixed random variables are easily dealt with and we shall not consider them further here. Examples of many common c.d.f.’s are given later.
2.3 The Probability Density Function (p.d.f.)
When F(x) is a continuous function of x, with a continuous first derivative, then f(x) = dF(x)/dx is called the Probability Density Function of the (continuous) random variable X. If F(x) is continuous but has a first derivative that is not continuous at a finite number of points, then we can still define the Probability Density Function as above, but for uniqueness we set f(x) = 0, for instance, when d F(x)/dx does not exist; an example of this is provided by the c.d.f. of the random variable Y of Exercise 2.25. The p.d.f. has the following properties:

(i)
f ( x ) ≥ 0

(ii)
∫
− ∞
∞
f ( x ) d x
= 1

(iii)
Pr ( a < X < b )
=
Pr ( a ≤ X < b )
=
Pr ( a < X ≤ b )
=
Pr ( a ≤ X ≤ b )
=
∫ a b
f ( t ) d t
EXAMPLE 2.1
Under what conditions on the constants α, β, γ can the following functions be a p.d.f.?

g ( x ) =
{
e
− α x
( β + γ x )
for
x ≥ 0
0 for
x < 0

We must verify that g(x) is non-negative, and that
∫
− ∞
∞
g ( x ) d x
= 1
. If α ≤ 0, this integral cannot be finite, and so we must have α
Sign up to read
Learn more about book
eBook - ePub
Measurement, Data Analysis, and Sensor Fundamentals for Engineering and Science
- Patrick F. Dunn(Author)
- 2019(Publication Date)
- CRC Press
  (Publisher)
This describes the probability of the number of successful outcomes, n, in N repeated trials, given that only either success (with probability P) or failure (with probability Q = 1 - P) is possible. The binomial Probability Density Function, for example, describes the probability of obtaining a certain sum of the numbers on a pair of dice when tossed or the probability of getting a particular number of heads and tails for a series of coin tosses. The Poisson Probability Density Function models the probability of rarely occurring events. It can be derived from the binomial Probability Density Function. Two examples of processes that can be modeled by the Poisson Probability Density Function are number of disintegrative emissions from an isotope and the number of micrometeoroid impacts on a spacecraft. Although the outcomes of these processes are discrete whole numbers, the process is considered continuous because of the very large number of events considered. This essentially amounts to possible outcomes that span a large, continuous range of whole numbers. Other Probability Density Functions are for continuous processes. The most common one is the normal (Gaussian) Probability Density Function. Many situations closely follow a normal distribution, such as the times of runners finishing a marathon, the scores on an exam for a very large class, and the IQs of everyone without a college degree (or with one). The Weibull Probability Density Function is used to determine the probability of component-failure times resulting from fatigue. The lognormal Probability Density Function is similar to the normal Probability Density Function but considers its variable to be related to the logarithm of another variable. The diameters of raindrops are lognormally distributed, as are the populations of various biological systems
Sign up to read
Learn more about book
eBook - ePub
Introduction to Stochastic Processes and Simulation
- Gerard-Michel Cochard(Author)
- 2019(Publication Date)
- Wiley-ISTE
  (Publisher)
dx as small as we like:

p(x): probability of choosing point M between x and x + dx

It is clear that p(x) must be proportional to dx because the larger dx is, the greater the probability of finding M in this interval. This means that p(x) is of the form p(x) = f(x)dx, where the function f(x) is called probability density. The constraints on this function are as follows:
- – p(x) ≥0 so f(x) ≥0;
- – f is continuous on the interval (a, b);
- – the sum of all the probabilities is equal to 1.
Figure 2.3.
Probability density

It is clear, in the case of choosing a point on a segment, that f(x) = C (where C is a constant), because the choice of M does not depend on its position (Figure 2.4 ). In this case:

Figure 2.4 . Uniform distribution

Figure 2.4 expresses the probability density of the uniform distribution. This is the simplest of the probability laws for a continuous variable. We will examine the others later on, including normal distribution (below) and exponential distribution (in a later chapter).
2.2. Mean, variance, standard deviation

Given a random variable X associated with a law of probability, we define the mean or mathematical expectation E(X) by:

– pi xi if the variable X is discrete and takes n values xi with pi = p(X = xi );

– if the variable X is continuous with a probability density f(x)>0 on [a, b].

E(X) is the arithmetic mean of X weighted by the probabilities of each value. In what follows, it will sometimes be called simply m or μ
Sign up to read
Learn more about book
eBook - ePub
Environmental Data Analysis with MatLab
- William Menke, Joshua Menke(Authors)
- 2016(Publication Date)
- Academic Press
  (Publisher)
be anything in the range from −∞ to +∞ with equal probability.

The second is the Normal Probability Density Function :

p ( d ) =
1
2 π
σ
exp
{
−
( d −
d ¯
)
2
2
σ 2
}

(3.5)

The constants have been chosen so that the Probability Density Function, when integrated over the range −∞ < d < +∞, has unit area and so that its mean is
d ¯
and its variance is σ 2 . Not only is the Normal curve centered at the mean, but it is peaked at the mean and symmetric about the mean (Figure 3.8 ). Thus, both its mode and median are equal to its mean,
d ¯
. The probability, P , enclosed by the interval
d ¯
± n σ
(where n is an integer) is given by the following table:

n
P , %
1 68.27
2 95.45
3 99.73

(3.6)

Figure 3.8 Examples of the Normal Probability Density Functions. (Left) Normal Probability Density Functions with the same variance (
σ2
= 52 ) but different means. (Right) Normal Probability Density Functions with the same mean (20) but different variances. MatLab scripts eda03_06 and eda03_07. (See Note 3.2 ).

It is easy to see why the Normal Probability Density Function is seen as an attractive one with which to model noisy observations. The typical observation will be near its mean,
d ¯
, which is equal to the mode and median. Most of the probability (99.73%) is concentrated within ±3σ
Sign up to read
Learn more about book
eBook - ePub
Statistical Methods in the Atmospheric Sciences
- Daniel S. Wilks(Author)
- 2005(Publication Date)
- Academic Press
  (Publisher)
Figure 4.2 is an example of the superposition of a Poisson probability distribution function on the histogram of observed annual numbers of tornados in New York state.

The procedure for superimposing a continuous PDF on a histogram is entirely analogous. The fundamental constraint is that the integral of any Probability Density Function, over the full range of the random variable, must be one. That is, Equation 4.17 is satisfied by all Probability Density Functions. One approach to matching the histogram and the density function is to rescale the density function. The proper scaling factor is obtained by computing the area occupied collectively by all the bars in the histogram plot. Denoting this area as A, it is easy to see that multiplying the fitted density function f (x ) by A produces a curve whose area is also A because, as a constant, A can be taken out of the integral:
∫
A • f
( x )
dx = A •
∫
f
( x )
dx =
A • 1 = A
. Note that it is also possible to rescale the histogram heights so that the total area contained in the bars is 1. This latter approach is more traditional in statistics, since the histogram is regarded as an estimate of the density function.
EXAMPLE 4.11 Superposition of PDFs onto a Histogram
Figure 4.15 illustrates the procedure of superimposing fitted distributions and a histogram, for the 1933–1982 January precipitation totals at Ithaca from Table A.2 . Here n = 50 years of data, and the bin width for the histogram (consistent with Equation 3.12 ) is 0.5 in., so the area occupied by the histogram rectangles is A = (50)(0.5) = 25. Superimposed on this histogram are PDFs for the gamma distribution fit using Equation 4.41 or 4.43a (solid curve), and the Gaussian distribution fit by matching the sample and distribution moments (dashed curve). In both cases the PDFs (Equations 4.38 and 4.23 , respectively) have been multiplied by 25 so that their areas are equal to that of the histogram. It is clear that the symmetrical Gaussian distribution is a poor choice for representing these positively skewed precipitation data, since too little probability is assigned to the largest precipitation amounts and nonnegligible probability is assigned to impossible negative precipitation amounts. The gamma distribution represents these data much more closely, and provides a quite plausible summary of the year-to-year variations in the data. The fit appears to be worst for the 0.75 in. −1.25 in. and 1.25 in. −1.75 in. bins, although this easily could have resulted from sampling variations. This same data set will also be used in Section 5.2.5
Sign up to read
Learn more about book

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Probability Density Function

12 Key excerpts on "Probability Density Function"

Statistics for Process Control Engineers

A Practical Approach

Probability and Random Processes

With Applications to Signal Processing and Communications

3.2 The Probability Density Function

Probability and Random Processes

With Applications to Signal Processing and Communications

Introduction to Civil Engineering Systems

A Systems Perspective to the Development of Civil Engineering Facilities

A User's Guide to Business Analytics

Statistics Using Python

Econometrics For Dummies

Elements of Simulation

Measurement, Data Analysis, and Sensor Fundamentals for Engineering and Science

Introduction to Stochastic Processes and Simulation

2.2. Mean, variance, standard deviation

Environmental Data Analysis with MatLab

Statistical Methods in the Atmospheric Sciences

EXAMPLE 4.11 Superposition of PDFs onto a Histogram