Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 24 additions & 28 deletions Doc/library/statistics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,7 @@ measurements as a single entity.

Normal distributions arise from the `Central Limit Theorem
<https://en.wikipedia.org/wiki/Central_limit_theorem>`_ and have a wide range
of applications in statistics, including simulations and hypothesis testing.
of applications in statistics.

.. class:: NormalDist(mu=0.0, sigma=1.0)

Expand All @@ -492,19 +492,19 @@ of applications in statistics, including simulations and hypothesis testing.

.. attribute:: mean

A read-only property representing the `arithmetic mean
A read-only property for the `arithmetic mean
<https://en.wikipedia.org/wiki/Arithmetic_mean>`_ of a normal
distribution.

.. attribute:: stdev

A read-only property representing the `standard deviation
A read-only property for the `standard deviation
<https://en.wikipedia.org/wiki/Standard_deviation>`_ of a normal
distribution.

.. attribute:: variance

A read-only property representing the `variance
A read-only property for the `variance
<https://en.wikipedia.org/wiki/Variance>`_ of a normal
distribution. Equal to the square of the standard deviation.

Expand Down Expand Up @@ -562,8 +562,8 @@ of applications in statistics, including simulations and hypothesis testing.
Dividing a constant by an instance of :class:`NormalDist` is not supported.

Since normal distributions arise from additive effects of independent
variables, it is possible to `add and subtract two normally distributed
random variables
variables, it is possible to `add and subtract two independent normally
distributed random variables
<https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>`_
represented as instances of :class:`NormalDist`. For example:

Expand All @@ -585,15 +585,15 @@ of applications in statistics, including simulations and hypothesis testing.

For example, given `historical data for SAT exams
<https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores
are normally distributed with a mean of 1060 and standard deviation of 192,
are normally distributed with a mean of 1060 and a standard deviation of 192,
determine the percentage of students with scores between 1100 and 1200:

.. doctest::

>>> sat = NormalDist(1060, 195)
>>> fraction = sat.cdf(1200) - sat.cdf(1100)
>>> fraction = sat.cdf(1200 + 0.5) - sat.cdf(1100 - 0.5)
>>> f'{fraction * 100 :.1f}% score between 1100 and 1200'
'18.2% score between 1100 and 1200'
'18.4% score between 1100 and 1200'

To estimate the distribution for a model than isn't easy to solve
analytically, :class:`NormalDist` can generate input samples for a `Monte
Expand All @@ -612,20 +612,12 @@ model:

Normal distributions commonly arise in machine learning problems.

Wikipedia has a `nice example with a Naive Bayesian Classifier
<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_. The challenge
is to guess a person's gender from measurements of normally distributed
features including height, weight, and foot size.
Wikipedia has a `nice example of a Naive Bayesian Classifier
<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_. The challenge is to
predict a person's gender from measurements of normally distributed features
including height, weight, and foot size.

The `prior probability <https://en.wikipedia.org/wiki/Prior_probability>`_ of
being male or female is 50%:

.. doctest::

>>> prior_male = 0.5
>>> prior_female = 0.5

We also have a training dataset with measurements for eight people. These
We're given a training dataset with measurements for eight people. The
measurements are assumed to be normally distributed, so we summarize the data
with :class:`NormalDist`:

Expand All @@ -638,28 +630,32 @@ with :class:`NormalDist`:
>>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10])
>>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9])

We observe a new person whose feature measurements are known but whose gender
is unknown:
Next, we encounter a new person whose feature measurements are known but whose
gender is unknown:

.. doctest::

>>> ht = 6.0 # height
>>> wt = 130 # weight
>>> fs = 8 # foot size

The posterior is the product of the prior times each likelihood of a
feature measurement given the gender:
Starting with a 50% `prior probability
<https://en.wikipedia.org/wiki/Prior_probability>`_ of being male or female,
we compute the posterior as the prior times the product of likelihoods for the
feature measurements given the gender:

.. doctest::

>>> prior_male = 0.5
>>> prior_female = 0.5
>>> posterior_male = (prior_male * height_male.pdf(ht) *
... weight_male.pdf(wt) * foot_size_male.pdf(fs))

>>> posterior_female = (prior_female * height_female.pdf(ht) *
... weight_female.pdf(wt) * foot_size_female.pdf(fs))

The final prediction is awarded to the largest posterior -- this is known as
the `maximum a posteriori
The final prediction goes to the largest posterior. This is known as the
`maximum a posteriori
<https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation>`_ or MAP:

.. doctest::
Expand Down
3 changes: 2 additions & 1 deletion Lib/test/test_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -2123,6 +2123,7 @@ def test_pdf(self):
0.3605, 0.3589, 0.3572, 0.3555, 0.3538,
]):
self.assertAlmostEqual(Z.pdf(x / 100.0), px, places=4)
self.assertAlmostEqual(Z.pdf(-x / 100.0), px, places=4)
# Error case: variance is zero
Y = NormalDist(100, 0)
with self.assertRaises(statistics.StatisticsError):
Expand Down Expand Up @@ -2200,7 +2201,7 @@ def test_translation_and_scaling(self):
self.assertEqual(X * y, NormalDist(1000, 150)) # __mul__
self.assertEqual(y * X, NormalDist(1000, 150)) # __rmul__
self.assertEqual(X / y, NormalDist(10, 1.5)) # __truediv__
with self.assertRaises(TypeError):
with self.assertRaises(TypeError): # __rtruediv__
y / X

def test_equality(self):
Expand Down