mirror of
				https://github.com/python/cpython.git
				synced 2025-10-25 02:43:41 +00:00 
			
		
		
		
	bpo-36018: Add the NormalDist class to the statistics module (GH-11973)
This commit is contained in:
		
							parent
							
								
									64d6cc826d
								
							
						
					
					
						commit
						11c7953165
					
				
					 5 changed files with 556 additions and 1 deletions
				
			
		|  | @ -467,6 +467,201 @@ A single exception is defined: | |||
| 
 | ||||
|    Subclass of :exc:`ValueError` for statistics-related exceptions. | ||||
| 
 | ||||
| 
 | ||||
| :class:`NormalDist` objects | ||||
| =========================== | ||||
| 
 | ||||
| A :class:`NormalDist` is a a composite class that treats the mean and standard | ||||
| deviation of data measurements as a single entity.  It is a tool for creating | ||||
| and manipulating normal distributions of a random variable. | ||||
| 
 | ||||
| Normal distributions arise from the `Central Limit Theorem | ||||
| <https://en.wikipedia.org/wiki/Central_limit_theorem>`_ and have a wide range | ||||
| of applications in statistics, including simulations and hypothesis testing. | ||||
| 
 | ||||
| .. class:: NormalDist(mu=0.0, sigma=1.0) | ||||
| 
 | ||||
|     Returns a new *NormalDist* object where *mu* represents the `arithmetic | ||||
|     mean <https://en.wikipedia.org/wiki/Arithmetic_mean>`_ of data and *sigma* | ||||
|     represents the `standard deviation | ||||
|     <https://en.wikipedia.org/wiki/Standard_deviation>`_ of the data. | ||||
| 
 | ||||
|     If *sigma* is negative, raises :exc:`StatisticsError`. | ||||
| 
 | ||||
|     .. attribute:: mu | ||||
| 
 | ||||
|         The mean of a normal distribution. | ||||
| 
 | ||||
|     .. attribute:: sigma | ||||
| 
 | ||||
|         The standard deviation of a normal distribution. | ||||
| 
 | ||||
|     .. attribute:: variance | ||||
| 
 | ||||
|        A read-only property representing the `variance | ||||
|        <https://en.wikipedia.org/wiki/Variance>`_ of a normal | ||||
|        distribution. Equal to the square of the standard deviation. | ||||
| 
 | ||||
|     .. classmethod:: NormalDist.from_samples(data) | ||||
| 
 | ||||
|        Class method that makes a normal distribution instance | ||||
|        from sample data.  The *data* can be any :term:`iterable` | ||||
|        and should consist of values that can be converted to type | ||||
|        :class:`float`. | ||||
| 
 | ||||
|        If *data* does not contain at least two elements, raises | ||||
|        :exc:`StatisticsError` because it takes at least one point to estimate | ||||
|        a central value and at least two points to estimate dispersion. | ||||
| 
 | ||||
|     .. method:: NormalDist.samples(n, seed=None) | ||||
| 
 | ||||
|        Generates *n* random samples for a given mean and standard deviation. | ||||
|        Returns a :class:`list` of :class:`float` values. | ||||
| 
 | ||||
|        If *seed* is given, creates a new instance of the underlying random | ||||
|        number generator.  This is useful for creating reproducible results, | ||||
|        even in a multi-threading context. | ||||
| 
 | ||||
|     .. method:: NormalDist.pdf(x) | ||||
| 
 | ||||
|        Using a `probability density function (pdf) | ||||
|        <https://en.wikipedia.org/wiki/Probability_density_function>`_, | ||||
|        compute the relative likelihood that a random sample *X* will be near | ||||
|        the given value *x*.  Mathematically, it is the ratio ``P(x <= X < | ||||
|        x+dx) / dx``. | ||||
| 
 | ||||
|        Note the relative likelihood of *x* can be greater than `1.0`.  The | ||||
|        probability for a specific point on a continuous distribution is `0.0`, | ||||
|        so the :func:`pdf` is used instead.  It gives the probability of a | ||||
|        sample occurring in a narrow range around *x* and then dividing that | ||||
|        probability by the width of the range (hence the word "density"). | ||||
| 
 | ||||
|     .. method:: NormalDist.cdf(x) | ||||
| 
 | ||||
|        Using a `cumulative distribution function (cdf) | ||||
|        <https://en.wikipedia.org/wiki/Cumulative_distribution_function>`_, | ||||
|        compute the probability that a random sample *X* will be less than or | ||||
|        equal to *x*.  Mathematically, it is written ``P(X <= x)``. | ||||
| 
 | ||||
|     Instances of :class:`NormalDist` support addition, subtraction, | ||||
|     multiplication and division by a constant.  These operations | ||||
|     are used for translation and scaling.  For example: | ||||
| 
 | ||||
|     .. doctest:: | ||||
| 
 | ||||
|         >>> temperature_february = NormalDist(5, 2.5)             # Celsius | ||||
|         >>> temperature_february * (9/5) + 32                     # Fahrenheit | ||||
|         NormalDist(mu=41.0, sigma=4.5) | ||||
| 
 | ||||
|     Dividing a constant by an instance of :class:`NormalDist` is not supported. | ||||
| 
 | ||||
|     Since normal distributions arise from additive effects of independent | ||||
|     variables, it is possible to `add and subtract two normally distributed | ||||
|     random variables | ||||
|     <https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>`_ | ||||
|     represented as instances of :class:`NormalDist`.  For example: | ||||
| 
 | ||||
|     .. doctest:: | ||||
| 
 | ||||
|         >>> birth_weights = NormalDist.from_samples([2.5, 3.1, 2.1, 2.4, 2.7, 3.5]) | ||||
|         >>> drug_effects = NormalDist(0.4, 0.15) | ||||
|         >>> combined = birth_weights + drug_effects | ||||
|         >>> f'mu={combined.mu :.1f}   sigma={combined.sigma :.1f}' | ||||
|         'mu=3.1   sigma=0.5' | ||||
| 
 | ||||
|     .. versionadded:: 3.8 | ||||
| 
 | ||||
| 
 | ||||
| :class:`NormalDist` Examples and Recipes | ||||
| ---------------------------------------- | ||||
| 
 | ||||
| A :class:`NormalDist` readily solves classic probability problems. | ||||
| 
 | ||||
| For example, given `historical data for SAT exams | ||||
| <https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores | ||||
| are normally distributed with a mean of 1060 and standard deviation of 192, | ||||
| determine the percentage of students with scores between 1100 and 1200: | ||||
| 
 | ||||
| .. doctest:: | ||||
| 
 | ||||
|     >>> sat = NormalDist(1060, 195) | ||||
|     >>> fraction = sat.cdf(1200) - sat.cdf(1100) | ||||
|     >>> f'{fraction * 100 :.1f}% score between 1100 and 1200' | ||||
|     '18.2% score between 1100 and 1200' | ||||
| 
 | ||||
| To estimate the distribution for a model than isn't easy to solve | ||||
| analytically, :class:`NormalDist` can generate input samples for a `Monte | ||||
| Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_ of the | ||||
| model: | ||||
| 
 | ||||
| .. doctest:: | ||||
| 
 | ||||
|     >>> n = 100_000 | ||||
|     >>> X = NormalDist(350, 15).samples(n) | ||||
|     >>> Y = NormalDist(47, 17).samples(n) | ||||
|     >>> Z = NormalDist(62, 6).samples(n) | ||||
|     >>> model_simulation = [x * y / z for x, y, z in zip(X, Y, Z)] | ||||
|     >>> NormalDist.from_samples(model_simulation)           # doctest: +SKIP | ||||
|     NormalDist(mu=267.6516398754636, sigma=101.357284306067) | ||||
| 
 | ||||
| Normal distributions commonly arise in machine learning problems. | ||||
| 
 | ||||
| Wikipedia has a `nice example with a Naive Bayesian Classifier | ||||
| <https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_.  The challenge | ||||
| is to guess a person's gender from measurements of normally distributed | ||||
| features including height, weight, and foot size. | ||||
| 
 | ||||
| The `prior probability <https://en.wikipedia.org/wiki/Prior_probability>`_ of | ||||
| being male or female is 50%: | ||||
| 
 | ||||
| .. doctest:: | ||||
| 
 | ||||
|     >>> prior_male = 0.5 | ||||
|     >>> prior_female = 0.5 | ||||
| 
 | ||||
| We also have a training dataset with measurements for eight people.  These | ||||
| measurements are assumed to be normally distributed, so we summarize the data | ||||
| with :class:`NormalDist`: | ||||
| 
 | ||||
| .. doctest:: | ||||
| 
 | ||||
|     >>> height_male = NormalDist.from_samples([6, 5.92, 5.58, 5.92]) | ||||
|     >>> height_female = NormalDist.from_samples([5, 5.5, 5.42, 5.75]) | ||||
|     >>> weight_male = NormalDist.from_samples([180, 190, 170, 165]) | ||||
|     >>> weight_female = NormalDist.from_samples([100, 150, 130, 150]) | ||||
|     >>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10]) | ||||
|     >>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9]) | ||||
| 
 | ||||
| We observe a new person whose feature measurements are known but whose gender | ||||
| is unknown: | ||||
| 
 | ||||
| .. doctest:: | ||||
| 
 | ||||
|     >>> ht = 6.0        # height | ||||
|     >>> wt = 130        # weight | ||||
|     >>> fs = 8          # foot size | ||||
| 
 | ||||
| The posterior is the product of the prior times each likelihood of a | ||||
| feature measurement given the gender: | ||||
| 
 | ||||
| .. doctest:: | ||||
| 
 | ||||
|    >>> posterior_male = (prior_male * height_male.pdf(ht) * | ||||
|    ...                   weight_male.pdf(wt) * foot_size_male.pdf(fs)) | ||||
| 
 | ||||
|    >>> posterior_female = (prior_female * height_female.pdf(ht) * | ||||
|    ...                     weight_female.pdf(wt) * foot_size_female.pdf(fs)) | ||||
| 
 | ||||
| The final prediction is awarded to the largest posterior -- this is known as | ||||
| the `maximum a posteriori | ||||
| <https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation>`_ or MAP: | ||||
| 
 | ||||
| .. doctest:: | ||||
| 
 | ||||
|   >>> 'male' if posterior_male > posterior_female else 'female' | ||||
|   'female' | ||||
| 
 | ||||
| 
 | ||||
| .. | ||||
|    # This modelines must appear within the last ten lines of the file. | ||||
|    kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8; | ||||
|  |  | |||
|  | @ -278,6 +278,32 @@ Added :func:`statistics.fmean` as a faster, floating point variant of | |||
| :func:`statistics.mean()`.  (Contributed by Raymond Hettinger and | ||||
| Steven D'Aprano in :issue:`35904`.) | ||||
| 
 | ||||
| Added :class:`statistics.NormalDist`, a tool for creating | ||||
| and manipulating normal distributions of a random variable. | ||||
| (Contributed by Raymond Hettinger in :issue:`36018`.) | ||||
| 
 | ||||
| :: | ||||
| 
 | ||||
|     >>> temperature_feb = NormalDist.from_samples([4, 12, -3, 2, 7, 14]) | ||||
|     >>> temperature_feb | ||||
|     NormalDist(mu=6.0, sigma=6.356099432828281) | ||||
| 
 | ||||
|     >>> temperature_feb.cdf(3)            # Chance of being under 3 degrees | ||||
|     0.3184678262814532 | ||||
|     >>> # Relative chance of being 7 degrees versus 10 degrees | ||||
|     >>> temperature_feb.pdf(7) / temperature_feb.pdf(10) | ||||
|     1.2039930378537762 | ||||
| 
 | ||||
|     >>> el_nino = NormalDist(4, 2.5) | ||||
|     >>> temperature_feb += el_nino        # Add in a climate effect | ||||
|     >>> temperature_feb | ||||
|     NormalDist(mu=10.0, sigma=6.830080526611674) | ||||
| 
 | ||||
|     >>> temperature_feb * (9/5) + 32      # Convert to Fahrenheit | ||||
|     NormalDist(mu=50.0, sigma=12.294144947901014) | ||||
|     >>> temperature_feb.samples(3)        # Generate random samples | ||||
|     [7.672102882379219, 12.000027119750287, 4.647488369766392] | ||||
| 
 | ||||
| 
 | ||||
| tokenize | ||||
| -------- | ||||
|  |  | |||
|  | @ -76,7 +76,7 @@ | |||
| 
 | ||||
| """ | ||||
| 
 | ||||
| __all__ = [ 'StatisticsError', | ||||
| __all__ = [ 'StatisticsError', 'NormalDist', | ||||
|             'pstdev', 'pvariance', 'stdev', 'variance', | ||||
|             'median',  'median_low', 'median_high', 'median_grouped', | ||||
|             'mean', 'mode', 'harmonic_mean', 'fmean', | ||||
|  | @ -85,11 +85,13 @@ | |||
| import collections | ||||
| import math | ||||
| import numbers | ||||
| import random | ||||
| 
 | ||||
| from fractions import Fraction | ||||
| from decimal import Decimal | ||||
| from itertools import groupby | ||||
| from bisect import bisect_left, bisect_right | ||||
| from math import hypot, sqrt, fabs, exp, erf, tau | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
|  | @ -694,3 +696,155 @@ def pstdev(data, mu=None): | |||
|         return var.sqrt() | ||||
|     except AttributeError: | ||||
|         return math.sqrt(var) | ||||
| 
 | ||||
| ## Normal Distribution ##################################################### | ||||
| 
 | ||||
| class NormalDist: | ||||
|     'Normal distribution of a random variable' | ||||
|     # https://en.wikipedia.org/wiki/Normal_distribution | ||||
|     # https://en.wikipedia.org/wiki/Variance#Properties | ||||
| 
 | ||||
|     __slots__ = ('mu', 'sigma') | ||||
| 
 | ||||
|     def __init__(self, mu=0.0, sigma=1.0): | ||||
|         'NormalDist where mu is the mean and sigma is the standard deviation' | ||||
|         if sigma < 0.0: | ||||
|             raise StatisticsError('sigma must be non-negative') | ||||
|         self.mu = mu | ||||
|         self.sigma = sigma | ||||
| 
 | ||||
|     @classmethod | ||||
|     def from_samples(cls, data): | ||||
|         'Make a normal distribution instance from sample data' | ||||
|         if not isinstance(data, (list, tuple)): | ||||
|             data = list(data) | ||||
|         xbar = fmean(data) | ||||
|         return cls(xbar, stdev(data, xbar)) | ||||
| 
 | ||||
|     def samples(self, n, seed=None): | ||||
|         'Generate *n* samples for a given mean and standard deviation' | ||||
|         gauss = random.gauss if seed is None else random.Random(seed).gauss | ||||
|         mu, sigma = self.mu, self.sigma | ||||
|         return [gauss(mu, sigma) for i in range(n)] | ||||
| 
 | ||||
|     def pdf(self, x): | ||||
|         'Probability density function:  P(x <= X < x+dx) / dx' | ||||
|         variance = self.sigma ** 2.0 | ||||
|         if not variance: | ||||
|             raise StatisticsError('pdf() not defined when sigma is zero') | ||||
|         return exp((x - self.mu)**2.0 / (-2.0*variance)) / sqrt(tau * variance) | ||||
| 
 | ||||
|     def cdf(self, x): | ||||
|         'Cumulative density function:  P(X <= x)' | ||||
|         if not self.sigma: | ||||
|             raise StatisticsError('cdf() not defined when sigma is zero') | ||||
|         return 0.5 * (1.0 + erf((x - self.mu) / (self.sigma * sqrt(2.0)))) | ||||
| 
 | ||||
|     @property | ||||
|     def variance(self): | ||||
|         'Square of the standard deviation' | ||||
|         return self.sigma ** 2.0 | ||||
| 
 | ||||
|     def __add__(x1, x2): | ||||
|         if isinstance(x2, NormalDist): | ||||
|             return NormalDist(x1.mu + x2.mu, hypot(x1.sigma, x2.sigma)) | ||||
|         return NormalDist(x1.mu + x2, x1.sigma) | ||||
| 
 | ||||
|     def __sub__(x1, x2): | ||||
|         if isinstance(x2, NormalDist): | ||||
|             return NormalDist(x1.mu - x2.mu, hypot(x1.sigma, x2.sigma)) | ||||
|         return NormalDist(x1.mu - x2, x1.sigma) | ||||
| 
 | ||||
|     def __mul__(x1, x2): | ||||
|         return NormalDist(x1.mu * x2, x1.sigma * fabs(x2)) | ||||
| 
 | ||||
|     def __truediv__(x1, x2): | ||||
|         return NormalDist(x1.mu / x2, x1.sigma / fabs(x2)) | ||||
| 
 | ||||
|     def __pos__(x1): | ||||
|         return x1 | ||||
| 
 | ||||
|     def __neg__(x1): | ||||
|         return NormalDist(-x1.mu, x1.sigma) | ||||
| 
 | ||||
|     __radd__ = __add__ | ||||
| 
 | ||||
|     def __rsub__(x1, x2): | ||||
|         return -(x1 - x2) | ||||
| 
 | ||||
|     __rmul__ = __mul__ | ||||
| 
 | ||||
|     def __eq__(x1, x2): | ||||
|         if not isinstance(x2, NormalDist): | ||||
|             return NotImplemented | ||||
|         return (x1.mu, x2.sigma) == (x2.mu, x2.sigma) | ||||
| 
 | ||||
|     def __repr__(self): | ||||
|         return f'{type(self).__name__}(mu={self.mu!r}, sigma={self.sigma!r})' | ||||
| 
 | ||||
| 
 | ||||
| if __name__ == '__main__': | ||||
| 
 | ||||
|     # Show math operations computed analytically in comparsion | ||||
|     # to a monte carlo simulation of the same operations | ||||
| 
 | ||||
|     from math import isclose | ||||
|     from operator import add, sub, mul, truediv | ||||
|     from itertools import repeat | ||||
| 
 | ||||
|     g1 = NormalDist(10, 20) | ||||
|     g2 = NormalDist(-5, 25) | ||||
| 
 | ||||
|     # Test scaling by a constant | ||||
|     assert (g1 * 5 / 5).mu == g1.mu | ||||
|     assert (g1 * 5 / 5).sigma == g1.sigma | ||||
| 
 | ||||
|     n = 100_000 | ||||
|     G1 = g1.samples(n) | ||||
|     G2 = g2.samples(n) | ||||
| 
 | ||||
|     for func in (add, sub): | ||||
|         print(f'\nTest {func.__name__} with another NormalDist:') | ||||
|         print(func(g1, g2)) | ||||
|         print(NormalDist.from_samples(map(func, G1, G2))) | ||||
| 
 | ||||
|     const = 11 | ||||
|     for func in (add, sub, mul, truediv): | ||||
|         print(f'\nTest {func.__name__} with a constant:') | ||||
|         print(func(g1, const)) | ||||
|         print(NormalDist.from_samples(map(func, G1, repeat(const)))) | ||||
| 
 | ||||
|     const = 19 | ||||
|     for func in (add, sub, mul): | ||||
|         print(f'\nTest constant with {func.__name__}:') | ||||
|         print(func(const, g1)) | ||||
|         print(NormalDist.from_samples(map(func, repeat(const), G1))) | ||||
| 
 | ||||
|     def assert_close(G1, G2): | ||||
|         assert isclose(G1.mu, G1.mu, rel_tol=0.01), (G1, G2) | ||||
|         assert isclose(G1.sigma, G2.sigma, rel_tol=0.01), (G1, G2) | ||||
| 
 | ||||
|     X = NormalDist(-105, 73) | ||||
|     Y = NormalDist(31, 47) | ||||
|     s = 32.75 | ||||
|     n = 100_000 | ||||
| 
 | ||||
|     S = NormalDist.from_samples([x + s for x in X.samples(n)]) | ||||
|     assert_close(X + s, S) | ||||
| 
 | ||||
|     S = NormalDist.from_samples([x - s for x in X.samples(n)]) | ||||
|     assert_close(X - s, S) | ||||
| 
 | ||||
|     S = NormalDist.from_samples([x * s for x in X.samples(n)]) | ||||
|     assert_close(X * s, S) | ||||
| 
 | ||||
|     S = NormalDist.from_samples([x / s for x in X.samples(n)]) | ||||
|     assert_close(X / s, S) | ||||
| 
 | ||||
|     S = NormalDist.from_samples([x + y for x, y in zip(X.samples(n), | ||||
|                                                        Y.samples(n))]) | ||||
|     assert_close(X + Y, S) | ||||
| 
 | ||||
|     S = NormalDist.from_samples([x - y for x, y in zip(X.samples(n), | ||||
|                                                        Y.samples(n))]) | ||||
|     assert_close(X - Y, S) | ||||
|  |  | |||
|  | @ -5,9 +5,11 @@ | |||
| 
 | ||||
| import collections | ||||
| import collections.abc | ||||
| import copy | ||||
| import decimal | ||||
| import doctest | ||||
| import math | ||||
| import pickle | ||||
| import random | ||||
| import sys | ||||
| import unittest | ||||
|  | @ -2025,6 +2027,181 @@ def test_compare_to_variance(self): | |||
|         expected = math.sqrt(statistics.variance(data)) | ||||
|         self.assertEqual(self.func(data), expected) | ||||
| 
 | ||||
| class TestNormalDist(unittest.TestCase): | ||||
| 
 | ||||
|     def test_slots(self): | ||||
|         nd = statistics.NormalDist(300, 23) | ||||
|         with self.assertRaises(TypeError): | ||||
|             vars(nd) | ||||
|         self.assertEqual(nd.__slots__, ('mu', 'sigma')) | ||||
| 
 | ||||
|     def test_instantiation_and_attributes(self): | ||||
|         nd = statistics.NormalDist(500, 17) | ||||
|         self.assertEqual(nd.mu, 500) | ||||
|         self.assertEqual(nd.sigma, 17) | ||||
|         self.assertEqual(nd.variance, 17**2) | ||||
| 
 | ||||
|         # default arguments | ||||
|         nd = statistics.NormalDist() | ||||
|         self.assertEqual(nd.mu, 0) | ||||
|         self.assertEqual(nd.sigma, 1) | ||||
|         self.assertEqual(nd.variance, 1**2) | ||||
| 
 | ||||
|         # error case: negative sigma | ||||
|         with self.assertRaises(statistics.StatisticsError): | ||||
|             statistics.NormalDist(500, -10) | ||||
| 
 | ||||
|     def test_alternative_constructor(self): | ||||
|         NormalDist = statistics.NormalDist | ||||
|         data = [96, 107, 90, 92, 110] | ||||
|         # list input | ||||
|         self.assertEqual(NormalDist.from_samples(data), NormalDist(99, 9)) | ||||
|         # tuple input | ||||
|         self.assertEqual(NormalDist.from_samples(tuple(data)), NormalDist(99, 9)) | ||||
|         # iterator input | ||||
|         self.assertEqual(NormalDist.from_samples(iter(data)), NormalDist(99, 9)) | ||||
|         # error cases | ||||
|         with self.assertRaises(statistics.StatisticsError): | ||||
|             NormalDist.from_samples([])                      # empty input | ||||
|         with self.assertRaises(statistics.StatisticsError): | ||||
|             NormalDist.from_samples([10])                    # only one input | ||||
| 
 | ||||
|     def test_sample_generation(self): | ||||
|         NormalDist = statistics.NormalDist | ||||
|         mu, sigma = 10_000, 3.0 | ||||
|         X = NormalDist(mu, sigma) | ||||
|         n = 1_000 | ||||
|         data = X.samples(n) | ||||
|         self.assertEqual(len(data), n) | ||||
|         self.assertEqual(set(map(type, data)), {float}) | ||||
|         # mean(data) expected to fall within 8 standard deviations | ||||
|         xbar = statistics.mean(data) | ||||
|         self.assertTrue(mu - sigma*8 <= xbar <= mu + sigma*8) | ||||
| 
 | ||||
|         # verify that seeding makes reproducible sequences | ||||
|         n = 100 | ||||
|         data1 = X.samples(n, seed='happiness and joy') | ||||
|         data2 = X.samples(n, seed='trouble and despair') | ||||
|         data3 = X.samples(n, seed='happiness and joy') | ||||
|         data4 = X.samples(n, seed='trouble and despair') | ||||
|         self.assertEqual(data1, data3) | ||||
|         self.assertEqual(data2, data4) | ||||
|         self.assertNotEqual(data1, data2) | ||||
| 
 | ||||
|         # verify that subclass type is honored | ||||
|         class NewNormalDist(NormalDist): | ||||
|             pass | ||||
|         nnd = NewNormalDist(200, 5) | ||||
|         self.assertEqual(type(nnd), NewNormalDist) | ||||
| 
 | ||||
|     def test_pdf(self): | ||||
|         NormalDist = statistics.NormalDist | ||||
|         X = NormalDist(100, 15) | ||||
|         # Verify peak around center | ||||
|         self.assertLess(X.pdf(99), X.pdf(100)) | ||||
|         self.assertLess(X.pdf(101), X.pdf(100)) | ||||
|         # Test symmetry | ||||
|         self.assertAlmostEqual(X.pdf(99), X.pdf(101)) | ||||
|         self.assertAlmostEqual(X.pdf(98), X.pdf(102)) | ||||
|         self.assertAlmostEqual(X.pdf(97), X.pdf(103)) | ||||
|         # Test vs CDF | ||||
|         dx = 2.0 ** -10 | ||||
|         for x in range(90, 111): | ||||
|             est_pdf = (X.cdf(x + dx) - X.cdf(x)) / dx | ||||
|             self.assertAlmostEqual(X.pdf(x), est_pdf, places=4) | ||||
|         # Error case: variance is zero | ||||
|         Y = NormalDist(100, 0) | ||||
|         with self.assertRaises(statistics.StatisticsError): | ||||
|             Y.pdf(90) | ||||
| 
 | ||||
|     def test_cdf(self): | ||||
|         NormalDist = statistics.NormalDist | ||||
|         X = NormalDist(100, 15) | ||||
|         cdfs = [X.cdf(x) for x in range(1, 200)] | ||||
|         self.assertEqual(set(map(type, cdfs)), {float}) | ||||
|         # Verify montonic | ||||
|         self.assertEqual(cdfs, sorted(cdfs)) | ||||
|         # Verify center | ||||
|         self.assertAlmostEqual(X.cdf(100), 0.50) | ||||
|         # Error case: variance is zero | ||||
|         Y = NormalDist(100, 0) | ||||
|         with self.assertRaises(statistics.StatisticsError): | ||||
|             Y.cdf(90) | ||||
| 
 | ||||
|     def test_same_type_addition_and_subtraction(self): | ||||
|         NormalDist = statistics.NormalDist | ||||
|         X = NormalDist(100, 12) | ||||
|         Y = NormalDist(40, 5) | ||||
|         self.assertEqual(X + Y, NormalDist(140, 13))        # __add__ | ||||
|         self.assertEqual(X - Y, NormalDist(60, 13))         # __sub__ | ||||
| 
 | ||||
|     def test_translation_and_scaling(self): | ||||
|         NormalDist = statistics.NormalDist | ||||
|         X = NormalDist(100, 15) | ||||
|         y = 10 | ||||
|         self.assertEqual(+X, NormalDist(100, 15))           # __pos__ | ||||
|         self.assertEqual(-X, NormalDist(-100, 15))          # __neg__ | ||||
|         self.assertEqual(X + y, NormalDist(110, 15))        # __add__ | ||||
|         self.assertEqual(y + X, NormalDist(110, 15))        # __radd__ | ||||
|         self.assertEqual(X - y, NormalDist(90, 15))         # __sub__ | ||||
|         self.assertEqual(y - X, NormalDist(-90, 15))        # __rsub__ | ||||
|         self.assertEqual(X * y, NormalDist(1000, 150))      # __mul__ | ||||
|         self.assertEqual(y * X, NormalDist(1000, 150))      # __rmul__ | ||||
|         self.assertEqual(X / y, NormalDist(10, 1.5))        # __truediv__ | ||||
|         with self.assertRaises(TypeError): | ||||
|             y / X | ||||
| 
 | ||||
|     def test_equality(self): | ||||
|         NormalDist = statistics.NormalDist | ||||
|         nd1 = NormalDist() | ||||
|         nd2 = NormalDist(2, 4) | ||||
|         nd3 = NormalDist() | ||||
|         self.assertNotEqual(nd1, nd2) | ||||
|         self.assertEqual(nd1, nd3) | ||||
| 
 | ||||
|         # Test NotImplemented when types are different | ||||
|         class A: | ||||
|             def __eq__(self, other): | ||||
|                 return 10 | ||||
|         a = A() | ||||
|         self.assertEqual(nd1.__eq__(a), NotImplemented) | ||||
|         self.assertEqual(nd1 == a, 10) | ||||
|         self.assertEqual(a == nd1, 10) | ||||
| 
 | ||||
|         # All subclasses to compare equal giving the same behavior | ||||
|         # as list, tuple, int, float, complex, str, dict, set, etc. | ||||
|         class SizedNormalDist(NormalDist): | ||||
|             def __init__(self, mu, sigma, n): | ||||
|                 super().__init__(mu, sigma) | ||||
|                 self.n = n | ||||
|         s = SizedNormalDist(100, 15, 57) | ||||
|         nd4 = NormalDist(100, 15) | ||||
|         self.assertEqual(s, nd4) | ||||
| 
 | ||||
|         # Don't allow duck type equality because we wouldn't | ||||
|         # want a lognormal distribution to compare equal | ||||
|         # to a normal distribution with the same parameters | ||||
|         class LognormalDist: | ||||
|             def __init__(self, mu, sigma): | ||||
|                 self.mu = mu | ||||
|                 self.sigma = sigma | ||||
|         lnd = LognormalDist(100, 15) | ||||
|         nd = NormalDist(100, 15) | ||||
|         self.assertNotEqual(nd, lnd) | ||||
| 
 | ||||
|     def test_pickle_and_copy(self): | ||||
|         nd = statistics.NormalDist(37.5, 5.625) | ||||
|         nd1 = copy.copy(nd) | ||||
|         self.assertEqual(nd, nd1) | ||||
|         nd2 = copy.deepcopy(nd) | ||||
|         self.assertEqual(nd, nd2) | ||||
|         nd3 = pickle.loads(pickle.dumps(nd)) | ||||
|         self.assertEqual(nd, nd3) | ||||
| 
 | ||||
|     def test_repr(self): | ||||
|         nd = statistics.NormalDist(37.5, 5.625) | ||||
|         self.assertEqual(repr(nd), 'NormalDist(mu=37.5, sigma=5.625)') | ||||
| 
 | ||||
| 
 | ||||
| # === Run tests === | ||||
| 
 | ||||
|  |  | |||
|  | @ -0,0 +1,3 @@ | |||
| Add statistics.NormalDist, a tool for creating and manipulating normal | ||||
| distributions of random variable.  Features a composite class that treats | ||||
| the mean and standard deviation of measurement data as single entity. | ||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue
	
	 Raymond Hettinger
						Raymond Hettinger