A Data Generating Process for Improper Payments

Author

Wade K. Copeland

Published

January 30, 2026

Statistical Methods

The proposed data generating process for improper payments involves a mixture distribution.

Payments are generated using a truncated gamma distribution X_i \sim \Gamma_T(\alpha, \theta) (see Appendix 3.1 for details). Practically speaking, unlike the gamma distribution, the first and second moments of the truncated gamma distribution don’t have closed-form solutions, so the values of \alpha and \theta are solved for numerically for a given E[X] and CV[X] = \frac{SD[X]}{E[X]}

Improper payments are stright forward. Let Y_i = B_iX_iZ_i where B_i \sim Unif(a \in [0, 1], b \ge a \in [0, 1]) is the percentage of each payment that is improper, and Z_{i} \sim Bin(1,p), where 1 is the number of trials and p is the probability of observing an improper payment (Bain and Engelhardt 1992, 4:92–95, 109–10).

Example

Suppose we want to generate 100,000 payments with mean payment amount of $100 and a coefficient of variation of 0.5. The percentage of each payment that is improper is between 40% and 60% of the payment amount, and the probability that each payment is improper is 10%.

We can accomplish this in Python using the improper_payments_dgp module (source code available in Appendix 3.2). From this module, the function improper_payment_rvs generates random variates from a truncated gamma distribution. It accepts the following arguments:

  • mean_target: Target mean of the truncated gamma distribution.
  • cv_target: Target coefficient of variation of the truncated gamma distribution.
  • A: Lower bound of the truncated gamma distribution.
  • B: Upper bound of the truncated gamma distribution.
  • b: Tuple indicating the lower and upper bounds of the uniform distribution for the percentage of each payment that is improper.
    • If a single value is provided, it is used for both the lower and upper bounds.
  • p_improper: Probability that a payment is improper.
  • size: Number of random variates to generate.
  • random_state: Seed for reproducibility.
    • Set to an integer for consistent results.
    • Set to None if reproducibility is not required.
from improper_payments_dgp import improper_payment_rvs
import matplotlib.pyplot as plt

pop_data = improper_payment_rvs(mean_target = 100, cv_target = 1/2, A = 0, B = 1000, b = (0.4, 0.6), p_improper = 0.1, size = 100000, random_state = 123)

Below is the population data. The function returns a Pandas DataFrame with the following:

  • X: Random variate(s) of a truncated gamma distribution for the payment amount.
  • B: Random variate(s) of a uniform distribution for the percent of the payment that is improper.
  • Z: Random variate(s) of a binomial distribution that indicate if a payment is improper.
  • Y: Random variate(s) for the improper payment amount.
pop_data
X B Z Y
0 116.246213 0.539294 0 0.0
1 35.021219 0.457228 0 0.0
2 59.890232 0.445370 0 0.0
3 55.471881 0.510263 0 0.0
4 54.394463 0.543894 0 0.0
... ... ... ... ...
99995 133.513989 0.476092 0 0.0
99996 79.941630 0.504328 0 0.0
99997 91.298149 0.573550 0 0.0
99998 22.203274 0.513245 0 0.0
99999 92.776933 0.522579 0 0.0

100000 rows × 4 columns

print(f"The mean payment amount is ${pop_data.X.mean():.2f} with a total payment amount of ${pop_data.X.sum():,.2f}.\nThe coefficient of variation for payment amounts is {pop_data.X.var()**0.5/pop_data.X.mean():.2%}.\nThe minimum and maximum percentages of improper payments are {pop_data.B.min():.2%} and {pop_data.B.max():.2%}, respectively.\nThe probability of an improper payment is {pop_data.Z.mean():.2%}.\nThe mean improper payment amount (conditional on being improper) is ${pop_data.Y[pop_data.Y > 0].mean():.2f} with a total improper payment amount of ${pop_data.Y.sum():,.2f}.")
The mean payment amount is $99.97 with a total payment amount of $9,996,861.11.
The coefficient of variation for payment amounts is 49.98%.
The minimum and maximum percentages of improper payments are 40.00% and 60.00%, respectively.
The probability of an improper payment is 9.95%.
The mean improper payment amount (conditional on being improper) is $59.66 with a total improper payment amount of $593,688.44.
plt.figure(figsize=(8,5))
plt.hist(pop_data.Y[pop_data.Y > 0], bins = "fd", density = False, alpha = 0.6, color = "steelblue", edgecolor = 'white')
plt.title(f"Distribution of Improper Payments")
plt.xlabel("Improper Payment Amount ($)")
plt.ylabel("Frequency")
plt.grid(True, alpha = 0.2)
plt.show()

Appendicies

Truncated Gamma Distribution

A continuous random variable X \sim GAM(\alpha, \theta) if it has a probability density function (PDF) of the form in Equation 1 [(Bain and Engelhardt 1992, 4:111)].

\begin{aligned} f(x|\alpha, \theta) = \frac{x^{\alpha-1}e^{-x/\theta}}{\theta^\alpha \Gamma(\alpha)} \text{ } \text{ for } x, \alpha, \theta > 0 \text{ with } \\ \Gamma(\alpha) = \int_{0}^{\infty} t^{\alpha - 1}e^{-t} dt \end{aligned} \tag{1}

For this PDF the cumulative distribution function (CDF) (a.k.a, Regularized Lower Incomplete Gamma Function) is shown in Equation 2 [(Bain and Engelhardt 1992, 4:112)].

F(x|\alpha, \theta) = \int_{0}^x f(u|\alpha, \theta) du = \frac{\gamma(\alpha, \frac{x}{\theta})}{\Gamma(\alpha)} \text { with } \gamma(\alpha, \frac{x}{\theta}) = \int_{0}^{x/\theta} t^{\alpha - 1}e^{-t} dt \tag{2}

The support of x is [0, \infty), so to create a truncated distribution we need to restrict this support to the interval [A, B] with 0 \le A < B. This implies the following truncated PDF in Equation 3 (e.g., \int_{A}^{B} f_T(x)dx = 1).

f_T(x|\alpha, \theta) = \frac{f(x)}{F(B|\alpha, \theta) - F(A|\alpha, \theta)} \text{ } \forall x \in [A, B] \tag{3}

For the truncated gamma distribution the first two moments are shown in Equation 4 and Equation 5. Detailed derivations for each of these moments can be found in Appendix 3.1.1. With the first two moments in hand, the variance follows immediately since Var[X] = E[X^2] - (E[X])^2 [(Bain and Engelhardt 1992, 4:74)].

E[X|A \le X \le B] = \frac{\int_{A}^{B} xf(x)dx}{\int_{A}^{B} f(x)dx} = \theta \alpha \frac{F(B|\alpha + 1, \theta) - F(A|\alpha + 1, \theta)}{F(B|\alpha, \theta) - F(A|\alpha, \theta)} \tag{4}

E[X^2|A \le X \le B] = \frac{\int_{A}^{B} x^2f(x)dx}{\int_{A}^{B} f(x)dx} = \theta^2(\alpha + 1)\alpha \frac{F(B|\alpha+2, \theta) - F(A|\alpha + 2, \theta)}{F(B|\alpha, \theta) - F(A|\alpha, \theta)} \tag{5}

Detailed Derivations

Derivation of the Denominator for Equation 4 and Equation 5

Theorem 1 shows the derivation of the denominator for Equation 4 and Equation 5.

Theorem 1 \int_{A}^{B} f(x)dx = F(B|\alpha, \theta) - F(A|\alpha, \theta)

Proof. \int_{A}^{B} f(x)dx = \frac{1}{\theta^{\alpha} \Gamma(\alpha)} \int_{A}^{B} x^{\alpha-1}e^{-x/\theta}

Let t = \frac{x}{\theta} \implies x = t\theta and \frac{dx}{dt} = \theta \implies dx = \theta dt

Then \int_{A}^{B} x^{\alpha-1}e^{-x/\theta} = \int_{A/\theta}^{B/\theta} (t\theta)^{\alpha-1}e^{-t}\theta dt = \theta^{\alpha} \int_{A/\theta}^{B/\theta} t^{\alpha-1}e^{-t}dt = \theta^{\alpha} \big((\int_{0}^{B/\theta} t^{\alpha-1}e^{-t}dt) - (\int_{0}^{A/\theta} t^{\alpha-1}e^{-t}dt)\big) = \theta^{\alpha} (\gamma(\alpha, \frac{B}{\theta}) - \gamma(\alpha, \frac{A}{\theta}))

\therefore \int_{A}^{B} f(x)dx = \frac{1}{\theta^{\alpha} \Gamma(\alpha)} (\theta^{\alpha} (\gamma(\alpha, \frac{B}{\theta}) - \gamma(\alpha, \frac{A}{\theta})) = F(B|\alpha, \theta) - F(A|\alpha, \theta)

Derivation of the Numerator for Equation 4

Theorem 2 shows the derivation of the numerator for Equation 4.

Theorem 2 \int_{A}^{B} xf(x)dx = \theta \alpha (F(B|\alpha+1, \theta) - F(A|\alpha+1, \theta))

Proof. \int_{A}^{B} xf(x)dx = \frac{1}{\theta^{\alpha} \Gamma(\alpha)} \int_{A}^{B} x^1 x^{\alpha - 1}e^{-x/\theta} = \frac{1}{\theta^{\alpha} \Gamma(\alpha)} \int_{A}^{B} x^{\alpha}e^{-x/\theta}

Let t = \frac{x}{\theta} \implies x = t\theta and \frac{dx}{dt} = \theta \implies dx = \theta dt

Then \int_{A}^{B} x^{\alpha}e^{-x/\theta} = \int_{A/\theta}^{B/\theta} (t\theta)^{\alpha}e^{-t} \theta dt = \theta^{\alpha + 1} \int_{A/\theta}^{B/\theta} t^{\alpha} e^{-t} dt = \theta^{\alpha + 1} (\gamma(\alpha + 1, \frac{B}{\theta}) - \gamma(\alpha + 1, \frac{A}{\theta}))

Substituting this integral into the original equation we have,

\int_{A}^{B} xf(x)dx = \frac{1}{\theta^{\alpha} \Gamma(\alpha)} \theta^{\alpha + 1} (\gamma(\alpha + 1, \frac{B}{\theta}) - \gamma(\alpha + 1, \frac{A}{\theta})) = \theta \frac{\gamma(\alpha + 1, \frac{B}{\theta}) - \gamma(\alpha + 1, \frac{A}{\theta})}{\Gamma(\alpha)}

Note \Gamma(\alpha + 1) = \alpha \Gamma(\alpha) [(Bain and Engelhardt 1992, 4:111)].

\therefore \int_{A}^{B} xf(x)dx = \theta \frac{\gamma(\alpha + 1, \frac{B}{\theta}) - \gamma(\alpha + 1, \frac{A}{\theta})}{\Gamma(\alpha)} = \theta \frac{\gamma(\alpha + 1, \frac{B}{\theta}) - \gamma(\alpha + 1, \frac{A}{\theta})}{\Gamma(\alpha+1)/\alpha} = \theta \alpha (F(B|\alpha+1, \theta) - F(A|\alpha+1, \theta))

Derivation of the Numerator for Equation 5

Theorem 3 shows the derivation of the numerator for Equation 5.

Theorem 3 \int_{A}^{B} x^2f(x)dx = \theta^2(\alpha + 1)\alpha (F(B|\alpha + 2, \theta)) - (F(A|\alpha + 1, \theta))

Proof. \int_{A}^{B} x^2f(x)dx = \frac{1}{\theta^{\alpha} \Gamma(\alpha)} \int_{A}^{B} x^2 x^{\alpha - 1}e^{-x/\theta} = \frac{1}{\theta^{\alpha} \Gamma(\alpha)} \int_{A}^{B} x^{\alpha + 1}e^{-x/\theta}

Let t = \frac{x}{\theta} \implies x = t\theta and \frac{dx}{dt} = \theta \implies dx = \theta dt

Then \int_{A}^{B} x^{\alpha+1}e^{-x/\theta} = \int_{A/\theta}^{B/\theta} (t\theta)^{\alpha+1}e^{-t} \theta dt = \theta^{\alpha + 2} \int_{A/\theta}^{B/\theta} t^{\alpha + 1} e^{-t} dt = \theta^{\alpha + 2} (\gamma(\alpha + 2, \frac{B}{\theta}) - \gamma(\alpha + 2, \frac{A}{\theta}))

Substituting this integral into the original equation we have,

\int_{A}^{B} x^2f(x)dx = \frac{1}{\theta^{\alpha} \Gamma(\alpha)} \theta^{\alpha + 2} (\gamma(\alpha + 2, \frac{B}{\theta}) - \gamma(\alpha + 2, \frac{A}{\theta})) = \theta^2 \frac{\gamma(\alpha + 2, \frac{B}{\theta}) - \gamma(\alpha + 2, \frac{A}{\theta})}{\Gamma(\alpha)}

Note \Gamma(\alpha + 2) = (\alpha+1) \alpha \Gamma(\alpha) [(Bain and Engelhardt 1992, 4:111)].

\therefore \int_{A}^{B} x^2f(x)dx = \theta^2 \frac{\gamma(\alpha + 2, \frac{B}{\theta}) - \gamma(\alpha + 2, \frac{A}{\theta})}{\Gamma(\alpha+2)/(\alpha + 1)\alpha} = \theta^2(\alpha + 1)\alpha (F(B|\alpha + 2, \theta) - F(A|\alpha + 2, \theta))

Python Source Code

Module: improper_payments_dgp.py

import numpy as np
import pandas as pd
from scipy.stats import uniform, binom, gamma
from scipy.special import gammainc
from scipy.optimize import root

class TruncatedGamma:
    """Class object for the truncated gamma distribution.
    
    This class object calculates distribution parameters for a truncated gamma distribution.
    
    Attributes:
        A (int or float): The lower bound for the truncated gamma distribution.
        B (int or float): The upper bound for the truncated gamma distribution.
        alpha (int or float): The value of the parameter alpha.
        theta (int or float): The value of the parameter theta.
    """
    
    def __init__(self, A, B, alpha, theta):
        """Initializes a new TruncatedGamma instance.
        
        Args:
            A (int or float): The lower bound for the truncated gamma distribution.
            B (int or float): The upper bound for the truncated gamma distribution.
            alpha (int or float): The value of the parameter alpha.
            theta (int or float): The value of the parameter theta.
        
        Attributes:
            A (int or float): The lower bound for the truncated gamma distribution.
            B (int or float): The upper bound for the truncated gamma distribution.
            alpha (int or float): The value of the parameter alpha.
            theta (int or float): The value of the parameter theta.
        
        Returns:
            None
        
        Raises:
            ValueError if alpha is less than zero.
            ValueError if theta is less than zero.
            ValueError if A is less than zero.
            ValueError if B is less than or equal to A.
        
        Example:
            x = TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 1000)
            x.alpha, x.theta, x.A, x.B
        """
        
        if alpha <= 0:
            raise ValueError("The value of alpha should be greater than zero.")
        
        if theta <= 0:
            raise ValueError("The value of theta should be greater than zero.")
        
        if A < 0:
            raise ValueError("The lower bound A should be greater than or equal to zero.")
        
        if B <= A:
            raise ValueError("The upper bound B should be greater than A.")
        
        self.alpha = alpha
        self.theta = theta
        self.A = A
        self.B = B
        
        return None
    
    def truncgamma_m1(self):
        """
        Calculate the first moment for the truncated gamma distribution.
        
        Args:
            None
        
        Attributes:
            None
        
        Returns:
            Returns the first moment of the truncated gamma distribution.
        
        Example:
            >>> x = TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 1000)
            >>> x.truncgamma_m1()
        """
        zA, zB = self.A/self.theta, self.B/self.theta
        denominator = gammainc(self.alpha, zB) - gammainc(self.alpha, zA)
        numerator = gammainc(self.alpha + 1, zB) - gammainc(self.alpha + 1, zA)
        m1 = self.alpha*self.theta*(numerator/denominator)
        return m1
    
    def truncgamma_m2(self):
        """
        Calculate the second moment for the truncated gamma distribution.
        
        Args:
            None
        
        Attributes:
            None
        
        Returns:
            Returns the second moment of the truncated gamma distribution.
        
        Example:
            >>> x = TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 1000)
            >>> x.truncgamma_m2()
        """
        zA, zB = self.A/self.theta, self.B/self.theta
        denominator = gammainc(self.alpha, zB) - gammainc(self.alpha, zA)
        numerator = gammainc(self.alpha + 2, zB) - gammainc(self.alpha + 2, zA)
        m2 = self.alpha*(self.alpha + 1)*self.theta**2*(numerator/denominator)
        return m2
    
    def truncgamma_var(self):
        """
        Calculate the variance for the truncated gamma distribution.
        
        Args:
            None
        
        Attributes:
            None
        
        Returns:
            Returns the variance of the truncated gamma distribution.
        
        Example:
            >>> x = TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 1000)
            >>> x.truncgamma_var()
        """
        m1 = self.truncgamma_m1()
        m2 = self.truncgamma_m2()
        variance = m2 - m1**2
        return variance
    
    def truncgamma_cv(self):
        """
        Calculate the coefficient of variation for the truncated gamma distribution.
        
        Args:
            None
        
        Attributes:
            None
        
        Returns:
            Returns the coefficient of variation of the truncated gamma distribution.
        
        Example:
            >>> x = TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 1000)
            >>> x.truncgamma_cv()
        """
        var = self.truncgamma_var()
        mean = self.truncgamma_m1()
        cv = var**0.5/mean
        return cv

def truncgamma_rvs(mean_target, cv_target, A, B, size = 1, random_state = None):
    """
    Generate random variates from a truncated gamma distribution.
    
    Args:
        mean_target (int or float):  The target mean for truncated gamma distribution.
        cv_target (float):  The target coefficient of variation for truncated gamma distribution.
        A (int or float): The lower bound for the truncated gamma distribution.
        B (int or float): The upper bound for the truncated gamma distribution.
        size (int): The number of random variates to generate.
        random_state (int): The seed for random number generation to create reproducible results.
     
    Returns:
        A numpy.ndarray containing the random variates from a truncated gamma distribution.
    
    Example:
        >>> x = truncgamma_rvs(mean_target = 100, cv_target = 1/2, A = 0, B = 1000, size = 10, random_state = 123)
    """
    # Calculate the starting values of alpha and theta from an untruncated gamma distribution
    alpha0 = 1/cv_target**2
    theta0 = mean_target/alpha0
    
    # Find the roots from the first and second moments of the truncated gamma distrubtion to determine the values of alpha theta for the target mean and coefficient of variation.
    def system(vars):
        log_alpha, log_theta = vars
        alpha, theta = np.exp(log_alpha), np.exp(log_theta)
        x = TruncatedGamma(alpha = alpha, theta = theta, A = A, B = B)
        mean = x.truncgamma_m1()
        cv = x.truncgamma_cv()
        return np.array([mean - mean_target, cv - cv_target])
    
    x0 = np.array([np.log(alpha0), np.log(theta0)])
    sol = root(system, x0, method='hybr')
    
    if not sol.success: 
        raise ValueError(f"Unable to find alpha and theta using the 2D-solver!:\n\n{sol}")
    else:
        alpha, theta = np.exp(sol.x)
    
    # Use the calculated values of alpha and theta to draw samples from the truncated gamma distrubtion.
    rng = np.random.default_rng(random_state)
    FA = gamma.cdf(x = A, a = alpha, scale = theta)
    FB = gamma.cdf(x = B, a = alpha, scale = theta)
    u = rng.uniform(FA, FB, size = size)
    res = gamma.ppf(u, a = alpha, scale = theta)
    
    return res

def improper_payment_rvs(mean_target, cv_target, A, B, b, p_improper, size, random_state = None):
    """This function generates random variates for improper payments.
    
    Args:
        mean_target (int or float):  The target mean for truncated gamma distribution.
        cv_target (float):  The target coefficient of variation for truncated gamma distribution.
        A (int or float): The lower bound for the truncated gamma distribution.
        B (int or float): The upper bound for the truncated gamma distribution.
        b (tuple): Bounds of the uniform distribution (0 <= b <= 1) for the percentage of each payment that is improper.
        p_improper (float): Probability that a given payment is improper.
        size (int): The number of random variates to generate.
        random_state (int): The seed for random number generation to create reproducible results.
        
    Returns:
        A Pandas DataFrame with the following:
            X: Random variate(s) of a truncated gamma distribution for the payment amount.
            B: Random variate(s) of a uniform distribution for the percent of the payment that is improper.
            Z: Random variate(s) of a binomial distribution that indicate if a payment is improper.
            Y: Random variate(s) for the improper payment amount.
        
    Example:
        >>> pop_data = improper_payment_rvs(mean_target = 100, cv_target = 1/2, A = 0, B = 1000, b = (0.4, 0.6), p_improper = 0.1, size = 100000, random_state = 123)
        >>> print(f"The mean payment amount is ${pop_data.X.mean():.2f} with a total payment amount of ${pop_data.X.sum():,.2f}.\nThe coefficient of variation for payment amounts is {pop_data.X.var()**0.5/pop_data.X.mean():.2%}.\nThe minimum and maximum percentages of improper payments are {pop_data.B.min():.2%} and {pop_data.B.max():.2%}, respectively.\nThe probability of an improper payment is {pop_data.Z.mean():.2%}.\nThe mean improper payment amount (conditional on being improper) is ${pop_data.Y[pop_data.Y > 0].mean():.2f} with a total improper payment amount of ${pop_data.Y.sum():,.2f}.")
    """
    b = b if type(b) is tuple else (b, b)
    
    X = truncgamma_rvs(mean_target = mean_target, cv_target = cv_target, A = A, B = B, size = size, random_state = random_state)
    B = uniform.rvs(size = size, loc = b[0], scale = b[1] - b[0], random_state = random_state)
    Z = binom.rvs(size = size, n = 1, p = p_improper, random_state = random_state)
    Y = X*B*Z
    
    pop_data = pd.DataFrame({
        "X": X,
        "B": B,
        "Z": Z,
        "Y": Y})
    
    return pop_data

Unit Tests: test_improper_payments_dgp.py

Unit tests for the improper_payments_dgp module are provided in the test_improper_payments_dgp.py file.

from improper_payments_dgp import TruncatedGamma, truncgamma_rvs, improper_payment_rvs
import numpy as np
import pandas as pd
import numpy.testing as npt
import pytest

def test_initilization():
    x = TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 1000)
    assert isinstance(x, TruncatedGamma)
    assert (x.alpha, x.theta, x.A, x.B) == (4, 25, 0, 1000)

    with pytest.raises(ValueError) as exc_info:
        TruncatedGamma(alpha = 0, theta = 25, A = 0, B = 1000)
    assert 'The value of alpha should be greater than zero.' in str(exc_info.value)

    with pytest.raises(ValueError) as exc_info:
        TruncatedGamma(alpha = 4, theta = 0, A = 0, B = 1000)
    assert 'The value of theta should be greater than zero.' in str(exc_info.value)
    
    with pytest.raises(ValueError) as exc_info:
        TruncatedGamma(alpha = 4, theta = 25, A = -1000, B = 1000)
    assert 'The lower bound A should be greater than or equal to zero.' in str(exc_info.value)
    
    with pytest.raises(ValueError) as exc_info:
        TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 0)
    assert 'The upper bound B should be greater than A.' in str(exc_info.value)

def test_truncgamma_m1():
    x = TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 1000)
    assert x.truncgamma_m1() == np.float64(99.99999999995468)

def test_truncgamma_m2():
    x = TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 1000)
    assert x.truncgamma_m2() == np.float64(12499.99999994902)

def test_truncgamma_var():
    x = TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 1000)
    assert x.truncgamma_var() == np.float64(2499.999999958083)

def test_truncgamma_cv():
    x = TruncatedGamma(alpha = 4, theta = 25, A = 0, B = 1000)
    assert x.truncgamma_cv() == np.float64(0.4999999999960349)

def test_truncgamma_rvs():
    actual = truncgamma_rvs(mean_target = 100, cv_target = 1/2, A = 0, B = 1000, size = 10, random_state = 123)
    expected = np.array([116.24621327,  35.02121931,  59.89023217,  55.47188101, 54.39446304, 140.63124805, 177.52242542,  66.43900397, 142.4522452 , 163.13733856])
    npt.assert_allclose(actual, expected)

def test_improper_payment_rvs():
    pop_data = improper_payment_rvs(mean_target = 100, cv_target = 1/2, A = 0, B = 1000, b = (0.4, 0.6), p_improper = 0.1, size = 100000, random_state = 123)
    
    assert isinstance(pop_data, pd.DataFrame)
    assert list(pop_data.columns) == ['X', 'B', 'Z', 'Y']
    assert len(pop_data) == 100000
    
    mean_X = pop_data.X.mean()
    total_X = pop_data.X.sum()
    cv_X = pop_data.X.var()**0.5/pop_data.X.mean()
    min_B = pop_data.B.min()
    max_B = pop_data.B.max()
    mean_Z = pop_data.Z.mean()
    mean_Y_improper = pop_data.Y[pop_data.Y > 0].mean()
    total_Y = pop_data.Y.sum()
    
    assert mean_X == np.float64(99.96861105724925)
    assert total_X == np.float64(9996861.105724925)
    assert cv_X == np.float64(0.49977019979482173)
    assert min_B == np.float64(0.4000001820563135)
    assert max_B == np.float64(0.5999945536761311)
    assert mean_Z == np.float64(0.09952)
    assert mean_Y_improper == np.float64(59.655189118224804)
    assert total_Y == np.float64(593688.4421045732)

Session Information

All of the files needed to reproduce these results can be downloaded from the Git repository https://github.com/wkingc/improper-payments-dgp-py.

-----
improper_payments_dgp       NA
matplotlib                  3.10.8
pandas                      3.0.0
session_info                v1.0.1
-----
Python 3.14.2 (main, Dec  5 2025, 16:49:16) [Clang 17.0.0 (clang-1700.6.3.2)]
macOS-26.2-arm64-arm-64bit-Mach-O
-----
Session information updated at 2026-01-30 12:52

References

Bain, Lee J, and Max Engelhardt. 1992. Introduction to Probability and Mathematical Statistics. Vol. 4. Duxbury Press Belmont, CA.