from improper_payments_dgp import improper_payments_dgp
pop_data = improper_payments_dgp(mean_target = 100, cv_target = 1/2, A = 0, B = 1000, b = (0.4, 0.6), p_improper = 0.1, size = 100000, random_state = 123)A Data Generating Process for Improper Payments
Statistical Methods
The proposed data generating process for improper payments involves a mixture distribution.
Payments are generated using a truncated gamma distribution X_i \sim \Gamma_T(\alpha, \theta) (see https://www.kingcopeland.com/truncated-gamma-rvs-py/ for details). Practically speaking, unlike the gamma distribution, the first and second moments of the truncated gamma distribution don’t have closed-form solutions, so the values of \alpha and \theta are solved for numerically for a given E[X] and CV[X] = \frac{SD[X]}{E[X]}.
Improper payments are straightforward. Let Y_i = B_iX_iZ_i where B_i \sim Unif(a \in [0, 1], b \ge a \in [0, 1]) is the percentage of each payment that is improper, and Z_{i} \sim Bin(1,p), where 1 is the number of trials and p is the probability of observing an improper payment (Bain and Engelhardt 1992, 4:92–95, 109–10).
Example
Suppose we want to generate 100,000 payments with maximum payment amount of $1,000, mean payment amount of $100, and a standard devation of 1/2 the payment amount (e.g., coefficient of variation equal to 0.5). The percentage of each payment that is improper is between 40% and 60% of the payment amount, and the probability that each payment is improper is 10%.
We can accomplish this in Python using the improper_payments_dgp python package [(Copeland 2026)]. From this module, the function improper_payments_dgp simulates improper payment data. It accepts the following arguments:
mean_target: The target mean payment amount.cv_target: The target coefficient of variation (e.g., CV[X] = SD[X]/E[X]) for the payment amount.A: The minimum payment amount.B: The maximum payment amount.b: Bounds of the uniform distribution (0 \le b \le 1) for the percentage of each payment that is improper.p_improper: Probability that a given payment is improper.size: The number of payments to generate.random_state: The seed for random number generation to create reproducible results.- Set to an integer for consistent results.
- Set to
Noneif reproducibility is not required.
Below is the population data. The function returns a Pandas DataFrame with the following:
- X: Random variate(s) of a truncated gamma distribution for the payment amount.
- B: Random variate(s) of a uniform distribution for the percent of the payment that is improper.
- Z: Random variate(s) of a binomial distribution that indicate if a payment is improper.
- Y: Random variate(s) for the improper payment amount.
pop_data| X | B | Z | Y | |
|---|---|---|---|---|
| 0 | 116.246213 | 0.539294 | 0 | 0.0 |
| 1 | 35.021219 | 0.457228 | 0 | 0.0 |
| 2 | 59.890232 | 0.445370 | 0 | 0.0 |
| 3 | 55.471881 | 0.510263 | 0 | 0.0 |
| 4 | 54.394463 | 0.543894 | 0 | 0.0 |
| ... | ... | ... | ... | ... |
| 99995 | 133.513989 | 0.476092 | 0 | 0.0 |
| 99996 | 79.941630 | 0.504328 | 0 | 0.0 |
| 99997 | 91.298149 | 0.573550 | 0 | 0.0 |
| 99998 | 22.203274 | 0.513245 | 0 | 0.0 |
| 99999 | 92.776933 | 0.522579 | 0 | 0.0 |
100000 rows × 4 columns
print(f"The mean payment amount is ${pop_data.X.mean():.2f} with a total payment amount of ${pop_data.X.sum():,.2f}.\nThe coefficient of variation for payment amounts is {pop_data.X.var()**0.5/pop_data.X.mean():.2%}.\nThe minimum and maximum percentages of improper payments are {pop_data.B.min():.2%} and {pop_data.B.max():.2%}, respectively.\nThe probability of an improper payment is {pop_data.Z.mean():.2%}.\nThe mean improper payment amount (conditional on being improper) is ${pop_data.Y[pop_data.Y > 0].mean():.2f} with a total improper payment amount of ${pop_data.Y.sum():,.2f}.")The mean payment amount is $99.97 with a total payment amount of $9,996,861.11.
The coefficient of variation for payment amounts is 49.98%.
The minimum and maximum percentages of improper payments are 40.00% and 60.00%, respectively.
The probability of an improper payment is 9.95%.
The mean improper payment amount (conditional on being improper) is $59.66 with a total improper payment amount of $593,688.44.
import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
plt.hist(pop_data.Y[pop_data.Y > 0], bins = "fd", density = False, alpha = 0.6, color = "steelblue", edgecolor = 'white')
plt.title(f"Distribution of Improper Payments")
plt.xlabel("Improper Payment Amount ($)")
plt.ylabel("Frequency")
plt.grid(True, alpha = 0.2)
plt.show()Session Information
All of the files needed to reproduce these results can be downloaded from the Git repository https://github.com/wkingc/improper-payments-dgp-py.
-----
improper_payments_dgp 0.1.6
matplotlib 3.10.8
pandas 3.0.0
session_info v1.0.1
-----
Python 3.14.2 (main, Dec 5 2025, 16:49:16) [Clang 17.0.0 (clang-1700.6.3.2)]
macOS-26.3-arm64-arm-64bit-Mach-O
-----
Session information updated at 2026-02-27 17:12