Tutorial 1: Deriving Lognormal statistics from measurements¶
This tutorial shows how to derive parameters of the Lognormal
distribution using the Method of Moments, and by doing statistics on the
logarithm of the values
Step 1: Importing the dataset, and interpreting the data.¶
In the next cells, we import the python packages necessary to run this
tutorial. If the packages are not available, install them in your Python
environment using pip install <package_name>
import numpy as np
import scipy.stats as stats
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Import the data using the Pandas package, and show the first 5 entries of the dataframe with the measurements
# The headers are automatically taken as column names.
df = pd.read_csv("input_volumetricweight_testresults.txt", sep=",")
df
VolWeight | |
---|---|
0 | 17.17 |
1 | 18.16 |
2 | 19.05 |
3 | 17.50 |
4 | 17.57 |
5 | 20.26 |
6 | 18.64 |
7 | 18.35 |
8 | 17.18 |
9 | 15.58 |
10 | 17.35 |
11 | 19.52 |
12 | 22.01 |
13 | 17.36 |
14 | 21.16 |
Get our measurements in a numpy array x
, and plot the histogram of
the data.
x = df['VolWeight'].to_numpy()
sns.histplot(x, bins=5)
plt.show()

Step 2: Sample mean and sample standard deviation of the data set¶
Step 2a: Calculate the mean and standard deviation of the ‘real’ values¶
# First set up a function to calculate.
def calculate_mean_and_std(x):
sample_mean = np.mean(x)
sample_std = np.std(x, ddof=1)
cov = sample_std/sample_mean
print('Sample mean = \t\t\t%0.2f'%(sample_mean))
print('Sample std = \t\t\t%0.3f'%(sample_std))
print('Coefficient of variation cov = \t%0.3f'%(cov))
return sample_mean, sample_std
# Now calculate the mean and std with that function
print('Mean and standard deviation of x')
mu, sigma = calculate_mean_and_std(x)
Mean and standard deviation of x
Sample mean = 18.46
Sample std = 1.701
Coefficient of variation cov = 0.092
Step 2b: Calculate the mean and standard deviation of the logarithmic values¶
# Here we do almost exactly the same, however, we
# calculate the mean and std of log(x)
print('Mean and standard deviation of ln(x)')
m, s = calculate_mean_and_std( np.log(x) )
Mean and standard deviation of ln(x)
Sample mean = 2.91
Sample std = 0.090
Coefficient of variation cov = 0.031
Step 3: Fitted Lognormal Distribution using method of moments (MoM)¶
First we introduce two functions.
The first
lognormal_parameters_from_mu_sigma(mu_x, sigma_x, shift)
transforms the mean and standard deviation of x (i.e. the first and second moment, respectively) into the parameters of the Lognormal distribution.The second
mu_sigma_from_lognormal_parameters(mu_y, sigma_y, shift)
does the exact opposite: it transforms the parameters of the Lognormal distribution into the mean and standard deviation of x.
def lognormal_parameters_from_mu_sigma(mu, sigma, shift=0):
"""
Convention: a Lognormal distribution of X,
means that Y = ln(X) is normally distributed.
INPUT
mu = mean(X)
sigma = mean(X)
shift = default 0
OUTPUT
m = mean(Y)
s = stdev(Y)
shift
"""
m = np.log( (mu - shift)**2 / np.sqrt(sigma**2 + (mu - shift)**2) )
s = np.sqrt( np.log(sigma**2 / (mu - shift)**2 + 1))
return m, s, shift
def mu_sigma_from_lognormal_parameters(m,s,shift=0):
"""
Convention: a Lognormal distribution of X,
means that Y = ln(X) is normally distributed.
INPUT
m = mean(Y)
s = std(Y)
shift = default 0
OUTPUT
mu = mean(X)
sigma = mean(X)
shift = default 0
"""
mu = np.exp(m + s**2 / 2) + shift
sigma = np.sqrt((mu - shift)**2 * (np.exp(s**2) - 1))
return mu, sigma, shift
To transform the mean and standard deviation (the first and second moment, respectively) into the parameters of the Lognormal distribution, we call the function. Then we print the result.
# Assume the lognormal is not shifted (shift=0)
shift=0
m, s, shift = lognormal_parameters_from_mu_sigma(mu, sigma, shift)
print('Location, scale, and shift parameters of the\
Lognormal distribution (Y=log(X)) are: \n\
m_Y = \t\t%0.2f\n\
s_Y = \t\t%0.3f\n\
shift_Y =\t%0.1f '%(m,s,shift))
Location, scale, and shift parameters of theLognormal distribution (Y=log(X)) are:
m_Y = 2.91
s_Y = 0.092
shift_Y = 0.0
The next step is to check the other formula for ‘the other way around’ (parameters of the Lognormal distribution -> mean and standard deviation).
# Assume the lognormal is not shifted (shift=0)
shift=0
mu, sigma, shift = mu_sigma_from_lognormal_parameters(m,s,shift)
print('Mean, standard deviation, and shift parameters of the\
Lognormal distribution are:\n\
mu_X = \t\t%0.2f\n\
sigma_X = \t%0.3f\n\
shift_X =\t%0.1f \n'%(mu,sigma,shift))
Mean, standard deviation, and shift parameters of theLognormal distribution are:
mu_X = 18.46
sigma_X = 1.701
shift_X = 0.0
Step 3c: Visual inspection of the results¶
Now we can visulize the results. We first set up two functions for drawing the pdf of a normal and a lognormal. Then we call this function to make the plots we want.
# First set up
def plot_pdf_normal(ax, x_range, mu, sigma):
'''
Calculate the pdf values using the norm.pdf function.
Note that python uses the following terminology:
location (loc) = mean value
scale (scale) = standard deviation
'''
pdf_values = stats.norm.pdf(x_range, loc=mu, scale=sigma)
ax.plot(x_range, pdf_values, label='PDF normal distribution')
ax.set_label('Probability density (pdf)')
def plot_pdf_lognormal(ax, x_range, m, s, shift=0, label=''):
'''
Calculate the pdf values using the lognorm.pdf function.
Note that python uses the following terminology:
loc (location) = shift
scale (scale) = median value (where cdf=0.5 ) -> exp(m)
s (standard deviation) = s
'''
pdf_values = stats.lognorm.pdf(x_range, s=s, loc=shift, scale=np.exp(m))
ax.plot(x_range, pdf_values, label='PDF lognormal distribution '+label)
ax.set_label('Probability density (pdf)')
ax.set_ylim(0)
ax.legend(loc=1)
# Plotting the histogram of the data
ax = sns.histplot(df, bins=5, stat='density')
# Set up an x range for plotting
x_range = np.linspace(14,23,200)
# Plot the pdf of the normal and lognormal distribution for
# the xrange, using the above function.
plot_pdf_normal(ax, x_range, mu, sigma)
plot_pdf_lognormal(ax, x_range, m, s)

Optional step 3d: Derive parameters for a shifted Lognormal distribution, with all values shifted to 14¶
Note: this is an advanced step to show how a (manually shifted) lognormal distribution could be derived. To this end, we first shift all our data points ‘to the left’ with a value of 14 (assuming 14 is some kind of physical lower limit for this parameter). Then we derive the statistics for the lognormal parameter, and parse the shift=14 parameter to the plots.
# We assume 14 is a physical lower limit for our volumetric weight parameter
shift = 14
# Shift our measurements and derive the lognormal statistics.
m_, s_ = calculate_mean_and_std( np.log(x - shift) )
# Plot the histogram and the pdf of a shifted Lognormal distribution
ax = sns.histplot(df, bins=5, stat='density')
plot_pdf_normal(ax, x_range, mu, sigma)
plot_pdf_lognormal(ax, x_range, m_, s_, shift)
Sample mean = 1.42
Sample std = 0.401
Coefficient of variation cov = 0.282

Step 4: Characteristic values¶
Step 4a: Calculate the characteristic value with the Student-T factor.¶
The number of measurements influences the statistical uncertainty of our estimated mean and standard deviation. Therefore we apply the Student-T factor to account for statistical uncertainty due to a small number of samples. The characteristic value is calculated as follows:
with
n = len(df)
student_t_factor = stats.t.ppf(0.05, df=n-1) # df = degrees of freedom, which is n-1
print('t-value \t\t%0.2f'%(student_t_factor))
# The 5%-characteristic value, using the t distributions, including
# the statistical uncertainty in the estimated mean (s/sqrt(n))
# Note the + sign before the student_t_factor, because student_t_factor is negative
characteristic_value = np.exp(m + student_t_factor * s * np.sqrt( 1 + 1 / n ) )
print("5p-char_value = \t%0.2f"%(characteristic_value))
t-value -1.76
5p-char_value = 15.55
Step 4b: Calculate the characteristic value with and without spatial avaraging¶
To account for spatial averaging the
Default assumptions on this factor are: -
# Allocate a list to store multiple outcomes
stored_values = []
# Calculate the characteristic value for different choises of spatial averaging:
gamma2_values = [1.0, 0.25, 0.0];
for gamma2 in gamma2_values:
characteristic_value = np.exp(m + student_t_factor * s * np.sqrt( gamma2 + 1 / n ) )
print("gamma2: 5p-char_value = \t%0.2f"%(characteristic_value))
stored_values.append(characteristic_value)
gamma2: 5p-char_value = 15.55
gamma2: 5p-char_value = 16.78
gamma2: 5p-char_value = 17.63
Step 4c: Visually checking the results¶
The results are visually checked with the histogram and the pdf’s, showing the calculated characteristic value for the different asssumptions with a black vertical line and a colored dot.
ax = sns.histplot(df, bins=5, stat='density')
plot_pdf_normal(ax, x_range, mu, sigma)
plot_pdf_lognormal(ax, x_range, m, s)
for gamma2, value in zip( gamma2_values, stored_values):
plt.vlines(value, ymin=0, ymax= 0.1, color='k')
plt.scatter(value, 0.10, label="$\Gamma ^{2}=%s$"%(gamma2) )
plt.legend(fontsize='small', bbox_to_anchor=(1,0.95))
plt.show()

Step 5: Get the inputs for D-Stability, taking into account the statistical uncertainty¶
To correct the Lornormal standard deviation for the statistical uncertainty, we increase the std with a factor.
This factor equals
Step 5a: For the uncertainty in the point values¶
correction_factor = student_t_factor / -1.645
s_point_incl_stat = s * correction_factor * np.sqrt(1 + 1 / n)
mu, sigma, shift = mu_sigma_from_lognormal_parameters(m,
s_point_incl_stat,
shift=0)
print('Mean, standard deviation, and shift parameters of the\
Lognormal distribution \n\
For D-Stability input:\n\
mu_X = \t\t%0.2f\n\
sigma_X = \t%0.3f\n\
shift_X =\t%0.1f \n'%(mu,sigma,shift))
Mean, standard deviation, and shift parameters of theLognormal distribution
For D-Stability input:
mu_X = 18.47
sigma_X = 1.883
shift_X = 0.0
Step 5b: For the uncertainty in the regional average¶
s_regional_average_incl_stat = s * correction_factor * np.sqrt(0.25 + 1 / n)
mu, sigma, shift = mu_sigma_from_lognormal_parameters(m,
s_regional_average_incl_stat,
shift=0)
print('Mean, standard deviation, and shift parameters of the\
Lognormal distribution \n\
For D-Stability input:\n\
mu_X = \t\t%0.2f\n\
sigma_X = \t%0.3f\n\
shift_X =\t%0.1f \n'%(mu,sigma,shift))
Mean, standard deviation, and shift parameters of theLognormal distribution
For D-Stability input:
mu_X = 18.41
sigma_X = 1.021
shift_X = 0.0
Step 5c: For the uncertainty in the spatial average¶
s_spatial_average_incl_stat = s * correction_factor * np.sqrt(1 / n)
mu, sigma, shift = mu_sigma_from_lognormal_parameters(m,
s_spatial_average_incl_stat,
shift=0)
print('Mean, standard deviation, and shift parameters of the\
Lognormal distribution \n\
For D-Stability input:\n\
mu_X = \t\t%0.2f\n\
sigma_X = \t%0.3f\n\
shift_X =\t%0.1f \n'%(mu,sigma,shift))
Mean, standard deviation, and shift parameters of theLognormal distribution
For D-Stability input:
mu_X = 18.39
sigma_X = 0.467
shift_X = 0.0
Step 5d: Visualize the results¶
ax = sns.histplot(df, bins=5, stat='density')
plot_pdf_lognormal(ax, x_range, m, s_point_incl_stat,
label='point statistics')
plot_pdf_lognormal(ax, x_range, m, s_regional_average_incl_stat,
label='regional average')
plot_pdf_lognormal(ax, x_range, m, s_spatial_average_incl_stat,
label='spatial average')
colors=['tab:blue', 'tab:orange', 'tab:green']
for i, (gamma2, value) in enumerate(zip( gamma2_values, stored_values)):
plt.vlines(value, ymin=0, ymax=0.5, color=colors[i], label="Characteristic value with $\Gamma ^{2}=%s$"%(gamma2) )
plt.legend(bbox_to_anchor=(1.05,1))
plt.ylim([0,1])
plt.show()
