Spis treści

Probability distributions

A real data distribution from a sample - empirical data distribution may be carried out in a mean of a ''frequency tables'' (by selecting Statistic menuDescriptive analysis)→Frequency tables). For example, a distribution of the amount of used free minutes by subscribers of some mobile network operator EXAMPLE (distribution.pqs file) performs the following table:

A graphical presentation of results included in a table is usually done using a histogram or a bar plot.

Such graph can be created by selecting Add graph option in the Frequency tables window.

Theoretical data distribution which is also called a probability distribution is usually presented graphically by means of a line graph. Such line is described by a function (mathematical model) and it is called a density function. You can replace the empirical distribution with the adequate theoretical distribution.

Note To replace an empirical distribution with the adequate theoretical distribution it is not enough to draw conclusions upon similarity of their shapes intuitively. To check it, you should use specially created compatibility tests.
The kind of probability distribution which is used the most often is a normal distribution (Gaussian distribution). Such distribution with a mean of 161.15 and a standard deviation 13.03 is presented by the data relating to the amount of used free minutes (EXAMPLE distribution.pqs file) .

Continuous probability distributions

  • Normal distribution which is also called the Gaussian distribution or a bell curve, is one of the most important distribution in statistics. It has very interesting mathematical features and occurs very often in nature. It is usually designated with $N(\mu,\sigma)$.

A density function is defined by: \begin{displaymath}
f(x,\mu,\sigma)=\frac{1}{\sqrt{2\pi}\sigma}\exp\bigg(-\frac{(x-\mu)^2}{2\sigma^2}\bigg), \label{r_normalny_fun}
\end{displaymath}

where:

$-\infty<x<+\infty$,

$\mu$ – an expected value of population (its measure is mean),

$\sigma$ – standard deviation.

\psset{xunit=1.25cm,yunit=8cm}
\begin{pspicture}(-3.5,-.1)(4.2,0.9)
\psaxes[Dy=0.1]{->}(0,0)(-4.5,0)(5,0.9)
\uput[-90](5,0){x}\uput[0](0,0.85){y}
\psGauss[linecolor=red, linewidth=2pt, mue=0, sigma=1]{-4}{4}%
\rput(1.5,0.27){\textcolor{red}{$N(0,1)$}}
\psGauss[linecolor=blue, linestyle=dotted, mue=1, sigma=1]{-4}{4}%
\rput(2.6,0.25){\textcolor{blue}{$N(1,1)$}}
\psGauss[linecolor=green,linestyle=dashed, mue=0, sigma=0.5]{-4}{4}%
\rput(1.1,0.6){\textcolor{green}{$N(0,4)$}}
\end{pspicture}

Normal distribution is a symmetrical distribution for a perpendicular line to axis of abscissae going through the points designating the mean, mode and median.

Normal distribution with a mean of $\mu=0$ and $\sigma=1$ ($N(0,1)$), is so called a standardised normal distribution.

  • t-Student distribution – the shape of t-Student distribution is similar to standardised normal distribution, but its tails are longer. The higher the number of degrees of freedom ($df$), the more similar the shape of t-Student distribution to normal distribution.

A density function is defined by: \begin{displaymath}
f(x,df)=\frac{\Gamma(\frac{df+1}{2})}{\Gamma(\frac{df}{2})\sqrt{df\pi}}\left(1+\frac{x^2}{df}\right)^{-\frac{df+1}{2}},
\end{displaymath}

where:

$-\infty<x<+\infty$,

$df$ – degrees of freedom (sample size is decreased by the number of limitations in given calculations),

$\Gamma$ is a Gamma function.

\psset{xunit=1.25cm,yunit=10cm}
\begin{pspicture}(-5,-0.1)(5,.5)
\psaxes[Dy=0.1]{->}(0,0)(-4.5,0)(5,0.5)
\uput[-90](5,0){x}\uput[0](0,0.45){y}
\psGauss[linecolor=red, linewidth=2pt, mue=0, sigma=1]{-4}{4}%
\rput(1.6,0.25){\textcolor{red}{$N(0,1)$}}
\psTDist[linecolor=blue,linestyle=dotted,nue=1]{-4}{4}
\rput(2.5,0.2){\textcolor{blue}{$T(df=1)$}}
\psTDist[linecolor=green,linestyle=dashed,nue=4]{-4}{4}
\rput(3,0.15){\textcolor{green}{$T(df=4)$}}
\end{pspicture}

  • Chi-square distribution, this is a right-skewed distribution with a shape depending on the number of degrees of freedom $df$. The higher the number of degrees of freedom, the more similar the shape of $\chi^2$ distribution to the normal distribution.

Density function is defined by: \begin{displaymath}
f(x,df)=\frac{1}{2^{\frac{df}{2}}\Gamma{\frac{df}{2}}}x^{\frac{df}{2}-1}e^{-\frac{x}{2}},
\end{displaymath}

where:

$x>0$,

$df$ – degrees of freedom (sample size is decreased by the number of limitations in given calculations),

$\Gamma$ is a Gamma function.

\psset{xunit=1.2cm,yunit=10cm,plotpoints=200}
\begin{pspicture*}(-0.75,-0.1)(9.5,.65)
\uput[-90](9.4,0){x}\uput[0](0,0.55){y}
\psChiIIDist[linewidth=1pt,linecolor=red, nue=1,]{0.01}{9}
\rput(1.8,0.4){\textcolor{red}{$\chi^2(df=1)$}}
\psChiIIDist[linewidth=1pt,linecolor=blue,linestyle=dotted, nue=5,]{0.01}{9}
\rput(4,0.2){\textcolor{blue}{$\chi^2(df=5)$}}
\psChiIIDist[linewidth=1pt,linecolor=green,linestyle=dashed, nue=10,]{0.01}{9}
\rput(8,0.15){\textcolor{green}{$\chi^2(df=10)$}}
\psaxes[Dy=0.1]{->}(0,0)(9.5,.6)
\end{pspicture*}

  • Fisher-Snedecor distribution, this is a distribution which has a right tail that is longer and a shape that depends on the number of degrees of freedom $df_1$ and $df_2$.

A density function is defined by: \begin{displaymath}
F(x,df_1,df_2)=\frac{\sqrt{\frac{(df_1x)^{df_1}d_2^{df_2}}{(df_1x+df_2)^{df_1+df_2}}}}{xB\left(\frac{df_1}{2},\frac{df_2}{2}\right)},
\end{displaymath}

where:

$x>0$,

$df_1$, $df_1$ – degrees of freedom (it is assumed that if $X$ i $Y$ are independent with a $\chi^2$ distribution with adequately $df_1$ and $df_2$ degrees of freedom, than $F=\frac{X/df_1}{Y/df_2}$ has a F Snedecor distribution $F(df_1,df_2)$),

$B$ is a Beta function.

\psset{xunit=2cm,yunit=10cm,plotpoints=100}
\begin{pspicture*}(-0.5,-0.07)(5.5,0.8)
\psFDist[linecolor=green,linestyle=dashed]{0.1}{5}
\rput(1,0.05){\textcolor{green}{$F(df_1=1,df_2=1)$}}
\psFDist[linecolor=red,nue=3,mue=12]{0.01}{5}
\rput(4,0.15){\textcolor{red}{$F(df_1=3,df_2=12)$}}
\psFDist[linecolor=blue,linestyle=dotted,nue=12,mue=3]{0.01}{5}
\rput(2,0.4){\textcolor{blue}{$F(df_1=12,df_2=3)$}}
\psaxes[Dy=0.1]{->}(0,0)(5,0.75)
\end{pspicture*}

2022/02/09 12:56

Probability distribution calculator

The area under a curve (density function) is $p$ probability of occurrence of all possible values of an analysed random variable. The whole area under a curve comes to $p=1$. If you want to analyse just a part of this area, you must put the border value, which is called the critical value or Statistic. To do this, you need to open the Probability distribution calculator window. In this window you can calculate not only a value of the area under the curve (p-value) of the given distribution on the basis of Statistic, but also Statistic value on the basis of p-value. To open the window of Probability distribution calculator, you need to select Probability distribution calculator from the StatisticsCalculators menu.

EXAMPLE Probability distribution calculator

Some mobile network operator did the research, which was supposed to show the usage of „free minutes” given to his clients on a pay-monthly contract. On the basis of the sample, which consists of 200 of the above-mentioned network clients (where the distribution of used free minutes is of the shape of normal distribution) is calculated the mean value $\overline{x}=161.15 min.$ and standard deviation $sd=13.03 min.$ We want to calculate the probability, that the chosen client used:

  1. 150 minutes or less,
  2. more than 150 minutes,
  3. the amount of minutes coming from the range $[\overline{x}- sd,\overline{x}+ sd] =[148.12min.,174.18min.]$,
  4. the amount of minutes out of the range $\overline{x}\pm sd$.

Open the Probability distribution calculator window, select Gaussian distribution and write the mean $\overline{x}=161.15min.$ and standard deviation $sd=13.03min.$ and select the option which indicates, that you are going to calculate the p- value.

(1)To calculate (using normal distribution (Gauss)) the probability that the client you have chosen used 150 free minutes or less, put the value of 150 in the Statistic field. Confirm all selected settings by clicking Calculate.

\psset{xunit=1.2cm,yunit=8cm}
\begin{pspicture}(-3.5,-.05)(4.2,0.4)
\psline{-}(-4,0)(4,0)
\psGauss[linecolor=blue, mue=0, sigma=1]{-4}{4}%
\pscustom[fillstyle=solid,fillcolor=red!30]{%
\psGauss[linewidth=1pt,mue=0, sigma=1]{-4}{-0.85572}%
\psline(-0.85572,0)(-4,0)}
\rput(2.4,0.25){\textcolor{blue}{$N(161.15,13.03)$}}
\rput(-0.85572,-0.05){\textcolor{blue}{150}}
\end{pspicture}

The obtained p-value is 0.193961.

Note

Similar calculations you can carry out on the basis of empirical distribution. The only thing you should do is to calculate a percentage of clients who use 150 minutes or less (example (\ref{tab_licznosci}) by using the Frequency tables window. In the analysed sample (which consists of 200 clients) there are 40 clients who use 150 minutes or less. It is 20% of the whole sample, so the probability you are looking for is $p=0.2$.

(2) To calculate the probability (using the normal distribution (Gauss)), that the client who you have chosen used more than 150 free minutes, you need to put the value of 150 in the Statistic field and than select the option 1 - (p-value). Confirm all the chosen settings by clicking Calculate.

\psset{xunit=1.2cm,yunit=8cm}
\begin{pspicture}(-3.5,-.05)(4.2,0.4)
\psline{-}(-4,0)(4,0)
\psGauss[linecolor=blue, mue=0, sigma=1]{-4}{4}%
\pscustom[fillstyle=solid,fillcolor=red!30]{%
\psline(-0.85572,0)(-0.85572,0)%
\psGauss[linewidth=1pt,mue=0, sigma=1]{-0.85572}{4}%
\psline(4,0)(-0.85572,0)}
\rput(2.4,0.25){\textcolor{blue}{$N(161.15,13.03)$}}
\rput(-0.85572,-0.05){\textcolor{blue}{150}}
\end{pspicture}

The obtained p-value is 0.806039.

(3) To calculate (using the normal distribution (Gauss)) a probability that the client you have chosen used free minutes which come from the range $[\overline{x}- sd,\overline{x}+ sd] =[148.12min.,174.18min.]$ in the Statistic field, put one of the final range values and than select the option two-sided. Confirm all the chosen settings by clicking Calculate.

\psset{xunit=1.2cm,yunit=8cm}
\begin{pspicture}(-3.5,-.05)(4.2,0.4)
\psline{-}(-4,0)(4,0)
\psGauss[linecolor=blue, mue=0, sigma=1]{-4}{4}%
\pscustom[fillstyle=solid,fillcolor=red!30]{%
\psline(-1,0)(-1,0)%
\psGauss[linewidth=1pt,mue=0, sigma=1]{-1}{1}%
\psline(1,0)(-1,0)}
\rput(2.4,0.25){\textcolor{blue}{$N(161.15,13.03)$}}
\rput(-1,-0.05){\textcolor{blue}{148.12}}
\rput(1,-0.05){\textcolor{blue}{174.18}}
\end{pspicture}

The obtained p-value is 0.682689.

(4) To calculate (using the normal distribution (Gauss)) a probability, that the client you have chosen used free minutes out of the range $[\overline{x}- sd,\overline{x}+ sd] =[148.12min.,174.18min.]$ in the Statistic field put one of the final range values and than select the option: two-sided and 1 - (p-value). Confirm all the chosen settings by clicking Calculate.

\psset{xunit=1.2cm,yunit=8cm}
\begin{pspicture}(-3.5,-.05)(4.2,0.4)
\psline{-}(-4,0)(4,0)
\psGauss[linecolor=blue, mue=0, sigma=1]{-4}{4}%
\pscustom[fillstyle=solid,fillcolor=red!30]{%
\psGauss[linewidth=1pt,mue=0, sigma=1]{-4}{-1}%
\psline(-1,0)(-4,0)}
\pscustom[fillstyle=solid,fillcolor=red!30]{%
\psline(1,0)(1,0)%
\psGauss[linewidth=1pt,mue=0, sigma=1]{1}{4}%
\psline(4,0)(1,0)}
\rput(2.4,0.25){\textcolor{blue}{$N(161.15,13.03)$}}\rput(-1,-0.05){\textcolor{blue}{148.12}}
\rput(1,-0.05){\textcolor{blue}{174.18}}
\end{pspicture}

The obtained p-value is 0.317311.

2022/02/09 12:56