A real data distribution from a sample - empirical data distribution may be carried out in a mean of a ''frequency tables'' (by selecting Statistic menu
→Descriptive analysis
)→Frequency tables
). For example, a distribution of the amount of used free minutes by subscribers of some mobile network operator EXAMPLE (distribution.pqs file) performs the following table:
A graphical presentation of results included in a table is usually done using a histogram or a bar plot.
Such graph can be created by selecting Add graph
option in the Frequency tables
window.
Theoretical data distribution which is also called a probability distribution is usually presented graphically by means of a line graph. Such line is described by a function (mathematical model) and it is called a density function. You can replace the empirical distribution with the adequate theoretical distribution.
Note
To replace an empirical distribution with the adequate theoretical distribution it is not enough to draw conclusions upon similarity of their shapes intuitively. To check it, you should use specially created compatibility tests.
The kind of probability distribution which is used the most often is a normal distribution (Gaussian distribution). Such distribution with a mean of 161.15 and a standard deviation 13.03 is presented by the data relating to the amount of used free minutes (EXAMPLE distribution.pqs file) .
A density function is defined by:
where:
,
– an expected value of population (its measure is mean),
– standard deviation.
Normal distribution is a symmetrical distribution for a perpendicular line to axis of abscissae going through the points designating the mean, mode and median.
Normal distribution with a mean of and (), is so called a standardised normal distribution.
A density function is defined by:
where:
,
– degrees of freedom (sample size is decreased by the number of limitations in given calculations),
is a Gamma function.
Density function is defined by:
where:
,
– degrees of freedom (sample size is decreased by the number of limitations in given calculations),
is a Gamma function.
A density function is defined by:
where:
,
, – degrees of freedom (it is assumed that if i are independent with a distribution with adequately and degrees of freedom, than has a F Snedecor distribution ),
is a Beta function.
The area under a curve (density function) is probability of occurrence of all possible values of an analysed random variable. The whole area under a curve comes to . If you want to analyse just a part of this area, you must put the border value, which is called the critical value or Statistic
. To do this, you need to open the Probability distribution calculator
window. In this window you can calculate not only a value of the area under the curve (p-value)
of the given distribution on the basis of Statistic
, but also Statistic
value on the basis of p-value
. To open the window of Probability distribution calculator
, you need to select Probability distribution calculator
from the Statistics
→Calculators
menu.
EXAMPLE Probability distribution calculator
Some mobile network operator did the research, which was supposed to show the usage of „free minutes” given to his clients on a pay-monthly contract. On the basis of the sample, which consists of 200 of the above-mentioned network clients (where the distribution of used free minutes is of the shape of normal distribution) is calculated the mean value and standard deviation We want to calculate the probability, that the chosen client used:
Open the Probability distribution calculator
window, select Gaussian distribution
and write the mean
and standard deviation
and select the option which indicates, that you are going to calculate the p- value
.
(1)To calculate (using normal distribution (Gauss)) the probability that the client you have chosen used 150 free minutes or less, put the value of 150 in the Statistic
field. Confirm all selected settings by clicking Calculate
.
The obtained p-value
is 0.193961.
Note
Similar calculations you can carry out on the basis of empirical distribution. The only thing you should do is to calculate a percentage of clients who use 150 minutes or less (example (\ref{tab_licznosci}) by using the Frequency tables
window. In the analysed sample (which consists of 200 clients) there are 40 clients who use 150 minutes or less. It is 20% of the whole sample, so the probability you are looking for is .
(2) To calculate the probability (using the normal distribution (Gauss)), that the client who you have chosen used more than 150 free minutes, you need to put the value of 150 in the Statistic
field and than select the option 1 - (p-value)
. Confirm all the chosen settings by clicking Calculate
.
The obtained p-value
is 0.806039.
(3) To calculate (using the normal distribution (Gauss)) a probability that the client you have chosen used free minutes which come from the range in the Statistic
field, put one of the final range values and than select the option two-sided
. Confirm all the chosen settings by clicking Calculate
.
The obtained p-value
is 0.682689.
(4) To calculate (using the normal distribution (Gauss)) a probability, that the client you have chosen used free minutes out of the range in the Statistic
field put one of the final range values and than select the option: two-sided
and 1 - (p-value)
. Confirm all the chosen settings by clicking Calculate
.
The obtained p-value
is 0.317311.