Suppose that using a diagnostic test we calculate the occurrence of a particular feature (most often disease) and know the gold-standard, so we know that the feature really occurs among the examined people. On the basis of these information, we can build a contingency table:
where:
TP – true positive
FP – false positive
FN – false negative
TN – true negative
For such a table we can calculate the following measurements.
Sensitivity and specificity of diagnostic test
Every diagnostic test, in some cases, can obtain results different than actual results, for example a diagnostic test, basing on the obtained parameters, classifies a patient to the group of people suffering from a particular disease, or to the group of healthy people. In reality, the number of people approved for the above groups by the test may differ from the number of people genuinely ill and genuinely healthy.
There are two evaluation measurements of the test accuracy. They are:
Confidence interval is built on the basis of the Clopper-Pearson method for a single proportion.
Confidence interval is built on the basis of the Clopper-Pearson method for a single proportion.
Positive predictive values, negative predictive values and prevalence rate
Confidence interval is built on the basis of the Clopper-Pearson method for a single proportion.
Confidence interval is built on the basis of the Clopper-Pearson method for a single proportion.
Positive and negative predictive values depend on the prevalence rate.
Prevalence – probability of disease in the population for which the diagnostic test was conducted.
Confidence interval is built on the basis of the Clopper-Pearson method for a single proportion.
Confidence interval for is built on the basis of the standard error:
Confidence interval for is built on the basis of the standard error:
Confidence interval is built on the basis of the Clopper-Pearson method for a single proportion.
Confidence interval for is built on the basis of the standard error:
The settings window with the diagnostic tests
can be opened in Advanced stistics
menu →Diagnostic tests
→ Diagnostic tests
EXAMPLE (mammography.pqs file)
Mammography is one of the most popular screening tests which enables the detection of breast cancer. The following study has been carried out on the group of 250 people, so-called „asymptomatic” women at the age from 40 to 50. Mammography can detect an outbreak of cancer smaller than 5 mm and enables to note the change which is not a nodule yet but a change in the structure of tissues.
We will calculate the values enabling the assessment of the performed diagnostic test.
The diagnostic test is used for differentiating objects with a given feature (marked as (+), e.g. ill people) from objects without the feature (marked as (–), e.g. healthy people). For the diagnostic test to be considered valuable, it should yield a relatively small number of wrong classifications. If the test is based on a dichotomous variable then the proper tool for the evaluation of the quality of the test is the analysis of a contingency table of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) values. Most frequently, though, diagnostic tests are based on continuous variables or ordered categorical variables. In such a situation the proper means of evaluating the capability of the test for differentiating (+) and (–) are ROC (Receiver Operating Characteristic) curves.
It is frequently observed that the greater the value of the diagnostic variable, the greater the odds of occurrence of the studied phenomenon, or the other way round: the smaller the value of the diagnostic variable, the smaller the odds of occurrence of the studied phenomenon. Then, with the use of ROC curves, the choice of the optimum cut-off is made, i.e. the choice of a certain value of the diagnostic variable which best separates the studied statistical population into two groups: (+) in which the given phenomenon occurs and (–) in which the given phenomenon does not occur.
When, on the basis of the studies of the same objects, two or more ROC curves are constructed, one can compare the curves with regard to the quality of classification.
Let us assume that we have at our disposal a sample of elements, in which each object has one of the values of the diagnostic variable. Each of the received values of the diagnostic variable becomes the cut-off .
If the diagnostic variable is:
For each of the cut-offs we define true positive (TP), true negative (TN), false positive (FP), and false negative (FN) values.
On the basis of those values each cut-off can be further described by means of sensitivity and specificity, positive predictive values(PPV), negative predictive values (NPV), positive result likelihood ratio (LR+), negative result likelihood ratio (LR-), and accuracy (Acc).
Note
The PQStat program computes the prevalence coefficient on the basis of the sample. The computed prevalence coefficient will reflect the occurrence of the studied phenomenon (illness) in the population in the case of screening of a large sample representing the population. If only people with suspected illness are directed to medical examinations, then the computed prevalence coefficient for them can be much higher than the prevalence coefficient for the population.
Because both the positive and negative predictive value depend on the prevalence coefficient, when the coefficient for the population is known a priori, we can use it to compute, for each cut-off , corrected predictive values according to Bayes's formulas:
where:
- the prevalence coefficient put in by the user, the so-called pre-test probability of disease
The ROC curve is created on the basis of the calculated values of sensitivity and specificity. On the abscissa axis the =1-specificity is placed, and on the ordinate axis =sensitivity. The points obtained in that manner are linked. The constructed curve, especially the area under the curve, presents the classification quality of the analyzed diagnostic variable. When the ROC curve coincides with the diagonal , then the decision made on the basis of the diagnostic variable is as good as the random distribution of studied objects into group (+) and group (–).
AUC (area under curve) – the size of the area under the ROC curve falls within . The greater the field the more exact the classification of the objects in group (+) and group (–) on the basis of the analyzed diagnostic variable. Therefore, that diagnostic variable can be even more useful as a classifier. The area , error and confidence interval for AUC are calculated on the basis of:
For the classification to be better than random distribution of objects into to classes, the area under the ROC curve should be significantly larger than the area under the line , i.e. than 0.5.
Hypotheses:
The test statistics has the form presented below:
where:
,
– size of the sample (+) in which the given phenomenon occurs,
– size of the sample (–), in which the given phenomenon does not occur.
The statistic asymptotically (for large sample sizes) has the normal distribution.
The p-value, designated on the basis of the test statistic, is compared with the significance level :
In addition, when we assume that the diagnostic parameter forms a high field (AUC), we can select the optimal cut-off point.
EXAMPLE (acteriemia.pqs file)
The point which is looked for is a certain value of the diagnostic variable, which provides the optimum separation of the studied population into to groups: (+) in which the given phenomenon occurs and (–) in which the given phenomenon does not occur. The selection of the optimum cut-off is not easy because it requires specialist knowledge about the topic of the study. For example, different cut-offs will be required in, on the one hand, a test used for screening of a large group of people, e.g. for a mammography study, and, on the other hand, in invasive studies conducted for the purpose of confirming an earlier suspicion, e.g. in histopathology. With the help of an advanced mathematical apparatus we can find a cut-off which will be the most useful from the perspective of mathematics.
PQStat allows you to select the optimal cut-off point by:
Errors which can be made when classifying the studied objects as belonging to group (+) and group (–) are false positive results () and false negative results (). If committing those errors is equally costly (ethical, financial, and other costs), then in the field Cost FP
and in the field Cost FN
we enter the same positive value – usually 1. However, if we come to the conclusion that one type of error is encumbered with a greater cost than the other one, then we will assign appropriately greater weight to it.
The optimum cut-off value is calculated on the basis of sensitivity, specificity, and with the help of value – slope of the tangent line to the ROC curve. The slope angle is defined in relation to two values: the costs of wrong decisions and the prevalence coefficient. Normally the costs of wrong decisions have the value 1 and the prevalence coefficient is estimated from the sample. Knowing, a priori, the prevalence coefficient () and the costs of wrong decisions, the user can influence the value and, consequently, the search for an optimum cut-off. As a result, the optimum cut-off is determined to be such a value of the diagnostic variable for which the formula:
reaches the minimum (Zweig M.H. 19935)).
The optimum cut-off point of the diagnostic variable, selected as described above, will finally be marked on the ROC curve.
The point marked on the graph is the minimum of the function presented above.
The optimal cut-off point of the diagnostic variable thus selected will eventually be marked on the ROC curve plot.
The optimal cut-off point of the diagnostic variable thus selected will eventually be marked on the ROC curve plot.
The point marked on the graph is the minimum of the function presented above.
The window with settings for ROC analysis
is accessed via the menu Advanced statistics
→ Diagnostic tests
→ROC curve
.
EXAMPLE (file bacteriemia.pqs)
Persistent high fever in an infant or a small child without clearly diagnosed reasons is a premise for testing for bacteremia. The most useful and reliable parameters for screening and monitoring bacterial infections are the following indicators:
It is assumed that in a healthy infant or a small child WBC should not exceed 15 thousand/ and PCT should be lower than 0.5 ng/ml.
The sample values of those indicators for 136 children of up to 3 years old with persistent fever is presented in the table fragment below:
One method of analyzing the PCT indicator is transforming it into a dichotomous variable by selecting a cut-off (e.g. =0.5 ng/ml) above which the study is considered to be „positive”. The level of adequacy of such a division will be indicated by the value of sensitivity and specificity. We want to use a more complex approach, that is, calculate the sensitivity and specificity not only for one value but for each PCT value obtained in the sample - which means constructing a ROC curve. On the basis of the information obtained in that manner we want to check if the PTC indicator is indeed useful for diagnosing bacteremia. If so, then we want to check what is the optimal cut-off above which we can consider the study to be „positive” – detecting bacteremia.
In order to check if PTC is really useful for diagnosing bacteremia we will calculate the size of the area under the ROC curve and verify the hypothesis that:
As bacteremia is accompanied by an increased PCT level, in the test options window we will consider the indicator to be a stimulant
. In the state variable
we have to define which value in the bacteremia column determines its presence, then we select „yes”. Apart from the result of the statistical test, in the report we can find an exact description of every possible cut-off.
The calculated size of the area under the ROC curve is AUC=0.889. Therefore, on the basis of the adopted level , based on the obtained value p<0.0001 we assume that diagnosing bacteremia with the use of the PCT indicator is indeed more useful than a random distribution of patients into 2 groups: suffering from bacteremia and not suffering from it. Therefore, we return to the analysis (button ) to define the optimal cut-off.
The algorithm of searching for the optimal cut-off takes into account the costs of wrong decisions and the prevalence coefficient.
FN cost - wrong diagnosis
is the cost of assuming that the patient does not suffer from bacteremia although in reality he or she is suffering from it (costs of a falsely negative decision)FP cost - wrong diagnosis
, is the cost of assuming that the patient suffers from bacteremia although in reality he or she is not suffering from it (costs of a falsely positive decision)As the FN costs are much more serious than the FP costs, we enter a greater value in field one than in field two. We decided the value would be 5.
The PCT value is to be used in screening so we do not give the prevalence coefficient for the population (a priori prevalence coefficient
) which is very low but we use the estimated coefficient from the sample. We do so in order not to move the cut-off of the PCT value too high and not to increase the number of falsely negative results.
The optimal PCT cut-off determined in this way is 1.819. For this point sensitivity=0.85 and specificity=0.96.
Another method of selecting the cut-off is the anlysis of the costs graph and of the sensitivity intersection graph:
The analysis of the costs graph shows that the minimum of the costs of wrong decisions lies at PCT=1.819. The value of sensitivity and specificity is similar at PCT=1.071.
Very often the aim of studies is the comparison of the size of the area under the ROC curve () with the area under another ROC curve (). The ROC curve with a greater area usually allows a more precise classification of objects. Methods for comparing the areas depend on the model of the study.
Hypotheses:
The test statistics has the form presented below:
where:
, and the standard error of the difference in areas are calculated on the basis of the nonparametric method proposed by DeLong (DeLong E.R. et al., 19887), Hanley J.A., and Hajian-Tilaki K.O. 19978))
Statistics has (for large sizes) asymptotic normal distribution.
The p-value, designated on the basis of the test statistic, is compared with the significance level :
The window with settings for comparing dependent ROC curves
is accessed via the menu Advanced statistics
→ Diagnostic tests
→ Dependent ROC Curves – comparison
.
Hypotheses:
Test statistics (Hanley J.A. and McNeil M.D. 19839)) has the form:
where:
, and standard errors of areas , are calculated on the basis of:
Statistics has (for large sizes) asymptotic normal distribution.
The p-value, designated on the basis of the test statistic, is compared with the significance level :
The window with settings for comparing independent ROC curves
is accessed via the menu Advanced statistics
→Diagnostic tests
→Independent ROC Curves – comparison
EXAMPLE cont. (bacteriemia.pqs file)
We will make 2 comparisons:
Both parameters, WBC and PCT, are stimulants (in bacteremia their values are high). In the course of the comparison of the diagnostic value of those parameters we verify the following hypotheses:
The calculated ares are , . On the basis of the adopted level , based on the obtained value =0.13032 we conclude that we cannot determine which of the parameters: WBC or PCT is better for diagnosing bacteremia.
The calculated areas are , . Therefore, on the basis of the adopted level , based on the obtained value =0.6372 we conclude that we cannot select the sex for which PCT parameter is better for diagnosing bacteremia.