There are also indexes of proximity that are used on a mixture of qualitative and quantitative variables. We will examine the Euclidean distance for quantitative variables, and some indexes of similarity for qualitative variables. 1 Euclidean distance Consider a data matrix containing only quantitative (or binary) variables. If x and y are rows from the data matrix then a function d(x, y) is said to be a distance between two observations if it satisfies the following properties: • • • • Non-negativity.

In general, 0 < Fi − Qi < Fi , i = 1, 2, . . , N − 1, with the differences increasing as maximum concentration is approached. The concentration index, denoted by R, is defined by the ratio between the −1 N −1 quantity N i=1 (Fi − Qi ) and its maximum value, equal to i=1 Fi . Thus, R= N −1 i=1 (Fi − Qi ) N −1 i=1 Fi and R assumes value 0 for minimal concentration and 1 for maximum concentration. 5 Measures of asymmetry In order to obtain an indication of the asymmetry of a distribution it may be sufficient to compare the mean and median.

This implies that the principal components are uncorrelated. The variance–covariance matrix between them is thus expressed by the diagonal matrix   λ1 0   .. Var(Y ) =  . λk 0 Consequently, the following ratio expresses the proportion of variability that is ‘maintained’ in the transformation from the original p variables to k < p principal components: k λi tr(VarY ) = i=1 . p tr(VarX) i=1 λi This equation expresses a cumulative measure of the quota of variability (and therefore of the statistical information) ‘reproduced’ by the first k components, with respect to the overall variability present in the original data matrix, as measured by the trace of the variance–covariance matrix.

