To form the clusters, Ward’s (1963) procedure to calculate the distance between clusters was used. The procedure forms partitions in a hierarchical manner, starting from the largest number of clusters possible (i.e. all bank/years in a separate cluster) and merging clusters by minimising the within-cluster sum-of-squared-errors for any given number of clusters. Several studies found that the Ward clustering methodology performs better than other clustering procedures for instruments that involve few outliers and in the presence of overlaps.[1]

One of the key problems often encountered in clustering is the presence of missing values. When a particular observation has one or more missing instrument values, it has to be dropped from the cluster analysis, since the similarity to other bank-year observations cannot be determined. The sample used in the Monitor contains such cases, despite efforts to choose indicators with high coverage ratios. In order to accommodate the entire sample of observations, when the ‘intangible assets’ and ‘negative carrying values of derivative exposures’ were not reported, they were assumed to be zero in the calculation of ‘Trading assets’, ‘Debt liabilities’ and ‘Derivative exposures,’ since banks are not required to report both balance sheet items unless significant.

All the clustering procedures were conducted using SAS’s built-in and user-contributed functions.

To diagnose the appropriate number of clusters, Calinski & Harabasz’s (1974) pseudo-F index was used as the primary ‘stopping rule’. The index is a sample estimate of the ratio of between-cluster variance to within-cluster variance.[2] The configuration with the greatest pseudo-F value was chosen as the most distinct clustering. The results show that the pseudo-F indices attain a single maximum, pointing to the five-cluster configuration as the most distinct one. The number of clusters is confirmed by alternative stopping rules, namely the Semi Partial R-Squared measure, the Cubic Clustering Criterion and the Sum of Squares Between.

[1] See Milligan (1981) and references therein for an assessment of different clustering methods.

[2] Evaluating a variety of cluster stopping rules, Milligan & Cooper (1985) single out the Calinski and Harabasz index as the best and most consistent rule, identifying the sought configurations correctly in over 90% of all cases in simulations.

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text. captcha txt