Comparative Analysis of Clustering Techniques for Classification of Philippine Climatological Rainfall

Different methods of clustering are explored to determine the different climate types in the Philippines. This study aims to find a way to reduce data that is varying in both space and time, as in the case of rainfall data, in order to classify areas with similar climatic rainfall patterns. A similar study was done in the early part of the 20th century by Fr. Coronas. However, changing climatic conditions necessitate an update of the Coronas Climate Atlas, as it is commonly known. With the advent of new technologies, new methods can
be used in the analysis of data.

The data used from the study came from the Data Distribution Center of the Intergovernmental Panel on Climate Change. The dataset consists of global decadal and climatological monthly means, with a 0.5o x 0.5o resolution. For the purposes of this study, precipitation data for the Philippines is extracted. Despite the country’s relatively small area, its location and the spatial and temporal diversity of prevailing conditions (such as the monsoons), account for its interesting climate.

To determine the basic climate types in the Philippines, and classify the areas according to these climate types, two data reduction and clustering techniques are studied. The first method is a statistical tool used mainly for air quality monitoring and modeling called Positive Matrix Factorization (PMF), a method wherein the kind of sources of air pollution or particulate matter are detected based on the elemental signatures derived from the samples. This method was used with minimal revisions to apply to this study. The climate types (sources) were determined based on the monthly patterns (elemental signatures). In this method, it was determined that four factors is the best number of factors to use.

The second technique is a statistical method called Empirical Orthogonal Functions (EOF). This is an extension of Principal Component Analysis wherein the main objective is the reduction of spatio-temporal data, while still maintaining the important features and characteristics of the data. Temporal data, expressed as points in space with each dimension corresponding to each grid point in the maps. The dataset was reduced by expressing each data point in terms of a new coordinate system, chosen with respect to the location of dominant clusters in the data set. K-means clustering was then applied to the data set to identify four climate types.

To find out the best method to use in the identification of climatological rainfall patterns in the Philippines, the representation of the grid points by each type was tested. Three of the four climate types identified by the two methods were quite similar. After subsequent testing however, the first method, PMF, was determined to be slightly better in representing climatological patterns of rainfall.