Posts

Showing posts from December, 2013

Multivariate Outlier Detection with R

Image
For multivariate outlier detection, R contains a package called "mvoutlier". The package contains number of multivariate outlier detection methods based on robust methods. There are many algorithms implemented in the package for identifying multivariate outliers in high dimensional large datasets including pcout [1], uni.plot [2], sign2 [1], symbol.plot [2]. All methods are multi-dimensional whereas symbol is two-dimensional. Detailed information for the package can be found in its manual [3]. A sample R Script using these methods is given below. It gets the name of the CSV file that contains the dataset as parameter. # parameter file: CSV file contains dataset # seperator: ; quote: " decimal symbol: , include row names # plots outliers to png images mvOutliers = function(file) {   data <- read.csv(file, header = TRUE, sep = ";", quote = "\"", dec = ",", row.names = 1);     fn = paste("Outlier-", substr(file,