2 min read

Machine Learning: Support Vector Machines

Support Vector Machines (SVMs) were designed to solve pattern classification problems such as optimal character recognition and face identification etc. but the application spread to function approximation, Geo and Environmental Sciences etc.

There are three main problems in machine learning, viz. Classification, Regression and Density Estimation. In all these cases the main goal is to learn a function (or hypothesis) from the training data.

Application of theory using R

Getting Data and loading packages in R

Data source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

library(e1071)
library(ggplot2)
data(iris)

The data set we are using:

str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Simple plot of Petal Length Vs Petal Width:

qplot(Petal.Length, Petal.Width, data = iris)

Plot by different Species:

qplot(Petal.Length, Petal.Width, data = iris , color = Species)

Running SVM and summarizing the result:

model1<- svm(Species ~ . , data=iris ,
             kernel = "linear")

summary(model1)
## 
## Call:
## svm(formula = Species ~ ., data = iris, kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  1 
## 
## Number of Support Vectors:  29
## 
##  ( 2 15 12 )
## 
## 
## Number of Classes:  3 
## 
## Levels: 
##  setosa versicolor virginica

PLotting and summarizing Predictions:

plot(model1, data=iris , 
     Petal.Width~Petal.Length,
     slice= list(Sepal.Width = 3 , Sepal.Length = 4))

 pred<- predict(model1, iris)

As the table shows, SVM predicted all of setosa, 46 of versicolor (out of 50) and 49 (out of 50) of virginica species.

table(Predicted = pred, Actual = iris$Species)
##             Actual
## Predicted    setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         46         1
##   virginica       0          4        49