Daily Archives: December 10, 2016

Logistic Regression by Scikit

Open Boundary

Assume there are 3 lines separating 3 areas:
line1: y=x , (x <= 10)
line2: y=20-x,  (x>=10)
line3: x=10, y>=10
logisticr_initial_area

For each area, there are some data:
A:
[-3, -2, 0]
[-3, 1, 0]
[0, 40, 0]
[8, 90, 0]
[8, 10, 0]
[3, 40, 0]

B:
[11, 11, 1]
[15, 6, 1]
[13, 10, 1]
[13, 19, 1]
[23, -1, 1]
[25, -2, 1]

C:
[-2, -3, 2]
[3, 2, 2]
[10, 9, 2]
[15, 10, 2 ]
[20, -1, 2]
[25, -10, 2]

Run below code, we expect that [10, 5] will output value 2. Because point (10, 5) belongs to area B, which has value 2.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, datasets

X = np.array([
    [-3, -2],
    [-3, 1],
    [0, 4],
    [8, 9],
    [8, 10],
    [3, 4],
    [11, 11],
    [15, 6],
    [13, 10],
    [13, 19],
    [23, -1],
    [25, -2],
    [-2, -3],
    [3, 2],
    [10, 9],
    [15, 10],
    [20, -1],
    [25, -10]
])

Y = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2])

logreg = linear_model.LogisticRegression(C=1e5)

# we create an instance of Neighbours Classifier and fit the data.
logreg.fit(X, Y)

print logreg.predict(np.array([[10, 5]]))

Run below code, it outputs the boundary and region:

h = .02  # step size in the mesh

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())

plt.show()

logisticr_initial_area2

Enclosed Boundary

Suppose we have A, B parts
A: x^2 + y^2 <=2
B: x^2 + y^2 > 2

polynomial_circle

To solve this, we can use Polynomial Logistic Regression method. This is similar to Polynomial Linear Regression.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model

# Area A: x^2 + y^2 <= 2
# Area B: x^2 + y^2 > 2
X = np.array([
    # training set for A
    [0, 0],
    [1, 0],
    [1.5, -0.2],
    [1, 1.5],
    [-1, -1.5],
    [-1.9, 0],
    [-0.5, -1],
    [0, 1.9],
    [0, -1.9],
    [-1.9, 0],
    [-1.5, 0.5],
    # training set for B
    [2, 2.5],
    [2, 3],
    [1, 5],
    [1, -5],
    [-3, 0],
    [-2, -0.1],
    [-2, 1],
    [0, -2.1],
    [0, 2.1],
    [-2.1, 0],
    [-0.6, -2]
])

poly = PolynomialFeatures(degree=2)
X_trans = poly.fit_transform(X)

Y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

logreg = linear_model.LogisticRegression(C=1e6)

# we create an instance of Neighbours Classifier and fit the data.
logreg.fit(X_trans, Y)

print logreg.coef_
print logreg.intercept_
print logreg.predict(np.array(poly.fit_transform([[0, 1.8]])))  # expect to return 0
print logreg.predict(np.array(poly.fit_transform([[0, -1.8]])))  # expect to return 0
print logreg.predict(np.array(poly.fit_transform([[0, -2.1]])))  # expect to return 1

And the result is:
logisticr_regression_circle2

Accordingly, we can build the hypothesis:
logisticr_regression_circle4

logisticr_regression_circle3

Check my code on github: open boundary, enclosed boundary