Bootcamp Machine Learning day04

28 Aug 2020

Reading time ~25 minutes

Bootcamp Machine Learning day04

머신러닝 기초공부 day04에서 느낀점을 적는다.

전체주제는 Regularization 이다.

ex00

Logistic Regression

day03에서 구현한 Logistic Regression 클래스이다.

# **************************************************************************** #
#                                                                              #
#                                                         :::      ::::::::    #
#    my_logistic_regression.py                          :+:      :+:    :+:    #
#                                                     +:+ +:+         +:+      #
#    By: dochoi <dochoi@student.42seoul.kr>         +#+  +:+       +#+         #
#                                                 +#+#+#+#+#+   +#+            #
#    Created: 2020/05/30 02:00:53 by dochoi            #+#    #+#              #
#    Updated: 2020/05/30 17:27:02 by dochoi           ###   ########.fr        #
#                                                                              #
# **************************************************************************** #

import numpy as np
import math

class MyLogisticRegression(object):
    """ Description: My personnal logistic regression to classify things. """

    def __init__(self, thetas, alpha=0.003, n_cycle=30000):
        self.alpha = alpha
        self.n_cycle = n_cycle
        self.thetas = np.array(thetas, dtype=float).reshape(-1, 1)

    def add_intercept(self,x):
        if len(x) == 0 or x.ndim >= 3:
            return None
        if x.ndim == 1:
            return np.vstack((np.ones(len(x)), x)).T
        else:
            return np.insert(x, 0, 1, axis=1)

    def log_gradient(self, x, y):
        if len(x) == 0 or len(y) == 0 or len(self.thetas) == 0:
            return None
        return np.sum(self.add_intercept(x).T @ (self.predict_(x) - y) / len(x),axis=1).reshape(-1,1)

    def fit_(self, x, y):
        if len(x) == 0 or len(y) == 0 or len(self.thetas) == 0:
            return None
        n_cycle= self.n_cycle
        while n_cycle:
            self.thetas -= self.alpha * ((self.log_gradient(x, y)))
            n_cycle -= 1
        return self.thetas

    def predict_(self,x):
        if (len(x) == 0 or len(self.thetas) == 0 or x.shape[1] != (len(self.thetas) - 1)):
            return None
        return 1 / (1 + pow(math.e, -self.add_intercept(x) @ self.thetas))

    def cost_(self, x, y, eps=1e-15):
        y_hat = self.predict_(x) - eps
        if (y.shape != y_hat.shape or len(y) == 0 or len(y_hat) == 0):

            return None
        return -sum((y * np.log(y_hat)) + ((1 - y) * np.log(1 - y_hat)) ).squeeze() / len(y_hat)

ex01

day02에서 구현한 add_polynomial_features 함수이다.

# **************************************************************************** #
#                                                                              #
#                                                         :::      ::::::::    #
#    polynomial_model.py                                :+:      :+:    :+:    #
#                                                     +:+ +:+         +:+      #
#    By: dochoi <dochoi@student.42seoul.kr>         +#+  +:+       +#+         #
#                                                 +#+#+#+#+#+   +#+            #
#    Created: 2020/05/27 16:40:41 by dochoi            #+#    #+#              #
#    Updated: 2020/05/30 22:38:30 by dochoi           ###   ########.fr        #
#                                                                              #
# **************************************************************************** #

import numpy as np

def add_polynomial_features(x, power):
    temp = x.copy()
    for i in range(2, power + 1):
        temp = np.append(temp, np.power(x, i), axis=1 )
    return temp

ex02

Question Time!

day02에서 물어봤던 질문들이다.

1 - What is overﬁtting?

모집단이 아닌 train데이터에만 cost함수가 최적으로 맞춰진것이다. 이 경우 실제 새로운 데이터값이 들어오면 오차가 커질 수 있다.

2 - What do you think underﬁtting might be?

overfitting이 다항식이 복잡해져서 일어났다면 underfitting은 다항식이 너무 단순하여 오차가 줄어들지 않는 경우를 말할 것 같습니다.

3 - Why is it important to split the data set in a training and a test set?

새로운 데이터를 테스트 해 봐서 overfitting을 막기 위함입니다.

4 - If a model overﬁts, what will happen when you compare its performance on the training set and the test set?

training set에서는 최고의 효율로 예측하겠지만 test set에서는 오차가 많이 발생합니다.

5 - If a model underﬁts, what do you think will happen when you compare its performance on the training set and the test set?

underfitting이 일어난다면 test set과 training set 모두 오차가 많이 발생할 것 같습니다.

ex03

Polynomial models 2

MLimg0

내가 구현한 Polynomial models는 이미 이 조건을 만족했다 .. 따라서 그대로 쓰겠다.

def add_polynomial_features(x, power):
    """Add polynomial features to matrix x by raising its columns to every power in the range of
    ,! 1 up to the power given in argument.
    Args:
    x: has to be an numpy.ndarray, a matrix of dimension m * n.
    power: has to be an int, the power up to which the columns of matrix x are going to be
    ,! raised.
    Returns:
    The matrix of polynomial features as a numpy.ndarray, of dimension m * (np), containg the
    ,! polynomial feature values for all training examples.
    None if x is an empty numpy.ndarray.
    Raises:
    This function should not raise any Exception.
    """
    temp = x.copy()
    for i in range(2, power + 1):
        temp = np.append(temp, np.power(x, i), axis=1 )
    return temp

Interlude - Fighting Overﬁtting… With Regularization

Evaluate 단계 ..

day02에서 데이터를 train과 test로 나누는 이유와 overfitting에 대해 공부하였다. 하지만 overfitting을 detect할 수 있다고 overfitting을 피할 순 없다. 이 문제를 해결하기 위해, 우리는 Regularization을 사용한다. overfitting은 model이 training data와 관련 없는 신호를 활용하기 때문에 발생한다. 기본적인 아이디어는 모델에게 특정(보통 무거운 다항식) features에 너무 많은 비중을 두는것에 대해 불이익을 주는것이다. cost함수에 extra term을 추가하자

MLimg0

이렇게 함으로써 θ 값을 가능한 작게 유지하게 할 수 있다.

λ(람다)는 reglarization 이 모델에 영향을 미치는것을 수정할 수 있는 매개변수이다.

MLimg0

Notice

in the regularization term, the sum starts at j =1 because we do NOT want to penalize the value of θ0 (the y-intercept, which doesn’t depend on a feature).

θ0에는 영향을 주지 않기 위해 j=1로 시작한다.

Be carefull!

Machine Learning was essentially developed by computer scientists (not mathematicians). This can cause problems when we try to represent things mathematically. For example: using the θ0 notation to represent the y-intercept makes things easy when we apply the linear algebra trick, but it completly messes up the overall matrix notation!

MLimg0

지금까지 해왔던 이야기를 한다. theta0을 적용하기 때문에 X`의 차원이 m *(n + 1)이며

행은 인덱스가 1로 시작하고 컬럼은 인덱스가 0으로 시작한다.

Terminology

MLimg0

여기서 norm에 대한 이야기가 나온다. 선형대수학때 배운 norm이다 단순히 크기라고 정의하기엔 꽤나 복잡한 친구다.. euclidean norm 은 거리, 크기 개념으로 절댓값 제곱의 합의 제곱근으로 구해진다.

다시, regularization 기술은 L2 regularization을 사용 한다고 한다.

이유는 theta vector L2 norm의 제곱을 cost 함수에 더해주기 때문이다.

L2 함수는 euclidean norm 이다.

여기서 regularization terms에 사용할 수 있는 norms의 종류는 굉장히 많다.(L2가 아니더라도)

우리는 가장 대중적인 L2를 사용한다.

저 시그마에서 우리는 theta0은 합치지 않을 것이다.

vectorization을 이용해서 위 계산을 효율적으로 할 수 있다.

Interlude - Answers to the Vectorization Problem

MLimg0

theta0은 간섭하지 않을것이기 때문에 0으로 두고 dot product를 하면 위 식을 얻을 수 있다.

ex04

L2 Regularization

이 함수를 반복문 버전과 vectorized 버전으로 구하자

# **************************************************************************** #
#                                                                              #
#                                                         :::      ::::::::    #
#    l2_reg.py                                          :+:      :+:    :+:    #
#                                                     +:+ +:+         +:+      #
#    By: dochoi <dochoi@student.42seoul.kr>         +#+  +:+       +#+         #
#                                                 +#+#+#+#+#+   +#+            #
#    Created: 2020/05/31 02:37:01 by dochoi            #+#    #+#              #
#    Updated: 2020/05/31 02:46:58 by dochoi           ###   ########.fr        #
#                                                                              #
# **************************************************************************** #

import numpy as np

def iterative_l2(theta):
    """Computes the L2 regularization of a non-empty numpy.ndarray, with a for-loop.
    Args:
    theta: has to be a numpy.ndarray, a vector of dimension n * 1.
    Returns:
    The L2 regularization as a float.
    None if theta in an empty numpy.ndarray.
    Raises:
    This function should not raise any Exception.
    """
    if len(theta) == 0:
        return None
    l2_lst = [i * i for i in theta]
    return sum(l2_lst) - l2_lst[0]


def l2(theta):
    """Computes the L2 regularization of a non-empty numpy.ndarray, without any for-loop.
    Args:
    theta: has to be a numpy.ndarray, a vector of dimension n * 1.
    Returns:
    The L2 regularization as a float.
    None if theta in an empty numpy.ndarray.
    Raises:
    This function
    """
    if len(theta) == 0:
        return None
    l2_lst = theta * theta
    return np.sum(l2_lst) - l2_lst[0]

ex05

Regularized Linear Cost Function

MLimg0

저 cost함수를 구현하자 이미 전에 linear cost function은구현했기 때문에 뒤에 Regularization term만 붙여주면 될거 같다.

# **************************************************************************** #
#                                                                              #
#                                                         :::      ::::::::    #
#    linear_cost_reg.py                               :+:      :+:    :+:    #
#                                                     +:+ +:+         +:+      #
#    By: dochoi <dochoi@student.42seoul.kr>         +#+  +:+       +#+         #
#                                                 +#+#+#+#+#+   +#+            #
#    Created: 2020/05/31 02:48:04 by dochoi            #+#    #+#              #
#    Updated: 2020/05/31 03:12:11 by dochoi           ###   ########.fr        #
#                                                                              #
# **************************************************************************** #

import numpy as np
import math

def cost_(y, y_hat):
    if len(y) == 0 or len(y_hat) == 0 or y.shape != y_hat.shape:
        return None
    return ((y_hat - y) @ (y_hat - y)) / (2 * len(y))

def l2(theta):
    if len(theta) == 0:
        return None
    l2_lst = theta * theta
    return np.sum(l2_lst) - l2_lst[0]

def reg_cost_(y, y_hat, theta, lambda_):
    """Computes the regularized cost of a linear regression model from two non-empty
    ,! numpy.ndarray, without any for loop. The two arrays must have the same dimensions.
    Args:
    y: has to be an numpy.ndarray, a vector of dimension m * 1.
    y_hat: has to be an numpy.ndarray, a vector of dimension m * 1.
    theta: has to be a numpy.ndarray, a vector of dimension n * 1.
    lambda_: has to be a float.
    Returns:
    The regularized cost as a float.
    None if y, y_hat, or theta are empty numpy.ndarray.
    None if y and y_hat do not share the same dimensions.
    Raises:
    This function should not raise any Exception.
    """
    if (len(y) == 0 or len(y_hat) == 0 or len(theta) == 0):
        return None
    if (y.shape != y_hat.shape):
        return None
    return cost_(y, y_hat)  + (lambda_ * l2(theta)/(2 * len(y_hat)))

데코레이터를 사용하면 깔끔할거 같지만 가독성이 떨어진다. 또한 아직 내가 데코레이터에 대한 이해가 부족하다.

ex06

Regularized Logistic Cost Function

MLimg0

마찬가지이다. 이전에 Logistic Cost Function도 구현했기 때문에 뒤에 regulariztion term만 붙여준다.

# **************************************************************************** #
#                                                                              #
#                                                         :::      ::::::::    #
#    logistic_cost_reg.py                               :+:      :+:    :+:    #
#                                                     +:+ +:+         +:+      #
#    By: dochoi <dochoi@student.42seoul.kr>         +#+  +:+       +#+         #
#                                                 +#+#+#+#+#+   +#+            #
#    Created: 2020/05/31 03:17:13 by dochoi            #+#    #+#              #
#    Updated: 2020/05/31 03:19:25 by dochoi           ###   ########.fr        #
#                                                                              #
# **************************************************************************** #

import numpy as np

def vec_log_loss_(y, y_hat, eps=1e-15):
    if (y.shape != y_hat.shape or len(y) == 0 or len(y_hat) == 0):
        return None
    y_hat = y_hat - eps
    return -sum((y * np.log(y_hat)) + ((1 - y) * np.log(1 - y_hat)) ) / len(y_hat)

def l2(theta):
    if len(theta) == 0:
        return None
    l2_lst = theta * theta
    return np.sum(l2_lst) - l2_lst[0]

def reg_log_cost_(y, y_hat, theta, lambda_):
    """Computes the regularized cost of a logistic regression model from two non-empty
    ,! numpy.ndarray, without any for loop. The two arrays must have the same dimensions.
    Args:
    y: has to be an numpy.ndarray, a vector of dimension m * 1.
    y_hat: has to be an numpy.ndarray, a vector of dimension m * 1.
    theta: has to be a numpy.ndarray, a vector of dimension n * 1.
    lambda_: has to be a float.
    Returns:
    The regularized cost as a float.
    None if y, y_hat, or theta is empty numpy.ndarray.
    None if y and y_hat do not share the same dimensions.
    Raises:
    This function should not raise any Exception.
    """
    if (len(y) == 0 or len(y_hat) == 0 or len(theta) == 0):
        return None
    if (y.shape != y_hat.shape):
        return None
    return vec_log_loss_(y, y_hat)  + (lambda_ * l2(theta)/(2 * len(y_hat)))

Interlude - Regularized Gradient

Imporve

MLimg0

cost함수에 Regulariztion term을 추가 했기 때문에 이를 미분한 Gradient에도 이 텀을 미분한 식이 추가된다.

Linear Gradient vs Logitic Gradient

이 둘은 hypothesis function이 유일하게 다르다.

MLimg0

ex07

Regularized Linear Gradient

MLimg0

formulas as a functions for the linear regression hypothesis을

iterative , Vectorized 두가지 버전으로 구현해본다.

# **************************************************************************** #
#                                                                              #
#                                                         :::      ::::::::    #
#    reg_linear_grad.py                                 :+:      :+:    :+:    #
#                                                     +:+ +:+         +:+      #
#    By: dochoi <dochoi@student.42seoul.kr>         +#+  +:+       +#+         #
#                                                 +#+#+#+#+#+   +#+            #
#    Created: 2020/05/31 13:37:39 by dochoi            #+#    #+#              #
#    Updated: 2020/05/31 13:58:34 by dochoi           ###   ########.fr        #
#                                                                              #
# **************************************************************************** #

import numpy as np

def add_intercept(x):
    if len(x) == 0 or x.ndim >= 3:
        return None
    if x.ndim == 1:
        return np.vstack((np.ones(len(x)), x)).T
    else:
        return np.insert(x, 0, 1, axis=1)

def predict_(x, theta):
    if x.ndim == 1:
        x = x[:,np.newaxis]
    if len(theta) - 1 != x.shape[1]  or len(x) == 0:
        return None
    return add_intercept(x) @ theta

def reg_linear_grad(y, x, theta, lambda_):
    """Computes the regularized linear gradient of three non-empty numpy.ndarray, with two
    for-loop. The three arrays must have compatible dimensions.
    Args:
    y: has to be a numpy.ndarray, a vector of dimension m * 1.
    x: has to be a numpy.ndarray, a matrix of dimesion m * n.
    theta: has to be a numpy.ndarray, a vector of dimension n * 1.
    lambda_: has to be a float.
    Returns:
    A numpy.ndarray, a vector of dimension n * 1, containing the results of the formula for all
    ,! j.
    None if y, x, or theta are empty numpy.ndarray.
    None if y, x or theta does not share compatibles dimensions.
    Raises:
    This function should not raise any Exception.
    """
    if len(y) == 0 or len(x) == 0 or len(theta) == 0:
        return None
    if len(y) != len(x) or len(x[0]) != len(theta) - 1:
        return None
    y_hat = predict_(x, theta)
    answer = np.sum(y_hat- y)
    for j in range(0, len(theta) - 1):
        temp = 0.0
        for i in range(len(x)):
            temp += ((y_hat[i] - y[i]) * x[i][j])
        answer = np.append(answer, temp + lambda_ * theta[j + 1])
    return (answer / len(x)).reshape(-1,1)

def vec_reg_linear_grad(y, x, theta, lambda_):
    """Computes the regularized linear gradient of three non-empty numpy.ndarray, without any
    for-loop. The three arrays must have compatible dimensions.
    Args:
    y: has to be a numpy.ndarray, a vector of dimension m * 1.
    x: has to be a numpy.ndarray, a matrix of dimesion m * n.
    theta: has to be a numpy.ndarray, a vector of dimension n * 1.
    lambda_: has to be a float.
    Returns:
    A numpy.ndarray, a vector of dimension n * 1, containing the results of the formula for all j.
    None if y, x, or theta are empty numpy.ndarray.
    None if y, x or theta does not share compatibles dimensions.
    Raises:
    This function should not raise any Exception.
    """
    if len(y) == 0 or len(x) == 0 or len(theta) == 0:
        return None
    if len(y) != len(x) or len(x[0]) != len(theta) - 1:
        return None
    answer = add_intercept(x).T @ (predict_(x, theta) - y) + (lambda_ * theta)
    answer[0] -= lambda_ * theta[0]
    return answer / len(x)

vectorize 버전에선 theta0의 간섭을 피하기 위해 마지막에 다시 값을 빼줬다.

ex08

Regularized Logistic Gradient

MLimg0

ex07과 동일한식에서 hypothesis만 수정하면 된다.

Interlude - Linear Regression to the Next Level: Ridge Regression

Linear Regression에 L2 Regularization을 적용하면 Ridge Regression model이라고 한다.

이는 L2로 Regulariztion을한 Linear Regression과 다를 바 없다.

ex09

Ridge Regression을 상속을 통해 만들자

# **************************************************************************** #
#                                                                              #
#                                                         :::      ::::::::    #
#    ridge.py                                           :+:      :+:    :+:    #
#                                                     +:+ +:+         +:+      #
#    By: dochoi <dochoi@student.42seoul.kr>         +#+  +:+       +#+         #
#                                                 +#+#+#+#+#+   +#+            #
#    Created: 2020/05/31 16:02:01 by dochoi            #+#    #+#              #
#    Updated: 2020/05/31 16:19:15 by dochoi           ###   ########.fr        #
#                                                                              #
# **************************************************************************** #

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import math

class MyLinearRegression(object):
    """ Description: My personnal linear regression class to fit like a boss. """
    def __init__(self, thetas, alpha=0.000001, n_cycle=30000):
        self.alpha = alpha
        self.n_cycle = n_cycle
        self.thetas = np.array(thetas, dtype=float).reshape(-1, 1)

    def add_intercept(self,x):
        if len(x) == 0 or x.ndim >= 3:
            return None
        if x.ndim == 1:
            return np.vstack((np.ones(len(x)), x)).T
        else:
            return np.insert(x, 0, 1, axis=1)

    def gradient(self, x, y):
        if len(x) == 0 or len(y) == 0 or len(self.thetas) == 0:
            return None
        return self.add_intercept(x).T @ (self.predict_(x).reshape(-1,1) - y) / len(x)

    def fit_(self, x, y):
        if len(x) == 0 or len(y) == 0 or len(self.thetas) == 0:
            return None
        n_cycle= self.n_cycle
        while n_cycle:
            self.thetas -= self.alpha * ((self.gradient(x, y)))
            # for i, v in enumerate(self.gradient(x, y)):
            #     self.thetas[i] -= (self.alpha * v)
            n_cycle -= 1
        return self.thetas

    def predict_(self,x):
        if x.ndim == 1:
            x = x[:,np.newaxis]
        if len(self.thetas) - 1 != x.shape[1]  or len(x) == 0:
            return None
        return self.add_intercept(x) @ self.thetas

    def cost_elem_(self, x, y):
        y_hat = self.predict_(x)
        if y.ndim == 1:
            y = y[:,np.newaxis]
        if y.shape != y_hat.shape or len(y) == 0 or len(y_hat) ==0:
            return None
        return ((y - y_hat) ** 2) / (2 * len(y))

    def cost_(self, x, y):
        y_hat = self.predict_(x)
        if len(y) == 0 or len(y_hat) == 0 or y.shape != y_hat.shape:
            return None
        return (sum((y_hat - y) * (y_hat - y)).squeeze() / (2 * len(y)))

class MyRidge(MyLinearRegression):
    """
    Description:
    My personnal ridge regression class to fit like a boss.
    """
    def __init__(self, thetas, alpha=0.001, n_cycle=1000, lambda_=0.5):
        self.alpha = alpha
        self.max_iter = max_iter
        self.thetas = thetas
        self.lambda_ = lambda_
    def gradient(self, x, y):

        if len(y) == 0 or len(x) == 0 or len(self.thetas) == 0:
            return None
        if len(y) != len(x) or len(x[0]) != len(self.thetas) - 1:
            return None
        answer = add_intercept(x).T @ (predict_(x, self.thetas) - y) + (self.lambda_  * self.thetas)
        answer[0] -= self.lambda_  * self.thetas[0]
        return answer / len(x)
    def cost_(y, y_hat):
        if len(y) == 0 or len(y_hat) == 0 or y.shape != y_hat.shape:
            return None
        return ((y_hat - y) @ (y_hat - y)) / (2 * len(y))

    def l2(theta):
        if len(theta) == 0:
            return None
        l2_lst = theta * theta
        return np.sum(l2_lst) - l2_lst[0]

    def reg_cost_(y, y_hat, theta, lambda_):
        if (len(y) == 0 or len(y_hat) == 0 or len(theta) == 0):
            return None
        if (y.shape != y_hat.shape):
            return None
        return cost_(y, y_hat)  + (lambda_ * l2(theta)/(2 * len(y_hat)))

ex10

Ridge Regression을 polynomial model에서 적용해보자

데이터를 학습과 테스트 셋으로 나누고
모델을 10개를 한다 1개는(Linear Regression) 나머지 9개는는(Ridge Regression)

polynomial hypothesis는 degree 3으로 하고 The Ridge Regression model은 람다값을 0.1에서 1로 바꿔가면서 만든다.

MSE를 이용하여 평가한다.
MSE score을 각각의 람다값을 적용한 모델마다 bar plot한다.
original data points 위에 hypothisis를 그래프를 표시한다.

**여기서 polynomial model 에 변수도 다변수이다. theta를 그만큼 늘려야한다. ** degree가 3이고 features도 3개이니 theta는 10개여야한다.

# **************************************************************************** #
#                                                                              #
#                                                         :::      ::::::::    #
#    polynomial_ridge.py                                :+:      :+:    :+:    #
#                                                     +:+ +:+         +:+      #
#    By: dochoi <dochoi@student.42seoul.kr>         +#+  +:+       +#+         #
#                                                 +#+#+#+#+#+   +#+            #
#    Created: 2020/05/31 16:24:37 by dochoi            #+#    #+#              #
#    Updated: 2020/05/31 18:40:47 by dochoi           ###   ########.fr        #
#                                                                              #
# **************************************************************************** #


import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import math
from data_spliter import data_spliter
from polynomial_model_extended import add_polynomial_features

def zscore(x):
    if len(x) == 0:
        return none
    mu = sum(x)/len(x)
    temp = 0.0
    for elem in x:
        temp += ((elem - mu) * (elem - mu))
    var = temp
    std = np.sqrt(var/ (len(x) - 1))
    return (x - mu) / std

class MyLinearRegression(object):
    """ Description: My personnal linear regression class to fit like a boss. """
    def __init__(self, thetas, alpha=0.01, n_cycle=300000):
        self.alpha = alpha
        self.n_cycle = n_cycle
        self.thetas = np.array(thetas, dtype=float).reshape(-1, 1)

    def add_intercept(self,x):
        if len(x) == 0 or x.ndim >= 3:
            return None
        if x.ndim == 1:
            return np.vstack((np.ones(len(x)), x)).T
        else:
            return np.insert(x, 0, 1, axis=1)

    def gradient(self, x, y):
        if len(x) == 0 or len(y) == 0 or len(self.thetas) == 0:
            return None
        return self.add_intercept(x).T @ (self.predict_(x).reshape(-1,1) - y) / len(x)

    def fit_(self, x, y):
        if len(x) == 0 or len(y) == 0 or len(self.thetas) == 0:
            return None
        n_cycle= self.n_cycle
        while n_cycle:
            self.thetas -= self.alpha * ((self.gradient(x, y)))
            n_cycle -= 1
        return self.thetas

    def predict_(self,x):
        if x.ndim == 1:
            x = x[:,np.newaxis]
        if len(self.thetas) - 1 != x.shape[1]  or len(x) == 0:
            return None
        return self.add_intercept(x) @ self.thetas

    def cost_elem_(self, x, y):
        y_hat = self.predict_(x)
        if y.ndim == 1:
            y = y[:,np.newaxis]
        if y.shape != y_hat.shape or len(y) == 0 or len(y_hat) ==0:
            return None
        return ((y - y_hat) ** 2) / (2 * len(y))

    def cost_(self, x, y):
        y_hat = self.predict_(x)
        if len(y) == 0 or len(y_hat) == 0 or y.shape != y_hat.shape:
            return None
        return (sum((y_hat - y) * (y_hat - y)).squeeze() / (2 * len(y)))

class MyRidge(MyLinearRegression):
    """
    Description:
    My personnal ridge regression class to fit like a boss.
    """
    def __init__(self, thetas, alpha=0.001, n_cycle=30000, lambda_=0.5):
        self.alpha = alpha
        self.n_cycle = n_cycle
        self.thetas = thetas
        self.lambda_ = lambda_
    def gradient(self, x, y):
        if len(y) == 0 or len(x) == 0 or len(self.thetas) == 0:
            return None
        if len(y) != len(x) or len(x[0]) != len(self.thetas) - 1:
            print(self.thetas)
            return None
        answer = self.add_intercept(x).T @ (self.predict_(x) - y) + (self.lambda_  * self.thetas)
        answer[0] -= self.lambda_  * self.thetas[0]
        return answer / len(x)

    def l2(self, theta):
        if len(theta) == 0:
            return None
        l2_lst = theta * theta
        return np.sum(l2_lst) - l2_lst[0]

    def reg_cost_(self, x, y):
        if (len(y) == 0 or len(x) == 0 or len(self.thetas) == 0):
            return None
        if (len(y) != len(x)):
            return None
        return self.cost_(x, y)  + (self.lambda_ * self.l2(self.thetas)/(2 * len(y)))
# Your code here
#... other methods ...


csv_data = pd.read_csv("../resources/spacecraft_data.csv")

x = np.array(csv_data[['Age','Thrust_power','Terameters']])

y = np.array(csv_data["Sell_price"]).reshape(-1,1)
y_n = []
x = zscore(x)
y = zscore(y)
temp = data_spliter(x, y, 0.5)
x_train = temp[0]
x_test = temp[1]

y_train = temp[2]
y_test = temp[3]

x_test_add_poly = add_polynomial_features(x_test, 3)
x_train_add_poly = add_polynomial_features(x_train, 3)
theta = np.array([[0],
    [0],
    [ 0],
    [0],[ 0],[ 0],[ 0],[ 0],[ 0],[ 0]], dtype=float)
mylr0 = MyRidge(theta, lambda_=0)
mylr0.fit_(x_train_add_poly, y_train)
mylr0.alpha = 0.003
mylr0.fit_(x_train_add_poly, y_train)
mylr0.alpha = 0.01
mylr0.fit_(x_train_add_poly, y_train)
mylr0.alpha = 0.03
mylr0.fit_(x_train_add_poly, y_train)

i = 0
plt.title(str(i))
plt.scatter(x_test[:,0], y_test,color="black", s=12, alpha=0.7,marker='s')
plt.scatter(x_test[:,0], mylr0.predict_(x_test_add_poly),color="red", s=7,alpha=0.7)
plt.legend(['Sell_price', 'Predicted sell_price'])
plt.show()
y_n.append(mylr0.reg_cost_(x_test_add_poly, y_test).squeeze())

for i in range(9):
    i += 1
    mylr0.lambda_  += 0.1
    mylr0.thetas = np.array([[0],
    [0],
    [ 0],
    [0],[ 0],[ 0],[ 0],[ 0],[ 0],[ 0]], dtype=float)
    mylr0.fit_(x_train_add_poly, y_train)
    mylr0.alpha = 0.003
    mylr0.fit_(x_train_add_poly, y_train)
    mylr0.alpha = 0.01
    mylr0.fit_(x_train_add_poly, y_train)
    mylr0.alpha = 0.03
    mylr0.fit_(x_train_add_poly, y_train)
    print(mylr0.thetas)
    plt.title(str(i))
    plt.scatter(x_test[:,0], y_test,color="black", s=12, alpha=0.7,marker='s')
    plt.scatter(x_test[:,0], mylr0.predict_(x_test_add_poly),color="red", s=7,alpha=0.7)
    plt.legend(['Sell_price', 'Predicted sell_price'])
    plt.show()
    y_n.append(mylr0.reg_cost_(x_test_add_poly, y_test).squeeze())

print(y_n)
plt.bar(range(0,10), y_n ,color=['black', 'red', 'green', 'blue','black', 'red', 'green', 'blue','black', 'red'])

plt.grid()
plt.show()

아래는 이제 0번 모델부터 1번 모델까지 test 데이터와 유사성을 나타낸 그래프이다.

MLimg0

MLimg0 MLimg0 MLimg0 MLimg0 MLimg0 MLimg0 MLimg0 MLimg0 MLimg0

MLimg0

테스트셋과 훈련 셋을 나눴음에도 불구하고 압도적으로 람다가 0일때 정확한 그래프가 그려진다.

아마 내가 설정한 alpha값과 n_cycle이 부족하기 때문일것이다. 내가 설정한 모델에선 람다가 커질수록 그래프의 예측률은 떨어진다.

Interlude - Regularized Logistic Regression is still Logistic Regression

MLimg0

logistic regression에 regularization을 한 것은 여전히 logistic regression이라 불린다.

ex11

Regularized Logistic Regression

목표 : 이전에 구현한 Logistic Regression에 penalty 파라미터를 추가하자

penalty의 default는 ‘L2’이다.

# **************************************************************************** #
#                                                                              #
#                                                         :::      ::::::::    #
#    my_logistic_regression.py                          :+:      :+:    :+:    #
#                                                     +:+ +:+         +:+      #
#    By: dochoi <dochoi@student.42seoul.kr>         +#+  +:+       +#+         #
#                                                 +#+#+#+#+#+   +#+            #
#    Created: 2020/05/30 02:00:53 by dochoi            #+#    #+#              #
#    Updated: 2020/06/01 01:33:34 by dochoi           ###   ########.fr        #
#                                                                              #
# **************************************************************************** #

import numpy as np
import math



class MyLogisticRegression(object):
    """ Description: My personnal logistic regression to classify things. """

    def __init__(self, thetas, alpha=0.0001, n_cycle=5000, penalty='l2', lambda_=0.5):
        self.alpha = alpha
        self.n_cycle = n_cycle
        self.penalty=penalty
        self.thetas = np.array(thetas, dtype=float).reshape(-1, 1)
        self.lambda_ = lambda_

    def l2(self, theta):
        if len(theta) == 0:
            return None
        l2_lst = theta * theta
        return np.sum(l2_lst) - l2_lst[0]

    def add_intercept(self,x):
        if len(x) == 0 or x.ndim >= 3:
            return None
        if x.ndim == 1:
            return np.vstack((np.ones(len(x)), x)).T
        else:
            return np.insert(x, 0, 1, axis=1)

    def log_gradient(self, x, y):
        if len(x) == 0 or len(y) == 0 or len(self.thetas) == 0:
            return None
        if self.penalty == 'l2':
            answer = self.add_intercept(x).T @ (self.predict_(x) - y) + (self.lambda_ * self.thetas)
            answer[0] -= self.lambda_ * self.thetas[0]
            return answer / len(x)
        else:
            return np.sum(self.add_intercept(x).T @ (self.predict_(x) - y) / len(x),axis=1).reshape(-1,1)

    def fit_(self, x, y):
        if len(x) == 0 or len(y) == 0 or len(self.thetas) == 0:
            return None
        n_cycle= self.n_cycle
        while n_cycle:
            self.thetas -= self.alpha * ((self.log_gradient(x, y)))
            n_cycle -= 1
        return self.thetas

    def predict_(self,x):
        if (len(x) == 0 or len(self.thetas) == 0 or x.shape[1] != (len(self.thetas) - 1)):
            return None
        return 1 / (1 + pow(math.e, -self.add_intercept(x) @ self.thetas))

    def cost_(self, x, y, eps=1e-15):
        if (len(y) != len(x) or len(y) == 0 or len(x) == 0):
            return None
        y_hat = self.predict_(x) - eps
        if self.penalty == 'l2':
            return -sum((y * np.log(y_hat)) + ((1 - y) * np.log(1 - y_hat)) ).squeeze() / len(y_hat) + (self.lambda_ * self.l2(self.thetas)/(2 * len(y_hat)))
        else :
            return -sum((y * np.log(y_hat)) + ((1 - y) * np.log(1 - y_hat)) ).squeeze() / len(y_hat)

ex12

Practicing Regularized Logistic Regression

Train ten different Logistic Regression models with a polynomial hypothesis of degree 3. The models will be trained with different θ values, ranging from 0 to 1. Use one-vs-all method.
Evaluate the f1 score of each of the ten models on the test set. You can use the f1score function that you wrote in the ex11 of day03.
To properly visualize your results, make a bar plot showing the score of the models given their (λ)value.

According to your evaluations, what is the best hypothesis (or model) you can get?

마지막 문제이다.

역시 degree를 3으로 해야한다., 변수도 3개이니 theta는 10개여야 한다.

사람의 체중, 키, 골밀도, 3가지 변수로 4가지의 지역으로 분류해야한다.

과연 얼마나 적중할까?

# **************************************************************************** #
#                                                                              #
#                                                         :::      ::::::::    #
#    log_reg_model.py                                   :+:      :+:    :+:    #
#                                                     +:+ +:+         +:+      #
#    By: dochoi <dochoi@student.42seoul.kr>         +#+  +:+       +#+         #
#                                                 +#+#+#+#+#+   +#+            #
#    Created: 2020/05/30 02:56:12 by dochoi            #+#    #+#              #
#    Updated: 2020/06/01 01:55:02 by dochoi           ###   ########.fr        #
#                                                                              #
# **************************************************************************** #

import numpy as np
import math
import pandas as pd
import matplotlib.pyplot as plt
from my_logistic_regression import MyLogisticRegression as MyLR
from data_spliter import data_spliter
from polynomial_model_extended import add_polynomial_features

def accuracy_score_(y, y_hat):
    """ Compute the accuracy score.
    Args:
    y:a numpy.ndarray for the correct labels y_hat:a numpy.ndarray for the predicted labels
    Returns: The accuracy score as a float. None on any error.
    Raises: This function should not raise any Exception. """
    return np.count_nonzero(y==y_hat) / float(len(y))

def precision_score_(y, y_hat, pos_label=1):
    """ Compute the precision score.
    Args:
    y:a numpy.ndarray for the correct labels
    y_hat:a numpy.ndarray for the predicted labels
    pos_label: str or int,
    the class on which to report the precision_score (default=1)
    Returns: The precision score as a float.
    None on any error.
    Raises: This function should not raise any Exception. """
    if np.count_nonzero(y_hat == pos_label) == 0:
        return 0
    return np.count_nonzero((y_hat == y) & (y_hat == pos_label)) / float(np.count_nonzero(y_hat==pos_label))

def recall_score_(y, y_hat, pos_label=1):
    """ Compute the recall score.
    Args: y:a numpy.ndarray for the correct labels
    y_hat:a numpy.ndarray for the predicted labels
    pos_label: str or int, the class on which to report the precision_score (default=1)
    Returns: The recall score as a float. None on any error.
    Raises: This function should not raise any Exception. """

    return  np.count_nonzero((y_hat == y) & (y_hat == pos_label)) / float(np.count_nonzero((y_hat == y) & (y_hat == pos_label)) + np.count_nonzero((y_hat != y) & (y_hat != pos_label)))

def f1_score_(y, y_hat, pos_label=1):
    """ Compute the f1 score. Args: y:a numpy.ndarray for the correct labels
    y_hat:a numpy.ndarray for the predicted labels
    pos_label: str or int, the class on which to report the precision_score (default=1)
    Returns: The f1 score as a float. None on any error.
    Raises: This function should not raise any Exception"""
    pre =precision_score_(y,y_hat, pos_label=pos_label)
    re = recall_score_(y,y_hat, pos_label=pos_label)
    if (pre * re) == 0:
        return 0
    return 2 * pre * re / (pre + re)

def zscore(x):
    if len(x) == 0:
        return none
    mu = sum(x)/len(x)
    temp = 0.0
    for elem in x:
        temp += ((elem - mu) * (elem - mu))
    var = temp
    std = np.sqrt(var/ (len(x) - 1))
    return (x - mu) / std

csv_data_x = pd.read_csv("../resources/solar_system_census.csv")

csv_data_y = pd.read_csv("../resources/solar_system_census_planets.csv")

x = np.array(csv_data_x[["height","weight","bone_density"]])
y =  np.array(csv_data_y["Origin"]).reshape(-1,1)

x = zscore(x)


temp = data_spliter(x, y, 0.5)
x_train = temp[0]
x_test = temp[1]
y_train = temp[2]
y_test = temp[3]

x_test_add_poly = add_polynomial_features(x_test, 3) # degree를 3으로 한다.
x_train_add_poly = add_polynomial_features(x_train, 3)
x = add_polynomial_features(x,3)

y_train0 = np.array([1 if i == 0 else 0 for i in y_train]).reshape(-1,1) #각각의 분류모델 데이터 전처리
y_test0 = np.array([1 if i == 0 else 0 for i in y_test]).reshape(-1,1)

y_train1 = np.array([1 if i == 1 else 0 for i in y_train]).reshape(-1,1) #각각의 분류모델 데이터 전처리
y_test1 = np.array([1 if i == 1 else 0 for i in y_test]).reshape(-1,1)

y_train2 = np.array([1 if i == 2 else 0 for i in y_train]).reshape(-1,1) #각각의 분류모델 데이터 전처리
y_test2 = np.array([1 if i == 2 else 0 for i in y_test]).reshape(-1,1)

y_train3 = np.array([1 if i == 3 else 0 for i in y_train]).reshape(-1,1) #각각의 분류모델 데이터 전처리
y_test3 = np.array([1 if i == 3 else 0 for i in y_test]).reshape(-1,1)
theta = np.array([[1],
    [1],
    [ 1],
    [1],[1],[ 1],[ 1],[ 1],[ 1],[ 1]], dtype=float)
mylr0 = MyLR(theta, lambda_=0) # The ﬂying cities of Venus (0)
mylr1 = MyLR(theta, lambda_=0) #  United Nations of Earth (1)
mylr2 = MyLR(theta, lambda_=0) # Mars Republic (2)
mylr3 = MyLR(theta, lambda_=0) # The Asteroids’ Belt colonies (3).
y_n = []
y_n2= []
for i in range(10):
    mylr0.thetas = np.array(  [[-0.38004857],
    [ 0.12257596],
    [-1.13496089],
    [ 0.64144711],
    [ 0.13721429],
    [-0.46771826],
    [-1.18485222],
    [-0.46742162],
    [ 0.03928006],
    [-0.1718098 ]])
    mylr0.fit_(x_train_add_poly, y_train0)
    mylr0.alpha = 0.00003
    mylr0.fit_(x_train_add_poly, y_train0)
    mylr0.alpha = 0.00007
    mylr0.fit_(x_train_add_poly, y_train0)
    mylr0.alpha = 0.0001
    mylr0.fit_(x_train_add_poly, y_train0)
    mylr0.lambda_ += 0.1

    mylr1.thetas = np.array( [[-0.79899142],
    [-0.3785926 ],
    [ 1.24131593],
    [ 1.13327427],
    [-0.73841759],
    [-0.79814797],
    [ 0.03383971],
    [-0.40336806],
    [-0.76538218],
    [ 0.75970411],])

    mylr1.fit_(x_train_add_poly, y_train1)
    mylr1.alpha = 0.00003
    mylr1.fit_(x_train_add_poly, y_train1)
    mylr1.alpha = 0.00007
    mylr1.fit_(x_train_add_poly, y_train1)
    mylr1.alpha = 0.0001
    mylr1.fit_(x_train_add_poly, y_train1)
    mylr1.lambda_ += 0.1

    mylr2.thetas = np.array([[-1.01133034],
    [ 0.43171802],
    [ 1.63911229],
    [ 0.20881501],
    [-1.05478397],
    [ 0.30739429],
    [-1.12331241],
    [ 0.09578607],
    [ 0.84076726],
    [ 0.25000609],])

    mylr2.fit_(x_train_add_poly, y_train2)
    mylr2.alpha = 0.00003
    mylr2.fit_(x_train_add_poly, y_train2)
    mylr2.alpha = 0.00007
    mylr2.fit_(x_train_add_poly, y_train2)
    mylr2.alpha = 0.0001
    mylr2.fit_(x_train_add_poly, y_train2)
    mylr2.lambda_ += 0.1

    mylr3.thetas = np.array([[-1.37659546],
    [ 0.88245098],
    [ 0.06049297],
    [-0.38618096],
    [-0.22685866],
    [-1.05910852],
    [-0.12425495],
    [ 0.76153196],
    [-0.61360801],
    [-1.65141989]])

    mylr3.fit_(x_train_add_poly, y_train3)
    mylr3.alpha = 0.00003
    mylr3.fit_(x_train_add_poly, y_train3)
    mylr3.alpha = 0.00007
    mylr3.fit_(x_train_add_poly, y_train3)
    mylr3.alpha = 0.0001
    mylr3.fit_(x_train_add_poly, y_train3)
    mylr3.lambda_ += 0.1

    print(mylr0.thetas)
    print(mylr1.thetas)
    print(mylr2.thetas)
    print(mylr3.thetas)
    # 모델 생성 완료
    # 테스트 데이터 예측
    y_hat0 = mylr0.predict_(x_test_add_poly)
    y_hat1 = mylr1.predict_(x_test_add_poly)
    y_hat2 = mylr2.predict_(x_test_add_poly)
    y_hat3 = mylr3.predict_(x_test_add_poly)


    y_hat_total = np.append(y_hat0, y_hat1, axis=1)
    y_hat_total = np.append(y_hat_total, y_hat2, axis=1)
    y_hat_total = np.append(y_hat_total, y_hat3, axis=1)

    y_hat_pre_all = np.array([])
    # 데이터 확률 최댓값을 기준으로 클래스 분류
    for i in range(len(y_hat_total)):
        y_hat_pre_all = np.append(y_hat_pre_all, np.argmax(y_hat_total[i]))

    y_hat_pre_all = y_hat_pre_all.reshape(-1,1)

    print(mylr0.cost_(x_test_add_poly,y_test0))
    print(mylr1.cost_(x_test_add_poly,y_test1))
    print(mylr2.cost_(x_test_add_poly,y_test2))
    print(mylr3.cost_(x_test_add_poly,y_test3))
    y_n.append(f1_score_(y_test, y_hat_pre_all))
    y_n2.append(accuracy_score_(y_test, y_hat_pre_all))
    # np.set_printoptions(formatter={'float_kind': lambda x: "{0:0.3f}".format(x)})

    # 시각화
plt.bar(range(0,10), y_n ,color=['black', 'red', 'green', 'blue','black', 'red', 'green', 'blue','black', 'red'])


plt.xlabel('lambda 0, 0.1, 0.2 ,..., 0.9')
plt.xticks(range(0,10))
plt.ylabel('f_score')
plt.ylim()
plt.title('**f_score for lambda change*')
plt.show()
print(y_n)
plt.bar(range(0,10), y_n2 ,color=['black', 'red', 'green', 'blue','black', 'red', 'green', 'blue','black', 'red'])

print(y_n2)
plt.xlabel('lambda 0, 0.1, 0.2 ,..., 0.9')
plt.xticks(range(0,10))
plt.ylabel('accuracy_score_')
plt.ylim()
plt.title('**accuracy_score_ for lambda change*')
plt.show()