자동차 구매 가격 예측¶
PROBLEM STATEMENT¶
다음과 같은 컬럼을 가지고 있는 데이터셋을 읽어서, 어떠한 고객이 있을때, 그 고객이 얼마정도의 차를 구매할 수 있을지를 예측하여, 그 사람에게 맞는 자동차를 보여주려 한다.
- Customer Name
- Customer e-mail
- Country
- Gender
- Age
- Annual Salary
- Credit Card Debt
- Net Worth (순자산)
예측하고자 하는 값 :
- Car Purchase Amount
STEP #0: 라이브러리 임포트 및 코랩 환경 설정¶
In [48]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
csv 파일을 읽기 위해, 구글 드라이브 마운트 하시오¶
In [49]:
# 왼쪽 사이드바에 폴더를 눌러서 마운트버튼 클릭하면 자동생성됨
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
working directory 를, 현재의 파일이 속한 폴더로 셋팅하시오.¶
In [50]:
import os
In [51]:
os.chdir('/content/drive/MyDrive/Colab Notebooks/ml_plus/data')
In [52]:
# 인코딩은 데이터 제공하는곳에서 확인
df = pd.read_csv('Car_Purchasing_Data.csv',encoding='ISO-8859-1')
In [53]:
df.head(3)
Out[53]:
Customer Name | Customer e-mail | Country | Gender | Age | Annual Salary | Credit Card Debt | Net Worth | Car Purchase Amount | |
---|---|---|---|---|---|---|---|---|---|
0 | Martina Avila | cubilia.Curae.Phasellus@quisaccumsanconvallis.edu | Bulgaria | 0 | 41.851720 | 62812.09301 | 11609.380910 | 238961.2505 | 35321.45877 |
1 | Harlan Barnes | eu.dolor@diam.co.uk | Belize | 0 | 40.870623 | 66646.89292 | 9572.957136 | 530973.9078 | 45115.52566 |
2 | Naomi Rodriquez | vulputate.mauris.sagittis@ametconsectetueradip... | Algeria | 1 | 43.152897 | 53798.55112 | 11160.355060 | 638467.1773 | 42925.70921 |
In [54]:
df.describe()
Out[54]:
Gender | Age | Annual Salary | Credit Card Debt | Net Worth | Car Purchase Amount | |
---|---|---|---|---|---|---|
count | 500.000000 | 500.000000 | 500.000000 | 500.000000 | 500.000000 | 500.000000 |
mean | 0.506000 | 46.241674 | 62127.239608 | 9607.645049 | 431475.713625 | 44209.799218 |
std | 0.500465 | 7.978862 | 11703.378228 | 3489.187973 | 173536.756340 | 10773.178744 |
min | 0.000000 | 20.000000 | 20000.000000 | 100.000000 | 20000.000000 | 9000.000000 |
25% | 0.000000 | 40.949969 | 54391.977195 | 7397.515792 | 299824.195900 | 37629.896040 |
50% | 1.000000 | 46.049901 | 62915.497035 | 9655.035568 | 426750.120650 | 43997.783390 |
75% | 1.000000 | 51.612263 | 70117.862005 | 11798.867487 | 557324.478725 | 51254.709517 |
max | 1.000000 | 70.000000 | 100000.000000 | 20000.000000 | 1000000.000000 | 80000.000000 |
연봉이 가장 높은 사람은 누구인가¶
In [55]:
df['Annual Salary'].max()
Out[55]:
100000.0
In [56]:
df.loc[ df['Annual Salary'] == df['Annual Salary'].max() , ]
Out[56]:
Customer Name | Customer e-mail | Country | Gender | Age | Annual Salary | Credit Card Debt | Net Worth | Car Purchase Amount | |
---|---|---|---|---|---|---|---|---|---|
28 | Gemma Hendrix | lobortis@non.co.uk | Denmark | 1 | 46.124036 | 100000.0 | 17452.92179 | 188032.0778 | 58350.31809 |
In [56]:
In [56]:
나이가 가장 어린 고객은, 연봉이 얼마인가¶
In [57]:
df['Age'].min()
Out[57]:
20.0
In [58]:
df.loc[ df['Age'].min() == df['Age'], 'Annual Salary' ]
Out[58]:
444 70467.29492 Name: Annual Salary, dtype: float64
In [59]:
sb.pairplot(data = df)
plt.show()
STEP #3: CREATE TESTING AND TRAINING DATASET/DATA CLEANING¶
NaN 값이 있으면, 이를 해결하시오.¶
In [60]:
df.isna().sum()
Out[60]:
Customer Name 0 Customer e-mail 0 Country 0 Gender 0 Age 0 Annual Salary 0 Credit Card Debt 0 Net Worth 0 Car Purchase Amount 0 dtype: int64
학습을 위해 'Customer Name', 'Customer e-mail', 'Country', 'Car Purchase Amount' 컬럼을 제외한 컬럼만, X로 만드시오.¶
In [61]:
df.head()
Out[61]:
Customer Name | Customer e-mail | Country | Gender | Age | Annual Salary | Credit Card Debt | Net Worth | Car Purchase Amount | |
---|---|---|---|---|---|---|---|---|---|
0 | Martina Avila | cubilia.Curae.Phasellus@quisaccumsanconvallis.edu | Bulgaria | 0 | 41.851720 | 62812.09301 | 11609.380910 | 238961.2505 | 35321.45877 |
1 | Harlan Barnes | eu.dolor@diam.co.uk | Belize | 0 | 40.870623 | 66646.89292 | 9572.957136 | 530973.9078 | 45115.52566 |
2 | Naomi Rodriquez | vulputate.mauris.sagittis@ametconsectetueradip... | Algeria | 1 | 43.152897 | 53798.55112 | 11160.355060 | 638467.1773 | 42925.70921 |
3 | Jade Cunningham | malesuada@dignissim.com | Cook Islands | 1 | 58.271369 | 79370.03798 | 14426.164850 | 548599.0524 | 67422.36313 |
4 | Cedric Leach | felis.ullamcorper.viverra@egetmollislectus.net | Brazil | 1 | 57.313749 | 59729.15130 | 5358.712177 | 560304.0671 | 55915.46248 |
In [62]:
X = df.loc[ :, 'Gender':'Net Worth' ]
y 값은 'Car Purchase Amount' 컬럼으로 셋팅하시오.¶
In [63]:
y = df['Car Purchase Amount']
피처 스케일링 하겠습니다. 정규화(normalization)를 사용합니다. MinMaxScaler 를 이용하시오.¶
In [64]:
from sklearn.preprocessing import MinMaxScaler
In [65]:
X # 컬럼마다 범위가 다르기 때문에 피처스케일링 0,1이면 할필요없다.
Out[65]:
Gender | Age | Annual Salary | Credit Card Debt | Net Worth | |
---|---|---|---|---|---|
0 | 0 | 41.851720 | 62812.09301 | 11609.380910 | 238961.2505 |
1 | 0 | 40.870623 | 66646.89292 | 9572.957136 | 530973.9078 |
2 | 1 | 43.152897 | 53798.55112 | 11160.355060 | 638467.1773 |
3 | 1 | 58.271369 | 79370.03798 | 14426.164850 | 548599.0524 |
4 | 1 | 57.313749 | 59729.15130 | 5358.712177 | 560304.0671 |
... | ... | ... | ... | ... | ... |
495 | 0 | 41.462515 | 71942.40291 | 6995.902524 | 541670.1016 |
496 | 1 | 37.642000 | 56039.49793 | 12301.456790 | 360419.0988 |
497 | 1 | 53.943497 | 68888.77805 | 10611.606860 | 764531.3203 |
498 | 1 | 59.160509 | 49811.99062 | 14013.034510 | 337826.6382 |
499 | 1 | 46.731152 | 61370.67766 | 9391.341628 | 462946.4924 |
500 rows × 5 columns
In [66]:
y # y도 범위가 만단위기 때문에 피처스케일링함.
Out[66]:
0 35321.45877 1 45115.52566 2 42925.70921 3 67422.36313 4 55915.46248 ... 495 48901.44342 496 31491.41457 497 64147.28888 498 45442.15353 499 45107.22566 Name: Car Purchase Amount, Length: 500, dtype: float64
In [67]:
scaler_X = MinMaxScaler() # 스케일러 하나당 한변수만 하는거다.
In [68]:
scaler_y = MinMaxScaler()
In [69]:
X = scaler_X.fit_transform(X.values)
In [70]:
y = scaler_y.fit_transform(y.values) # y는 1차원이기 때문에 그냥하면 안되고 reshape를 해서 2차원으로 만들어준다. (밑에서 계속)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-70-70e0b418bccb> in <module> ----> 1 y = scaler_y.fit_transform(y.values) # y는 1차원이기 때문에 그냥하면 안되고 reshape를 해서 2차원으로 만들어준다. (밑에서 계속) /usr/local/lib/python3.8/dist-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params) 850 if y is None: 851 # fit method of arity 1 (unsupervised transformation) --> 852 return self.fit(X, **fit_params).transform(X) 853 else: 854 # fit method of arity 2 (supervised transformation) /usr/local/lib/python3.8/dist-packages/sklearn/preprocessing/_data.py in fit(self, X, y) 414 # Reset internal state before fitting 415 self._reset() --> 416 return self.partial_fit(X, y) 417 418 def partial_fit(self, X, y=None): /usr/local/lib/python3.8/dist-packages/sklearn/preprocessing/_data.py in partial_fit(self, X, y) 451 452 first_pass = not hasattr(self, "n_samples_seen_") --> 453 X = self._validate_data( 454 X, 455 reset=first_pass, /usr/local/lib/python3.8/dist-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params) 564 raise ValueError("Validation should be done on X, y or both.") 565 elif not no_val_X and no_val_y: --> 566 X = check_array(X, **check_params) 567 out = X 568 elif no_val_X and not no_val_y: /usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator) 767 # If input is 1D raise error 768 if array.ndim == 1: --> 769 raise ValueError( 770 "Expected 2D array, got 1D array instead:\narray={}.\n" 771 "Reshape your data either using array.reshape(-1, 1) if " ValueError: Expected 2D array, got 1D array instead: array=[35321.45877 45115.52566 42925.70921 67422.36313 55915.46248 56611.99784 28925.70549 47434.98265 48013.6141 38189.50601 59045.51309 42288.81046 28700.0334 49258.87571 49510.03356 53017.26723 41814.72067 43901.71244 44633.99241 54827.52403 51130.95379 43402.31525 47240.86004 46635.49432 45078.40193 44387.58412 37161.55393 49091.97185 58350.31809 43994.35972 17584.56963 44650.36073 66363.89316 53489.46214 39810.34817 51612.14311 38978.67458 10092.22509 35928.52404 54823.19221 45805.67186 41567.47033 28031.20985 27815.73813 68678.4352 68925.09447 34215.7615 37843.46619 37883.24231 48734.35708 27187.23914 63738.39065 48266.75516 46381.13111 31978.9799 48100.29052 47380.91224 41425.00116 38147.81018 32737.80177 37348.13737 47483.85316 49730.53339 40093.61981 42297.5062 52954.93121 48104.11184 43680.91327 52707.96816 49392.8897 30841.00154 49373.37555 41903.65171 45058.8969 52991.52667 50958.08115 41357.17897 44434.71917 38502.42392 41221.24918 38399.46139 41456.68097 30394.82494 42384.05128 39002.0771 19553.2739 45167.32542 36019.9556 50937.93844 12895.71468 38955.21919 51221.04249 25971.95673 60670.33672 54075.12064 40004.87142 61593.52058 39503.38829 52474.71839 42187.6828 57441.44414 22681.71667 33640.73697 31540.77868 60461.24268 45738.3343 34803.82395 34642.6024 27586.71854 54973.02495 49142.51174 58840.53964 57306.32866 51941.6756 30240.60975 67120.89878 42408.02625 41451.71843 42592.88647 34521.17618 42213.69644 41913.53713 59416.18101 51402.61506 54755.42038 47143.44008 64391.68906 37252.55194 52665.36511 44001.20706 51551.67997 38243.66481 39766.64804 40077.57289 33131.52734 48622.66097 47693.23482 39410.4616 33428.40183 32700.27871 62864.43011 29425.83001 44418.60955 36645.5609 53655.53859 45977.12502 38504.39444 47935.9394 60222.22672 38930.55234 27810.21814 47604.34591 42356.6895 31300.54347 42369.64247 31837.22537 26499.31418 38172.83602 39433.40631 37714.31659 57125.41541 46453.34819 43855.06077 55592.70383 42484.02283 40879.19107 20653.21409 35438.80549 36112.79346 38182.30465 41026.02421 27889.95197 43724.4896 57430.76903 41104.07108 49050.85378 41265.52929 64545.16339 29052.09521 30719.8156 38763.11306 39331.20127 32608.45468 58045.56257 54387.27727 36638.20688 39522.13129 42978.34626 60865.76396 46380.44732 56579.90338 42774.35579 37879.65385 45208.42539 56229.4127 50455.11935 49721.31082 31696.99679 49220.0218 46188.83514 36086.93161 43264.04965 40660.38317 51683.60859 44525.02085 48518.90163 45805.30588 54850.38742 32478.44758 42209.28948 55125.93237 47984.42062 43405.89086 44577.44829 37744.54285 47805.25605 44846.68557 46643.26581 56563.98675 41673.44617 61118.46947 37303.56701 46892.26617 56457.74038 45509.69732 27625.44144 46389.50237 29002.05665 51355.7106 42011.19965 52654.40455 44432.71747 46054.60253 58235.41454 42990.29255 50702.18103 47009.57741 49399.97041 42997.16761 44434.98419 46325.50959 46846.7305 56499.10202 42773.75905 52313.98392 34139.6373 60763.24731 66158.69494 31215.6421 46135.27233 56973.18105 24184.07443 49079.61942 37093.92033 43401.56612 29092.1311 48349.16457 33261.00057 41327.16554 49336.11628 51405.55229 31249.98803 43598.96993 48300.02057 54013.47595 38674.66038 37076.82508 37947.85125 41320.07256 66888.93694 12536.93842 39549.13039 52709.08196 53502.97742 52116.90791 38705.65839 48025.02542 59483.91183 35911.64559 41034.28343 51730.17434 53021.86074 32828.03477 29417.64694 57461.51158 50441.62427 41575.34739 46412.47781 47610.11718 70878.29664 55543.38497 53848.7555 39904.81613 44736.41097 46937.17422 28440.81268 38148.00163 42747.53925 29670.83337 63038.20422 63248.76188 42321.56548 44463.30502 67092.23276 22091.11839 40022.17406 56071.61377 49442.12107 42497.72862 37084.77621 51866.48719 35716.31133 39892.93343 35781.16156 42866.21274 80000. 60526.97788 59758.73247 39606.24598 58641.71051 52983.89411 50666.88173 59625.02618 22630.25982 41137.89459 53496.48183 36543.93642 43503.97349 31146.71078 31526.04931 31083.70271 45366.35963 25252.93221 39888.59789 52240.72866 39911.6116 45857.75365 30826.10903 39422.79389 34678.83226 23517.91983 28733.68779 59096.26978 50188.86612 35659.12237 46398.35204 32291.18978 49079.29461 49348.88394 41427.59797 24221.99937 44424.07681 60390.06616 42793.9932 46935.72774 58667.06865 38042.80065 39270.57909 54606.18769 39083.94268 47984.12043 46082.80993 30964.07804 35726.95299 49065.1634 48955.85816 37183.10293 46710.52519 52889.56257 29754.66271 60960.83428 39975.43302 38545.80328 56764.44728 63079.84329 55700.83389 36367.18452 52477.83479 50296.67496 37259.84386 47715.96049 29540.87013 60567.18837 36125.48846 57303.87131 51922.07691 35848.82935 42704.3221 55174.98946 26599.90843 53993.44322 47970.76767 43641.65727 41679.7929 63140.05082 30757.65726 65592.22012 37871.7082 42919.5196 22599.45863 70598.96768 43242.58224 38138.57511 30419.8 63868.94051 45112.94547 44361.87507 19525.29827 49991.60697 61731.71426 41769.38288 46402.53583 37376.63439 33766.6413 30667.60927 52056.41478 30736.5798 39439.45349 38174.87433 40589.8625 62028.71192 48465.27211 40095.0498 49568.47685 31408.62631 47719.47741 35784.42411 42905.53815 48516.84335 45593.6849 32061.6467 32208.37522 35475.00344 29519.56184 55420.56668 42139.64528 50539.90169 34922.42846 43898.2733 39135.03023 41147.46679 24134.59205 42705.11311 38901.60925 28645.39425 52150.41786 66648.25077 42909.27129 49248.10595 27303.17104 47869.82593 59984.16361 45271.46081 9000. 46012.10616 32967.20191 48785.15839 45824.5656 40102.11417 35457.1486 29556.7932 38243.06228 44430.63323 51046.42226 52570.36517 61404.22578 28463.64326 27586.20078 47979.48549 28164.86039 69669.47402 48052.65091 37364.23474 44500.81936 35139.24793 55167.37361 48383.69071 35823.55471 36517.70996 53110.88052 53049.44567 21471.11367 45015.67953 55377.87697 56510.13294 47443.74443 41489.64123 32553.53423 41984.62412 59538.40327 41352.47071 52785.16947 60117.67886 47760.66427 64188.26862 48901.44342 31491.41457 64147.28888 45442.15353 45107.22566]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
In [ ]:
학습을 위해서, y 의 shape 을 변경하시오.¶
In [ ]:
y.shape
In [ ]:
y.reshape(500, 1) # 500행 1열 이기때문에 / 그냥하면 시리즈기 때문에 reshape 을 할수가없다.
In [ ]:
y.values.reshape(500,1) # values를 하면 넘파이만 나오기때문에 된다.
y 도 피처 스케일링 하겠습니다. X 처럼 y도 노멀라이징 하시오.¶
In [72]:
y_scaled = scaler_y.fit_transform(y.values.reshape(500,1 )) # minmax 할때만 reshape 하는거다 , y자체는 1차원이여야 한다. 때문에 y는 그대로두고 values를 넣는다.
In [ ]:
In [ ]:
STEP#4: TRAINING THE MODEL¶
트레이닝셋과 테스트셋으로 분리하시오. (테스트 사이즈는 25%로 하며, 동일 결과를 위해 랜덤스테이트는 50 으로 셋팅하시오.)¶
In [75]:
from sklearn.model_selection import train_test_split
In [80]:
X_train, X_test, y_train, y_test = train_test_split(X, y_scaled, test_size= 0.25, random_state=50)
In [83]:
X_train.shape
Out[83]:
(375, 5)
In [84]:
X_test.shape
Out[84]:
(125, 5)
In [85]:
X_train
Out[85]:
array([[1. , 0.46353068, 0.467206 , 0.64213798, 0.36416975], [1. , 0.48605956, 0.44258078, 0.47349631, 0.5135217 ], [1. , 0.23792445, 0.50480432, 0.57378351, 0.58079638], ..., [1. , 0.30675752, 0.52153758, 0.49312537, 0.27744633], [0. , 0.61603869, 0.57308137, 0.4543552 , 0.52228606], [1. , 0.50682579, 0.52553958, 0.24525971, 0.34452963]])
아래 라이브러리를 임포트 하시오¶
In [77]:
import tensorflow.keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.preprocessing import MinMaxScaler
딥러닝을 이용한 모델링을 하시오.¶
In [87]:
X_train[0] # 5열임을 확인 == 인풋레이어는 5이다. 인풋레이어는 데이터를 받기때문
Out[87]:
array([1. , 0.46353068, 0.467206 , 0.64213798, 0.36416975])
In [89]:
X_train.shape[1] # 좀더정확히 열의 갯수를 확인함
Out[89]:
5
In [108]:
# 일반적으로 모델링할때는 함수로 만든다고 보면된다.
def build_model() :
model = Sequential()
model.add( Dense(units = 5, activation ='relu' ,input_shape= (5,))) # add히든레이어를 추가한다, 레이어는 Dense # activation =경사하강법 미분 문제로 시그모이드는 사용하지않음.
# input_shape == 인풋에서 넘어오는 데이터는몇개냐 ?인풋레이어는 항상 튜플형태로
model.add( Dense(units = 25, activation ='relu'))
model.add( Dense(units = 10, activation ='relu'))
# output 레이어는 마음대로 할수가없다.(데이터를보고한다) ,여기서 금액은 1개이기때문에 아웃풋은1개
# 리그레이션문제는 linear를 사용한다 , linear == 0과 1로 바꾸지말고 그냥뽑아라
model.add( Dense(units = 1, activation ='linear'))
model.compile(optimizer='adam', loss='mse', metrics=['mse','mae']) # 첫번째 자리는 optimizer, 두번째자리는 loss 이다 , 파라미터 순서가 맞기때문에 생략이 가능하다?만 좋지는않다. 'mse' == min squard error , metrics [종류와갯수]== 검증방법
return model
In [ ]:
옵티마이저는 'adam' 으로 하고, 로스펑션은 'mean_squared_error' 로 셋팅하여 컴파일 하시오¶
In [109]:
model = build_model()
In [111]:
model.summary()
Model: "sequential_3" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 5) 30 dense_1 (Dense) (None, 25) 150 dense_2 (Dense) (None, 10) 260 dense_3 (Dense) (None, 1) 11 ================================================================= Total params: 451 Trainable params: 451 Non-trainable params: 0 _________________________________________________________________
학습을 진행하시오.¶
In [112]:
model.fit(X_train,y_train, batch_size = 10 , epochs = 20) # mse가 작을수록 좋다 보통이걸로 성능을 측정 1s 2ms/step = 1스텝당 2ms 초 걸렷다는 의미
Epoch 1/20 38/38 [==============================] - 1s 2ms/step - loss: 0.2277 - mse: 0.2277 - mae: 0.4519 Epoch 2/20 38/38 [==============================] - 0s 2ms/step - loss: 0.1157 - mse: 0.1157 - mae: 0.3070 Epoch 3/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0252 - mse: 0.0252 - mae: 0.1271 Epoch 4/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0158 - mse: 0.0158 - mae: 0.1022 Epoch 5/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0142 - mse: 0.0142 - mae: 0.0960 Epoch 6/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0129 - mse: 0.0129 - mae: 0.0913 Epoch 7/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0114 - mse: 0.0114 - mae: 0.0858 Epoch 8/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0104 - mse: 0.0104 - mae: 0.0832 Epoch 9/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0095 - mse: 0.0095 - mae: 0.0784 Epoch 10/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0086 - mse: 0.0086 - mae: 0.0752 Epoch 11/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0077 - mse: 0.0077 - mae: 0.0711 Epoch 12/20 38/38 [==============================] - 0s 3ms/step - loss: 0.0070 - mse: 0.0070 - mae: 0.0674 Epoch 13/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0062 - mse: 0.0062 - mae: 0.0634 Epoch 14/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0056 - mse: 0.0056 - mae: 0.0602 Epoch 15/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0050 - mse: 0.0050 - mae: 0.0561 Epoch 16/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0042 - mse: 0.0042 - mae: 0.0516 Epoch 17/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0036 - mse: 0.0036 - mae: 0.0478 Epoch 18/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0030 - mse: 0.0030 - mae: 0.0435 Epoch 19/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0025 - mse: 0.0025 - mae: 0.0394 Epoch 20/20 38/38 [==============================] - 0s 2ms/step - loss: 0.0020 - mse: 0.0020 - mae: 0.0355
Out[112]:
<keras.callbacks.History at 0x7f022b41b910>
In [ ]:
STEP#5: EVALUATING THE MODEL¶
In [113]:
model.evaluate(X_test, y_test) # 답안지를 줘서 평가를 한다. 검증값으로 성능측정
4/4 [==============================] - 0s 3ms/step - loss: 0.0026 - mse: 0.0026 - mae: 0.0381
Out[113]:
[0.0025828008074313402, 0.0025828008074313402, 0.038085076957941055]
In [ ]:
In [ ]:
테스트셋으로 예측을 해 보시오.¶
In [114]:
y_pred = model.predict(X_test)
4/4 [==============================] - 0s 3ms/step
In [115]:
y_pred
Out[115]:
array([[0.5224 ], [0.7142014 ], [0.67470855], [0.51420337], [0.524148 ], [0.33085275], [0.5887935 ], [0.47477442], [0.6071884 ], [0.5833241 ], [0.5208583 ], [0.48203295], [0.43933874], [0.4049549 ], [0.5453125 ], [0.2012224 ], [0.5046336 ], [0.59748596], [0.46860176], [0.60647124], [0.4190597 ], [0.5223207 ], [0.35910094], [0.76474917], [0.564606 ], [0.51169306], [0.6643147 ], [0.4524073 ], [0.6818391 ], [0.54598415], [0.5616117 ], [0.52351815], [0.6710905 ], [0.54112625], [0.5507159 ], [0.5947876 ], [0.66437525], [0.46966928], [0.45158625], [0.6414284 ], [0.5429411 ], [0.3941781 ], [0.29050514], [0.63821 ], [0.5441814 ], [0.6086421 ], [0.51483566], [0.4319877 ], [0.29778302], [0.4967755 ], [0.70529056], [0.6722764 ], [0.31712365], [0.45829064], [0.31044108], [0.5821765 ], [0.6268752 ], [0.5070299 ], [0.56229174], [0.4521888 ], [0.4118389 ], [0.54514277], [0.560114 ], [0.47323143], [0.49272484], [0.39568865], [0.48952645], [0.49744576], [0.6376413 ], [0.31072587], [0.35352224], [0.324946 ], [0.48562276], [0.37657738], [0.50813985], [0.42472094], [0.52001643], [0.18346614], [0.605491 ], [0.3953464 ], [0.32523876], [0.6626122 ], [0.16433531], [0.5288373 ], [0.6053343 ], [0.47930628], [0.33166325], [0.48763084], [0.558701 ], [0.21733937], [0.33341086], [0.38507158], [0.45563573], [0.7256303 ], [0.52891946], [0.7649345 ], [0.4228297 ], [0.61918294], [0.43435878], [0.3048051 ], [0.35647637], [0.5800123 ], [0.655023 ], [0.30225468], [0.66525126], [0.58164525], [0.4059922 ], [0.2742763 ], [0.68889374], [0.56998974], [0.57322633], [0.5266626 ], [0.46465802], [0.3564937 ], [0.4664744 ], [0.38996315], [0.4734887 ], [0.69959366], [0.76846826], [0.32132196], [0.4230138 ], [0.34104586], [0.5780781 ], [0.4901495 ], [0.06654672]], dtype=float32)
In [116]:
y_test
Out[116]:
array([[0.51220225], [0.76168793], [0.6538108 ], [0.53430602], [0.52673735], [0.21316327], [0.56811431], [0.42720002], [0.56035434], [0.65554063], [0.53007738], [0.46357095], [0.42849005], [0.33560612], [0.53369389], [0.18438195], [0.54529522], [0.61231998], [0.54146119], [0.60642838], [0.46018938], [0.5343264 ], [0.26804521], [0.86759109], [0.59067519], [0.48846357], [0.67273869], [0.42962519], [0.66841888], [0.52129727], [0.60117759], [0.50939895], [0.71808681], [0.51540401], [0.54948752], [0.60183344], [0.68033622], [0.52080458], [0.43394857], [0.61562087], [0.46779854], [0.33173992], [0.21439436], [0.68227386], [0.50866938], [0.64444254], [0.55352142], [0.46885649], [0.3172683 ], [0.47531745], [0.72629843], [0.75865395], [0.37719946], [0.42443705], [0.26233016], [0.56409653], [0.63399262], [0.50109082], [0.56449711], [0.49908055], [0.41088501], [0.50814651], [0.64966102], [0.27669569], [0.41091372], [0.31338011], [0.46217916], [0.54888405], [0.67782275], [0.19154167], [0.35515157], [0.2138602 ], [0.47873651], [0.34170423], [0.47053558], [0.39792327], [0.49287831], [0. ], [0.62671101], [0.37927499], [0.36116341], [0.68037083], [0.16412978], [0.49091635], [0.61485077], [0.43107389], [0.22891454], [0.49841668], [0.49948317], [0.26493265], [0.31561446], [0.36510463], [0.49805458], [0.72143981], [0.48907732], [0.81860421], [0.34705263], [0.61908354], [0.38056275], [0.31289637], [0.29231919], [0.56429808], [0.69953618], [0.29916352], [0.69917902], [0.56199216], [0.39802597], [0.27381426], [0.67013948], [0.60775236], [0.52661271], [0.54372318], [0.45107076], [0.26992761], [0.50726309], [0.45573492], [0.48459001], [0.78016463], [0.81194719], [0.3119255 ], [0.38187033], [0.30740999], [0.59339372], [0.46743215], [0.01538345]])
In [117]:
# 수동으로 mse를 구해보자
error = y_test - y_pred # 진짜값 - 예측값
(error ** 2 ).mean()
Out[117]:
0.002582801007627578
실제값과 예측값을 plot 으로 나타내시오.¶
In [118]:
plt.plot(y_test)
plt.plot(y_pred)
plt.show()
In [ ]:
MSE 를 계산하시오.¶
In [ ]:
# 이미함
In [ ]:
In [119]:
df.head(1)
Out[119]:
Customer Name | Customer e-mail | Country | Gender | Age | Annual Salary | Credit Card Debt | Net Worth | Car Purchase Amount | |
---|---|---|---|---|---|---|---|---|---|
0 | Martina Avila | cubilia.Curae.Phasellus@quisaccumsanconvallis.edu | Bulgaria | 0 | 41.85172 | 62812.09301 | 11609.38091 | 238961.2505 | 35321.45877 |
In [127]:
new_data = np.array([0, 38, 90000, 2000, 500000])
In [128]:
new_data
Out[128]:
array([ 0, 38, 90000, 2000, 500000])
In [129]:
new_data.shape
Out[129]:
(5,)
In [131]:
new_data = new_data.reshape(1,5)
In [132]:
new_data = scaler_X.transform(new_data)
In [133]:
y_pred2 = model.predict(new_data)
1/1 [==============================] - 0s 23ms/step
In [134]:
y_pred2
Out[134]:
array([[0.6526625]], dtype=float32)
In [ ]:
scaler_y # 최대 최솟값을 알고있기 때문에 원상복구가 가능
In [135]:
scaler_y.inverse_transform(y_pred2)
Out[135]:
array([[55339.04]], dtype=float32)
새로운 고객 데이터가 있습니다. 이 사람은 차량을 얼마정도 구매 가능한지 예측하시오.
첫번째 고객은 여자, 나이는 38, 연봉은 90000, 카드빚은 2000, 순자산은 500000
두번째 고객은,남자이고, 나이27, 연봉30000, 카드빚10000, 순자산300000
In [154]:
new_data2 = np.array([0, 38, 90000, 2000, 500000, 1, 27, 30000, 10000, 300000])
In [155]:
new_data2 = new_data2.reshape(2,5)
In [156]:
new_data2
Out[156]:
array([[ 0, 38, 90000, 2000, 500000], [ 1, 27, 30000, 10000, 300000]])
In [157]:
new_data2 = scaler_X.transform(new_data2)
In [158]:
y_pred3 = model.predict(new_data2)
1/1 [==============================] - 0s 32ms/step
In [159]:
y_pred3
Out[159]:
array([[ 0.6526626 ], [-0.01829305]], dtype=float32)
In [160]:
scaler_y.inverse_transform(y_pred3)
Out[160]:
array([[55339.043 ], [ 7701.1934]], dtype=float32)
In [ ]:
# 서비스 개발자에게 보내줄것 scaler_X, scaler_y == pkl형식 , model == h5형식(텐서 자체기능)
In [1]:
from IPython.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))
In [ ]:
'DataScience > TensorFlow[ANN]' 카테고리의 다른 글
딥러닝 텐서플로우 leaning rate를 옵티마이저에서 셋팅, 밸리데이션 데이터란 무엇이고 사용법,EarlyStopping 라이브러리 사용법 (0) | 2022.12.28 |
---|---|
딥러닝 regression 문제, epoch_history, loss(경사)를 눈으로 확인 (오차가 더 떨어질지 보는것) (0) | 2022.12.28 |
딥러닝 텐서플로우 분류의문제 GridSearch (저번편에 빈공간에 이어서..) (0) | 2022.12.27 |
딥러닝 ANN개념 정리 요약 (0) | 2022.12.27 |
딥러닝 텐서플로우에서 학습시 epoch 와 batch_size 에 대한 설명, dummy variable trap, 분류의 문제 모델링 하는 방법, 실무 예시 (0) | 2022.12.27 |