AVOCADO 가격 예측 (Facebook Prophet )¶
STEP #0: 데이터셋¶
- 데이터는 미국의 아보카도 리테일 데이터 입니다. (2018년도 weekly 데이터)
- 아보카도 거래량과 가격이 나와 있습니다.
컬럼 설명 :
- Date - The date of the observation
- AveragePrice - the average price of a single avocado
- type - conventional or organic
- year - the year
- Region - the city or region of the observation
- Total Volume - Total number of avocados sold
- 4046 - Total number of avocados with PLU 4046 sold - PLU는 농산물 코드입니다
- 4225 - Total number of avocados with PLU 4225 sold
- 4770 - Total number of avocados with PLU 4770 sold
STEP #1: 데이터 준비¶
Prophet 라이브러리¶
install : pip install fbprophet
위 에러 발생시 : conda install -c conda-forge fbprophet
레퍼런스 : https://research.fb.com/prophet-forecasting-at-scale/
https://facebook.github.io/prophet/docs/quick_start.html#python-api
In [ ]:
# 프로펫 라이브러리가 fbprophet 에서 그냥 prophet 으로 변경되었음.
In [4]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
import seaborn as sns
from prophet import Prophet
In [5]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [6]:
# 워킹디렉토리 설정
import os
os.chdir('/content/drive/MyDrive/Colab Notebooks/ml_plus/data')
In [9]:
# avocado.csv 데이터 읽기
df = pd.read_csv('avocado.csv', index_col = 0) # index_col = 0 (첫번째 컬럼을 인덱스로 사용하라.)
In [ ]:
# 날짜별로 데이터가 있는 형식 => Time Series Data
In [10]:
df
Out[10]:
Date | AveragePrice | Total Volume | 4046 | 4225 | 4770 | Total Bags | Small Bags | Large Bags | XLarge Bags | type | year | region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2015-12-27 | 1.33 | 64236.62 | 1036.74 | 54454.85 | 48.16 | 8696.87 | 8603.62 | 93.25 | 0.0 | conventional | 2015 | Albany |
1 | 2015-12-20 | 1.35 | 54876.98 | 674.28 | 44638.81 | 58.33 | 9505.56 | 9408.07 | 97.49 | 0.0 | conventional | 2015 | Albany |
2 | 2015-12-13 | 0.93 | 118220.22 | 794.70 | 109149.67 | 130.50 | 8145.35 | 8042.21 | 103.14 | 0.0 | conventional | 2015 | Albany |
3 | 2015-12-06 | 1.08 | 78992.15 | 1132.00 | 71976.41 | 72.58 | 5811.16 | 5677.40 | 133.76 | 0.0 | conventional | 2015 | Albany |
4 | 2015-11-29 | 1.28 | 51039.60 | 941.48 | 43838.39 | 75.78 | 6183.95 | 5986.26 | 197.69 | 0.0 | conventional | 2015 | Albany |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7 | 2018-02-04 | 1.63 | 17074.83 | 2046.96 | 1529.20 | 0.00 | 13498.67 | 13066.82 | 431.85 | 0.0 | organic | 2018 | WestTexNewMexico |
8 | 2018-01-28 | 1.71 | 13888.04 | 1191.70 | 3431.50 | 0.00 | 9264.84 | 8940.04 | 324.80 | 0.0 | organic | 2018 | WestTexNewMexico |
9 | 2018-01-21 | 1.87 | 13766.76 | 1191.92 | 2452.79 | 727.94 | 9394.11 | 9351.80 | 42.31 | 0.0 | organic | 2018 | WestTexNewMexico |
10 | 2018-01-14 | 1.93 | 16205.22 | 1527.63 | 2981.04 | 727.01 | 10969.54 | 10919.54 | 50.00 | 0.0 | organic | 2018 | WestTexNewMexico |
11 | 2018-01-07 | 1.62 | 17489.58 | 2894.77 | 2356.13 | 224.53 | 12014.15 | 11988.14 | 26.01 | 0.0 | organic | 2018 | WestTexNewMexico |
18249 rows × 13 columns
In [ ]:
STEP #2: EDA(Exploratory Data Analysis) : 탐색적 데이터 분석¶
In [11]:
df.describe()
Out[11]:
AveragePrice | Total Volume | 4046 | 4225 | 4770 | Total Bags | Small Bags | Large Bags | XLarge Bags | year | |
---|---|---|---|---|---|---|---|---|---|---|
count | 18249.000000 | 1.824900e+04 | 1.824900e+04 | 1.824900e+04 | 1.824900e+04 | 1.824900e+04 | 1.824900e+04 | 1.824900e+04 | 18249.000000 | 18249.000000 |
mean | 1.405978 | 8.506440e+05 | 2.930084e+05 | 2.951546e+05 | 2.283974e+04 | 2.396392e+05 | 1.821947e+05 | 5.433809e+04 | 3106.426507 | 2016.147899 |
std | 0.402677 | 3.453545e+06 | 1.264989e+06 | 1.204120e+06 | 1.074641e+05 | 9.862424e+05 | 7.461785e+05 | 2.439660e+05 | 17692.894652 | 0.939938 |
min | 0.440000 | 8.456000e+01 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 2015.000000 |
25% | 1.100000 | 1.083858e+04 | 8.540700e+02 | 3.008780e+03 | 0.000000e+00 | 5.088640e+03 | 2.849420e+03 | 1.274700e+02 | 0.000000 | 2015.000000 |
50% | 1.370000 | 1.073768e+05 | 8.645300e+03 | 2.906102e+04 | 1.849900e+02 | 3.974383e+04 | 2.636282e+04 | 2.647710e+03 | 0.000000 | 2016.000000 |
75% | 1.660000 | 4.329623e+05 | 1.110202e+05 | 1.502069e+05 | 6.243420e+03 | 1.107834e+05 | 8.333767e+04 | 2.202925e+04 | 132.500000 | 2017.000000 |
max | 3.250000 | 6.250565e+07 | 2.274362e+07 | 2.047057e+07 | 2.546439e+06 | 1.937313e+07 | 1.338459e+07 | 5.719097e+06 | 551693.650000 | 2018.000000 |
In [12]:
df['year'].unique()
Out[12]:
array([2015, 2016, 2017, 2018])
In [13]:
df['region'].nunique()
Out[13]:
54
In [14]:
df['region'].unique()
Out[14]:
array(['Albany', 'Atlanta', 'BaltimoreWashington', 'Boise', 'Boston', 'BuffaloRochester', 'California', 'Charlotte', 'Chicago', 'CincinnatiDayton', 'Columbus', 'DallasFtWorth', 'Denver', 'Detroit', 'GrandRapids', 'GreatLakes', 'HarrisburgScranton', 'HartfordSpringfield', 'Houston', 'Indianapolis', 'Jacksonville', 'LasVegas', 'LosAngeles', 'Louisville', 'MiamiFtLauderdale', 'Midsouth', 'Nashville', 'NewOrleansMobile', 'NewYork', 'Northeast', 'NorthernNewEngland', 'Orlando', 'Philadelphia', 'PhoenixTucson', 'Pittsburgh', 'Plains', 'Portland', 'RaleighGreensboro', 'RichmondNorfolk', 'Roanoke', 'Sacramento', 'SanDiego', 'SanFrancisco', 'Seattle', 'SouthCarolina', 'SouthCentral', 'Southeast', 'Spokane', 'StLouis', 'Syracuse', 'Tampa', 'TotalUS', 'West', 'WestTexNewMexico'], dtype=object)
In [ ]:
필요없는 맨 처음 컬럼을 제거하시오¶
In [ ]:
In [ ]:
In [ ]:
데이터의 날짜가 뒤죽박죽 입니다. 날짜로 정렬하시오.¶
In [18]:
df.sort_values('Date', inplace= True)
In [19]:
df
Out[19]:
Date | AveragePrice | Total Volume | 4046 | 4225 | 4770 | Total Bags | Small Bags | Large Bags | XLarge Bags | type | year | region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
51 | 2015-01-04 | 1.75 | 27365.89 | 9307.34 | 3844.81 | 615.28 | 13598.46 | 13061.10 | 537.36 | 0.00 | organic | 2015 | Southeast |
51 | 2015-01-04 | 1.49 | 17723.17 | 1189.35 | 15628.27 | 0.00 | 905.55 | 905.55 | 0.00 | 0.00 | organic | 2015 | Chicago |
51 | 2015-01-04 | 1.68 | 2896.72 | 161.68 | 206.96 | 0.00 | 2528.08 | 2528.08 | 0.00 | 0.00 | organic | 2015 | HarrisburgScranton |
51 | 2015-01-04 | 1.52 | 54956.80 | 3013.04 | 35456.88 | 1561.70 | 14925.18 | 11264.80 | 3660.38 | 0.00 | conventional | 2015 | Pittsburgh |
51 | 2015-01-04 | 1.64 | 1505.12 | 1.27 | 1129.50 | 0.00 | 374.35 | 186.67 | 187.68 | 0.00 | organic | 2015 | Boise |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
0 | 2018-03-25 | 1.36 | 908202.13 | 142681.06 | 463136.28 | 174975.75 | 127409.04 | 103579.41 | 22467.04 | 1362.59 | conventional | 2018 | Chicago |
0 | 2018-03-25 | 0.70 | 9010588.32 | 3999735.71 | 966589.50 | 30130.82 | 4014132.29 | 3398569.92 | 546409.74 | 69152.63 | conventional | 2018 | SouthCentral |
0 | 2018-03-25 | 1.42 | 163496.70 | 29253.30 | 5080.04 | 0.00 | 129163.36 | 109052.26 | 20111.10 | 0.00 | organic | 2018 | SouthCentral |
0 | 2018-03-25 | 1.70 | 190257.38 | 29644.09 | 70982.10 | 0.00 | 89631.19 | 89424.11 | 207.08 | 0.00 | organic | 2018 | California |
0 | 2018-03-25 | 1.34 | 1774776.77 | 63905.98 | 908653.71 | 843.45 | 801373.63 | 774634.09 | 23833.93 | 2905.61 | conventional | 2018 | NewYork |
18249 rows × 13 columns
날짜별로 가격이 어떻게 변하는지 간단하게 확인하시오. (plot 이용)¶
In [24]:
df_date = df.groupby('Date')['AveragePrice'].mean()
In [25]:
df_date
Out[25]:
Date 2015-01-04 1.301296 2015-01-11 1.370648 2015-01-18 1.391111 2015-01-25 1.397130 2015-02-01 1.247037 ... 2018-02-25 1.359630 2018-03-04 1.350185 2018-03-11 1.335093 2018-03-18 1.313704 2018-03-25 1.346852 Name: AveragePrice, Length: 169, dtype: float64
In [27]:
df_date.plot()
plt.show()
In [ ]:
# 주기성이 있다.
In [ ]:
In [ ]:
'region' 별로 데이터 몇개인지 시각화 하시오.¶
In [28]:
df
Out[28]:
Date | AveragePrice | Total Volume | 4046 | 4225 | 4770 | Total Bags | Small Bags | Large Bags | XLarge Bags | type | year | region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
51 | 2015-01-04 | 1.75 | 27365.89 | 9307.34 | 3844.81 | 615.28 | 13598.46 | 13061.10 | 537.36 | 0.00 | organic | 2015 | Southeast |
51 | 2015-01-04 | 1.49 | 17723.17 | 1189.35 | 15628.27 | 0.00 | 905.55 | 905.55 | 0.00 | 0.00 | organic | 2015 | Chicago |
51 | 2015-01-04 | 1.68 | 2896.72 | 161.68 | 206.96 | 0.00 | 2528.08 | 2528.08 | 0.00 | 0.00 | organic | 2015 | HarrisburgScranton |
51 | 2015-01-04 | 1.52 | 54956.80 | 3013.04 | 35456.88 | 1561.70 | 14925.18 | 11264.80 | 3660.38 | 0.00 | conventional | 2015 | Pittsburgh |
51 | 2015-01-04 | 1.64 | 1505.12 | 1.27 | 1129.50 | 0.00 | 374.35 | 186.67 | 187.68 | 0.00 | organic | 2015 | Boise |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
0 | 2018-03-25 | 1.36 | 908202.13 | 142681.06 | 463136.28 | 174975.75 | 127409.04 | 103579.41 | 22467.04 | 1362.59 | conventional | 2018 | Chicago |
0 | 2018-03-25 | 0.70 | 9010588.32 | 3999735.71 | 966589.50 | 30130.82 | 4014132.29 | 3398569.92 | 546409.74 | 69152.63 | conventional | 2018 | SouthCentral |
0 | 2018-03-25 | 1.42 | 163496.70 | 29253.30 | 5080.04 | 0.00 | 129163.36 | 109052.26 | 20111.10 | 0.00 | organic | 2018 | SouthCentral |
0 | 2018-03-25 | 1.70 | 190257.38 | 29644.09 | 70982.10 | 0.00 | 89631.19 | 89424.11 | 207.08 | 0.00 | organic | 2018 | California |
0 | 2018-03-25 | 1.34 | 1774776.77 | 63905.98 | 908653.71 | 843.45 | 801373.63 | 774634.09 | 23833.93 | 2905.61 | conventional | 2018 | NewYork |
18249 rows × 13 columns
In [29]:
df['region'].value_counts()
Out[29]:
Southeast 338 NewOrleansMobile 338 SanDiego 338 BaltimoreWashington 338 Roanoke 338 RichmondNorfolk 338 Northeast 338 SouthCentral 338 GreatLakes 338 Louisville 338 Seattle 338 CincinnatiDayton 338 NewYork 338 Indianapolis 338 Chicago 338 Jacksonville 338 Columbus 338 Detroit 338 Philadelphia 338 PhoenixTucson 338 Nashville 338 Portland 338 HartfordSpringfield 338 Tampa 338 Orlando 338 West 338 Denver 338 GrandRapids 338 NorthernNewEngland 338 BuffaloRochester 338 HarrisburgScranton 338 Pittsburgh 338 Boise 338 LosAngeles 338 LasVegas 338 Atlanta 338 DallasFtWorth 338 MiamiFtLauderdale 338 Plains 338 StLouis 338 Syracuse 338 Midsouth 338 Sacramento 338 Boston 338 Charlotte 338 Spokane 338 Albany 338 Houston 338 SouthCarolina 338 SanFrancisco 338 TotalUS 338 RaleighGreensboro 338 California 338 WestTexNewMexico 335 Name: region, dtype: int64
In [30]:
import seaborn as sb
In [36]:
plt.figure(figsize= (6,11) )
sb.countplot(data=df, y = 'region')
plt.show()
In [ ]:
In [ ]:
In [ ]:
년도('year')별로 데이터가 몇건인지 확인하시오.¶
In [52]:
plt.figure(figsize= (10,5))
sb.countplot(data = df , y = 'year')
plt.show()
In [53]:
df.tail()
Out[53]:
Date | AveragePrice | Total Volume | 4046 | 4225 | 4770 | Total Bags | Small Bags | Large Bags | XLarge Bags | type | year | region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2018-03-25 | 1.36 | 908202.13 | 142681.06 | 463136.28 | 174975.75 | 127409.04 | 103579.41 | 22467.04 | 1362.59 | conventional | 2018 | Chicago |
0 | 2018-03-25 | 0.70 | 9010588.32 | 3999735.71 | 966589.50 | 30130.82 | 4014132.29 | 3398569.92 | 546409.74 | 69152.63 | conventional | 2018 | SouthCentral |
0 | 2018-03-25 | 1.42 | 163496.70 | 29253.30 | 5080.04 | 0.00 | 129163.36 | 109052.26 | 20111.10 | 0.00 | organic | 2018 | SouthCentral |
0 | 2018-03-25 | 1.70 | 190257.38 | 29644.09 | 70982.10 | 0.00 | 89631.19 | 89424.11 | 207.08 | 0.00 | organic | 2018 | California |
0 | 2018-03-25 | 1.34 | 1774776.77 | 63905.98 | 908653.71 | 843.45 | 801373.63 | 774634.09 | 23833.93 | 2905.61 | conventional | 2018 | NewYork |
In [ ]:
프로펫 분석을 위해, 두개의 컬럼만 가져오시오. ('Date', 'AveragePrice')¶
In [42]:
# 프로펫은 날짜와 날짜별로 예측할것 두가지가 필수로 있어야함.
avocado_prophet_df = df[ [ 'Date', 'AveragePrice' ] ]
In [43]:
avocado_prophet_df
Out[43]:
Date | AveragePrice | |
---|---|---|
51 | 2015-01-04 | 1.75 |
51 | 2015-01-04 | 1.49 |
51 | 2015-01-04 | 1.68 |
51 | 2015-01-04 | 1.52 |
51 | 2015-01-04 | 1.64 |
... | ... | ... |
0 | 2018-03-25 | 1.36 |
0 | 2018-03-25 | 0.70 |
0 | 2018-03-25 | 1.42 |
0 | 2018-03-25 | 1.70 |
0 | 2018-03-25 | 1.34 |
18249 rows × 2 columns
STEP 3: Prophet 을 이용한 예측 수행¶
ds 와 y 로 컬럼명을 셋팅하시오.¶
In [45]:
# 프로펫은 컬럼명이 통일되어있어야함.
avocado_prophet_df.columns = ['ds', 'y']
In [46]:
avocado_prophet_df
Out[46]:
ds | y | |
---|---|---|
51 | 2015-01-04 | 1.75 |
51 | 2015-01-04 | 1.49 |
51 | 2015-01-04 | 1.68 |
51 | 2015-01-04 | 1.52 |
51 | 2015-01-04 | 1.64 |
... | ... | ... |
0 | 2018-03-25 | 1.36 |
0 | 2018-03-25 | 0.70 |
0 | 2018-03-25 | 1.42 |
0 | 2018-03-25 | 1.70 |
0 | 2018-03-25 | 1.34 |
18249 rows × 2 columns
프로펫 예측 하시오.¶
In [54]:
# 1. 라이브러리를 변수로 만들고,
prophet = Prophet()
In [55]:
# 2. 데이터로, 학습 시킨다.
prophet.fit(avocado_prophet_df)
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Out[55]:
<prophet.forecaster.Prophet at 0x7f5c6478ee50>
In [57]:
# 365일치를 예측하시오.
# 3. 예측하고자 하는 기간을 정해서, 비어있는 데이터 프레임을 만든다!.
future = prophet.make_future_dataframe(periods=365, freq='D') # periods=365, freq='D' == 365일 // periods=365, freq='W' == 365주
In [58]:
future
Out[58]:
ds | |
---|---|
0 | 2015-01-04 |
1 | 2015-01-11 |
2 | 2015-01-18 |
3 | 2015-01-25 |
4 | 2015-02-01 |
... | ... |
529 | 2019-03-21 |
530 | 2019-03-22 |
531 | 2019-03-23 |
532 | 2019-03-24 |
533 | 2019-03-25 |
534 rows × 1 columns
In [59]:
# 4. 이제, 미래 날짜까지 만들어져 있으니,
# 위의 future 데이터프레임을 이용해서, 예측할 수 있다.
forecast = prophet.predict(future)
In [60]:
forecast # yhat == 예측값
Out[60]:
ds | trend | yhat_lower | yhat_upper | trend_lower | trend_upper | additive_terms | additive_terms_lower | additive_terms_upper | yearly | yearly_lower | yearly_upper | multiplicative_terms | multiplicative_terms_lower | multiplicative_terms_upper | yhat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2015-01-04 | 1.498608 | 0.879976 | 1.874224 | 1.498608 | 1.498608 | -0.113270 | -0.113270 | -0.113270 | -0.113270 | -0.113270 | -0.113270 | 0.0 | 0.0 | 0.0 | 1.385337 |
1 | 2015-01-11 | 1.493471 | 0.906704 | 1.877692 | 1.493471 | 1.493471 | -0.104849 | -0.104849 | -0.104849 | -0.104849 | -0.104849 | -0.104849 | 0.0 | 0.0 | 0.0 | 1.388621 |
2 | 2015-01-18 | 1.488334 | 0.889276 | 1.831219 | 1.488334 | 1.488334 | -0.104524 | -0.104524 | -0.104524 | -0.104524 | -0.104524 | -0.104524 | 0.0 | 0.0 | 0.0 | 1.383810 |
3 | 2015-01-25 | 1.483198 | 0.842019 | 1.809866 | 1.483198 | 1.483198 | -0.123469 | -0.123469 | -0.123469 | -0.123469 | -0.123469 | -0.123469 | 0.0 | 0.0 | 0.0 | 1.359729 |
4 | 2015-02-01 | 1.478061 | 0.821152 | 1.803670 | 1.478061 | 1.478061 | -0.151828 | -0.151828 | -0.151828 | -0.151828 | -0.151828 | -0.151828 | 0.0 | 0.0 | 0.0 | 1.326232 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
529 | 2019-03-21 | 1.161728 | 0.542878 | 1.613888 | 0.980910 | 1.336850 | -0.086221 | -0.086221 | -0.086221 | -0.086221 | -0.086221 | -0.086221 | 0.0 | 0.0 | 0.0 | 1.075508 |
530 | 2019-03-22 | 1.160997 | 0.562707 | 1.580983 | 0.979392 | 1.336800 | -0.084549 | -0.084549 | -0.084549 | -0.084549 | -0.084549 | -0.084549 | 0.0 | 0.0 | 0.0 | 1.076448 |
531 | 2019-03-23 | 1.160266 | 0.517173 | 1.591213 | 0.977661 | 1.336861 | -0.082604 | -0.082604 | -0.082604 | -0.082604 | -0.082604 | -0.082604 | 0.0 | 0.0 | 0.0 | 1.077662 |
532 | 2019-03-24 | 1.159535 | 0.540519 | 1.601641 | 0.975953 | 1.337079 | -0.080406 | -0.080406 | -0.080406 | -0.080406 | -0.080406 | -0.080406 | 0.0 | 0.0 | 0.0 | 1.079129 |
533 | 2019-03-25 | 1.158804 | 0.576940 | 1.572218 | 0.974279 | 1.337226 | -0.077982 | -0.077982 | -0.077982 | -0.077982 | -0.077982 | -0.077982 | 0.0 | 0.0 | 0.0 | 1.080822 |
534 rows × 16 columns
In [ ]:
# 차트로 확인하시오.
In [63]:
prophet.plot(forecast) # 차트그리는 첫번째 방법
plt.savefig('chart1.jpg') # 이미지로 저장하라 // 두개로 나타내는 버그를 없앤다.
In [65]:
prophet.plot_components(forecast) # 차트그리는 두번째 방법
plt.savefig('chart2.jpg')
PART 2 : region 이 west 인 아보카도의 가격을 예측하시오.¶
In [88]:
sorted (df['region'].unique())
Out[88]:
['Albany', 'Atlanta', 'BaltimoreWashington', 'Boise', 'Boston', 'BuffaloRochester', 'California', 'Charlotte', 'Chicago', 'CincinnatiDayton', 'Columbus', 'DallasFtWorth', 'Denver', 'Detroit', 'GrandRapids', 'GreatLakes', 'HarrisburgScranton', 'HartfordSpringfield', 'Houston', 'Indianapolis', 'Jacksonville', 'LasVegas', 'LosAngeles', 'Louisville', 'MiamiFtLauderdale', 'Midsouth', 'Nashville', 'NewOrleansMobile', 'NewYork', 'Northeast', 'NorthernNewEngland', 'Orlando', 'Philadelphia', 'PhoenixTucson', 'Pittsburgh', 'Plains', 'Portland', 'RaleighGreensboro', 'RichmondNorfolk', 'Roanoke', 'Sacramento', 'SanDiego', 'SanFrancisco', 'Seattle', 'SouthCarolina', 'SouthCentral', 'Southeast', 'Spokane', 'StLouis', 'Syracuse', 'Tampa', 'TotalUS', 'West', 'WestTexNewMexico']
In [74]:
df_west = df.loc[ df['region'] == 'West', ]
In [77]:
df_west = df_west[ [ 'Date', 'AveragePrice']] # 안가져와도 컬럼명만 바꾸면 알아서 예측해주긴한다.
In [78]:
df_west
Out[78]:
Date | AveragePrice | |
---|---|---|
51 | 2015-01-04 | 1.40 |
51 | 2015-01-04 | 0.89 |
50 | 2015-01-11 | 1.39 |
50 | 2015-01-11 | 0.95 |
49 | 2015-01-18 | 0.96 |
... | ... | ... |
2 | 2018-03-11 | 1.00 |
1 | 2018-03-18 | 1.73 |
1 | 2018-03-18 | 0.99 |
0 | 2018-03-25 | 0.93 |
0 | 2018-03-25 | 1.60 |
338 rows × 2 columns
In [79]:
df_west.columns = ['ds','y'] # rename 해도됨.
In [80]:
df_west
Out[80]:
ds | y | |
---|---|---|
51 | 2015-01-04 | 1.40 |
51 | 2015-01-04 | 0.89 |
50 | 2015-01-11 | 1.39 |
50 | 2015-01-11 | 0.95 |
49 | 2015-01-18 | 0.96 |
... | ... | ... |
2 | 2018-03-11 | 1.00 |
1 | 2018-03-18 | 1.73 |
1 | 2018-03-18 | 0.99 |
0 | 2018-03-25 | 0.93 |
0 | 2018-03-25 | 1.60 |
338 rows × 2 columns
In [81]:
prophet2 = Prophet()
In [82]:
prophet2.fit( df_west )
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Out[82]:
<prophet.forecaster.Prophet at 0x7f5c6486bf10>
In [89]:
future2 = prophet2.make_future_dataframe(periods=365, freq='D')
In [90]:
future2
Out[90]:
ds | |
---|---|
0 | 2015-01-04 |
1 | 2015-01-11 |
2 | 2015-01-18 |
3 | 2015-01-25 |
4 | 2015-02-01 |
... | ... |
529 | 2019-03-21 |
530 | 2019-03-22 |
531 | 2019-03-23 |
532 | 2019-03-24 |
533 | 2019-03-25 |
534 rows × 1 columns
In [91]:
forecast2 = prophet2.predict(future2)
In [92]:
forecast2
Out[92]:
ds | trend | yhat_lower | yhat_upper | trend_lower | trend_upper | additive_terms | additive_terms_lower | additive_terms_upper | yearly | yearly_lower | yearly_upper | multiplicative_terms | multiplicative_terms_lower | multiplicative_terms_upper | yhat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2015-01-04 | 1.287090 | 0.639895 | 1.520475 | 1.287090 | 1.287090 | -0.188138 | -0.188138 | -0.188138 | -0.188138 | -0.188138 | -0.188138 | 0.0 | 0.0 | 0.0 | 1.098952 |
1 | 2015-01-11 | 1.284842 | 0.701152 | 1.559720 | 1.284842 | 1.284842 | -0.172230 | -0.172230 | -0.172230 | -0.172230 | -0.172230 | -0.172230 | 0.0 | 0.0 | 0.0 | 1.112611 |
2 | 2015-01-18 | 1.282593 | 0.663041 | 1.538315 | 1.282593 | 1.282593 | -0.163686 | -0.163686 | -0.163686 | -0.163686 | -0.163686 | -0.163686 | 0.0 | 0.0 | 0.0 | 1.118907 |
3 | 2015-01-25 | 1.280345 | 0.706518 | 1.547059 | 1.280345 | 1.280345 | -0.175002 | -0.175002 | -0.175002 | -0.175002 | -0.175002 | -0.175002 | 0.0 | 0.0 | 0.0 | 1.105342 |
4 | 2015-02-01 | 1.278096 | 0.669138 | 1.481156 | 1.278096 | 1.278096 | -0.196058 | -0.196058 | -0.196058 | -0.196058 | -0.196058 | -0.196058 | 0.0 | 0.0 | 0.0 | 1.082038 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
529 | 2019-03-21 | 1.725246 | 1.091646 | 1.927508 | 1.685382 | 1.759265 | -0.207836 | -0.207836 | -0.207836 | -0.207836 | -0.207836 | -0.207836 | 0.0 | 0.0 | 0.0 | 1.517409 |
530 | 2019-03-22 | 1.725805 | 1.092608 | 1.951162 | 1.685880 | 1.759976 | -0.202810 | -0.202810 | -0.202810 | -0.202810 | -0.202810 | -0.202810 | 0.0 | 0.0 | 0.0 | 1.522995 |
531 | 2019-03-23 | 1.726364 | 1.069243 | 1.962666 | 1.686323 | 1.760687 | -0.197214 | -0.197214 | -0.197214 | -0.197214 | -0.197214 | -0.197214 | 0.0 | 0.0 | 0.0 | 1.529150 |
532 | 2019-03-24 | 1.726924 | 1.108538 | 1.944338 | 1.686720 | 1.761397 | -0.191153 | -0.191153 | -0.191153 | -0.191153 | -0.191153 | -0.191153 | 0.0 | 0.0 | 0.0 | 1.535771 |
533 | 2019-03-25 | 1.727483 | 1.106256 | 1.981002 | 1.687123 | 1.762108 | -0.184741 | -0.184741 | -0.184741 | -0.184741 | -0.184741 | -0.184741 | 0.0 | 0.0 | 0.0 | 1.542743 |
534 rows × 16 columns
In [96]:
prophet2.plot(forecast2)
plt.savefig('chart3.png')
In [ ]:
In [97]:
prophet2.plot_components(forecast2) # 차트그리는 두번째 방법
plt.savefig('chart4.png')
In [ ]:
In [ ]:
'DataScience > Pandas' 카테고리의 다른 글
Pandas 시카고 범죄율을 예측 Prophet, error_bad_lines, to_datetime(format), resample 함수의 사용법과, 이 함수를 사용하기 위해 인덱스를 설정하는 방법 (0) | 2023.01.03 |
---|---|
Pandas datetime,datetime64,데이터프레임 날짜 일괄처리 (0) | 2022.11.30 |
Pandas Tip[1] 문자열 컬럼의 슬라이싱. str (0) | 2022.11.30 |
Pandas concat(), merge() 여러 데이터 프레임을 하나로 합치는 방법 (0) | 2022.11.25 |
Pandas 데이터프레임 오름차순, 내림차순 정렬 .Sort_values() ,sort_index() (0) | 2022.11.25 |