영화 추천 시스템¶
PROBLEM STATEMENT¶
추천시스템은 영화나 노래등을 추천하는데 사용되며, 주로 관심사나 이용 내역을 기반으로 추천한다.
이 노트북에서는, Item-based Collaborative Filtering 으로 추천시스템을 구현한다.
Dataset MovieLens: https://grouplens.org/datasets/movielens/100k/
STEP #0: LIBRARIES IMPORT¶
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
import os
os.chdir('/content/drive/MyDrive/Colab Notebooks/ml_plus/data')
STEP #1: IMPORT DATASET¶
Movie_Id_Titles.csv 파일을 읽으세요.¶
movie_titles_df = pd.read_csv('Movie_Id_Titles.csv')
movie_titles_df
item_id | title | |
---|---|---|
0 | 1 | Toy Story (1995) |
1 | 2 | GoldenEye (1995) |
2 | 3 | Four Rooms (1995) |
3 | 4 | Get Shorty (1995) |
4 | 5 | Copycat (1995) |
... | ... | ... |
1677 | 1678 | Mat' i syn (1997) |
1678 | 1679 | B. Monkey (1998) |
1679 | 1680 | Sliding Doors (1998) |
1680 | 1681 | You So Crazy (1994) |
1681 | 1682 | Scream of Stone (Schrei aus Stein) (1991) |
1682 rows × 2 columns
'u.data' 파일을 구글드라이브에서 열어보세요.¶
그러면, 탭으로 구분되어 있고, 맨 위에 컬럼이름이 없습니다.
따라서 컬럼이름을 'user_id', 'item_id', 'rating', 'timestamp' 로 지어주면서 데이터프레임으로 읽어오세요.
movies_rating_df = pd.read_csv('u.data', sep='\t', names= [ 'user_id', 'item_id', 'rating', 'timestamp' ])
movies_rating_df
user_id | item_id | rating | timestamp | |
---|---|---|---|---|
0 | 0 | 50 | 5 | 881250949 |
1 | 0 | 172 | 5 | 881250949 |
2 | 0 | 133 | 1 | 881250949 |
3 | 196 | 242 | 3 | 881250949 |
4 | 186 | 302 | 3 | 891717742 |
... | ... | ... | ... | ... |
99998 | 880 | 476 | 3 | 880175444 |
99999 | 716 | 204 | 5 | 879795543 |
100000 | 276 | 1090 | 1 | 874795795 |
100001 | 13 | 225 | 2 | 882399156 |
100002 | 12 | 203 | 3 | 879959583 |
100003 rows × 4 columns
timestamp 컬럼은 필요없으니, movies_rating_df 에서 아예 제거하시오.¶
movies_rating_df.drop('timestamp', axis=1,inplace=True)
movies_rating_df
user_id | item_id | rating | |
---|---|---|---|
0 | 0 | 50 | 5 |
1 | 0 | 172 | 5 |
2 | 0 | 133 | 1 |
3 | 196 | 242 | 3 |
4 | 186 | 302 | 3 |
... | ... | ... | ... |
99998 | 880 | 476 | 3 |
99999 | 716 | 204 | 5 |
100000 | 276 | 1090 | 1 |
100001 | 13 | 225 | 2 |
100002 | 12 | 203 | 3 |
100003 rows × 3 columns
movie_titles_df.head(2)
item_id | title | |
---|---|---|
0 | 1 | Toy Story (1995) |
1 | 2 | GoldenEye (1995) |
movies_rating_df.head(2)
user_id | item_id | rating | |
---|---|---|---|
0 | 0 | 50 | 5 |
1 | 0 | 172 | 5 |
movies_rating_df = pd.merge(movie_titles_df, movies_rating_df, on= 'item_id',how = 'left') ## 레이팅이 없는 영화는 삭제되는 문제 때문에 how='left' 즉 타이틀 df의 모든 컬럼은 살린다.
movies_rating_df
item_id | title | user_id | rating | |
---|---|---|---|---|
0 | 1 | Toy Story (1995) | 308 | 4 |
1 | 1 | Toy Story (1995) | 287 | 5 |
2 | 1 | Toy Story (1995) | 148 | 4 |
3 | 1 | Toy Story (1995) | 280 | 4 |
4 | 1 | Toy Story (1995) | 66 | 3 |
... | ... | ... | ... | ... |
99998 | 1678 | Mat' i syn (1997) | 863 | 1 |
99999 | 1679 | B. Monkey (1998) | 863 | 3 |
100000 | 1680 | Sliding Doors (1998) | 863 | 2 |
100001 | 1681 | You So Crazy (1994) | 896 | 3 |
100002 | 1682 | Scream of Stone (Schrei aus Stein) (1991) | 916 | 3 |
100003 rows × 4 columns
STEP #2: VISUALIZE DATASET¶
각 영화 제목별로, 별점에 대한 기본통계치(최대,최소,중앙,표준편차,1/4,3/4 값)를 보여주세요.¶
movie_titles_df.shape
(1682, 2)
movies_rating_df.groupby('item_id')['rating'].describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
item_id | ||||||||
1 | 452.0 | 3.878319 | 0.927897 | 1.0 | 3.0 | 4.0 | 5.0 | 5.0 |
2 | 131.0 | 3.206107 | 0.966497 | 1.0 | 3.0 | 3.0 | 4.0 | 5.0 |
3 | 90.0 | 3.033333 | 1.212760 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 |
4 | 209.0 | 3.550239 | 0.965069 | 1.0 | 3.0 | 4.0 | 4.0 | 5.0 |
5 | 86.0 | 3.302326 | 0.946446 | 1.0 | 3.0 | 3.0 | 4.0 | 5.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
1678 | 1.0 | 1.000000 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
1679 | 1.0 | 3.000000 | NaN | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 |
1680 | 1.0 | 2.000000 | NaN | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 |
1681 | 1.0 | 3.000000 | NaN | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 |
1682 | 1.0 | 3.000000 | NaN | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 |
1682 rows × 8 columns
# item_id 로 그룹바이하면, 이 값이 인덱스로 표현되므로
# 판다스에서 데이터를 분석할때는, 인덱스가 사람이 해석할수 있는 데이터로
# 나타나도록 해주는게 좋다.
movies_rating_df.groupby('title')['rating'].describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
title | ||||||||
'Til There Was You (1997) | 9.0 | 2.333333 | 1.000000 | 1.0 | 2.00 | 2.0 | 3.0 | 4.0 |
1-900 (1994) | 5.0 | 2.600000 | 1.516575 | 1.0 | 1.00 | 3.0 | 4.0 | 4.0 |
101 Dalmatians (1996) | 109.0 | 2.908257 | 1.076184 | 1.0 | 2.00 | 3.0 | 4.0 | 5.0 |
12 Angry Men (1957) | 125.0 | 4.344000 | 0.719588 | 2.0 | 4.00 | 4.0 | 5.0 | 5.0 |
187 (1997) | 41.0 | 3.024390 | 1.172344 | 1.0 | 2.00 | 3.0 | 4.0 | 5.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
Young Guns II (1990) | 44.0 | 2.772727 | 1.008421 | 1.0 | 2.00 | 3.0 | 3.0 | 5.0 |
Young Poisoner's Handbook, The (1995) | 41.0 | 3.341463 | 1.237129 | 1.0 | 3.00 | 4.0 | 4.0 | 5.0 |
Zeus and Roxanne (1997) | 6.0 | 2.166667 | 0.983192 | 1.0 | 1.25 | 2.5 | 3.0 | 3.0 |
unknown | 9.0 | 3.444444 | 1.130388 | 1.0 | 3.00 | 4.0 | 4.0 | 5.0 |
Á köldum klaka (Cold Fever) (1994) | 1.0 | 3.000000 | NaN | 3.0 | 3.00 | 3.0 | 3.0 | 3.0 |
1664 rows × 8 columns
# 이것 자체가 판다스 데이터 프레임이니까, 아래와 같은것도 가능하다.
movies_rating_df.groupby('title')['rating'].describe().sort_values('mean', ascending=False)
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
title | ||||||||
They Made Me a Criminal (1939) | 1.0 | 5.0 | NaN | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 |
Marlene Dietrich: Shadow and Light (1996) | 1.0 | 5.0 | NaN | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 |
Saint of Fort Washington, The (1993) | 2.0 | 5.0 | 0.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 |
Someone Else's America (1995) | 1.0 | 5.0 | NaN | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 |
Star Kid (1997) | 3.0 | 5.0 | 0.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
Eye of Vichy, The (Oeil de Vichy, L') (1993) | 1.0 | 1.0 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
King of New York (1990) | 1.0 | 1.0 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Touki Bouki (Journey of the Hyena) (1973) | 1.0 | 1.0 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Bloody Child, The (1996) | 1.0 | 1.0 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Crude Oasis, The (1995) | 1.0 | 1.0 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
1664 rows × 8 columns
각 영화별 별점의 평균을 구하고, 이를 ratings_df_mean 에 저장하시오.¶
ratings_df_mean = movies_rating_df.groupby('title')['rating'].mean()
ratings_df_mean
title 'Til There Was You (1997) 2.333333 1-900 (1994) 2.600000 101 Dalmatians (1996) 2.908257 12 Angry Men (1957) 4.344000 187 (1997) 3.024390 ... Young Guns II (1990) 2.772727 Young Poisoner's Handbook, The (1995) 3.341463 Zeus and Roxanne (1997) 2.166667 unknown 3.444444 Á köldum klaka (Cold Fever) (1994) 3.000000 Name: rating, Length: 1664, dtype: float64
각 영화별로, 몇개의 데이터가 있는지 구하고, 이를 ratings_df_count 에 저장하시오.¶
ratings_df_count = movies_rating_df.groupby('title')['rating'].count()
ratings_df_count
title 'Til There Was You (1997) 9 1-900 (1994) 5 101 Dalmatians (1996) 109 12 Angry Men (1957) 125 187 (1997) 41 ... Young Guns II (1990) 44 Young Poisoner's Handbook, The (1995) 41 Zeus and Roxanne (1997) 6 unknown 9 Á köldum klaka (Cold Fever) (1994) 1 Name: rating, Length: 1664, dtype: int64
두 데이터프레임을 합치세요.¶
df1 = ratings_df_mean.to_frame()
df1.columns = [ 'mean' ]
df1
mean | |
---|---|
title | |
'Til There Was You (1997) | 2.333333 |
1-900 (1994) | 2.600000 |
101 Dalmatians (1996) | 2.908257 |
12 Angry Men (1957) | 4.344000 |
187 (1997) | 3.024390 |
... | ... |
Young Guns II (1990) | 2.772727 |
Young Poisoner's Handbook, The (1995) | 3.341463 |
Zeus and Roxanne (1997) | 2.166667 |
unknown | 3.444444 |
Á köldum klaka (Cold Fever) (1994) | 3.000000 |
1664 rows × 1 columns
df2 = ratings_df_count.to_frame()
df2.columns = ['count']
ratings_mean_count_df = df1.join(df2)
ratings_mean_count_df
mean | count | |
---|---|---|
title | ||
'Til There Was You (1997) | 2.333333 | 9 |
1-900 (1994) | 2.600000 | 5 |
101 Dalmatians (1996) | 2.908257 | 109 |
12 Angry Men (1957) | 4.344000 | 125 |
187 (1997) | 3.024390 | 41 |
... | ... | ... |
Young Guns II (1990) | 2.772727 | 44 |
Young Poisoner's Handbook, The (1995) | 3.341463 | 41 |
Zeus and Roxanne (1997) | 2.166667 | 6 |
unknown | 3.444444 | 9 |
Á köldum klaka (Cold Fever) (1994) | 3.000000 | 1 |
1664 rows × 2 columns
컬럼명을 확인하면, 합쳐진 컬럼들이 rating, rating 이라고 되어있습니다. 이를 count, mean 으로 컬럼명을 셋팅하세요.¶
mean 으로 히스토그램을 그려보세요.¶
ratings_mean_count_df
mean | count | |
---|---|---|
title | ||
'Til There Was You (1997) | 2.333333 | 9 |
1-900 (1994) | 2.600000 | 5 |
101 Dalmatians (1996) | 2.908257 | 109 |
12 Angry Men (1957) | 4.344000 | 125 |
187 (1997) | 3.024390 | 41 |
... | ... | ... |
Young Guns II (1990) | 2.772727 | 44 |
Young Poisoner's Handbook, The (1995) | 3.341463 | 41 |
Zeus and Roxanne (1997) | 2.166667 | 6 |
unknown | 3.444444 | 9 |
Á köldum klaka (Cold Fever) (1994) | 3.000000 | 1 |
1664 rows × 2 columns
ratings_mean_count_df['mean'].hist()
plt.show()
count 로 히스토그램을 그려보세요.¶
ratings_mean_count_df['count'].hist(bins=50)
plt.show()
(ratings_mean_count_df['count'] >= 500).sum()
4
ratings_mean_count_df.loc[ ratings_mean_count_df['count'] >= 500, ]
mean | count | |
---|---|---|
title | ||
Contact (1997) | 3.803536 | 509 |
Fargo (1996) | 4.155512 | 508 |
Return of the Jedi (1983) | 4.007890 | 507 |
Star Wars (1977) | 4.359589 | 584 |
평균점수가 5점인 영화들은 어떤 영화인지 확인하세요.¶
ratings_mean_count_df['mean'] == 5
title 'Til There Was You (1997) False 1-900 (1994) False 101 Dalmatians (1996) False 12 Angry Men (1957) False 187 (1997) False ... Young Guns II (1990) False Young Poisoner's Handbook, The (1995) False Zeus and Roxanne (1997) False unknown False Á köldum klaka (Cold Fever) (1994) False Name: mean, Length: 1664, dtype: bool
ratings_mean_count_df.loc[ ratings_mean_count_df['mean'] == 5, ]
mean | count | |
---|---|---|
title | ||
Aiqing wansui (1994) | 5.0 | 1 |
Entertaining Angels: The Dorothy Day Story (1996) | 5.0 | 1 |
Great Day in Harlem, A (1994) | 5.0 | 1 |
Marlene Dietrich: Shadow and Light (1996) | 5.0 | 1 |
Prefontaine (1997) | 5.0 | 3 |
Saint of Fort Washington, The (1993) | 5.0 | 2 |
Santa with Muscles (1996) | 5.0 | 2 |
Someone Else's America (1995) | 5.0 | 1 |
Star Kid (1997) | 5.0 | 3 |
They Made Me a Criminal (1939) | 5.0 | 1 |
count 가 가장 많은 것부터 정렬하여 100개까지만 보여주세요.¶
ratings_mean_count_df.sort_values('count', ascending = False).head(100)
mean | count | |
---|---|---|
title | ||
Star Wars (1977) | 4.359589 | 584 |
Contact (1997) | 3.803536 | 509 |
Fargo (1996) | 4.155512 | 508 |
Return of the Jedi (1983) | 4.007890 | 507 |
Liar Liar (1997) | 3.156701 | 485 |
... | ... | ... |
Aladdin (1992) | 3.812785 | 219 |
Babe (1995) | 3.995434 | 219 |
Volcano (1997) | 2.808219 | 219 |
To Kill a Mockingbird (1962) | 4.292237 | 219 |
Murder at 1600 (1997) | 3.087156 | 218 |
100 rows × 2 columns
STEP #3: 영화 하나에 대한, ITEM-BASED COLLABORATIVE FILTERING 수행!¶
# ITEM-BASED COLLABORATIVE FILTERING == 유사한 영화를 찾으라는 의미 == 상관관계의 비례관계를 찾는것
# movies_rating_df 를 가지고 아래 피봇테이블 합니다.
피봇 테이블을 하여, 콜라보레이티브 필터링 포맷으로 변경¶
movies_rating_df
item_id | title | user_id | rating | |
---|---|---|---|---|
0 | 1 | Toy Story (1995) | 308 | 4 |
1 | 1 | Toy Story (1995) | 287 | 5 |
2 | 1 | Toy Story (1995) | 148 | 4 |
3 | 1 | Toy Story (1995) | 280 | 4 |
4 | 1 | Toy Story (1995) | 66 | 3 |
... | ... | ... | ... | ... |
99998 | 1678 | Mat' i syn (1997) | 863 | 1 |
99999 | 1679 | B. Monkey (1998) | 863 | 3 |
100000 | 1680 | Sliding Doors (1998) | 863 | 2 |
100001 | 1681 | You So Crazy (1994) | 896 | 3 |
100002 | 1682 | Scream of Stone (Schrei aus Stein) (1991) | 916 | 3 |
100003 rows × 4 columns
df = movies_rating_df.pivot_table(index= 'user_id', columns= 'title', values='rating', aggfunc= 'mean')
df
title | 'Til There Was You (1997) | 1-900 (1994) | 101 Dalmatians (1996) | 12 Angry Men (1957) | 187 (1997) | 2 Days in the Valley (1996) | 20,000 Leagues Under the Sea (1954) | 2001: A Space Odyssey (1968) | 3 Ninjas: High Noon At Mega Mountain (1998) | 39 Steps, The (1935) | ... | Yankee Zulu (1994) | Year of the Horse (1997) | You So Crazy (1994) | Young Frankenstein (1974) | Young Guns (1988) | Young Guns II (1990) | Young Poisoner's Handbook, The (1995) | Zeus and Roxanne (1997) | unknown | Á köldum klaka (Cold Fever) (1994) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
user_id | |||||||||||||||||||||
0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | NaN | NaN | 2.0 | 5.0 | NaN | NaN | 3.0 | 4.0 | NaN | NaN | ... | NaN | NaN | NaN | 5.0 | 3.0 | NaN | NaN | NaN | 4.0 | NaN |
2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | NaN | NaN | NaN | NaN | 2.0 | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
939 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
940 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
941 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
942 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.0 | NaN | 3.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
943 | NaN | NaN | NaN | NaN | NaN | 2.0 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 4.0 | 3.0 | NaN | NaN | NaN | NaN |
944 rows × 1664 columns
전체 영화와, 타이타닉 영화의 상관관계 분석을 하면, 타이타닉을 본 사람들에게 상관계수가 높은 영화를 추천하면 된다. corrwith 함수를 이용한다.¶
df['Titanic (1997)'] # 하나의 컬럼과 전체 컬럼을 비교하는 함수 == corrwith
user_id 0 NaN 1 NaN 2 5.0 3 NaN 4 NaN ... 939 NaN 940 5.0 941 NaN 942 3.0 943 NaN Name: Titanic (1997), Length: 944, dtype: float64
# 타이타닉 컬럼과, df 의 전체 컬럼간의 상관계수를 뽑는 방법
corr_titanic = df.corrwith( df['Titanic (1997)'] )
/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py:2683: RuntimeWarning: Degrees of freedom <= 0 for slice c = cov(x, y, rowvar, dtype=dtype) /usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py:2542: RuntimeWarning: divide by zero encountered in true_divide c *= np.true_divide(1, fact)
corr_titanic
title 'Til There Was You (1997) -0.062017 1-900 (1994) NaN 101 Dalmatians (1996) 0.120113 12 Angry Men (1957) 0.077700 187 (1997) 0.315654 ... Young Guns II (1990) 0.317274 Young Poisoner's Handbook, The (1995) 0.356783 Zeus and Roxanne (1997) NaN unknown NaN Á köldum klaka (Cold Fever) (1994) NaN Length: 1664, dtype: float64
corr_titanic = corr_titanic.to_frame()
corr_titanic
0 | |
---|---|
title | |
'Til There Was You (1997) | -0.062017 |
1-900 (1994) | NaN |
101 Dalmatians (1996) | 0.120113 |
12 Angry Men (1957) | 0.077700 |
187 (1997) | 0.315654 |
... | ... |
Young Guns II (1990) | 0.317274 |
Young Poisoner's Handbook, The (1995) | 0.356783 |
Zeus and Roxanne (1997) | NaN |
unknown | NaN |
Á köldum klaka (Cold Fever) (1994) | NaN |
1664 rows × 1 columns
corr_titanic.columns = ['correlation']
corr_titanic = corr_titanic.join(ratings_mean_count_df['count']) # 이와같이 프레임에 시리즈도 붙는다.
# 5점 준사람이 소수이면 신뢰성에 문제가 생긴다. 때문에 리뷰 갯수도 합친다.
corr_titanic
correlation | |
---|---|
title | |
'Til There Was You (1997) | -0.062017 |
1-900 (1994) | NaN |
101 Dalmatians (1996) | 0.120113 |
12 Angry Men (1957) | 0.077700 |
187 (1997) | 0.315654 |
... | ... |
Young Guns II (1990) | 0.317274 |
Young Poisoner's Handbook, The (1995) | 0.356783 |
Zeus and Roxanne (1997) | NaN |
unknown | NaN |
Á köldum klaka (Cold Fever) (1994) | NaN |
1664 rows × 1 columns
# nan 값은 상관계수가 없는 영화이므로 삭제
corr_titanic.dropna(inplace=True)
corr_titanic
correlation | count | |
---|---|---|
title | ||
'Til There Was You (1997) | -0.062017 | 9 |
101 Dalmatians (1996) | 0.120113 | 109 |
12 Angry Men (1957) | 0.077700 | 125 |
187 (1997) | 0.315654 | 41 |
2 Days in the Valley (1996) | 0.017295 | 93 |
... | ... | ... |
Year of the Horse (1997) | 1.000000 | 7 |
Young Frankenstein (1974) | 0.107666 | 200 |
Young Guns (1988) | 0.199931 | 101 |
Young Guns II (1990) | 0.317274 | 44 |
Young Poisoner's Handbook, The (1995) | 0.356783 | 41 |
1356 rows × 2 columns
# 상관계수가 높은 영화로 정렬
corr_titanic.sort_values('correlation', ascending=False)
correlation | count | |
---|---|---|
title | ||
Nadja (1994) | 1.0 | 8 |
Pest, The (1997) | 1.0 | 8 |
Savage Nights (Nuits fauves, Les) (1992) | 1.0 | 3 |
For Ever Mozart (1996) | 1.0 | 3 |
Jerky Boys, The (1994) | 1.0 | 3 |
... | ... | ... |
Pather Panchali (1955) | -1.0 | 8 |
Angel Baby (1995) | -1.0 | 4 |
Blood Beach (1981) | -1.0 | 6 |
Two Bits (1995) | -1.0 | 5 |
Faces (1968) | -1.0 | 4 |
1356 rows × 2 columns
# 별점의 갯수가 80개 이상된 영화만 가져온다. (추천의 신뢰성을 위해)
corr_titanic.loc[ corr_titanic['count'] >= 80 , ].sort_values('correlation',ascending=False).head(7)
correlation | count | |
---|---|---|
title | ||
Titanic (1997) | 1.000000 | 350 |
River Wild, The (1994) | 0.497600 | 146 |
Abyss, The (1989) | 0.472103 | 151 |
Bram Stoker's Dracula (1992) | 0.443560 | 120 |
True Lies (1994) | 0.435104 | 208 |
William Shakespeare's Romeo and Juliet (1996) | 0.430243 | 106 |
Last of the Mohicans, The (1992) | 0.427239 | 128 |
실습. star wars 를 본 사람들에게 영화를 추천할 것입니다. 5개의 추천 영화 제목을 찾으세요.¶
힌트 : 먼저 star wars 의 정확한 이름을 검색해서 찾으세요. 그리고 나서 스타워즈를 본 유저의 데이터를 가져와서, 위와 같이 상관관계분석을 합니다.
# str.contains 를 사용하는방법도 있다.
for i in df.columns :
if 's' in i :
print(i)
'Til There Was You (1997) 101 Dalmatians (1996) 2 Days in the Valley (1996) 20,000 Leagues Under the Sea (1954) 2001: A Space Odyssey (1968) 3 Ninjas: High Noon At Mega Mountain (1998) 39 Steps, The (1935) 8 Heads in a Duffel Bag (1997) 8 Seconds (1994) Absolute Power (1997) Abyss, The (1989) Ace Ventura: When Nature Calls (1995) Across the Sea of Time (1995) Addams Family Values (1993) Adventures of Pinocchio, The (1996) Adventures of Priscilla, Queen of the Desert, The (1994) Adventures of Robin Hood, The (1938) Aiqing wansui (1994) Airheads (1994) Aladdin and the King of Thieves (1996) Alaska (1996) Alien: Resurrection (1997) Aliens (1986) All Dogs Go to Heaven 2 (1996) All Things Fair (1996) Amadeus (1984) American President, The (1995) American Strays (1996) American in Paris, An (1951) Amistad (1997) Amityville 1992: It's About Time (1992) Amityville Curse, The (1990) Amityville II: The Possession (1982) Amityville: Dollhouse (1996) Amos & Andrew (1993) Anastasia (1997) Angels and Insects (1995) Angels in the Outfield (1994) Angus (1995) Antonia's Line (1995) Apocalypse Now (1979) Apostle, The (1997) April Fool's Day (1986) Aristocats, The (1970) Army of Darkness (1993) Around the World in 80 Days (1956) Arsenic and Old Lace (1944) As Good As It Gets (1997) Assassins (1995) Assignment, The (1997) Associate, The (1996) Audrey Rose (1977) August (1996) Austin Powers: International Man of Mystery (1997) Ayn Rand: A Sense of Life (1997) Baby-Sitters Club, The (1995) Babysitter, The (1995) Bad Boys (1995) Bad Girls (1994) Bad Taste (1987) Ballad of Narayama, The (Narayama Bushiko) (1958) Bananas (1971) Basic Instinct (1992) Basketball Diaries, The (1995) Basquiat (1996) Bastard Out of Carolina (1996) Batman Returns (1992) Beans of Egypt, Maine, The (1994) Beautician and the Beast, The (1997) Beautiful Girls (1996) Beauty and the Beast (1991) Beavis and Butt-head Do America (1996) Bed of Roses (1996) Bedknobs and Broomsticks (1971) Before Sunrise (1995) Believers, The (1987) Best Men (1997) Best of the Best 3: No Turning Back (1995) Beverly Hillbillies, The (1993) Beverly Hills Cop III (1994) Beverly Hills Ninja (1997) Big Lebowski, The (1998) Billy Madison (1995) Birds, The (1963) Bliss (1997) Blood For Dracula (Andy Warhol's Dracula) (1974) Bloodsport 2 (1995) Blue Chips (1994) Blues Brothers 2000 (1998) Blues Brothers, The (1980) Bob Roberts (1992) Body Parts (1991) Body Snatchers (1993) Bogus (1996) Boogie Nights (1997) Boot, Das (1981) Boy's Life 2 (1997) Boys (1996) Boys Life (1995) Boys in Venice (1996) Boys of St. Vincent, The (1993) Boys on the Side (1995) Boys, Les (1997) Bram Stoker's Dracula (1992) Brassed Off (1996) Breakfast at Tiffany's (1961) Breaking the Waves (1996) Bride of Frankenstein (1935) Bridges of Madison County, The (1995) Broken English (1996) Brother Minister: The Assassination of Malcolm X (1994) Brother's Kiss, A (1997) Brothers McMullen, The (1995) Brothers in Trouble (1995) Browning Version, The (1994) Bullets Over Broadway (1994) Burnt Offerings (1976) Bushwhacked (1995) Butch Cassidy and the Sundance Kid (1969) Butterfly Kiss (1995) C'est arrivé près de chez vous (1992) Candyman: Farewell to the Flesh (1995) Captives (1994) Career Girls (1997) Carlito's Way (1993) Carmen Miranda: Bananas Is My Business (1994) Casablanca (1942) Casino (1995) Casper (1995) Castle Freak (1995) Cats Don't Dance (1997) Celestial Clockwork (1994) Celluloid Closet, The (1995) Chasers (1994) Chasing Amy (1997) Christmas Carol, A (1938) Chungking Express (1994) Ciao, Professore! (1993) Cinema Paradiso (1988) Circle of Friends (1995) City Slickers II: The Legend of Curly's Gold (1994) City of Angels (1998) City of Industry (1997) City of Lost Children, The (1995) Clear and Present Danger (1994) Clerks (1994) Clockers (1995) Close Shave, A (1995) Clueless (1995) Collectionneuse, La (1967) Commandments (1997) Coneheads (1993) Conspiracy Theory (1997) Contempt (Mépris, Le) (1963) Cook the Thief His Wife & Her Lover, The (1989) Cool Runnings (1993) Cops and Robbersons (1994) Cosi (1996) Crash (1996) Crimson Tide (1995) Cronos (1992) Crossfire (1947) Crossing Guard, The (1995) Crow: City of Angels, The (1996) Crows and Sparrows (1949) Crude Oasis, The (1995) Cutthroat Island (1995) D3: The Mighty Ducks (1996) Daens (1992) Damsel in Distress, A (1937) Dances with Wolves (1990) Dangerous Beauty (1998) Dangerous Ground (1997) Dangerous Minds (1995) Daniel Defoe's Robinson Crusoe (1996) Dante's Peak (1997) Days of Thunder (1990) Daytrippers, The (1996) Dazed and Confused (1993) Dead Poets Society (1989) Dead Presidents (1995) Death in Brunswick (1991) Deconstructing Harry (1997) Deep Rising (1998) Delicatessen (1991) Delta of Venus (1994) Denise Calls Up (1995) Desert Winds (1995) Designated Mourner, The (1997) Desperado (1995) Desperate Measures (1998) Destiny Turns on the Radio (1995) Devil in a Blue Dress (1995) Devil's Advocate, The (1997) Devil's Own, The (1997) Die xue shuang xiong (Killer, The) (1989) Disclosure (1994) Dolores Claiborne (1994) Donnie Brasco (1997) Doors, The (1991) Double Happiness (1994) Down Periscope (1996) Dream With the Fishes (1997) Drunks (1995) Dunston Checks In (1996) Duoluo tianshi (1995) E.T. the Extra-Terrestrial (1982) East of Eden (1955) Ed's Next Move (1996) Empire Strikes Back, The (1980) Endless Summer 2, The (1994) English Patient, The (1996) Englishman Who Went Up a Hill, But Came Down a Mountain, The (1995) Entertaining Angels: The Dorothy Day Story (1996) Eraser (1996) Escape from L.A. (1996) Escape from New York (1981) Escape to Witch Mountain (1975) Etz Hadomim Tafus (Under the Domin Tree) (1994) Eve's Bayou (1997) Even Cowgirls Get the Blues (1993) Everest (1998) Everyone Says I Love You (1996) Excess Baggage (1997) Executive Decision (1996) Extreme Measures (1996) Faces (1968) Fantasia (1940) Far From Home: The Adventures of Yellow Dog (1995) Farewell to Arms, A (1932) Farinelli: il castrato (1994) Farmer & Chase (1995) Fast, Cheap & Out of Control (1997) Faster Pussycat! Kill! Kill! (1965) Fatal Instinct (1993) Fathers' Day (1997) Faust (1994) Fausto (1993) Fearless (1993) Feast of July (1995) Feeling Minnesota (1996) Female Perversions (1996) Field of Dreams (1989) Fierce Creatures (1997) Fille seule, La (A Single Girl) (1995) Firestorm (1998) First Kid (1996) First Knight (1995) First Wives Club, The (1996) Fish Called Wanda, A (1988) Flesh and Bone (1993) Flintstones, The (1994) Flirting With Disaster (1996) Flower of My Secret, The (Flor de mi secreto, La) (1995) Fools Rush In (1997) For Whom the Bell Tolls (1943) Forbidden Christ, The (Cristo proibito, Il) (1950) Foreign Correspondent (1940) Forget Paris (1995) Forrest Gump (1994) Four Days in September (1997) Four Rooms (1995) Four Weddings and a Funeral (1994) Free Willy 3: The Rescue (1997) French Kiss (1995) French Twist (Gazon maudit) (1995) Fresh (1994) Fried Green Tomatoes (1991) Frighteners, The (1996) Frisk (1995) From Dusk Till Dawn (1996) Further Gesture, A (1996) Gaslight (1944) Get on the Bus (1996) Ghost (1990) Ghost and Mrs. Muir, The (1947) Ghost and the Darkness, The (1996) Ghost in the Shell (Kokaku kidotai) (1995) Ghosts of Mississippi (1996) Gilligan's Island: The Movie (1998) Girls Town (1996) Glass Shield, The (1994) Glengarry Glen Ross (1992) Go Fish (1994) Gold Diggers: The Secret of Bear Mountain (1995) Golden Earrings (1947) Gone Fishin' (1997) GoodFellas (1990) Grass Harp, The (1995) Grease (1978) Grease 2 (1982) Great Escape, The (1963) Great Expectations (1998) Grifters, The (1990) Grosse Fatigue (1994) Grosse Pointe Blank (1997) Guilty as Sin (1993) Hackers (1995) Halloween: The Curse of Michael Myers (1995) Hearts and Minds (1996) Heathers (1989) Heaven's Prisoners (1996) Heavenly Creatures (1994) Heavyweights (1994) Heidi Fleiss: Hollywood Madam (1995) Hellraiser: Bloodline (1996) Herbie Rides Again (1974) Hercules (1997) Here Comes Cookie (1935) His Girl Friday (1940) Home for the Holidays (1995) Homeward Bound II: Lost in San Francisco (1996) Hoop Dreams (1994) Horse Whisperer, The (1998) Horseman on the Roof, The (Hussard sur le toit, Le) (1995) Hostile Intentions (1994) Hot Shots! Part Deux (1993) House Arrest (1996) House Party 3 (1994) House of Yes, The (1997) House of the Spirits, The (1993) Houseguest (1994) Hudsucker Proxy, The (1994) Hurricane Streets (1998) Hush (1998) I Can't Sleep (J'ai pas sommeil) (1994) I Don't Want to Talk About It (De eso no se habla) (1993) I Know What You Did Last Summer (1997) I, Worst of All (Yo, la peor de todas) (1990) Ill Gotten Gains (1997) In the Mouth of Madness (1995) In the Realm of the Senses (Ai no corrida) (1976) Indiana Jones and the Last Crusade (1989) Innocents, The (1961) Inspector General, The (1949) Intimate Relations (1996) Inventing the Abbotts (1997) Invitation, The (Zaproszenie) (1986) Island of Dr. Moreau, The (1996) It Takes Two (1995) It's My Party (1995) It's a Wonderful Life (1946) Jackie Chan's First Strike (1996) James and the Giant Peach (1996) Jason's Lyric (1994) Jaws (1975) Jaws 2 (1978) Jaws 3-D (1983) Jefferson in Paris (1995) Jerky Boys, The (1994) Joe's Apartment (1996) Johnny 100 Pesos (1993) Johns (1996) Journey of August King, The (1995) Jupiter's Wife (1994) Jurassic Park (1993) Just Cause (1995) Kansas City (1996) Kaspar Hauser (1993) Keys to Tulsa (1997) Kid in King Arthur's Court, A (1995) Kids (1995) Kids in the Hall: Brain Candy (1996) Killing Fields, The (1984) Kiss Me, Guido (1997) Kiss of Death (1995) Kiss the Girls (1997) Kissed (1996) Koyaanisqatsi (1983) Lady of Burlesque (1943) Lashou shentan (1992) Lassie (1994) Last Action Hero (1993) Last Dance (1996) Last Klezmer: Leopold Kozlowski, His Life and Music, The (1995) Last Man Standing (1996) Last Summer in the Hamptons (1995) Last Supper, The (1995) Last Time I Committed Suicide, The (1997) Last Time I Saw Paris, The (1954) Last of the Mohicans, The (1992) Late Bloomers (1996) Lawnmower Man 2: Beyond Cyberspace (1996) Leaving Las Vegas (1995) Legends of the Fall (1994) Life Less Ordinary, A (1997) Line King: Al Hirschfeld, The (1996) Little Odessa (1994) Little Princess, A (1995) Little Princess, The (1939) Little Rascals, The (1994) Live Nude Girls (1995) Loch Ness (1995) Locusts, The (1997) Long Kiss Goodnight, The (1996) Lord of Illusions (1995) Losing Chase (1996) Losing Isaiah (1995) Lost Highway (1997) Lost Horizon (1937) Lost World: Jurassic Park, The (1997) Lost in Space (1998) Love & Human Remains (1993) Love Is All There Is (1996) Love Jones (1997) Love and Death on Long Island (1997) Love and Other Catastrophes (1996) Love! Valour! Compassion! (1997) Lover's Knot (1996) Ma vie en rose (My Life in Pink) (1997) Madness of King George, The (1994) Mallrats (1995) Maltese Falcon, The (1941) Man in the Iron Mask, The (1998) Man of the House (1995) Manhattan Murder Mystery (1993) Manon of the Spring (Manon des sources) (1986) Margaret's Museum (1995) Mars Attacks! (1996) Marvin's Room (1996) Mary Poppins (1964) Mary Shelley's Frankenstein (1994) Mask, The (1994) Mat' i syn (1997) Maximum Risk (1996) Maya Lin: A Strong Clear Vision (1994) McHale's Navy (1997) Meet Me in St. Louis (1944) Meet Wally Sparks (1997) Men With Guns (1997) Men of Means (1998) Mercury Rising (1998) Metisse (Café au Lait) (1993) Miami Rhapsody (1995) Michael Collins (1996) Microcosmos: Le peuple de l'herbe (1996) Midnight Dancers (Sibak) (1994) Mighty Morphin Power Rangers: The Movie (1995) Miller's Crossing (1990) Mirror Has Two Faces, The (1996) Mission: Impossible (1996) Misérables, Les (1995) Mixed Nuts (1994) Moll Flanders (1996) Money Talks (1997) Monty Python's Life of Brian (1979) Mostro, Il (1994) Mouse Hunt (1997) Mr. Holland's Opus (1995) Mr. Jones (1993) Mr. Smith Goes to Washington (1939) Mrs. Brown (Her Majesty, Mrs. Brown) (1997) Mrs. Dalloway (1997) Mrs. Doubtfire (1993) Mrs. Parker and the Vicious Circle (1994) Mrs. Winterbourne (1996) Mulholland Falls (1996) Muppet Treasure Island (1996) Murder in the First (1995) Muriel's Wedding (1994) Mute Witness (1994) My Best Friend's Wedding (1997) My Favorite Season (1993) My Fellow Americans (1996) My Life and Times With Antonin Artaud (En compagnie d'Antonin Artaud) (1993) My Life as a Dog (Mitt liv som hund) (1985) Mystery Science Theater 3000: The Movie (1996) Naked Gun 33 1/3: The Final Insult (1994) National Lampoon's Senior Trip (1995) Natural Born Killers (1994) Nelly & Monsieur Arnaud (1995) Nemesis 2: Nebula (1995) New Jersey Drive (1995) Newton Boys, The (1998) Night Falls on Manhattan (1997) Nightmare Before Christmas, The (1993) Nina Takes a Lover (1994) Nine Months (1995) No Escape (1994) Nobody Loves Me (Keiner liebt mich) (1994) Nobody's Fool (1994) North by Northwest (1959) Nosferatu (Nosferatu, eine Symphonie des Grauens) (1922) Nosferatu a Venezia (1986) Nothing Personal (1995) Nothing to Lose (1994) Notorious (1946) Nutty Professor, The (1996) Of Love and Shadows (1994) Old Lady Who Walked in the Sea, The (Vieille qui marchait dans la mer, La) (1991) Once Upon a Time in the West (1969) Once Were Warriors (1994) One Flew Over the Cuckoo's Nest (1975) Open Season (1996) Original Gangstas (1996) Oscar & Lucinda (1997) Other Voices, Other Rooms (1997) Pagemaster, The (1994) Paradise Lost: The Child Murders at Robin Hood Hills (1996) Paradise Road (1997) Paris Is Burning (1990) Paris Was a Woman (1995) Paris, France (1993) Paris, Texas (1984) Passion Fish (1992) Paths of Glory (1957) People vs. Larry Flynt, The (1996) Persuasion (1995) Pest, The (1997) Pete's Dragon (1977) Phantoms (1998) Pharaoh's Army (1995) Pocahontas (1995) Poetic Justice (1993) Poison Ivy II (1995) Pompatus of Love, The (1996) Postino, Il (1994) Postman, The (1997) Preacher's Wife, The (1996) Price Above Rubies, A (1998) Priest (1994) Primary Colors (1998) Princess Bride, The (1987) Princess Caraboo (1994) Prisoner of the Mountains (Kavkazsky Plennik) (1996) Private Parts (1997) Professional, The (1994) Promesse, La (1996) Promise, The (Versprechen, Das) (1994) Psycho (1960) Pushing Hands (1992) Pyromaniac's Love Story, A (1995) Quest, The (1996) Radioland Murders (1994) Raiders of the Lost Ark (1981) Raise the Red Lantern (1991) Raising Arizona (1987) Ransom (1996) Real Genius (1985) Reality Bites (1994) Rebel Without a Cause (1955) Reckless (1995) Red Rock West (1992) Remains of the Day, The (1993) Renaissance Man (1994) Rendezvous in Paris (Rendez-vous de Paris, Les) (1995) Replacement Killers, The (1998) Reservoir Dogs (1992) Restoration (1995) Rhyme & Reason (1997) Rich Man's Wife, The (1996) Rising Sun (1993) Robert A. Heinlein's The Puppet Masters (1994) Robin Hood: Men in Tights (1993) Robin Hood: Prince of Thieves (1991) Romeo Is Bleeding (1993) Romy and Michele's High School Reunion (1997) Roommates (1995) Roseanna's Grave (For Roseanna) (1997) Rosencrantz and Guildenstern Are Dead (1990) Rosewood (1997) Ruby in Paradise (1993) Ruling Class, The (1972) Safe Passage (1994) Saint of Fort Washington, The (1993) Salut cousin! (1996) Santa Clause, The (1994) Santa with Muscles (1996) Savage Nights (Nuits fauves, Les) (1992) Schindler's List (1993) Schizopolis (1996) Scream of Stone (Schrei aus Stein) (1991) Screamers (1995) Searching for Bobby Fischer (1993) Secret Adventures of Tom Thumb, The (1993) Secret of Roan Inish, The (1994) Secrets & Lies (1996) Sense and Sensibility (1995) Senseless (1998) Seven Years in Tibet (1997) Seventh Seal, The (Sjunde inseglet, Det) (1957) Sex, Lies, and Videotape (1989) Sexual Life of the Belgians, The (1994) Shadow Conspiracy (1997) Shadow of Angels (Schatten der Engel) (1976) Shadowlands (1993) Shadows (Cienie) (1988) Shawshank Redemption, The (1994) She's So Lovely (1997) She's the One (1996) Shooting Fish (1997) Short Cuts (1993) Showgirls (1995) Silence of the Lambs, The (1991) Silence of the Palace, The (Saimt el Qusur) (1994) Simple Twist of Fate, A (1994) Simple Wish, A (1997) Sirens (1994) Six Degrees of Separation (1993) Sleepers (1996) Sleepless in Seattle (1993) Sliding Doors (1998) Slingshot, The (1993) Small Faces (1995) Smile Like Yours, A (1997) Smilla's Sense of Snow (1997) Sneakers (1992) Snow White and the Seven Dwarfs (1937) Some Folks Call It a Sling Blade (1993) Some Mother's Son (1996) Someone Else's America (1995) Sophie's Choice (1982) Sound of Music, The (1965) Spanish Prisoner, The (1997) Specialist, The (1994) Species (1995) Speechless (1994) Speed 2: Cruise Control (1997) Spirits of the Dead (Tre passi nel delirio) (1968) Star Maker, The (Uomo delle stelle, L') (1995) Star Maps (1997) Star Trek VI: The Undiscovered Country (1991) Star Trek: First Contact (1996) Star Trek: Generations (1994) Star Wars (1977) Stars Fell on Henrietta, The (1995) Starship Troopers (1997) Stefano Quantestorie (1993) Stephen King's The Langoliers (1995) Strange Days (1995) Stranger in the House (1997) Strawberry and Chocolate (Fresa y chocolate) (1993) Streetcar Named Desire, A (1951) Striking Distance (1993) Stripes (1981) Striptease (1996) Stuart Saves His Family (1995) Stupids, The (1996) Substance of Fire, The (1996) Substitute, The (1996) Sum of Us, The (1994) Sunchaser, The (1996) Sunset Blvd. (1950) Sunset Park (1996) Super Mario Bros. (1993) Surviving Picasso (1996) Swan Princess, The (1994) Swimming with Sharks (1995) Swingers (1996) Swiss Family Robinson (1960) Switchblade Sisters (1975) Symphonie pastorale, La (1946) Tales From the Crypt Presents: Demon Knight (1995) Tales from the Crypt Presents: Bordello of Blood (1996) Tales from the Hood (1995) Tango Lesson, The (1997) Telling Lies in America (1997) Temptress Moon (Feng Yue) (1996) Terror in a Texas Town (1958) Tetsuo II: Body Hammer (1992) Thieves (Voleurs, Les) (1996) Things to Do in Denver when You're Dead (1995) Thirty-Two Short Films About Glenn Gould (1993) This Is Spinal Tap (1984) Thousand Acres, A (1997) Three Caballeros, The (1945) Three Colors: Blue (1993) Three Colors: Red (1994) Three Colors: White (1994) Three Lives and Only One Death (1996) Three Musketeers, The (1993) Three Wishes (1995) Threesome (1994) Tie That Binds, The (1995) Tigrero: A Film That Was Never Made (1994) Time Tracers (1995) To Cross the Rubicon (1991) To Wong Foo, Thanks for Everything! Julie Newmar (1995) Tokyo Fist (1995) Tombstone (1993) Tomorrow Never Dies (1997) Total Eclipse (1995) Trainspotting (1996) Transformers: The Movie, The (1986) Treasure of the Sierra Madre, The (1948) Trees Lounge (1996) True Lies (1994) Trust (1990) Truth About Cats & Dogs, The (1996) Truth or Consequences, N.M. (1997) Turbo: A Power Rangers Movie (1997) Twelve Monkeys (1995) Twisted (1996) Twister (1996) Two Bits (1995) Two Deaths (1995) Two Friends (1986) Two or Three Things I Know About Her (1966) U.S. Marshalls (1998) Ulee's Gold (1997) Umbrellas of Cherbourg, The (Parapluies de Cherbourg, Les) (1964) Unbearable Lightness of Being, The (1988) Unhook the Stars (1996) Unstrung Heroes (1995) Until the End of the World (Bis ans Ende der Welt) (1991) Up Close and Personal (1996) Usual Suspects, The (1995) Vegas Vacation (1997) Vermont Is For Lovers (1992) Vie est belle, La (Life is Rosey) (1987) Virtuosity (1995) Visitors, The (Visiteurs, Les) (1993) Walk in the Clouds, A (1995) Wallace & Gromit: The Best of Aardman Animation (1996) Warriors of Virtue (1997) Washington Square (1997) Wedding Bell Blues (1996) Weekend at Bernie's (1989) Welcome to the Dollhouse (1995) Wend Kuuni (God's Gift) (1982) Wes Craven's New Nightmare (1994) What Happened Was... (1994) What's Eating Gilbert Grape (1993) What's Love Got to Do with It (1993) When Night Is Falling (1995) When We Were Kings (1996) When a Man Loves a Woman (1994) When the Cats Away (Chacun cherche son chat) (1996) White Man's Burden (1995) Widows' Peak (1994) Wild Reeds (1994) Wild Things (1998) William Shakespeare's Romeo and Juliet (1996) Window to Paris (1994) Wings of Courage (1995) Wings of Desire (1987) Wings of the Dove, The (1997) Winnie the Pooh and the Blustery Day (1968) Winter Guest, The (1997) Wishmaster (1997) With Honors (1994) Witness (1985) Woman in Question, The (1950) Wonderful, Horrible Life of Leni Riefenstahl, The (1993) Wooden Man's Bride, The (Wu Kui) (1994) World of Apu, The (Apur Sansar) (1959) Wrong Trousers, The (1993) Year of the Horse (1997) Young Frankenstein (1974) Young Guns (1988) Young Guns II (1990) Young Poisoner's Handbook, The (1995) Zeus and Roxanne (1997)
corr_starwars = df.corrwith(df['Star Wars (1977)'])
/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py:2683: RuntimeWarning: Degrees of freedom <= 0 for slice c = cov(x, y, rowvar, dtype=dtype) /usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py:2542: RuntimeWarning: divide by zero encountered in true_divide c *= np.true_divide(1, fact)
corr_starwars = corr_starwars.to_frame()
ratings_mean_count_df.head(5)
mean | count | |
---|---|---|
title | ||
'Til There Was You (1997) | 2.333333 | 9 |
1-900 (1994) | 2.600000 | 5 |
101 Dalmatians (1996) | 2.908257 | 109 |
12 Angry Men (1957) | 4.344000 | 125 |
187 (1997) | 3.024390 | 41 |
corr_starwars.columns = ['correlation']
corr_starwars = corr_starwars.join( ratings_mean_count_df['count'])
corr_starwars.loc[corr_starwars['count']>= 80 , ].sort_values('correlation',ascending=False).head(7)
correlation | count | |
---|---|---|
title | ||
Star Wars (1977) | 1.000000 | 584 |
Empire Strikes Back, The (1980) | 0.748353 | 368 |
Return of the Jedi (1983) | 0.672556 | 507 |
Raiders of the Lost Ark (1981) | 0.536117 | 420 |
Austin Powers: International Man of Mystery (1997) | 0.377433 | 130 |
Sting, The (1973) | 0.367538 | 241 |
Indiana Jones and the Last Crusade (1989) | 0.350107 | 331 |
# 여기까진 학습을 위한것
STEP#4: 전체 데이터셋에 대한 ITEM-BASED COLLABORATIVE FILTER 를 만들자!¶
# 실제로 쓰는방법
df.shape
(944, 1664)
df
title | 'Til There Was You (1997) | 1-900 (1994) | 101 Dalmatians (1996) | 12 Angry Men (1957) | 187 (1997) | 2 Days in the Valley (1996) | 20,000 Leagues Under the Sea (1954) | 2001: A Space Odyssey (1968) | 3 Ninjas: High Noon At Mega Mountain (1998) | 39 Steps, The (1935) | ... | Yankee Zulu (1994) | Year of the Horse (1997) | You So Crazy (1994) | Young Frankenstein (1974) | Young Guns (1988) | Young Guns II (1990) | Young Poisoner's Handbook, The (1995) | Zeus and Roxanne (1997) | unknown | Á köldum klaka (Cold Fever) (1994) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
user_id | |||||||||||||||||||||
0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | NaN | NaN | 2.0 | 5.0 | NaN | NaN | 3.0 | 4.0 | NaN | NaN | ... | NaN | NaN | NaN | 5.0 | 3.0 | NaN | NaN | NaN | 4.0 | NaN |
2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | NaN | NaN | NaN | NaN | 2.0 | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
939 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
940 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
941 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
942 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.0 | NaN | 3.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
943 | NaN | NaN | NaN | NaN | NaN | 2.0 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 4.0 | 3.0 | NaN | NaN | NaN | NaN |
944 rows × 1664 columns
# 모든 영화의 상관계수를 뽑는다.
# 단, 각 영화는, 적어도 80명 이상이 별점을 준 영화만 상관계수를 뽑는다.
corr_movie = df.corr( min_periods= 80 )
corr_movie
title | 'Til There Was You (1997) | 1-900 (1994) | 101 Dalmatians (1996) | 12 Angry Men (1957) | 187 (1997) | 2 Days in the Valley (1996) | 20,000 Leagues Under the Sea (1954) | 2001: A Space Odyssey (1968) | 3 Ninjas: High Noon At Mega Mountain (1998) | 39 Steps, The (1935) | ... | Yankee Zulu (1994) | Year of the Horse (1997) | You So Crazy (1994) | Young Frankenstein (1974) | Young Guns (1988) | Young Guns II (1990) | Young Poisoner's Handbook, The (1995) | Zeus and Roxanne (1997) | unknown | Á köldum klaka (Cold Fever) (1994) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
title | |||||||||||||||||||||
'Til There Was You (1997) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1-900 (1994) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
101 Dalmatians (1996) | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
12 Angry Men (1957) | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 0.178848 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
187 (1997) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Young Guns II (1990) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Young Poisoner's Handbook, The (1995) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Zeus and Roxanne (1997) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
unknown | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Á köldum klaka (Cold Fever) (1994) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1664 rows × 1664 columns
나의 별점 정보를 가지고, 영화를 추천해 달라고 할 것이다!¶
내 영화 별점 정보는 My_Ratings.csv 파일에 있다.
myRatings = pd.read_csv('My_Ratings.csv')
myRatings
Movie Name | Ratings | |
---|---|---|
0 | 101 Dalmatians (1996) | 5 |
1 | 2001: A Space Odyssey (1968) | 4 |
2 | You So Crazy (1994) | 1 |
myRatings['Movie Name'][0]
'101 Dalmatians (1996)'
movie_title = myRatings['Movie Name'][0]
corr_movie[movie_title] # 인덱스 액세스
title 'Til There Was You (1997) NaN 1-900 (1994) NaN 101 Dalmatians (1996) 1.0 12 Angry Men (1957) NaN 187 (1997) NaN ... Young Guns II (1990) NaN Young Poisoner's Handbook, The (1995) NaN Zeus and Roxanne (1997) NaN unknown NaN Á köldum klaka (Cold Fever) (1994) NaN Name: 101 Dalmatians (1996), Length: 1664, dtype: float64
recom_movies = corr_movie[movie_title].dropna().sort_values(ascending=False).to_frame() # 가중치 계산을 위해 프레임으로 만든다.
recom_movies.columns = ['correlation']
recom_movies['correlation'] * myRatings['Ratings'][0]
title 101 Dalmatians (1996) 5.000000 Independence Day (ID4) (1996) 1.555910 Twister (1996) 1.441875 Toy Story (1995) 1.160591 Star Wars (1977) 1.055661 Mission: Impossible (1996) 0.998220 Return of the Jedi (1983) 0.828296 Willy Wonka and the Chocolate Factory (1971) 0.526306 Name: correlation, dtype: float64
recom_movies['weight'] = recom_movies['correlation'] * myRatings['Ratings'][0]
위의 추천영화 작업을 자동화 하기 위한 파이프라인을 만드시오.¶
힌트 : 반복문을 사용하여 비슷한영화에 대한 데이터프레임을 만들고, 이를 아래 빈 데이터프레임에 계속하여 추가하시오. 반복문이 끝나면, 아래 데이터프레임을 wegiht 컬럼으로 정렬하면 됩니다.
myRatings.shape[0]
3
# 1. 내가 본 영화의 이름을 가져온다.
# 내가 본 영화는 여러개 일 수 있기 때문에, 반복문을 사용한다.
similar_movies_list = pd.DataFrame()
for i in range( myRatings.shape[0] ) :
movie_title = myRatings['Movie Name'][i]
recom_movies = corr_movie[movie_title].dropna().sort_values(ascending=False).to_frame()
recom_movies.columns = ['correlation']
recom_movies['weight'] = recom_movies['correlation'] * myRatings['Ratings'][i]
similar_movies_list = similar_movies_list.append( recom_movies ) # 데이터 프레임에 어펜드도 된다.
similar_movies_list
correlation | weight | |
---|---|---|
title | ||
101 Dalmatians (1996) | 1.000000 | 5.000000 |
Independence Day (ID4) (1996) | 0.311182 | 1.555910 |
Twister (1996) | 0.288375 | 1.441875 |
Toy Story (1995) | 0.232118 | 1.160591 |
Star Wars (1977) | 0.211132 | 1.055661 |
... | ... | ... |
Firm, The (1993) | -0.167599 | -0.670396 |
Fried Green Tomatoes (1991) | -0.170559 | -0.682234 |
Beauty and the Beast (1991) | -0.171573 | -0.686290 |
Last of the Mohicans, The (1992) | -0.186544 | -0.746177 |
Air Force One (1997) | -0.282994 | -1.131976 |
207 rows × 2 columns
# 1. weight 로 정렬한다. (내가 준 별점이 반영된 컬럼이 바로 weight 니까)
similar_movies_list = similar_movies_list.sort_values('weight', ascending = False)
similar_movies_list
correlation | weight | |
---|---|---|
title | ||
101 Dalmatians (1996) | 1.000000 | 5.000000 |
2001: A Space Odyssey (1968) | 1.000000 | 4.000000 |
Being There (1979) | 0.425009 | 1.700037 |
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963) | 0.392916 | 1.571663 |
Independence Day (ID4) (1996) | 0.311182 | 1.555910 |
... | ... | ... |
Firm, The (1993) | -0.167599 | -0.670396 |
Fried Green Tomatoes (1991) | -0.170559 | -0.682234 |
Beauty and the Beast (1991) | -0.171573 | -0.686290 |
Last of the Mohicans, The (1992) | -0.186544 | -0.746177 |
Air Force One (1997) | -0.282994 | -1.131976 |
207 rows × 2 columns
# 2. 내가 본 영화를 추천할 필요는 없으니 제거한다.
drop_index_list = myRatings['Movie Name'].to_list() # 시리즈를 리스트로 변경
for name in drop_index_list :
if name in similar_movies_list.index :
similar_movies_list.drop(name, axis = 0, inplace=True ) # 행으로 삭제
similar_movies_list.index
Index(['Being There (1979)', 'Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)', 'Independence Day (ID4) (1996)', 'Clockwork Orange, A (1971)', 'Citizen Kane (1941)', 'Twister (1996)', 'Reservoir Dogs (1992)', 'Lawrence of Arabia (1962)', 'Chinatown (1974)', 'Apocalypse Now (1979)', ... 'Sound of Music, The (1965)', 'Crow, The (1994)', 'Jerry Maguire (1996)', 'Maverick (1994)', 'Phenomenon (1996)', 'Firm, The (1993)', 'Fried Green Tomatoes (1991)', 'Beauty and the Beast (1991)', 'Last of the Mohicans, The (1992)', 'Air Force One (1997)'], dtype='object', name='title', length=205)
# 마지막! 내가본영화는 이제 없으므로, 영화를 추천해주면 된다.
similar_movies_list
correlation | weight | |
---|---|---|
title | ||
Being There (1979) | 0.425009 | 1.700037 |
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963) | 0.392916 | 1.571663 |
Independence Day (ID4) (1996) | 0.311182 | 1.555910 |
Clockwork Orange, A (1971) | 0.388071 | 1.552285 |
Citizen Kane (1941) | 0.370413 | 1.481653 |
... | ... | ... |
Firm, The (1993) | -0.167599 | -0.670396 |
Fried Green Tomatoes (1991) | -0.170559 | -0.682234 |
Beauty and the Beast (1991) | -0.171573 | -0.686290 |
Last of the Mohicans, The (1992) | -0.186544 | -0.746177 |
Air Force One (1997) | -0.282994 | -1.131976 |
205 rows × 2 columns
# 3. 추천영화가 중복되는 경우도 발생한다.
# 따라서, 중복된 영화가 있을경우는, 웨이트가 가장 높은 값으로만
# 추천해준다.
# 즉, 영화이름별로, 웨이트가 가장 높은 데이터를 가져와서,
# 웨이트로 정렬한다.
# 중복데이터 있는지 먼저 확인
similar_movies_list.reset_index()['title'].value_counts() # 인덱스로는 확인이 어렵기 때문에 컬럼으로 만든것.
Toy Story (1995) 2 Independence Day (ID4) (1996) 2 Twister (1996) 2 Mission: Impossible (1996) 2 Willy Wonka and the Chocolate Factory (1971) 2 .. Cool Hand Luke (1967) 1 Professional, The (1994) 1 Cape Fear (1991) 1 Ed Wood (1994) 1 Air Force One (1997) 1 Name: title, Length: 198, dtype: int64
# 제거
similar_movies_list.groupby('title')['weight'].max().sort_values(ascending=False) # 인덱스도 그룹바이로 묶을수 있다. max로 중복데이터중 높은값을 가져오는것.
title Being There (1979) 1.700037 Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963) 1.571663 Independence Day (ID4) (1996) 1.555910 Clockwork Orange, A (1971) 1.552285 Citizen Kane (1941) 1.481653 ... Firm, The (1993) -0.670396 Fried Green Tomatoes (1991) -0.682234 Beauty and the Beast (1991) -0.686290 Last of the Mohicans, The (1992) -0.746177 Air Force One (1997) -1.131976 Name: weight, Length: 198, dtype: float64
# 이정보를 클라로 보내주면 된다.
'DataScience > Python' 카테고리의 다른 글
파이썬 원본 폴더에 이미지 파일 여러개를 일정비율로 나눠서, 랜덤으로 파일의 순서를 바꾼다음, 새로운폴더를 생성하여 넣는 방법 (0) | 2022.12.30 |
---|---|
이미지 url을 가져올때 없는 이미지인지 체크하는 방법 (0) | 2022.12.23 |
파이썬 차트 라이브러리 추천 plotly (커스터 마이징이 좋다) (0) | 2022.12.23 |
두 개의 데이터 프레임을 서브 플롯으로 배치해보자 (0) | 2022.12.22 |
Streamlit 이미지 크기를 조절하기 (0) | 2022.12.22 |