09.a 강의 pandas

PANDAS OPERATIONS¶

In [2]:

import pandas as pd

In [3]:

df = pd.DataFrame({'Employee ID':[111, 222, 333, 444],
                   'Employee Name':['Chanel', 'Steve', 'Mitch', 'Bird'],
                   'Salary [$/h]':[35, 29, 38, 20],
                   'Years of Experience':[3, 4 ,9, 1]})
df

Out[3]:

	Employee ID	Employee Name	Salary [$/h]	Years of Experience
0	111	Chanel	35	3
1	222	Steve	29	4
2	333	Mitch	38	9
3	444	Bird	20	1

In [ ]:

# 직원 이름이 몇글자인지, 이름 글자수를 구해서
# 새로운 컬럼 length 라는 컬럼에 저장하세요

In [ ]:

# 새로운 데이터를 만든다 == 새로운 컬럼을만든다

In [4]:

name = "Chanel"

In [5]:

len(name)

Out[5]:

In [ ]:

# 글자수를 구하는 함수는 이미 있다.
# len 함수 
# 우리가 하고싶은 작업은, 데이터 프레임에 들어있는 데이터를 한꺼번에
# len 함수에 적용하고 싶은것

In [6]:

df

Out[6]:

	Employee ID	Employee Name	Salary [$/h]	Years of Experience
0	111	Chanel	35	3
1	222	Steve	29	4
2	333	Mitch	38	9
3	444	Bird	20	1

In [7]:

df["Employee Name"]

Out[7]:

0    Chanel
1     Steve
2     Mitch
3      Bird
Name: Employee Name, dtype: object

In [8]:

# apply 함수 안에, 내가 적용하고 싶은 함수의 이름만 써준다. ()는 생략
df["Employee Name"].apply(len)

Out[8]:

0    6
1    5
2    5
3    4
Name: Employee Name, dtype: int64

In [13]:

df["length"] = df["Employee Name"].apply(len)

In [14]:

df

Out[14]:

	Employee ID	Employee Name	Salary [$/h]	Years of Experience	length
0	111	Chanel	35	3	6
1	222	Steve	29	4	5
2	333	Mitch	38	9	5
3	444	Bird	20	1	4

In [ ]:

# Employee Name 의 이름의 문자 갯수를 구해서, 
# 새로운 컬럼 length2 라는 컬럼에 저장하세요

In [16]:

# 판다스의 str 라이브러리를 이용한 방법
df["Employee Name"].str.len()

Out[16]:

0    6
1    5
2    5
3    4
Name: Employee Name, dtype: int64

In [ ]:

In [21]:

# Employee Name 의 이름을 모두 대문자로 바꿔서
# 새로운 컬럼 upper_name 이라는 컬럼으로 저장해 주세요

In [ ]:

# 1. apply 함수를 이용하는 방법

In [25]:

df["Employee Name"]

Out[25]:

0    Chanel
1     Steve
2     Mitch
3      Bird
Name: Employee Name, dtype: object

In [26]:

df["Employee Name"].apply(upper)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [26], line 1
----> 1 df["Employee Name"].apply(upper)

NameError: name 'upper' is not defined

In [ ]:

# 2. 판다스의 str 라이브러리를 이용하는 방법

In [24]:

df["Employee Name"].str.upper()

Out[24]:

0    CHANEL
1     STEVE
2     MITCH
3      BIRD
Name: Employee Name, dtype: object

In [27]:

df

Out[27]:

	Employee ID	Employee Name	Salary [$/h]	Years of Experience	length
0	111	Chanel	35	3	6
1	222	Steve	29	4	5
2	333	Mitch	38	9	5
3	444	Bird	20	1	4

In [ ]:

# 시급이 30 이상이면, A 라고하고,
# 그렇지 않으면 B 라고 구분해서
# 처리해달라.

In [ ]:

# 1. 함수를 만든다. 시급 정보가 입력되면, A인지 B인지를 리턴하는 함수

In [31]:

def grouping(salary):
    if salary >= 30 :
        return "A"
    else :
        return "B"

In [34]:

grouping(40)

# A

Out[34]:

'A'

In [ ]:

df["Salary [$/h]"] >30

In [ ]:

# 2. 위에서 만든 함수를, 데이터 프레임에 Salary [$/h] 컬럼에 저장된
# 데이터들에 모두 적용해야 한다.

In [37]:

df['Salary [$/h]']

Out[37]:

0    35
1    29
2    38
3    20
Name: Salary [$/h], dtype: int64

In [36]:

df['Salary [$/h]'].apply(grouping)

Out[36]:

0    A
1    B
2    A
3    B
Name: Salary [$/h], dtype: object

In [38]:

df["Group"] = df['Salary [$/h]'].apply(grouping)

In [39]:

df

Out[39]:

	Employee ID	Employee Name	Salary [$/h]	Years of Experience	length	Group
0	111	Chanel	35	3	6	A
1	222	Steve	29	4	5	B
2	333	Mitch	38	9	5	A
3	444	Bird	20	1	4	B

In [ ]:

# 결론 : apply()가 굉장히 유용한 이유는 사용자정의 함수를 이용할 수 있기때문이다.

'DataScience > Pandas' 카테고리의 다른 글

Pandas concat(), merge() 여러 데이터 프레임을 하나로 합치는 방법 (0)	2022.11.25
Pandas 데이터프레임 오름차순, 내림차순 정렬 .Sort_values() ,sort_index() (0)	2022.11.25
Pandas 카테고리컬, groupby(), 특정 데이터 가져오기 (0)	2022.11.24
Pandas NaN을 처리하는 전략 dropna(), fillna() (0)	2022.11.24
Pandas CSV파일불러오기, .describe() 통계, .info()정보 (0)	2022.11.24

Gemini & Ocean

Pandas 사용자 정의 함수사용 .apply(), 판다스내장.str라이브러리

PANDAS OPERATIONS¶

'DataScience > Pandas' 카테고리의 다른 글

티스토리툴바

Pandas 사용자 정의 함수사용 .apply(), 판다스내장.str라이브러리

PANDAS OPERATIONS¶

'DataScience > Pandas' 카테고리의 다른 글

관련글

티스토리툴바