[python] merge, join, concatenate pandas

import pandas as pd

In [3]:

#df1 생성하기
df1 = pd.read_csv(f'/Users/grace/Desktop/Alex/LOTR.csv')
df1

Out[3]:

FellowshipIDFirstNameSkills0123

1001	Frodo	Hiding
1002	Samwise	Gardening
1003	Gandalf	Spells
1004	Pippin	Fireworks

#df2 생성하기
df2 = pd.read_csv(r"/Users/grace/Desktop/Alex/LOTR 2.csv")
df2

Out[5]:

FellowshipIDFirstNameAge01234

1001	Frodo	50
1002	Samwise	39
1006	Legolas	2931
1007	Elrond	6520
1008	Barromir	51

1. merge 는 가장 많이 쓰이는 함수로 같은 것들로만 해줌

- 겹치는 컬럼을 명시해주지 않으면 Merge 된 표에서 _x랑 _y 로 나뉘어서 중복되어 보내줌

#merge 함수 써서 how랑 on 사용해서, Fellowship ID랑 firstname 조인시키기 
df3 = pd.merge(df1, df2, on = 'FellowshipID', how = "inner")
df3

df1.merge(df2, on = ['FellowshipID','FirstName'], how= 'inner')

2. 왼 오른 모든 데이터를 보여주지만 겹치는 곳에 값이 없다면 nan 으로 나옴

#df2를 merge outer 키워드를 사용해서 merge
df1.merge(df2, how='outer')

Out[25]:

FellowshipIDFirstNameSkillsAge0123456

1001	Frodo	Hiding	50.0
1002	Samwise	Gardening	39.0
1003	Gandalf	Spells	NaN
1004	Pippin	Fireworks	NaN
1006	Legolas	NaN	2931.0
1007	Elrond	NaN	6520.0
1008	Barromir	NaN	51.0

3. Left 의 경우, 왼쪽 데이터 + 교집합

df1.merge(df2, how='left')

4. right 의 경우, 오른쪽 데이터 + 교집합

df1.merge(df2, how='right')

Out[29]:

5. Cross: 왼쪽의 1번 데이터와 오른쪽 모든 데이터를 하나씩 비교, 순서대로 차근차근 비교해줌

#df1과 df2 모두 cross 시킴
df1.merge(df2, how='cross')

6. df1.join(df2, on = 'FellowshipID') 이거만 넣을 때 에러 뜸, join 은 인덱스를 handle 할때 더 좋음

#fellowhship ID 로 outer join, lsuffix는 _Left, rsuffix는 _Right로 생성
df1.join(df2, on = 'FellowshipID', how='outer', lsuffix = '_Left', rsuffix  = '_Right')

7. when using join, both instances are setting index on the both one.

#df4를 생성해서 df1의 인덱스는 fellowship ID한후 fellowhip id 로 인덱스 잡은 df(lsuffix = _left, rsuffix = _Right)를 outerjoin 시킴
df4 = df1.set_index('FellowshipID').join(df2.set_index('FellowshipID'), how = 'outer', lsuffix = '_Left', rsuffix = '_Right')


df4

8. concat은 putting on top of the other(join/merge는 putting on the next on the other)

#df1이랑 df2를 outerjoing한것에 대해 concat 하기 axis 추가
df4= pd.concat([df1,df2], join = 'outer', axis =1)
#concat에서 바로 join 포함 가능
df4

9. append one data frame to another data frame. append는 잘 쓰지 않아서 concat 을 쓰는 것을 더 권고

#df2를 df1에 append 하기
df1.append(df2)

출처: alex the analyst

'Programming > python' 카테고리의 다른 글

[python] data cleaning in pandas (0)	2023.06.01
[python] visualization pandas (0)	2023.06.01
[python] pandas group by and agg. function (2)	2023.06.01
[python] pandas indexes (0)	2023.06.01
[python] Pandas Filtering and Ordering (0)	2023.05.31

그렉

[python] merge, join, concatenate pandas

'Programming > python' 카테고리의 다른 글

티스토리툴바

[python] merge, join, concatenate pandas

'Programming > python' 카테고리의 다른 글

관련글

티스토리툴바