isin : Series의 각 원소가 넘겨받은 연속된 값에 속하는지 나타내는 bool배열을 반환
match : 각 값에 대해 유일한 값을 담고 있는 배열에서의 정수 색인을 계산.
unique : Series에서 중복되는 값을 제거하고 유일한 값만 포함하는 배열을 반환
value_count : Series에서 유일값에 대한 색인과 두수를 계산 (도수는 내림차순)

order_id : 주문번호
quantity : 아이템의 주문수량
item_name : 아이템 이름
choice_description : 주문아이템 상세 선택 옵션
item_price : 주문아이템의 가격

가장많이 주문한 아이템 top 10
가장 비싼 아이템 총 몇개 팔렸을까?
Veggie Salad Bowl 이 몇 번 주문되었을까?

import pandas as pd
file_path = 'chipotle.tsv'
chipo = pd.read_csv(file_path,sep='\t')

chipo.head()

1번¶

chipo['item_name'].value_counts()[0:10]

Chicken Bowl                    726
Chicken Burrito                 553
Chips and Guacamole             479
Steak Burrito                   368
Canned Soft Drink               301
Chips                           211
Steak Bowl                      211
Bottled Water                   162
Chicken Soft Tacos              115
Chips and Fresh Tomato Salsa    110
Name: item_name, dtype: int64

chipo['item_price'].value_counts().sort_index()

$1.09     106
$1.25     264
$1.50     117
$1.69      99
$1.99       1
         ... 
$8.75     730
$8.90      20
$8.99     246
$9.25     398
$9.39      17
Name: item_price, Length: 78, dtype: int64

chipo['item_price'].unique()

array(['$2.39 ', '$3.39 ', '$16.98 ', '$10.98 ', '$1.69 ', '$11.75 ',
       '$9.25 ', '$4.45 ', '$8.75 ', '$11.25 ', '$8.49 ', '$2.18 ',
       '$8.99 ', '$1.09 ', '$2.95 ', '$2.15 ', '$3.99 ', '$22.50 ',
       '$11.48 ', '$17.98 ', '$17.50 ', '$4.30 ', '$5.90 ', '$1.25 ',
       '$23.78 ', '$6.49 ', '$11.08 ', '$1.50 ', '$22.16 ', '$32.94 ',
       '$22.20 ', '$10.58 ', '$2.50 ', '$23.50 ', '$7.40 ', '$18.50 ',
       '$3.00 ', '$6.78 ', '$11.89 ', '$9.39 ', '$4.00 ', '$3.75 ',
       '$8.69 ', '$2.29 ', '$8.90 ', '$3.27 ', '$3.89 ', '$8.19 ',
       '$35.00 ', '$27.75 ', '$11.80 ', '$6.00 ', '$26.25 ', '$21.96 ',
       '$4.36 ', '$7.50 ', '$4.78 ', '$13.35 ', '$6.45 ', '$5.07 ',
       '$22.96 ', '$7.17 ', '$7.98 ', '$4.50 ', '$26.07 ', '$12.98 ',
       '$35.25 ', '$44.25 ', '$10.50 ', '$33.75 ', '$16.38 ', '$13.52 ',
       '$5.00 ', '$15.00 ', '$8.50 ', '$17.80 ', '$1.99 ', '$11.49 '],
      dtype=object)

3번¶

sum(chipo[chipo['item_name']=='Veggie Salad Bowl']['quantity'])

18

2번¶

가격을 전부 float 로 수정하자

item_name=chipo['item_name']
quantity=chipo['quantity']
item_price=chipo['item_price']

for i in range(len(item_price)):
    item_price[i]=item_price[i].replace('$',"")
    item_price[i]=item_price[i].replace(' ',"")
    item_price[i]=float(item_price[i])

1개의 가격이 가장 높은 것들¶

item_per_price=item_price/quantity
max(item_per_price)

11.89

sum(item_per_price==11.89)

28

item_per_price.sort_values().tail(10)

1326    11.89
606     11.89
4313    11.89
3208    11.89
749     11.89
4239    11.89
1229    11.89
2439    11.89
2442    11.89
2401    11.89
dtype: object

찾아보면 나오는 메뉴들
Steak Salad Bowl
Barbacoa Salad Bowel
실제로 이것들의 주문을 보면 전부 선택사항이 있었다.

chipo[chipo['item_name']=='Steak Salad Bowl']['choice_description']

281     [Tomatillo Red Chili Salsa, [Black Beans, Chee...
606     [Fresh Tomato Salsa, [Pinto Beans, Cheese, Gua...
607                         [Fresh Tomato Salsa, Lettuce]
613     [Tomatillo Red Chili Salsa, [Fajita Vegetables...
749     [Roasted Chili Corn Salsa, [Rice, Cheese, Lett...
1159    [Fresh Tomato Salsa, [Rice, Fajita Vegetables,...
1311    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
1505    [Fresh Tomato Salsa, [Rice, Pinto Beans, Chees...
1571    [Fresh Tomato Salsa, [Fajita Vegetables, Sour ...
1590    [Fresh Tomato Salsa, [Fajita Vegetables, Rice,...
1816    [Fresh Tomato Salsa, [Rice, Black Beans, Chees...
2401    [Fresh Tomato Salsa, [Fajita Vegetables, Guaca...
2439    [Fresh Tomato Salsa, [Fajita Vegetables, Rice,...
2600    [Fresh Tomato Salsa, [Fajita Vegetables, Lettu...
2624    [Fresh Tomato Salsa, [Black Beans, Sour Cream,...
2740    [Fresh Tomato Salsa, [Fajita Vegetables, Black...
2804    [Tomatillo Red Chili Salsa, [Rice, Black Beans...
2957    [Fresh Tomato Salsa, [Black Beans, Cheese, Gua...
3098    [Roasted Chili Corn Salsa, [Rice, Black Beans,...
3120    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
3350    [Fresh Tomato Salsa, [Cheese, Guacamole, Lettu...
3493    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4036    [Fresh Tomato Salsa, [Fajita Vegetables, Chees...
4241    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4313    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4391    [Fresh Tomato Salsa, [Black Beans, Pinto Beans...
4419    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4547    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4572    [Fresh Tomato Salsa, [Fajita Vegetables, Lettu...
Name: choice_description, dtype: object

chipo[chipo['choice_description'].isnull()==True][chipo['quantity']==1].sort_values(by='item_price')

chipo[chipo['choice_description'].isnull()==True][chipo['quantity']==1]['item_name'].unique()

array(['Chips and Fresh Tomato Salsa',
       'Chips and Tomatillo-Green Chili Salsa', 'Side of Chips',
       'Chips and Guacamole', 'Bottled Water',
       'Chips and Tomatillo Green Chili Salsa', 'Chips',
       'Chips and Tomatillo Red Chili Salsa',
       'Chips and Roasted Chili-Corn Salsa',
       'Chips and Roasted Chili Corn Salsa',
       'Chips and Tomatillo-Red Chili Salsa',
       'Chips and Mild Fresh Tomato Salsa'], dtype=object)

단품으로 주문한것들에는 포함되지 않는 것들이 많았음

chipo['item_name'].unique()

array(['Chips and Fresh Tomato Salsa', 'Izze', 'Nantucket Nectar',
       'Chips and Tomatillo-Green Chili Salsa', 'Chicken Bowl',
       'Side of Chips', 'Steak Burrito', 'Steak Soft Tacos',
       'Chips and Guacamole', 'Chicken Crispy Tacos',
       'Chicken Soft Tacos', 'Chicken Burrito', 'Canned Soda',
       'Barbacoa Burrito', 'Carnitas Burrito', 'Carnitas Bowl',
       'Bottled Water', 'Chips and Tomatillo Green Chili Salsa',
       'Barbacoa Bowl', 'Chips', 'Chicken Salad Bowl', 'Steak Bowl',
       'Barbacoa Soft Tacos', 'Veggie Burrito', 'Veggie Bowl',
       'Steak Crispy Tacos', 'Chips and Tomatillo Red Chili Salsa',
       'Barbacoa Crispy Tacos', 'Veggie Salad Bowl',
       'Chips and Roasted Chili-Corn Salsa',
       'Chips and Roasted Chili Corn Salsa', 'Carnitas Soft Tacos',
       'Chicken Salad', 'Canned Soft Drink', 'Steak Salad Bowl',
       '6 Pack Soft Drink', 'Chips and Tomatillo-Red Chili Salsa', 'Bowl',
       'Burrito', 'Crispy Tacos', 'Carnitas Crispy Tacos', 'Steak Salad',
       'Chips and Mild Fresh Tomato Salsa', 'Veggie Soft Tacos',
       'Carnitas Salad Bowl', 'Barbacoa Salad Bowl', 'Salad',
       'Veggie Crispy Tacos', 'Veggie Salad', 'Carnitas Salad'],
      dtype=object)

sum(chipo[chipo['item_name']=='Steak Salad Bowl']['item_price'])/sum(chipo[chipo['item_name']=='Steak Salad Bowl']['quantity'])

11.083548387096766

mean_price=[]

chipo['item_name'].unique()[0]

'Chips and Fresh Tomato Salsa'

저희는 절대 추가 사항마다의 가격을 구할 수가 없습니다. 각각의 가격이 적힌 메뉴표가 저희한테는 존재하지 않습니다. 그래서 각 메뉴들의 평균 값을 계산해봅시다. 각 주문에는 그 상품의 가격이 무조건 포함. 선택사항은 사람마다 다릅니다.

for i in chipo['item_name'].unique():
    mean_price.append(sum(chipo[chipo['item_name']==i]['item_price'])/sum(chipo[chipo['item_name']==i]['quantity']))

max(mean_price)

11.083548387096766

mean_price

[2.779692307692303,
 3.3899999999999997,
 3.39,
 2.39,
 9.648791064388927,
 1.6899999999999984,
 9.977797927461117,
 9.57232142857143,
 4.3498814229249145,
 9.442600000000002,
 9.234083333333334,
 9.43455160744498,
 1.090000000000002,
 9.832417582417586,
 9.963833333333334,
 10.376197183098595,
 1.4339336492890997,
 2.950000000000001,
 10.18727272727273,
 2.14930434782608,
 9.989837398373984,
 10.227104072398193,
 10.0184,
 9.636804123711343,
 9.97689655172414,
 9.926111111111112,
 2.9500000000000006,
 10.0175,
 10.13888888888889,
 2.39,
 2.950000000000001,
 9.3985,
 9.01,
 1.25,
 11.083548387096766,
 6.490000000000004,
 2.39,
 7.4,
 7.3999999999999995,
 7.4,
 9.745000000000001,
 8.915,
 3.0,
 9.245000000000001,
 11.056666666666667,
 10.64,
 7.4,
 8.49,
 8.49,
 8.99]

그렇게 찾은 메뉴

chipo['item_name'].unique()[-16]

'Steak Salad Bowl'

sum(chipo[chipo['item_name']=='Steak Salad Bowl']['quantity'])

31

chipo[chipo['item_name']=='Steak Salad Bowl']['choice_description']

281     [Tomatillo Red Chili Salsa, [Black Beans, Chee...
606     [Fresh Tomato Salsa, [Pinto Beans, Cheese, Gua...
607                         [Fresh Tomato Salsa, Lettuce]
613     [Tomatillo Red Chili Salsa, [Fajita Vegetables...
749     [Roasted Chili Corn Salsa, [Rice, Cheese, Lett...
1159    [Fresh Tomato Salsa, [Rice, Fajita Vegetables,...
1311    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
1505    [Fresh Tomato Salsa, [Rice, Pinto Beans, Chees...
1571    [Fresh Tomato Salsa, [Fajita Vegetables, Sour ...
1590    [Fresh Tomato Salsa, [Fajita Vegetables, Rice,...
1816    [Fresh Tomato Salsa, [Rice, Black Beans, Chees...
2401    [Fresh Tomato Salsa, [Fajita Vegetables, Guaca...
2439    [Fresh Tomato Salsa, [Fajita Vegetables, Rice,...
2600    [Fresh Tomato Salsa, [Fajita Vegetables, Lettu...
2624    [Fresh Tomato Salsa, [Black Beans, Sour Cream,...
2740    [Fresh Tomato Salsa, [Fajita Vegetables, Black...
2804    [Tomatillo Red Chili Salsa, [Rice, Black Beans...
2957    [Fresh Tomato Salsa, [Black Beans, Cheese, Gua...
3098    [Roasted Chili Corn Salsa, [Rice, Black Beans,...
3120    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
3350    [Fresh Tomato Salsa, [Cheese, Guacamole, Lettu...
3493    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4036    [Fresh Tomato Salsa, [Fajita Vegetables, Chees...
4241    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4313    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4391    [Fresh Tomato Salsa, [Black Beans, Pinto Beans...
4419    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4547    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4572    [Fresh Tomato Salsa, [Fajita Vegetables, Lettu...
Name: choice_description, dtype: object

#비슷한 가격의 다른 제품을 보자
chipo['item_name'].unique()[-6]

'Carnitas Salad Bowl'

추가사항이 비슷함

chipo[chipo['item_name']=='Carnitas Salad Bowl']['choice_description']

1132    [Fresh Tomato Salsa, [Rice, Black Beans, Chees...
1865     [Fresh Tomato Salsa, [Rice, Cheese, Sour Cream]]
2610    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
3115    [Tomatillo Green Chili Salsa, [Rice, Pinto Bea...
3749    [Roasted Chili Corn Salsa, [Fajita Vegetables,...
4239    [Tomatillo Green Chili Salsa, [Black Beans, Ch...
Name: choice_description, dtype: object

sum(chipo[chipo['item_name']=='Steak Salad Bowl']['quantity'])

31

padas¶

https://pandas.pydata.org/

빅데이터 시대

데이터로부터 유용한 정보를 뽑아내는 분석프로세스를 위해
데이터를 수집하고 정리하는 데 최적화된 도구

판다스 자료 구조¶

분석을 위해 다양한 소스로부터 수집하는 데이터는 형태나 속성이 매우 다양함
서로 다른 형식을 갖는 여러 종류의 데이터를 컴퓨터가 이해 할 수 있도록 동일한 형식을 갖는 구조로 통합 해야함
Series(1차원) 와 Dataframe(2차원) 이라는 구조화된 데이터 형식을 제공
서로다른 여러가지 유형의 데이터를 공통의 포맷으로 정리하는 목적
Dataframe : 행과 열로 이루어진 2차원 구조의 형태로 데이터 분석 실무에 자주 사용됨

1. 시리즈(Series)¶

데이터가 순차적으로 나열된 1차원 배열의 형태
인덱스(index)는 데이터값(value)와 일대일 대응
파이썬의 딕셔너리와 비슷한 구조

딕셔너리 ==> 시리즈¶

pandas.Series(딕셔너리)

import pandas as pd

dict_data= {'a':1, 'b':2, 'c':3}
sr=pd.Series(dict_data)
print(type(sr))
print()
print(sr)

<class 'pandas.core.series.Series'>

a    1
b    2
c    3
dtype: int64

obj=pd.Series([4,7,-5,3]) #인덱스 지정 안 했을 때 디폴트로 0,1,2,3.. 나옴
print(obj)

0    4
1    7
2   -5
3    3
dtype: int64

Series의 index / value¶

Series객체.index : 인덱스 배열
Series객체.values : 데이터값 배열

print(obj.values)
print(obj.index)

[ 4  7 -5  3]
RangeIndex(start=0, stop=4, step=1)

import pandas as pd
obj2=pd.Series([4,7,-5,3], index=['d', 'b', 'a', 'c'])
print(obj2)
print(obj2.index)

d    4
b    7
a   -5
c    3
dtype: int64
Index(['d', 'b', 'a', 'c'], dtype='object')

import numpy as np 
import pandas as pd
list_A=np.array(list('abcdef'))
list_B= np.arange(10,70,10)

dict_data={key:value for key,value in zip(list_A,list_B)}
print(dict_data)

{'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50, 'f': 60}

sr=pd.Series(dict_data)
sr

a    10
b    20
c    30
d    40
e    50
f    60
dtype: int64

# 위의 과정보다 간편
import numpy as np 
import pandas as pd
list_A=np.array(list('abcdef'))
list_B= np.arange(10,70,10)

sr=pd.Series(list_B, index=list_A)
for i in range(sr.size):
    key=sr.index[i]
    print("sr['{}'] : {} or sr[{}] : {}".format(key,sr[key],i, sr.values[i]))

sr['a'] : 10 or sr[0] : 10
sr['b'] : 20 or sr[1] : 20
sr['c'] : 30 or sr[2] : 30
sr['d'] : 40 or sr[3] : 40
sr['e'] : 50 or sr[4] : 50
sr['f'] : 60 or sr[5] : 60

print(sr['a'], sr[0], sr.values[0]) # 같은 값

10 10 10

print(sr.index[0])

a

print(obj2)
print()
print(obj2[obj2>0])
print()
print(obj2*2)
print()
print(np.exp(obj2))

d    4
b    7
a   -5
c    3
dtype: int64

d    4
b    7
c    3
dtype: int64

d     8
b    14
a   -10
c     6
dtype: int64

d      54.598150
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

print('b' in obj2)
print('e' in obj2)

True
False

sdata= {'ohio':35000, 'Texas':71000, 'Oregon':16000, 'Utah':5000}
obj3=pd.Series(sdata)
print(obj3)

ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

states= ['Callifornia', 'ohio','Texas','Oregon']
print(type(states))
obj4=pd.Series(sdata, index=states)
print(obj4)

<class 'list'>
Callifornia        NaN
ohio           35000.0
Texas          71000.0
Oregon         16000.0
dtype: float64

import pandas as pd
print(pd.isnull(obj4)) #비어있냐 
print() 
print(pd.notnull(obj4)) #비어있지않냐

Callifornia     True
ohio           False
Texas          False
Oregon         False
dtype: bool

Callifornia    False
ohio            True
Texas           True
Oregon          True
dtype: bool

print(obj4.isnull())

Callifornia     True
ohio           False
Texas          False
Oregon         False
dtype: bool

print(obj3)
print()
print(obj4)
print()
print(obj3+obj4)

ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

Callifornia        NaN
ohio           35000.0
Texas          71000.0
Oregon         16000.0
dtype: float64

Callifornia         NaN
Oregon          32000.0
Texas          142000.0
Utah                NaN
ohio            70000.0
dtype: float64

#print(obj4.name)

obj4.name='population'
obj4.index.name='state' 
print(obj4)

state
Callifornia        NaN
ohio           35000.0
Texas          71000.0
Oregon         16000.0
Name: population, dtype: float64

print(obj)

0    4
1    7
2   -5
3    3
dtype: int64

obj.index=['Bob', 'Steve', 'Jeff', 'Ryan']
print(obj)

Bob      4
Steve    7
Jeff    -5
Ryan     3
dtype: int64

2. 데이터프레임(DataFrame)¶

2차원 배열
R의 데이터 프레임에서 유래
엑셀, 관계형 DB등에서 사용됨
하나의 열이 각각의 Series객체임

data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'], #키, 값 
        'year': [2000, 2001, 2002, 2001, 2002, 2003],
        'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame = pd.DataFrame(data)

frame

frame.head()

frame.tail()

#column의 순서를 바꿀 수 있음
pd.DataFrame(data, columns=['year','state','pop'])

행 인덱스/ 열 이름 설정: pandas.DataFrame(2차원 배열, index=행 인덱스 배열, colimns=열 이름 배열)¶

import pandas as pd
frame2= pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'], index=['one', 'two', 'three', 'four', 'five', 'six'])
frame2

행 인덱스 변경: DataFrame 객체.rename(index{기존 인덱스:새 인덱스, ...})¶

열 이름 변경 : DataFrame 객체.rename(colums{기존 이름:새 이름,...})¶

print(frame2.columns)

Index(['YEA', 'STA', 'POP', 'DEBT'], dtype='object')

frame2.rename(columns={'year': 'YEA', 'state':'STA', 'pop':'POP', 'debt':'DEBT'}, inplace=True)

frame2.rename(index={'one': '01', 'two':'02'}, inplace=True)

frame2

frame2['STA']

01         Ohio
02         Ohio
three      Ohio
four     Nevada
five     Nevada
six      Nevada
Name: STA, dtype: object

frame2.YEA

01       2000
02       2001
three    2002
four     2001
five     2002
six      2003
Name: YEA, dtype: int64

.iloc[[행],[열]]¶

Data의 행 번호 활용, integer만 가능
### .loc[[행],[열]]
DataFrame index 활용, 아무 것이나 활용 가능

frame2

frame2.loc['three']

YEA     2002
STA     Ohio
POP      3.6
DEBT     NaN
Name: three, dtype: object

frame2.iloc[2]

YEA     2002
STA     Ohio
POP      3.6
DEBT     NaN
Name: three, dtype: object

frame2['DEB']=16.5 #한 열의 값을 통째로 바꿈 
frame2

frame2['DEB']=np.arange(1,13,2)
frame2

val=pd.Series([-1.2,-1.5,-1.7], index=['02', 'four', 'six'])
frame2['DEB']=val
frame2

frame2['eastern']=frame2.STA=='Ohio'
frame2

frame2['Big_State']=(frame2.STA=='Ohio') & (frame2.POP>3.0)
frame2

del frame2['eastern']
frame2

del frame2['Big_State']
frame2

중첩된 딕셔너리¶

pop = {'Nevada': {2001: 2.4, 2002: 2.9},
       'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}

frame3= pd.DataFrame(pop)
frame3

frame3.T

pd.DataFrame(pop, index=[2001,2002,2003])

frame3

print(frame3.iloc[0,0])
print(frame3.iloc[0,1])
print(frame3.iloc[1,0])
print(frame3.iloc[1,1])

2.4
1.7
2.9
3.6

frame3.iloc[0,[0,1]]

Nevada    2.4
Ohio      1.7
Name: 2001, dtype: float64

frame3.iloc[0,0:]

Nevada    2.4
Ohio      1.7
Name: 2001, dtype: float64

pdata= {'Ohio' : frame3['Ohio'][:-1], 'Nevada' : frame3['Nevada'][:-2]}
pd.DataFrame(pdata)

import pandas as pd
import seaborn as sns

titanic = sns.load_dataset('titanic')

titanic.head()

titanic.tail()

df = titanic.loc[:,['age', 'fare']]

df.head()

df.tail()

df_add10= df+ 10

df_add10.head()

print(type(df_add10))

<class 'pandas.core.frame.DataFrame'>

df_sub= df_add10-df
df_sub

색인¶

obj=pd.Series(range(3), index=['a', 'b', 'c'])
index= obj.index
print(index)
index[1:]

Index(['a', 'b', 'c'], dtype='object')

Index(['b', 'c'], dtype='object')

import numpy as np
import pandas as pd 
labels=pd.Index(np.arange(3))
print(labels)
print()
obj2=pd.Series([1.5, -2.5, 0], index=labels)
print(obj2)

Int64Index([0, 1, 2], dtype='int64')

0    1.5
1   -2.5
2    0.0
dtype: float64

obj2.index is labels

True

dup_labels=pd.Index(['foo', 'foo', 'bar', 'bar']) #중복 가능 
dup_labels

Index(['foo', 'foo', 'bar', 'bar'], dtype='object')

obj=pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
obj

d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

obj2=obj.reindex(['a','b','c','d','e'])
obj2

a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

obj3=pd.Series(['blue', 'purple', 'yellow'], index=[0,2,4])
obj3

0      blue
2    purple
4    yellow
dtype: object

obj3.reindex(range(6), method="ffill") #Nan 값을 앞의 값으로 채운다

0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

import numpy as np 
import pandas as pd 

frame=pd.DataFrame(np.arange(9).reshape((3,3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])
frame

frame2= frame.reindex(['a','b','c','d'])
frame2

states=['Texas', 'Utah', 'California']
frame.reindex(columns=states)

obj=pd.Series(np.arange(5.), index=['a','b','c','d','e'])
obj

a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

new_obj= obj.drop('c')
new_obj

a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

new_obj2=obj.drop(['d', 'c'])
new_obj2

a    0.0
b    1.0
e    4.0
dtype: float64

data=pd.DataFrame(np.arange(16).reshape((4,4)), index= ['Ohio','Colorado','Utah', 'New York'] ,columns=['one', 'two','three', 'four'])
data

data.drop(['Colorado', 'Ohio']) # drop은 행을 삭제함

data.drop('two', axis=1) # 열 삭제

data2= data.drop('two', axis=1)
data2.drop('Utah', axis=0)

data.drop(['two','four'], axis = 1)

data.drop('Ohio', axis='rows') #axis='rows or axis=0은 생략가능

data.drop('Ohio')

data

data3=data.copy() 
data3.drop("Ohio", inplace=True)
data3

인덱싱¶

obj= pd.Series(np.arange(4.), index=['a','b','c','d'])
obj

a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64

print(obj['b'], obj[1]); print()
print(obj[2:4])
print(obj[['b','a','d']])
print(obj[[1,3]]); print()
print((range(4),obj.index is obj))

1.0 1.0

c    2.0
d    3.0
dtype: float64
b    1.0
a    0.0
d    3.0
dtype: float64
b    1.0
d    3.0
dtype: float64

(range(0, 4), False)

obj['b':'c']=5
obj

a    0.0
b    5.0
c    5.0
d    3.0
dtype: float64

data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns=['one', 'two', 'three', 'four'])
data

data['two']

Ohio         1
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

data[['three','one']]

data[:2]

data[data['three']>5]

data[data<5] =0 #data<5 True, True는 0으로 바뀜 
data

data.loc['Colorado', ['two', 'three']]

two      5
three    6
Name: Colorado, dtype: int32

data

data.iloc[2,[3,0,1]]

four    11
one      8
two      9
Name: Utah, dtype: int32

data.iloc[[1,2], [3,0,1]]

data.loc[:'Utah', 'two']

Ohio        0
Colorado    5
Utah        9
Name: two, dtype: int32

data.iloc[:,:3][data.three>5]

ser = pd.Series(np.arange(3.))
ser

0    0.0
1    1.0
2    2.0
dtype: float64

ser[:1]
ser.loc[:1]
ser.iloc[:1]

0    0.0
dtype: float64

print(ser[:1]); print()
print(ser.loc[:1]); print() #숫자1 
print(ser.iloc[:1])

0    0.0
dtype: float64

0    0.0
1    1.0
dtype: float64

0    0.0
dtype: float64

frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'),
                     index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame

np.abs(frame) #절대값

np.random.randn : 평균 0 표준편차가 1인 가우시안 정규분포 난수 matrix 생성

f=lambda x:x.max()-x.min()
frame.apply(f)

b    2.303884
d    1.460808
e    1.654872
dtype: float64

frame.apply(f, axis='columns')

Utah      0.858553
Ohio      1.616140
Texas     2.096070
Oregon    2.221988
dtype: float64

frame

def f(x):
    return pd.Series([x.min(), x.max()], index=['min','max'])
frame.apply(f)

obj = pd.Series(range(4), index=['d', 'a', 'b', 'c']) 
obj

d    0
a    1
b    2
c    3
dtype: int64

index를 기준으로 sorting¶

obj.sort_index()

a    1
b    2
c    3
d    0
dtype: int64

frame = pd.DataFrame(np.arange(8).reshape((2, 4)),
                     index=['three', 'one'],
                     columns=['d', 'a', 'b', 'c']) 
frame

frame.sort_index() #행을 정렬(오름차순)

frame.sort_index(axis=1) #열을 정렬

frame.sort_index(axis='columns') #열을 정렬

frame.sort_index(axis='columns',ascending=False) #내림차순 정렬

frame.sort_index(axis='columns',ascending=True) #오름차순 정렬

obj = pd.Series([4, 7, -3, 2]) 
obj

0    4
1    7
2   -3
3    2
dtype: int64

obj.sort_values() # 값이 낮은 기준으로 정렬

2   -3
3    2
0    4
1    7
dtype: int64

frame = pd.DataFrame({'b': [4, 7, -3, 8], 'a': [0, 1, 2, 3]})
frame

frame.sort_values(by=['b','a']) #b를 기준으로 정렬

frame.sort_values(by=['a','b']) # a를 기준으로 먼저 정렬하고 b 정렬

obj = pd.Series([7, -5, 7, 4, 2, 0, 4]) 
obj

0    7
1   -5
2    7
3    4
4    2
5    0
6    4
dtype: int64

obj.rank() #순위

0    6.5
1    1.0
2    6.5
3    4.5
4    3.0
5    2.0
6    4.5
dtype: float64

obj.rank(method='first') #먼저 온 순서대로 (중복없음)

0    6.0
1    1.0
2    7.0
3    4.0
4    3.0
5    2.0
6    5.0
dtype: float64

obj

0    7
1   -5
2    7
3    4
4    2
5    0
6    4
dtype: int64

obj.rank(ascending=False, method='max') # ex) 0과 2가 2로 공동1등이라서 1.5로 적었지만 max를 쓰면 2로 표기됨

0    2.0
1    7.0
2    2.0
3    4.0
4    5.0
5    6.0
6    4.0
dtype: float64

frame = pd.DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1],
                      'c': [-2, 5, 8, -2.5]})
frame

frame.rank(axis='columns') # 한 행에 있는 열 값을 기준으로 순서 매김

obj = pd.Series(range(5), index=['a', 'a', 'b', 'b', 'c'])
obj

a    0
a    1
b    2
b    3
c    4
dtype: int64

obj.index.is_unique #a와 b 중복되서 False

False

obj['a']

a    0
a    1
dtype: int64

obj['c']

4

df = pd.DataFrame(np.random.randn(4, 3), index=['a', 'a', 'b', 'b'])
df

df.loc['b']

df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],
                   [np.nan, np.nan], [0.75, -1.3]],
                  index=['a', 'b', 'c', 'd'],
                  columns=['one', 'two'])
df

df.sum()

one    9.25
two   -5.80
dtype: float64

df.sum(axis='columns')

a    1.40
b    2.60
c    0.00
d   -0.55
dtype: float64

df.mean(axis='columns', skipna=False) #NAN을 skip할건지말건지

a      NaN
b    1.300
c      NaN
d   -0.275
dtype: float64

df.mean(axis='columns',skipna=True) #skipna=True 기본값이라 생략가능

a    1.400
b    1.300
c      NaN
d   -0.275
dtype: float64

df.idxmax()

one    b
two    d
dtype: object

df.idxmin()

one    d
two    b
dtype: object

df

df

df.cumsum() #누적 합

df.describe()

Unique Values, Value Counts, and Membership¶

obj = pd.Series(['c', 'a', 'd', 'a', 'a', 'b', 'b', 'c', 'c'])
obj

0    c
1    a
2    d
3    a
4    a
5    b
6    b
7    c
8    c
dtype: object

uniques= obj.unique()
uniques

array(['c', 'a', 'd', 'b'], dtype=object)

obj.value_counts()

a    3
c    3
b    2
d    1
dtype: int64

pd.value_counts(obj.values,sort=False)

c    3
b    2
a    3
d    1
dtype: int64

pd.value_counts(obj.values,sort=True)

a    3
c    3
b    2
d    1
dtype: int64

obj

0    c
1    a
2    d
3    a
4    a
5    b
6    b
7    c
8    c
dtype: object

mask=obj.isin(['b','c'])
mask

0     True
1    False
2    False
3    False
4    False
5     True
6     True
7     True
8     True
dtype: bool

to_match = pd.Series(['c', 'a', 'b', 'b', 'c', 'a'])
to_match

0    c
1    a
2    b
3    b
4    c
5    a
dtype: object

unique_vals = pd.Series(['c', 'b', 'a'])
unique_vals

0    c
1    b
2    a
dtype: object

pd.Index(unique_vals).get_indexer(to_match) #unique_vals의 c=0 b=1 a=2로 값을 정하고 to_match에서 적용

array([0, 2, 1, 1, 0, 2], dtype=int64)

data = pd.DataFrame({'Qu1': [5, 1, 4, 5, 4],
                     'Qu2': [2, 3, 1, 2, 3],
                     'Qu3': [1, 5, 2, 4, 4]})

data

data['Qu1'].value_counts()

5    2
4    2
1    1
Name: Qu1, dtype: int64

data['Qu1'].value_counts()[:1]

5    2
Name: Qu1, dtype: int64

data['Qu1'].value_counts()[1:]

4    2
1    1
Name: Qu1, dtype: int64

result = data.apply(pd.value_counts).fillna(0) # 위의 값 count 확인, fillna(0) : 없는 값은 0으로 바꿔줌  
result

isin : Series의 각 원소가 넘겨받은 연속된 값에 속하는지 나타내는 bool배열을 반환
match : 각 값에 대해 유일한 값을 담고 있는 배열에서의 정수 색인을 계산.
unique : Series에서 중복되는 값을 제거하고 유일한 값만 포함하는 배열을 반환
value_count : Series에서 유일값에 대한 색인과 두수를 계산 (도수는 내림차순)

vectorize¶

import numpy as np

matrix= np.arange(1,10).reshape(3,3)
print(matrix)

[[1 2 3]
 [4 5 6]
 [7 8 9]]

add_100= lambda i:i+100

vectorized_add_100= np.vectorize(add_100)

vectorized_add_100(matrix)

array([[101, 102, 103],
       [104, 105, 106],
       [107, 108, 109]])

find_odd = lambda i:i%2*i
vectorized_find_odd= np.vectorize(find_odd)

vectorized_find_odd(matrix)

array([[1, 0, 3],
       [0, 5, 0],
       [7, 0, 9]])

find_odd = lambda i : i if i%2==1 else 0 
find_even= find_odd= lambda i : i if i%2==0 else 0

vectorized_find_odd= np.vectorize(find_odd)
vectorized_find_odd(matrix)

array([[0, 2, 0],
       [4, 0, 6],
       [0, 8, 0]])

import numpy as np

matrix= np.arange(1,10).reshape(3,3)
print(matrix)

print(np.max(matrix))
print()
print(np.min(matrix))
print()
print(np.min(matrix, axis=1)) # 각 행에서 최소값 찾음
print()
print(np.min(matrix, axis=0)) # 각 열에서 최소값 찾음

[[1 2 3]
 [4 5 6]
 [7 8 9]]
9

1

[1 4 7]

[1 2 3]

vector_column= np.max(matrix, axis=1, keepdims=True) 
print(vector_column)
print(vector_column.shape)

[[3]
 [6]
 [9]]
(3, 1)

print(matrix.shape)
print(matrix-vector_column)

(3, 3)
[[-2 -1  0]
 [-2 -1  0]
 [-2 -1  0]]

평균, 분산, 표준편차¶

import numpy as np

matrix= np.arange(1,10).reshape(3,3)
print(matrix)

print("평균: ",np.mean(matrix))
print("분산: ",np.var(matrix))
print("표준편차: ",np.std(matrix))

[[1 2 3]
 [4 5 6]
 [7 8 9]]
평균:  5.0
분산:  6.666666666666667
표준편차:  2.581988897471611

print("평균 : ", np.mean(matrix))
print("평균 : ",np.mean(matrix,axis=0)) #각 열의 평균

평균 :  5.0
평균 :  [4. 5. 6.]

print(np.std(matrix))
print(np.std(matrix, ddof=1))

2.581988897471611
2.7386127875258306

행렬의 전치(전치행렬, Transpose)¶

import numpy as np 
matrix= np.arange(1,10).reshape(3,3)
print(matrix)
print()
print(matrix.T)

[[1 2 3]
 [4 5 6]
 [7 8 9]]

[[1 4 7]
 [2 5 8]
 [3 6 9]]

np.array([[1,2,3,4,5,6,7]]).T

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7]])

matrix=([[1,1,1], [1,1,10], [1,1,15]])
print(matrix)
np.linalg.matrix_rank(matrix)

[[1, 1, 1], [1, 1, 10], [1, 1, 15]]

2

inverse matrix¶

matrix=np.array([[1,4], [2,5]])
print(matrix)

[[1 4]
 [2 5]]

#역행렬 
inv_matrix= np.linalg.inv(matrix)
print(inv_matrix)

[[-1.66666667  1.33333333]
 [ 0.66666667 -0.33333333]]

matrix @ inv_matrix

array([[1., 0.],
       [0., 1.]])

matrix= np.arange(1,7).reshape(2,3)
print(matrix)

[[1 2 3]
 [4 5 6]]

inv_matrix= np.linalg.pinv(matrix) #유사역행렬
print(inv_matrix)

[[-0.94444444  0.44444444]
 [-0.11111111  0.11111111]
 [ 0.72222222 -0.22222222]]

matrix@inv_matrix

array([[1.00000000e+00, 2.22044605e-16],
       [0.00000000e+00, 1.00000000e+00]])

print(np.round(matrix @ inv_matrix,0))

[[1. 0.]
 [0. 1.]]

import random
a= np.arange(5)
print(a)
np.random.shuffle(a)
print(a)

[0 1 2 3 4]
[3 0 1 2 4]

import random
a= np.arange(5)
print(a) 
b= np.random.permutation(a)
print(b)

[0 1 2 3 4]
[2 4 1 0 3]

print(np.random.permutation(5))

[0 4 1 2 3]

import matplotlib.pyplot as plt 
import numpy as np 
%matplotlib inline 
#주피터 노트북에서 그래프를 표시

#data작성 
np.random.seed(1)
x= np.arange(10)
y= np.random.rand(10) # 0과 1사이의 값 10개 

plt.plot(x,y)
plt.show()

%reset

import numpy as np 
import matplotlib.pyplot as plt 
%matplotlib inline 

def f(x):
    return(x-2)*x*(x+2)

print(f(1))

-3

print(f(np.array([1,2,3])))

[-3  0 15]

x= np.arange(-3,3.5,0.5)
print(x)

[-3.  -2.5 -2.  -1.5 -1.  -0.5  0.   0.5  1.   1.5  2.   2.5  3. ]

np.linespace(start,stop,num,endpoint=True)¶

endpoint=True : stop으로 주어진 값을 포함시킴
endpoint=False : stop으로 주어진 값을 포함시키지 않음

x=np.linspace(-3,3,10)
print(np.round(x,2))

[-3.   -2.33 -1.67 -1.   -0.33  0.33  1.    1.67  2.33  3.  ]

plt.plot(x,f(x))
plt.show()

import numpy as np 
import matplotlib.pyplot as plt 
%matplotlib inline 

#함수정의
def f2(x,w):
    return(x-w)*x*(x+w)

#x를 정의
x=np.linspace(-3,3,100)

#차트를 묘사 
plt.plot(x,f2(x,2), color="black", label="w=2")
plt.plot(x,f2(x,1), color="cornflowerblue", label="w=1")
plt.legend(loc="upper left") #범래 표시 
plt.ylim(-15,15) #y축의 범위를 지정 
plt.title('$f_2(x)$') #제목 #$가 없으면 f_2(x)로 나옴 
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()

import matplotlib 
matplotlib.colors.cnames

{'aliceblue': '#F0F8FF',
 'antiquewhite': '#FAEBD7',
 'aqua': '#00FFFF',
 'aquamarine': '#7FFFD4',
 'azure': '#F0FFFF',
 'beige': '#F5F5DC',
 'bisque': '#FFE4C4',
 'black': '#000000',
 'blanchedalmond': '#FFEBCD',
 'blue': '#0000FF',
 'blueviolet': '#8A2BE2',
 'brown': '#A52A2A',
 'burlywood': '#DEB887',
 'cadetblue': '#5F9EA0',
 'chartreuse': '#7FFF00',
 'chocolate': '#D2691E',
 'coral': '#FF7F50',
 'cornflowerblue': '#6495ED',
 'cornsilk': '#FFF8DC',
 'crimson': '#DC143C',
 'cyan': '#00FFFF',
 'darkblue': '#00008B',
 'darkcyan': '#008B8B',
 'darkgoldenrod': '#B8860B',
 'darkgray': '#A9A9A9',
 'darkgreen': '#006400',
 'darkgrey': '#A9A9A9',
 'darkkhaki': '#BDB76B',
 'darkmagenta': '#8B008B',
 'darkolivegreen': '#556B2F',
 'darkorange': '#FF8C00',
 'darkorchid': '#9932CC',
 'darkred': '#8B0000',
 'darksalmon': '#E9967A',
 'darkseagreen': '#8FBC8F',
 'darkslateblue': '#483D8B',
 'darkslategray': '#2F4F4F',
 'darkslategrey': '#2F4F4F',
 'darkturquoise': '#00CED1',
 'darkviolet': '#9400D3',
 'deeppink': '#FF1493',
 'deepskyblue': '#00BFFF',
 'dimgray': '#696969',
 'dimgrey': '#696969',
 'dodgerblue': '#1E90FF',
 'firebrick': '#B22222',
 'floralwhite': '#FFFAF0',
 'forestgreen': '#228B22',
 'fuchsia': '#FF00FF',
 'gainsboro': '#DCDCDC',
 'ghostwhite': '#F8F8FF',
 'gold': '#FFD700',
 'goldenrod': '#DAA520',
 'gray': '#808080',
 'green': '#008000',
 'greenyellow': '#ADFF2F',
 'grey': '#808080',
 'honeydew': '#F0FFF0',
 'hotpink': '#FF69B4',
 'indianred': '#CD5C5C',
 'indigo': '#4B0082',
 'ivory': '#FFFFF0',
 'khaki': '#F0E68C',
 'lavender': '#E6E6FA',
 'lavenderblush': '#FFF0F5',
 'lawngreen': '#7CFC00',
 'lemonchiffon': '#FFFACD',
 'lightblue': '#ADD8E6',
 'lightcoral': '#F08080',
 'lightcyan': '#E0FFFF',
 'lightgoldenrodyellow': '#FAFAD2',
 'lightgray': '#D3D3D3',
 'lightgreen': '#90EE90',
 'lightgrey': '#D3D3D3',
 'lightpink': '#FFB6C1',
 'lightsalmon': '#FFA07A',
 'lightseagreen': '#20B2AA',
 'lightskyblue': '#87CEFA',
 'lightslategray': '#778899',
 'lightslategrey': '#778899',
 'lightsteelblue': '#B0C4DE',
 'lightyellow': '#FFFFE0',
 'lime': '#00FF00',
 'limegreen': '#32CD32',
 'linen': '#FAF0E6',
 'magenta': '#FF00FF',
 'maroon': '#800000',
 'mediumaquamarine': '#66CDAA',
 'mediumblue': '#0000CD',
 'mediumorchid': '#BA55D3',
 'mediumpurple': '#9370DB',
 'mediumseagreen': '#3CB371',
 'mediumslateblue': '#7B68EE',
 'mediumspringgreen': '#00FA9A',
 'mediumturquoise': '#48D1CC',
 'mediumvioletred': '#C71585',
 'midnightblue': '#191970',
 'mintcream': '#F5FFFA',
 'mistyrose': '#FFE4E1',
 'moccasin': '#FFE4B5',
 'navajowhite': '#FFDEAD',
 'navy': '#000080',
 'oldlace': '#FDF5E6',
 'olive': '#808000',
 'olivedrab': '#6B8E23',
 'orange': '#FFA500',
 'orangered': '#FF4500',
 'orchid': '#DA70D6',
 'palegoldenrod': '#EEE8AA',
 'palegreen': '#98FB98',
 'paleturquoise': '#AFEEEE',
 'palevioletred': '#DB7093',
 'papayawhip': '#FFEFD5',
 'peachpuff': '#FFDAB9',
 'peru': '#CD853F',
 'pink': '#FFC0CB',
 'plum': '#DDA0DD',
 'powderblue': '#B0E0E6',
 'purple': '#800080',
 'rebeccapurple': '#663399',
 'red': '#FF0000',
 'rosybrown': '#BC8F8F',
 'royalblue': '#4169E1',
 'saddlebrown': '#8B4513',
 'salmon': '#FA8072',
 'sandybrown': '#F4A460',
 'seagreen': '#2E8B57',
 'seashell': '#FFF5EE',
 'sienna': '#A0522D',
 'silver': '#C0C0C0',
 'skyblue': '#87CEEB',
 'slateblue': '#6A5ACD',
 'slategray': '#708090',
 'slategrey': '#708090',
 'snow': '#FFFAFA',
 'springgreen': '#00FF7F',
 'steelblue': '#4682B4',
 'tan': '#D2B48C',
 'teal': '#008080',
 'thistle': '#D8BFD8',
 'tomato': '#FF6347',
 'turquoise': '#40E0D0',
 'violet': '#EE82EE',
 'wheat': '#F5DEB3',
 'white': '#FFFFFF',
 'whitesmoke': '#F5F5F5',
 'yellow': '#FFFF00',
 'yellowgreen': '#9ACD32'}

그래프 여러 개 보여주기¶

subplot
plt.subplot(n1, n2, n)
n1: 전체 그림의 세로 개수
n2: 전체 그림의 가로 개수
n : 현재 그림의 위치 (왼쪽 위부터 오른쪽으로 1,2,3,...)
&nbsp : 0이 아니고 1부터 시작

import numpy as np 
import matplotlib.pyplot as plt 
%matplotlib inline 

plt.figure(figsize=(10,5)) #figure 지정
plt.subplots_adjust(wspace=0.3, hspace=0.3) #그래프 간 간격지정 
for i in range(6):
    plt.subplot(2,3,i+1) #그래프의 위치 지정/ 2행 3열, 1번부터 시작 
    plt.title(i+1)
    plt.plot(x, f2(x,i), color="hotpink")
    plt.ylim(-25,25)
    plt.grid(True)
plt.show()

import numpy as np 
import matplotlib.pyplot as plt 
%matplotlib inline 

def f3(x0,x1):
    r= 2 * x0**2 + x1**2 
    ans= r * np.exp(-r)
    return ans 
xn=9
x0= np.linspace(-2,2,xn)
x1= np.linspace(-2,2,xn)
y= np.zeros((len(x0), len(x1))) # 0으로 만들어줌
for i0 in range(xn):
    for i1 in range(xn):
        y[i1, i0]= f3(x0[i0], x1[i1])

print(x0)

[-2.         -1.91836735 -1.83673469 -1.75510204 -1.67346939 -1.59183673
 -1.51020408 -1.42857143 -1.34693878 -1.26530612 -1.18367347 -1.10204082
 -1.02040816 -0.93877551 -0.85714286 -0.7755102  -0.69387755 -0.6122449
 -0.53061224 -0.44897959 -0.36734694 -0.28571429 -0.20408163 -0.12244898
 -0.04081633  0.04081633  0.12244898  0.20408163  0.28571429  0.36734694
  0.44897959  0.53061224  0.6122449   0.69387755  0.7755102   0.85714286
  0.93877551  1.02040816  1.10204082  1.18367347  1.26530612  1.34693878
  1.42857143  1.51020408  1.59183673  1.67346939  1.75510204  1.83673469
  1.91836735  2.        ]

print(y)

[[7.37305482e-05 1.32338877e-04 2.31126710e-04 ... 2.31126710e-04
  1.32338877e-04 7.37305482e-05]
 [9.88167049e-05 1.77092462e-04 3.08776613e-04 ... 3.08776613e-04
  1.77092462e-04 9.88167049e-05]
 [1.30739998e-04 2.33937265e-04 4.07205749e-04 ... 4.07205749e-04
  2.33937265e-04 1.30739998e-04]
 ...
 [1.30739998e-04 2.33937265e-04 4.07205749e-04 ... 4.07205749e-04
  2.33937265e-04 1.30739998e-04]
 [9.88167049e-05 1.77092462e-04 3.08776613e-04 ... 3.08776613e-04
  1.77092462e-04 9.88167049e-05]
 [7.37305482e-05 1.32338877e-04 2.31126710e-04 ... 2.31126710e-04
  1.32338877e-04 7.37305482e-05]]

print(np.round(y,1))

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

import numpy as np 
import matplotlib.pyplot as plt 
%matplotlib inline

plt.figure(figsize=(7,5))
plt.gray()
plt.pcolor(y)
plt.colorbar()
plt.show()

import numpy as np 
import matplotlib.pyplot as plt 
%matplotlib inline

plt.figure(figsize=(7,5))
plt.pcolor(y,cmap='hot')
plt.colorbar()
plt.show()

surface¶

from mpl_toolkits.mplot3d import Axes3D

xx0,xx1 = np.meshgrid(x0,x1)

plt.figure(figsize=(5,3.5))
ax = plt.subplot(1,1,1,projection='3d')
ax.plot_surface(xx0,xx1,y,rstride=1,cstride=1,alpha=0.3,color='blue',edgecolor='black')
#(x값, y값, z값, row step size(1:전부표시, 2:한칸씩 띄워서표시), column step size, alpha=투명도, color, edgecolor=선의 색)

ax.set_zticks((0,0.2))  # z의 눈금을 0.~0.2로 제한
ax.view_init(75,-95)    # ax.view_init(인수1,인수2)
plt.show()              # 인수1 : 상하회전 각도(0:옆/90/위)
                        # 인수2 : 좌우 회전각도 양수(시계방향)

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

xn = 50
x0 = np.linspace(-2,2,xn)
x1 = np.linspace(-2,2,xn)
y = np.zeros((len(x0),len(x1)))

for i0 in range(xn):
    for i1 in range(xn):
        y[i0, i1] = f3(x0[i0], x1[i1])
        
xx0, xx1 = np.meshgrid(x0,x1)
plt.figure(1, figsize= (4,4))
cont = plt.contour(xx0,xx1,y,6, colors='black')
#숫자 5는 등고선의 높이를 6단계로 지시 

cont.clabel(fmt='%3.2f',fontsize=8) 
#plt.contour의 반환값을 cont에 저장하고 cont.clabel로 출력

plt.xlabel("$x_0$", fontsize=14)
plt.ylabel("$x_1$", fontsize=14)
plt.show()

import numpy as np

import numpy as np 
A= np.arange(0,15,2)
print(A)

[ 0  2  4  6  8 10 12 14]

print(A.shape)
for i in range(A.size):
    print("A[{}] : {}".format(i,A[i]))

(8,)
A[0] : 0
A[1] : 2
A[2] : 4
A[3] : 6
A[4] : 8
A[5] : 10
A[6] : 12
A[7] : 14

# -1부터 -8까지 나오도록 출력 
print(A.shape)
for i in range(A.size):
    print("A[{}] : {}".format(-(i+1),A[-(i+1)]))

(8,)
A[-1] : 14
A[-2] : 12
A[-3] : 10
A[-4] : 8
A[-5] : 6
A[-6] : 4
A[-7] : 2
A[-8] : 0

import numpy as  np

A= np.arange(12).reshape(3,4)
print(A)
print("A.ndim:",A.ndim)
print(A.shape, A.shape[0], A.shape[1])

for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        print("A[{0}][{1}] : {2} ".format(i,j,A[i][j]), end=" ")
    print()
    
print()
for i in range(A.shape[0]):
    print("A[{0}][{1}] : {2:2d}". format(i,j,A[i][j]), end=" ")

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
A.ndim: 2
(3, 4) 3 4
A[0][0] : 0  A[0][1] : 1  A[0][2] : 2  A[0][3] : 3  
A[1][0] : 4  A[1][1] : 5  A[1][2] : 6  A[1][3] : 7  
A[2][0] : 8  A[2][1] : 9  A[2][2] : 10  A[2][3] : 11  

A[0][3] :  3 A[1][3] :  7 A[2][3] : 11

import numpy as  np

A= np.arange(9).reshape(3,3)
print(A)
print()
for row in A: #A => [[0.1,2], [3,4,5], [6,7,8]]
    print(row)
print()
print(A.T) #전치행렬: 행과 열의 값을 바꿈 
print()
for column in A.T:
    print(column)

[[0 1 2]
 [3 4 5]
 [6 7 8]]

[0 1 2]
[3 4 5]
[6 7 8]

[[0 3 6]
 [1 4 7]
 [2 5 8]]

[0 3 6]
[1 4 7]
[2 5 8]

flat¶

import numpy as  np
A=np.arange(9).reshape(3,3)
print(A)
print()
for a in A.flat: #개별 원소로 바뀜
    print(a, end=" ")

[[0 1 2]
 [3 4 5]
 [6 7 8]]

0 1 2 3 4 5 6 7 8

import numpy as  np
A= np.arange(0,20,2) #1차원 배열 생성 
print(A)

[ 0  2  4  6  8 10 12 14 16 18]

print(A[:]) #전체원소 
print(A[0:3])
print(A[:3]) # 0에서 시작하면 생략하능

[ 0  2  4  6  8 10 12 14 16 18]
[0 2 4]
[0 2 4]

print(A[7:10])
print(A[7:]) # 끝 인덱스 생략가능

[14 16 18]
[14 16 18]

print(A[::2]) #첫번째 원소부터 2씩 건너뜀

[ 0  4  8 12 16]

print(A[:-2]) #마지막 2개 원소 제외 
print(A[-10:-2])

print(A[-2:]) #마지막 원소 2개만 취함

[ 0  2  4  6  8 10 12 14]
[ 0  2  4  6  8 10 12 14]
[16 18]

print(A[0:3].shape)
print(A[0:3].ndim)

(3,)
1

print(A)
A[0:3]=100
print(A)

[ 0  2  4  6  8 10 12 14 16 18]
[100 100 100   6   8  10  12  14  16  18]

import numpy as  np
A=np.arange(1,13).reshape(3,4)
print(A)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

각 원소에 접근¶

A[row][col] or A[row,col]

for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        print("A[{0}][{1}] : {2:2d} or A[{3}, {4}] : {5:2d}".format(i, j, A[i][j], i, j, A[i, j]))

A[0][0] :  1 or A[0, 0] :  1
A[0][1] :  2 or A[0, 1] :  2
A[0][2] :  3 or A[0, 2] :  3
A[0][3] :  4 or A[0, 3] :  4
A[1][0] :  5 or A[1, 0] :  5
A[1][1] :  6 or A[1, 1] :  6
A[1][2] :  7 or A[1, 2] :  7
A[1][3] :  8 or A[1, 3] :  8
A[2][0] :  9 or A[2, 0] :  9
A[2][1] : 10 or A[2, 1] : 10
A[2][2] : 11 or A[2, 2] : 11
A[2][3] : 12 or A[2, 3] : 12

for i in range(A.shape[0]):
     print("A[{0}] : {1} or A[{2},:] : {3}".format(i,A[i],i,A[i,:]))

A[0] : [1 2 3 4] or A[0,:] : [1 2 3 4]
A[1] : [5 6 7 8] or A[1,:] : [5 6 7 8]
A[2] : [ 9 10 11 12] or A[2,:] : [ 9 10 11 12]

print(A)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

print(A[:2,:2])

[[1 2]
 [5 6]]

A[:2,:2]=0
print(A)

[[ 0  0  3  4]
 [ 0  0  7  8]
 [ 9 10 11 12]]

import numpy as  np
A=np.arange(1,13).reshape(3,4)
print(A)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

print(A[0])
print(A[:,1])
print(A[...,1])

[1 2 3 4]
[ 2  6 10]
[ 2  6 10]

import numpy as  np
A=np.array([2,4,6,8]).reshape(2,2)
B=np.array([2,2,2,2]).reshape(2,2)
print(A)
print(B)

[[2 4]
 [6 8]]
[[2 2]
 [2 2]]

print(A+B)
print()
print(A-B)
print()
print(A*B)
print()
print(A/B)

[[ 4  6]
 [ 8 10]]

[[0 2]
 [4 6]]

[[ 4  8]
 [12 16]]

[[1. 2.]
 [3. 4.]]

print(A@B)

[[12 12]
 [28 28]]

print(np.dot(A,B))
print(A.dot(B))

[[12 12]
 [28 28]]
[[12 12]
 [28 28]]

다른 크기의 배열간의 산술연산(브로드캐스팅)¶

A.shape= (2,3,4,5)
B.shape= (5,) => B의 차원의 개수가 같아지도록 B shape 왼쪽에 1추가
B.shape= (1,1,1,5)

두 배열을 비교
대응하는 차원이 같거나 한쪽이 1인 경우에만 브로드캐스팅 가능
참고) 행렬 곱셈 규칙 : 행렬(m x n) x 행렬(n x a) -> (m xa) 행렬
브로드캐스팅해서 행렬곱셈은 적용안됨..

1Q
A.shape : (2,1,3,1)
B.shape : (6,1,3)

=> 브로드캐스팅이 가능한가?

1A)
B의 차원의 개수가 같아지도록 함
A.shape : (2,1,3,1)
B.shape : (6,1,3)
B.shape : (1,6,1,3)

-> 가능하다.

2Q)
A.shape : (2,3,4)
B.shape : (3,1)

=> 브로드캐스팅이 가능한가?

2A)
A.shape : (2,3,4)
B.shape : (3,1)
B.shape : (1,3,1)

-> 가능하다

3Q)
A.shape : (3,)
B.shape : (4,)
=> 브로드캐스팅이 가능한가?

3A)
A.shape : (3,)
B.shape : (4,)
-> 브로드캐스팅 불가능

4Q)
A.shape : (3,3,5)
B.shape : ()
=> 브로드캐스팅이 가능한가?

4A)
A.shape : (3,3,5)
B.shape : (1,1,1)

-> 브로드캐스팅 가능

+
A.shape: (3,4)
B.shape: (3,) => (1,3)

-> 브로드캐스팅 NO

import numpy as  np
A=np.arange(10,130,10).reshape(3,4)
print(A)
print(A.shape)
print()
B=np.arange(1,5)
print(B)
print(B.shape)

[[ 10  20  30  40]
 [ 50  60  70  80]
 [ 90 100 110 120]]
(3, 4)

[1 2 3 4]
(4,)

A+B

array([[ 11,  22,  33,  44],
       [ 51,  62,  73,  84],
       [ 91, 102, 113, 124]])

import numpy as  np
A=np.arange(10,130,10).reshape(3,4)
print(A)
print(A.shape)
print()
B=np.arange(1,4)
print(B)
print(B.shape)

[[ 10  20  30  40]
 [ 50  60  70  80]
 [ 90 100 110 120]]
(3, 4)

[1 2 3]
(3,)

A+B #브로드캐스팅 no

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-139-c8aa45903e18> in <module>
----> 1 A+B #브로드캐스팅 no

ValueError: operands could not be broadcast together with shapes (3,4) (3,)

import numpy as  np
A= np.arange(10,130,10).reshape(3,4)
print(A)
print()
B=5
A+B

[[ 10  20  30  40]
 [ 50  60  70  80]
 [ 90 100 110 120]]

array([[ 15,  25,  35,  45],
       [ 55,  65,  75,  85],
       [ 95, 105, 115, 125]])

A*B

array([[ 50, 100, 150, 200],
       [250, 300, 350, 400],
       [450, 500, 550, 600]])

import numpy as np
A = np.arange(10,130,10).reshape(3,4)
print(A)
print(A.shape)
B = np.arange(1,5)
print(B)
print(B.shape)

[[ 10  20  30  40]
 [ 50  60  70  80]
 [ 90 100 110 120]]
(3, 4)
[1 2 3 4]
(4,)

A*B

array([[ 10,  40,  90, 160],
       [ 50, 120, 210, 320],
       [ 90, 200, 330, 480]])

A@B

array([ 300,  700, 1100])

B= np.array([1, 2, 3])
print(B.shape)

(3,)

A= np.array([[1], [2], [3]])
print(A.shape)

(3, 1)

C= np.array([[[1, 2, 3, 4], [4, 5, 6,7],[1, 2, 3, 4]], [[1, 2, 3, 4],[1, 2, 3, 4], [4, 5, 6,7]], ])
print(C.shape)

(2, 3, 4)

A*C

array([[[ 1,  2,  3,  4],
        [ 8, 10, 12, 14],
        [ 3,  6,  9, 12]],

       [[ 1,  2,  3,  4],
        [ 2,  4,  6,  8],
        [12, 15, 18, 21]]])

B*C

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-181-03507f8da0a7> in <module>
----> 1 B*C

ValueError: operands could not be broadcast together with shapes (3,) (2,3,4)

C*B

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-182-a5fce2007bf4> in <module>
----> 1 C*B

ValueError: operands could not be broadcast together with shapes (2,3,4) (3,)

C*A

array([[[ 10,  40,  90, 160],
        [200, 300, 420, 560],
        [ 90, 200, 330, 480]],

       [[ 10,  40,  90, 160],
        [ 50, 120, 210, 320],
        [360, 500, 660, 840]]])

Numpy 설치¶

파이썬으로 수치계산을 하기 위한 라이브러리
다차원 배열을 효율적으로 구현한 Numpy배열간 빠른 연산을 할 수 있는 루틴 제공
pip install numpy
conda install numpy

import numpy as np

리스트: 여러 데이터 타입의 데이터 원소를 가질 수 있음
Numpy : 단일 데이터 타입의 데이터 원소를 가질 수 있음

import numpy as np
A= np.array([1,2,3])

A

array([1, 2, 3])

A= [1, 2, 3]
B= [-1, -2 -3]
C= []
for a, b in zip(A, B):
    C.append(a+b)
print(C)

[0, -3]

import numpy as np 
A= np.array([1, 2, 3])
B= np.array([-1, -2, -3])
C= A+B
print(C)

[0 0 0]

len(A)

3

import numpy as np
a= np.array([0.1, 0.2, 0.3])
print(a)
print(a.dtype)
print(type(a[0]))

[0.1 0.2 0.3]
float64
<class 'numpy.float64'>

import numpy as np
b= np.array([1, 2, 3])
print(b)
print(b.dtype)
print(type(b[0]))

[1 2 3]
int32
<class 'numpy.int32'>

dtype= ' '을 사용해서 데이터 타입 지정

c= np.array([1, 2, 3])

print(c)
print(c.dtype)

[1 2 3]
int32

d= np.array([1.1, 2.2, 3.3, 4.7])
print(d.dtype)

float64

**astype(자료형)으로 데이터 타입 변환

e= d.astype(np.int32)
print(e.dtype)

int32

- numpy.ndarray¶

numpy 배열의 타입은 numpy.ndarray 클래스임

import numpy as np
A= np.array([[1, 2, 3], [4, 5, 6]])
print(A)
print(type(A))

[[1 2 3]
 [4 5 6]]
<class 'numpy.ndarray'>

ndarray.ndim¶

배열을 구성하는 차원의 개수

A= np.array([[1, 2, 3], [4, 5, 6]])
print(A.ndim)

2

B= np.array([1, 2, 3])
print(B.ndim)

1

C= np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(C.ndim)

3

ndarray.shape¶

배열을 구성하는 차원의 개수와 차원별 크기를 튜플로 나타냄

B= np.array([1, 2, 3])
print(B.ndim)
print(B.shape)

1
(3,)

A= np.array([[1, 2, 3], [4, 5, 6]])
print(A.ndim)
print(A.shape)

2
(2, 3)

C= np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(C.ndim)
print(C.shape)

3
(2, 2, 3)

ndarray.size¶

배열에 있는 모든 원소의 개수(shape의 모든 원소를 곱한 값)

import numpy as np
B= np.array([1, 2, 3])
print(B.ndim)
print(B.shape)
print(B.size)

1
(3,)
3

A= np.array([[1, 2, 3], [4, 5, 6]])
print(A.ndim)
print(A.shape)
print(A.size)

E=np.array([[1, 2, 3,4], [4, 5, 6,7], [7,8,9,10]])
print(E.shape)
E

2
(2, 3)
6
(3, 4)

array([[ 1,  2,  3,  4],
       [ 4,  5,  6,  7],
       [ 7,  8,  9, 10]])

C= np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(C.ndim)
print(C.shape)
print(C.size)

3
(2, 2, 3)
12

ndarray.dtype¶

배열원소의 데이터 타입을 나타냄
### ndarray.itemsize
배열원소 하나의 바이트 크기
### ndarray.data
배열원소를 실제로 저장하고 있는 버퍼

B= np.array([1, 2, 3])
print(B.dtype)
print(B.itemsize)
print(B.data)

int32
4
<memory at 0x00000194673B2040>

A= np.array([[1, 2, 3], [4, 5, 6]])
print(A.dtype)
print(A.itemsize)
print(A.data)

int32
4
<memory at 0x0000019465F77040>

C= np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(C.dtype)
print(C.itemsize)
print(C.data)

int32
4
<memory at 0x000001946588DE50>

min/max/sum/mean¶

b= np.array([1, 2, 3, 4, 5, 6])
print(b.max(), end=" ")
print(b.min(), end=" ")
print(b.sum(), end=" ")
print(b.mean(), end=" ")

6 1 21 3.5

c= np.array([[1, 2, 3], [4, 5, 6]])
c

array([[1, 2, 3],
       [4, 5, 6]])

c.sum(axis=0) # 열방향으로 계산

array([5, 7, 9])

c.sum(axis=1) #행방향으로 계산

array([ 6, 15])

!conda install -c conda-forge opencv

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

import cv2

import cv2
print(cv2.__version__)

import cv2

image = cv2.imread("aa/balloon.jpg", cv2.IMREAD_COLOR)
cv2.imshow("Ball", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

import cv2

image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow("Gray", image_gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

#차원의 개수
print(image.ndim)
print(image_gray.ndim)

3
2

# 배열을 구성하는 차원의 개수와 차원별 크기를 튜플로 나타냄
print(image.shape)
print(image_gray.shape)

(230, 219, 3)
(230, 219)

넘파이 배열 원소 접근방법¶

import numpy as np
A= np.array([1,2,3])
A

array([1, 2, 3])

print(A.shape)

(3,)

for i in range(A.size):
    print(A[i])

1
2
3

B= np.array([[1,2,3], [4,5,6]])
print(B)

[[1 2 3]
 [4 5 6]]

B[1]

array([4, 5, 6])

print(B[0,2])

3

C= np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

print(C[1,0,2])

9

import numpy as np
C= np.arange(24).reshape(2,3,4)
print(C.shape)

(2, 3, 4)

print(np.arange(10000). reshape(100,100))

[[   0    1    2 ...   97   98   99]
 [ 100  101  102 ...  197  198  199]
 [ 200  201  202 ...  297  298  299]
 ...
 [9700 9701 9702 ... 9797 9798 9799]
 [9800 9801 9802 ... 9897 9898 9899]
 [9900 9901 9902 ... 9997 9998 9999]]

import sys 
np.set_printoptions(threshold=sys.maxsize)
print(np.arange(200). reshape(2,100))

[[  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
   18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
   36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53
   54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71
   72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
   90  91  92  93  94  95  96  97  98  99]
 [100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
  118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
  136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
  154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171
  172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189
  190 191 192 193 194 195 196 197 198 199]]

import timeit
timeit.timeit('B**2', setup='import numpy as np; B=np.arange(100)')

0.6282845999994606

import timeit
timeit.timeit('[i**2 for i in A]', setup='A=range(100)')

20.08197300000029

import numpy as np
a=np.array([1,2,3,4]) 
print(a.shape)

(4,)

a.shape=1,4

print(a)

[[1 2 3 4]]

a.shape

(1, 4)

a.shape=4,1

print(a)

[[1]
 [2]
 [3]
 [4]]

print(a.shape)

(4, 1)

배열 생성 및 shape 변환¶

import numpy as np
A=np.zeros((2,3))
print(A)
print(A.dtype)
#기본은  float64

[[0. 0. 0.]
 [0. 0. 0.]]
float64

import numpy as np
A=np.zeros((2,3), dtype='uint8')
print(A)
print(A.dtype)

[[0 0 0]
 [0 0 0]]
uint8

import numpy as np
A=np.ones((2,3))
print(A)
print(A.dtype)

[[1. 1. 1.]
 [1. 1. 1.]]
float64

import numpy as np
A=np.ones((2,3), dtype='uint8')
print(A)
print(A.dtype)

[[1 1 1]
 [1 1 1]]
uint8

np.empty(): 초기화 하지 않고 배열 공간 할당하기¶

C=np.empty((5,5))
print(C)
print()
print(C.dtype)

[[6.23042070e-307 4.67296746e-307 1.69121096e-306 8.34441742e-308
  1.37961302e-306]
 [9.34593493e-307 1.33511290e-306 1.33511969e-306 1.69120416e-306
  1.78022342e-306]
 [8.34449382e-308 1.06811422e-306 1.00132143e-307 8.34423068e-308
  8.90071135e-308]
 [1.11257937e-307 1.78022342e-306 2.44776798e-307 1.69119330e-306
  1.69122046e-306]
 [1.86921958e-306 6.89804133e-307 1.11261162e-306 8.34443015e-308
  2.22813476e-312]]

float64

np.random.random()¶

0~1 사이의 실수를 랜덤으로 생성

import random
import numpy as np 
D= np.random.random((3,3))
print(D)

[[0.2642987  0.05553899 0.24423954]
 [0.32598247 0.52201816 0.66689007]
 [0.45552892 0.47999297 0.35326953]]

np.random.randint(시작,끝(가로, 세로))¶

시작~끝 사이의 실수를 랜덤으로 생성

import random
import numpy as np 
E= np.random.randint(1,10,(2,3))
print(E)

[[4 4 1]
 [8 9 9]]

연속원소 배열 생성함수¶

np.arrange(시작,마지막,간격)
np.arrange(시작,마지막)
np.arrange(마지막)

# 0~50범위에서 5간격으로 숫자를 뽑아 배열 생성
A=np.arange(0,50,5)
print(A)

# 0.1~2.5사이에서 1간격으로 숫자를 뽑아 배열 생성 
B=np.arange(0.1,2.5,1)
print(B)

# 0<= x <10 내에서 배열 생성
C=np.arange(0,10)
print(C)

[ 0  5 10 15 20 25 30 35 40 45]
[0.1 1.1 2.1]
[0 1 2 3 4 5 6 7 8 9]

np.linspace()¶

지정한 범위 내에서 원하는 원소 개수로 숫자를 뽑아냄
np.linspace(시작값, 마지막값, 샘플개수)
np.linspace(시작값, 마지막값)

A=np.linspace(0,10,10)
print(A.size)
print(A)

10
[ 0.          1.11111111  2.22222222  3.33333333  4.44444444  5.55555556
  6.66666667  7.77777778  8.88888889 10.        ]

#샘플 개수를 정하지 않으면 기본값은 50
B= np.linspace(0,10)
print(B.size)
print(B)

50
[ 0.          0.20408163  0.40816327  0.6122449   0.81632653  1.02040816
  1.2244898   1.42857143  1.63265306  1.83673469  2.04081633  2.24489796
  2.44897959  2.65306122  2.85714286  3.06122449  3.26530612  3.46938776
  3.67346939  3.87755102  4.08163265  4.28571429  4.48979592  4.69387755
  4.89795918  5.10204082  5.30612245  5.51020408  5.71428571  5.91836735
  6.12244898  6.32653061  6.53061224  6.73469388  6.93877551  7.14285714
  7.34693878  7.55102041  7.75510204  7.95918367  8.16326531  8.36734694
  8.57142857  8.7755102   8.97959184  9.18367347  9.3877551   9.59183673
  9.79591837 10.        ]

# 0~15까지 16개 1차원 배열 

A=np.arange(16)

B= A.reshape(4,4)

print(A)
print(B)
print(A.shape)
print(B.shape)
print(B.base)
print(B.base is A) # B가 배열A의 데이터가 저장된 공간을 공유함

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
(16,)
(4, 4)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
True

B[0]=-1
print(A)
print(B)

[-1 -1 -1 -1  4  5  6  7  8  9 10 11 12 13 14 15]
[[-1 -1 -1 -1]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

C==B.reshape(2,8).copy()
C[0]= 0
print(C)
print(B)

[0 1 2 3 4 5 6 7 8 9]
[[-1 -1 -1 -1]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

C=np.arange(16)
print(C)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]

D=C.reshape(8,-1)
print(D)

[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]
 [12 13]
 [14 15]]

E= C.reshape(-1,8)
print(E)

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]]

ravel()¶

주어진 배열을 1차원 배열로 변환하여 리턴

A= np.array([[1,2], [3,4]])
print(A)

[[1 2]
 [3 4]]

B=A.ravel()
print(B)
print(B.shape)

[1 2 3 4]
(4,)

print(B.base is A)

True

A=np.array([[[1,2], [3,4]], [[1,2], [3,4]]])
print(A)

[[[1 2]
  [3 4]]

 [[1 2]
  [3 4]]]

B= A.ravel() # 몇차원이든 1차원으로 
print(B)
print(B.shape)

[1 2 3 4 1 2 3 4]
(8,)

newaxis()¶

차원을 증가시킴

a= np.array([1,2,3])
a=a[:, np.newaxis]
print(a.shape)
print(a)

(3, 1)
[[1]
 [2]
 [3]]

a= np.array([1,2,3])
a=a[np.newaxis, :]
print(a.shape)
print(a)

(1, 3)
[[1 2 3]]

hstack(가로로 결합)/ vstack (세로로 결합)¶

import numpy as np 
A= np.array([[1,2], [3,4]])
B= np.array([[1,0], [0,1]])
print(A)
print(B)
C=np.hstack((A,B))
print(C)
D= np.vstack((A,B))
print(D)

[[1 2]
 [3 4]]
[[1 0]
 [0 1]]
[[1 2 1 0]
 [3 4 0 1]]
[[1 2]
 [3 4]
 [1 0]
 [0 1]]

column_stack()¶

1차원 벡터를 2차원 벡터로 만들기

a=np.array([1,2,3])
b=np.array([4,5,6])
c=np.array([7,8,9])
D=np.column_stack((a,b,c))
print(D)

[[1 4 7]
 [2 5 8]
 [3 6 9]]

concatenate((배열1, 배열2), axis=1)¶

지정한 방향을 배열 결합
axis= 0 :열방향
axis= 1 :행방향

import numpy as np 
A= np.array([[1,2], [3,4]])
B= np.array([[1,0], [0,1]])

C= np.concatenate((A,B), axis=0) #열방향, 세로로 
print(C)
print()
D= np.concatenate((A,B), axis=1) #열방향, 가로로
print(D)

[[1 2]
 [3 4]
 [1 0]
 [0 1]]

[[1 2 1 0]
 [3 4 0 1]]

hsplit/ vsplit¶

가로 또는 세로로 자르기

B= np.arange(18).reshape(3,6)
print(B)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]]

C=np.hsplit(B,3)
print(C)

[array([[ 0,  1],
       [ 6,  7],
       [12, 13]]), array([[ 2,  3],
       [ 8,  9],
       [14, 15]]), array([[ 4,  5],
       [10, 11],
       [16, 17]])]

b=np.hsplit(B, (2,4))
print(b)

[array([[ 0,  1],
       [ 6,  7],
       [12, 13]]), array([[ 2,  3],
       [ 8,  9],
       [14, 15]]), array([[ 4,  5],
       [10, 11],
       [16, 17]])]

B= np.arange(18).reshape(6,3)
print(B)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]
 [15 16 17]]

b=np.vsplit(B, (2,4))
print(b)

[array([[0, 1, 2],
       [3, 4, 5]]), array([[ 6,  7,  8],
       [ 9, 10, 11]]), array([[12, 13, 14],
       [15, 16, 17]])]

balance=8000
def deposit(money):
        global balance #global 전역변수 
        balance +=money

def inquire():
    print("잔액은 {}원 입니다.".format(balance))

deposit(1000) #예금 
inquire() #잔액확인

잔액은 9000원 입니다.

class Account:
    def __init__(self, balance):
        self.balance=balance
    def deposit(self, money):
        self.balance +=money
    def inquire(self):
        print("잔액은 {}원 입니다.".format(self.balance))

kb=Account(80000)
kb.deposit(10000)
kb.inquire()

잔액은 90000원 입니다.

kakao=Account(100000)
kakao.deposit(200000)
kakao.inquire()

잔액은 300000원 입니다.

class Human:
    def __init__(self, name, age):
        self.age=age
        self.name=name 
    def intro(self): 
        print(str(self.age)+"살 "+self.name+"입니다.")

class Student(Human): #부모 클래스가 Human을 상속받은 클래스 
    def __init__(self, name, age, student_id):
        super().__init__(name,age)
        self.student_id=student_id
    def intro(self):
        super().intro() 
        print("학번 : "+str(self.student_id))
    def study(self):
        print("파이썬 공부: class")

#객체 생성 

kim = Human("홍길동", 37)
kim.intro()

37살 홍길동입니다.

oh=Student("김영희", 31, 2020009)
oh.intro()
oh.study()

31살 김영희입니다.
학번 : 2020009
파이썬 공부: class

class Date:
    def __init__(self, month):
        self.month=month    
    def getmonth(self):
        return self.month
    def setmonth(self, month):
        if 1<=month <=12:
            self.month=month

today= Date(9)
today.setmonth(19)
print(today.getmonth())

9

today= Date(9)
today.month=15
print(today.month)

15

class Date: # __는 직접 참조가 불가능하다 
    def __init__(self, month):
        self.inner_month= month    
    def getmonth(self):
        return self.inner_month
    def setmonth(self, month):
        if 1<=month<=12:
            self.inner_month=month
    month=property(getmonth, setmonth) #property: 외부에서 클래스 내부 변수를 참조하기 위한 함수

today=Date(9)
today.month=15
print(today.month)

9

today=Date(9)
today.inner_month=15
print(today.month)

15

class Date:
    def __init__(self, month):
        self.__month=month    
    def getmonth(self):
        return self.__month
    def setmonth(self, month):
        if 1<=month <=12:
            self.__month=month
    month=property(getmonth, setmonth)

today=Date(9)
today.month=15
print(today.month)

9

today=Date(9)
today.inner_month=15
print(today.month)

9

class Car:
    count=0 
    def __init__(self, name):
        self.name= name
        Car.count +=1
    
    @classmethod
    def outcount(cls):
         print(cls.count)

k3= Car("K3")
k5= Car("K5")
Car.outcount()

2

#static method 
class Car:
    @staticmethod
    def hello():
        print("안전운전")
    count=0 
    def __init__(self, name):
        self.name= name
        Car.count +=1
    
    @classmethod
    def outcount(cls):
         print(cls.count)

Car.hello()

안전운전

#연산자 메서드 
class Human:
    def __init__(self, name, age):
        self.age= age
        self.name= name
    def __eq__(self, other):
        return self.age== other.age and self.name == other.name

kim= Human("짱구", 25)
ch=Human("짱구", 25)
lee= Human("철수", 26)

print( kim== ch)
print( kim== lee)

True
False

#특수 연산자 메서드
class Human:
    def __init__(self, name, age):
        self.age= age
        self.name= name
    def __str__(self): 
        return "이름{}, 나이{}". format(self.name, self.age)
    def __len__(self): 
        return self.age

kim=("짱구", 25)
print(kim)
print(len(kim))

('짱구', 25)
2

f=0.1 
sum=0
for i in range(100):
    sum+=f 
print(sum)

9.99999999999998

#오차없이 10진수형태의 실수를 표현하는 클래스: Decimal 
from decimal import Decimal
Decimal(123)            #정수형
Decimal('3.14')         #실수 문자열 
Decimal(('3.14e3'))     # 부동소수점 형태
Decimal((0,(3,1,4), -2)) #튜플형태 3.14

Decimal('3.14')

from decimal import Decimal
f=Decimal('0.1') 
sum=0
for i in range(100):
    sum+=f 
print(sum)

10.0

from fractions import * 
a=Fraction(1,3)
print(a)
b=Fraction(8,14)
print(b)

1/3
4/7

from fractions import * 
a=Fraction(2,3)
b=Fraction(3,5)
c=a+b
print(c) #Fraction간 연산은 분수로

19/15

d= c+0.1 #Fraction과 실수의 계산은 실수로 
print(d)

1.3666666666666667

array(타입코드, [초기값])

import array
ar = array.array('i', [33, 44, 55, 67, 89])

for a in ar: 
    print(a, end=",")
print()
ar.append(100)
del ar[0]
for a in ar: 
    print(a, end=",")
print()
print("ar[1]: ", ar[1])
print("ar[2:4]: ", ar[2:4])

33,44,55,67,89,
44,55,67,89,100,
ar[1]:  55
ar[2:4]:  array('i', [67, 89])

score=[88,95,70,100,59]
for no, s in enumerate(score, 1):
    print(str(no)+ "번 학생의 성적 :", s)

1번 학생의 성적 : 88
2번 학생의 성적 : 95
3번 학생의 성적 : 70
4번 학생의 성적 : 100
5번 학생의 성적 : 59

enumerate(score, n)
: n부터 시작하는 순서값과 요소값을 튜플로 생성 후 리턴
ex) (1, 88), (2,95), (3,70), (4,100), (5,59)

yoil=["월", "화", "수", "목", "금", "토", "일"]
food=["갈비탕", "순대국", "김밥", "삼겹살", "짜장면"]
menu= zip(yoil, food)
for y, f in menu:
    print("{}요일 메뉴: {}".format(y,f))

월요일 메뉴: 갈비탕
화요일 메뉴: 순대국
수요일 메뉴: 김밥
목요일 메뉴: 삼겹살
금요일 메뉴: 짜장면

zip(list1, list2,...)
: 여러 개의 컬렉션을 합쳐 하나로 만든다
: 두 리스트의 대응되는 요소끼리 짝을 지어 튜플 리스트를 생성
: 두 리스트의 길이가 달라도 짧은 쪽의 길이에 맞춤
("월", "갈비탕), ("화", "순대국), ("수" "김밥"), ("목", "삼겹살"), ("금", "짜장면")

dict(zip(yoil,food))

{'월': '갈비탕', '화': '순대국', '수': '김밥', '목': '삼겹살', '금': '짜장면'}

#any / all 
adult= [True, True, True, True]
print("any:",any(adult)) #하나라도 True면 True
print("all:",all(adult)) #모두 True일 때 True

adult= [True, False, False, False]
print("any:",any(adult)) 
print("all:",all(adult))

any: True
all: True
any: True
all: False

filter[조건지정 함수, 대상 리스트]
: filter함수는 리스트의 요소 중 조건에 맞는 것만 골라낸다.

# filter
def flunk(s):
    return s<60 

score =[45,89,72,53,99]
for s in filter(flunk, score):
    print(s)

45
53

map[조건지정 함수, 대상 리스트]
: map함수는 모든 요소에 대해 변환함수를 호출하여 새 요소값으로 구성된 리스트 생성

def half(s):
    return s / 2 
score =[45,89,72,53,94]
for s in map(half, score):
    print(s, end=",")

22.5,44.5,36.0,26.5,47.0,

new_score= map(half, score)
for s in new_score:
    print(s, end=",")

22.5,44.5,36.0,26.5,47.0,

def total(s,b):
    return s+b 
score =[45,89,72,53,94]
bonus= [2,3,7,0,5]
for s in map(total, score, bonus):
    print(s, end=" ")

47 92 79 53 99

람다(lamda) 함수¶

lamda 인수 : 식

lambda x:x+1

<function __main__.<lambda>(x)>

#위의 lambda를 함수로 표현하면 
def increase(x):
    return x+1

def flunk(s):
    return s<60 

score =[45,89,72,53,99]
for s in filter(lambda x: x<60, score):
    print(s)

45
53

def half(s):
    return s / 2 
score =[45,89,72,53,94]
for s in map(lambda x: x /2, score):
    print(s, end=",")

22.5,44.5,36.0,26.5,47.0,

깊은 복사와 얕은 복사¶

a=3 
b=a
print("a={} b={}".format(a,b))

a=5
print("a={} b={}".format(a,b))

a=3 b=3
a=5 b=3

list1=[1, 2, 3]
list2=list1
list2[1]=100 # 2 ->100으로 바꿈 
print(list1)
print(list2)

[1, 100, 3]
[1, 100, 3]

list1=[1, 2, 3]
list2=list1.copy()
list2[1]=100 # 2 ->100으로 바꿈 
print(list1)
print(list2)

[1, 2, 3]
[1, 100, 3]

list0=["a", "b"]
list1=[list0, 1, 2]
list2=list1.copy()

list2[0][1]= "c" #b ->c 
print(list1)
print(list2)

[['a', 'c'], 1, 2]
[['a', 'c'], 1, 2]

import copy 

list0=["a", "b"]
list1=[list0, 1, 2]
list2=copy.deepcopy(list1)

list2[0][1]= "c" #b ->c 
print(list1)
print(list2)

[['a', 'b'], 1, 2]
[['a', 'c'], 1, 2]

list1=[1,2,3] 
list2=list1
list3=list1.copy()

print("1==2:", list1 is list2)
print("1==3:", list1 is list3)
print("2==3:", list2 is list3)

1==2: True
1==3: False
2==3: False

a=1
b=a
print("a={} b={} : {}".format(a,b, a is b))

a=5
print("a={} b={} : {}".format(a,b, a is b))

a=301
b=a
print("a={} b={} : {}".format(a,b, a is b))

b=301
print("a={} b={} : {}".format(a,b, a is b))

a=1 b=1 : True
a=5 b=1 : False
a=301 b=301 : True
a=301 b=301 : False

텍스트 파일에서 특정 단어를 원하는 단어로 바꾸는 방법¶

1) 텍스트 파일 만들기¶

f=open('test2.txt', 'w', encoding="utf-8")
f.write(" 중앙재난안전대책본부는 10호 태풍 '하이선'이 북상함에 따라 오늘(6시) 오전 9시를 기해 태풍 위기경보를 '주의'에서 '경계'로, 중대본 비상대응 수위를 1단계에서 2단계로 각각 격상했습니다.\n위기경보는 관심-주의-경계-심각, 중대본 비상대응 수위는 1∼3단계 순으로 단계가 올라갑니다.\n중대본은 비상 2단계 상향 발령에 따라 관계부처와 지방자치단체에 비상근무체계를 강화해 태풍 대응에 모든 역량을 집중해달라고 지시했습니다.\n강한 비바람을 동반한 태풍이 동해안을 따라 북상할 것으로 예상되므로 해안가 저지대와 산사태 위험지역에서는 이날까지 대피명령 등을 활용해 사전대피를 철저히 이행하도록 했습니다.\n또 태풍이 우리나라에 직접적인 영향을 미치는 시간을 고려해 태풍 이동 경로에 있는 지역에서는 공공기관·민간기업의 출퇴근 시간과 학교 등하교 시간을 조정해 달라고 긴급 요청했습니다.\n하이선은 초속 49m의 매우 강한 태풍으로, 내일 오전 9시쯤 부산 동쪽 약 80㎞ 부근 해상에 도달한 뒤 동해안과 울릉도 사이 해상을 지나 밤 9시 북한 청진 남쪽 약 180㎞ 부근 해상으로 올라갈 것으로 전망됩니다.") 
f.close()

2) 읽어오기¶

f=open("test2.txt", 'r', encoding="utf-8")
print(f.read())
f.close()

중앙재난안전대책본부는 10호 태풍 '하이선'이 북상함에 따라 오늘(6시) 오전 9시를 기해 태풍 위기경보를 '주의'에서 '경계'로, 중대본 비상대응 수위를 1단계에서 2단계로 각각 격상했습니다.
위기경보는 관심-주의-경계-심각, 중대본 비상대응 수위는 1∼3단계 순으로 단계가 올라갑니다.
중대본은 비상 2단계 상향 발령에 따라 관계부처와 지방자치단체에 비상근무체계를 강화해 태풍 대응에 모든 역량을 집중해달라고 지시했습니다.
강한 비바람을 동반한 태풍이 동해안을 따라 북상할 것으로 예상되므로 해안가 저지대와 산사태 위험지역에서는 이날까지 대피명령 등을 활용해 사전대피를 철저히 이행하도록 했습니다.
또 태풍이 우리나라에 직접적인 영향을 미치는 시간을 고려해 태풍 이동 경로에 있는 지역에서는 공공기관·민간기업의 출퇴근 시간과 학교 등하교 시간을 조정해 달라고 긴급 요청했습니다.
하이선은 초속 49m의 매우 강한 태풍으로, 내일 오전 9시쯤 부산 동쪽 약 80㎞ 부근 해상에 도달한 뒤 동해안과 울릉도 사이 해상을 지나 밤 9시 북한 청진 남쪽 약 180㎞ 부근 해상으로 올라갈 것으로 전망됩니다.

f=open("test2.txt", 'r', encoding="utf-8")
while True:
    line=f.readline()
    if not line:
        break
    print(line.replace("\n", ""))
f.close()

 중앙재난안전대책본부는 10호 태풍 '하이선'이 북상함에 따라 오늘(6시) 오전 9시를 기해 태풍 위기경보를 '주의'에서 '경계'로, 중대본 비상대응 수위를 1단계에서 2단계로 각각 격상했습니다.
위기경보는 관심-주의-경계-심각, 중대본 비상대응 수위는 1∼3단계 순으로 단계가 올라갑니다.
중대본은 비상 2단계 상향 발령에 따라 관계부처와 지방자치단체에 비상근무체계를 강화해 태풍 대응에 모든 역량을 집중해달라고 지시했습니다.
강한 비바람을 동반한 태풍이 동해안을 따라 북상할 것으로 예상되므로 해안가 저지대와 산사태 위험지역에서는 이날까지 대피명령 등을 활용해 사전대피를 철저히 이행하도록 했습니다.
또 태풍이 우리나라에 직접적인 영향을 미치는 시간을 고려해 태풍 이동 경로에 있는 지역에서는 공공기관·민간기업의 출퇴근 시간과 학교 등하교 시간을 조정해 달라고 긴급 요청했습니다.
하이선은 초속 49m의 매우 강한 태풍으로, 내일 오전 9시쯤 부산 동쪽 약 80㎞ 부근 해상에 도달한 뒤 동해안과 울릉도 사이 해상을 지나 밤 9시 북한 청진 남쪽 약 180㎞ 부근 해상으로 올라갈 것으로 전망됩니다.

f=open("test2.txt", 'r', encoding="utf-8")
contents= f.read()
word_list= contents.split(" ")
line_list= contents.split("\n")

print("총 글자 수 : ", len(contents)) #공백포함
print("총 단어의 수 : ", len(word_list))
print("총 줄의 수 : ", len(line_list))

총 글자 수 :  561
총 단어의 수 :  120
총 줄의 수 :  6

3) 태풍을 햇빛으로 바꾸기¶

f=open("test2.txt",'r',encoding='utf-8')
f_out = open('out_test.txt', 'w',encoding='utf-8')
while True:
    line = f.readline()
    if not line:
        break
    if '태풍' in line :
        for i in range(line.count("태풍")):
            line=line.replace("태풍", '햇빛')
            f_out.write(line)
        print(line)

f_out.close()
f.close()

중앙재난안전대책본부는 10호 햇빛 '하이선'이 북상함에 따라 오늘(6시) 오전 9시를 기해 햇빛 위기경보를 '주의'에서 '경계'로, 중대본 비상대응 수위를 1단계에서 2단계로 각각 격상했습니다.

중대본은 비상 2단계 상향 발령에 따라 관계부처와 지방자치단체에 비상근무체계를 강화해 햇빛 대응에 모든 역량을 집중해달라고 지시했습니다.

강한 비바람을 동반한 햇빛이 동해안을 따라 북상할 것으로 예상되므로 해안가 저지대와 산사태 위험지역에서는 이날까지 대피명령 등을 활용해 사전대피를 철저히 이행하도록 했습니다.

또 햇빛이 우리나라에 직접적인 영향을 미치는 시간을 고려해 햇빛 이동 경로에 있는 지역에서는 공공기관·민간기업의 출퇴근 시간과 학교 등하교 시간을 조정해 달라고 긴급 요청했습니다.

하이선은 초속 49m의 매우 강한 햇빛으로, 내일 오전 9시쯤 부산 동쪽 약 80㎞ 부근 해상에 도달한 뒤 동해안과 울릉도 사이 해상을 지나 밤 9시 북한 청진 남쪽 약 180㎞ 부근 해상으로 올라갈 것으로 전망됩니다.

4) 바꾼 단어 앞에 숫자 넣기¶

f=open("test2.txt",'r',encoding='utf-8')
f_out = open('out_test.txt', 'w',encoding='utf-8')
count=1
while True:
    line = f.readline()
    if not line:
        break
    if '태풍' in line :
        for i in range(line.count("태풍")):
            line=line.replace("태풍",'{}) 햇빛'.format(count),1)
            f_out.write(line)
            count+=1
        print(line)

f_out.close()
f.close()

중앙재난안전대책본부는 10호 1) 햇빛 '하이선'이 북상함에 따라 오늘(6시) 오전 9시를 기해 2) 햇빛 위기경보를 '주의'에서 '경계'로, 중대본 비상대응 수위를 1단계에서 2단계로 각각 격상했습니다.

중대본은 비상 2단계 상향 발령에 따라 관계부처와 지방자치단체에 비상근무체계를 강화해 3) 햇빛 대응에 모든 역량을 집중해달라고 지시했습니다.

강한 비바람을 동반한 4) 햇빛이 동해안을 따라 북상할 것으로 예상되므로 해안가 저지대와 산사태 위험지역에서는 이날까지 대피명령 등을 활용해 사전대피를 철저히 이행하도록 했습니다.

또 5) 햇빛이 우리나라에 직접적인 영향을 미치는 시간을 고려해 6) 햇빛 이동 경로에 있는 지역에서는 공공기관·민간기업의 출퇴근 시간과 학교 등하교 시간을 조정해 달라고 긴급 요청했습니다.

하이선은 초속 49m의 매우 강한 7) 햇빛으로, 내일 오전 9시쯤 부산 동쪽 약 80㎞ 부근 해상에 도달한 뒤 동해안과 울릉도 사이 해상을 지나 밤 9시 북한 청진 남쪽 약 180㎞ 부근 해상으로 올라갈 것으로 전망됩니다.

Python_example (0)	2020.09.11
python_pandas(판다스): 계층적 색인 지정, 누락된 데이터처리, 결측치채우기, 데이터 변형하기, onehot인코딩 (0)	2020.09.11
Python_pandas(판다스):시리즈,데이터프레임,색인,인덱싱,sorting (0)	2020.09.09
Python 기초09_vectorize (0)	2020.09.08
Python 기초08_matplotlib(그래프 그리기,subplots,meshgrid) (0)	2020.09.08

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
886	0	2	male	27.0	0	0	13.00	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19.0	0	0	30.00	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	NaN	1	2	23.45	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26.0	0	0	30.00	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32.0	0	0	7.75	Q	Third	man	True	NaN	Queenstown	no	True

	age	fare
0	22.0	7.2500
1	38.0	71.2833
2	26.0	7.9250
3	35.0	53.1000
4	35.0	8.0500

	age	fare
0	32.0	17.2500
1	48.0	81.2833
2	36.0	17.9250
3	45.0	63.1000
4	45.0	18.0500

	order_id	quantity	item_name	choice_description	item_price
0	1	1	Chips and Fresh Tomato Salsa	NaN	$2.39
1	1	1	Izze	[Clementine]	$3.39
2	1	1	Nantucket Nectar	[Apple]	$3.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NaN	$2.39
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98

	order_id	quantity	item_name	choice_description	item_price
1140	471	1	Bottled Water	NaN	1.09
3361	1348	1	Bottled Water	NaN	1.09
4001	1602	1	Bottled Water	NaN	1.09
3499	1405	1	Bottled Water	NaN	1.09
2545	1009	1	Bottled Water	NaN	1.09
...	...	...	...	...	...
2230	899	1	Chips and Guacamole	NaN	4.45
2220	894	1	Chips and Guacamole	NaN	4.45
2207	890	1	Chips and Guacamole	NaN	4.45
2489	989	1	Chips and Guacamole	NaN	4.45
4616	1832	1	Chips and Guacamole	NaN	4.45

	state	year	pop
0	Ohio	2000	1.5
1	Ohio	2001	1.7
2	Ohio	2002	3.6
3	Nevada	2001	2.4
4	Nevada	2002	2.9
5	Nevada	2003	3.2

	YEA	STA	POP	DEBT	DEB
01	2000	Ohio	1.5	NaN	16.5
02	2001	Ohio	1.7	NaN	16.5
three	2002	Ohio	3.6	NaN	16.5
four	2001	Nevada	2.4	NaN	16.5
five	2002	Nevada	2.9	NaN	16.5
six	2003	Nevada	3.2	NaN	16.5

	age	fare
0	10.0	10.0
1	10.0	10.0
2	10.0	10.0
3	10.0	10.0
4	10.0	10.0
...	...	...
886	10.0	10.0
887	10.0	10.0
888	NaN	10.0
889	10.0	10.0
890	10.0	10.0

	b	d	e
Utah	0.579217	-0.279336	-0.170469
Ohio	-1.724667	-1.571901	-0.108527
Texas	-0.961463	-0.701714	1.134606
Oregon	-0.737585	-0.111093	1.484404

	b	d	e
Utah	0.579217	0.279336	0.170469
Ohio	1.724667	1.571901	0.108527
Texas	0.961463	0.701714	1.134606
Oregon	0.737585	0.111093	1.484404

	0	1	2
a	0.817822	1.620150	0.502513
a	0.954089	0.212788	-0.037256
b	0.996862	-1.087917	0.357842
b	1.299607	-0.104178	-2.045602

	one	two
count	3.000000	2.000000
mean	3.083333	-2.900000
std	3.493685	2.262742
min	0.750000	-4.500000
25%	1.075000	-3.700000
50%	1.400000	-2.900000
75%	4.250000	-2.100000
max	7.100000	-1.300000

python_pandas(판다스): 계층적 색인 지정, 누락된 데이터처리, 결측치채우기, 데이터 변형하기, onehot인코딩 (0)	2020.09.11
Python_pandas 문제 (0)	2020.09.09
Python 기초09_vectorize (0)	2020.09.08
Python 기초08_matplotlib(그래프 그리기,subplots,meshgrid) (0)	2020.09.08
Python 기초07_Numpy2 (0)	2020.09.08

Python 기초07_Numpy2 (0)	2020.09.08
Python 기초06_Numpy (0)	2020.09.07
Python기초04_ 텍스트파일에서 특정단어를 원하는 단어로 바꾸기 (0)	2020.09.06
Python 기초04_ 파일 다루기(텍스트 파일 생성, 편집 등) (0)	2020.09.06
Python 기초3 (0)	2020.09.04

Python 기초06_Numpy (0)	2020.09.07
Python 기초05 (0)	2020.09.07
Python 기초04_ 파일 다루기(텍스트 파일 생성, 편집 등) (0)	2020.09.06
Python 기초3 (0)	2020.09.04
Python 기초2 (0)	2020.09.04

Python

1번¶

3번¶

2번¶

1개의 가격이 가장 높은 것들¶

'Python' 카테고리의 다른 글

padas¶

판다스 자료 구조¶

1. 시리즈(Series)¶

딕셔너리 ==> 시리즈¶

Series의 index / value¶

2. 데이터프레임(DataFrame)¶

행 인덱스/ 열 이름 설정: pandas.DataFrame(2차원 배열, index=행 인덱스 배열, colimns=열 이름 배열)¶

행 인덱스 변경: DataFrame 객체.rename(index{기존 인덱스:새 인덱스, ...})¶

열 이름 변경 : DataFrame 객체.rename(colums{기존 이름:새 이름,...})¶

.iloc[[행],[열]]¶

중첩된 딕셔너리¶

색인¶

인덱싱¶

index를 기준으로 sorting¶

Unique Values, Value Counts, and Membership¶

'Python' 카테고리의 다른 글

vectorize¶

평균, 분산, 표준편차¶

행렬의 전치(전치행렬, Transpose)¶

inverse matrix¶

'Python' 카테고리의 다른 글

np.linespace(start,stop,num,endpoint=True)¶

그래프 여러 개 보여주기¶

surface¶

'Python' 카테고리의 다른 글

flat¶

각 원소에 접근¶

다른 크기의 배열간의 산술연산(브로드캐스팅)¶

'Python' 카테고리의 다른 글

Numpy 설치¶

- numpy.ndarray¶

ndarray.ndim¶

ndarray.shape¶

ndarray.size¶

ndarray.dtype¶

min/max/sum/mean¶

넘파이 배열 원소 접근방법¶

배열 생성 및 shape 변환¶

np.empty(): 초기화 하지 않고 배열 공간 할당하기¶

np.random.random()¶

np.random.randint(시작,끝(가로, 세로))¶

연속원소 배열 생성함수¶

np.linspace()¶

ravel()¶

newaxis()¶

hstack(가로로 결합)/ vstack (세로로 결합)¶

column_stack()¶

concatenate((배열1, 배열2), axis=1)¶

hsplit/ vsplit¶

'Python' 카테고리의 다른 글

람다(lamda) 함수¶

깊은 복사와 얕은 복사¶

'Python' 카테고리의 다른 글

텍스트 파일에서 특정 단어를 원하는 단어로 바꾸는 방법¶

1) 텍스트 파일 만들기¶

2) 읽어오기¶

3) 태풍을 햇빛으로 바꾸기¶

4) 바꾼 단어 앞에 숫자 넣기¶

'Python' 카테고리의 다른 글

티스토리툴바