デフォルトのインデクスは、行番号（RangeIndex）
特定列をDataFrame.set_index('列名')でインデックスとして指定できる
行名（行ラベル）の実質は、行のインデックス
pandas.DataFrame, pandas.Seriesのインデックス（添字）[]を指定することで、行・列または要素の値を選択し取得できる
pandas.DataFrameで要素の値を抽出したり、列をスライス、行を行名・行番号やそのリストで選択したい場合はat, iat, loc, ilocを使う
pandas.Seriesの値を取得
- 行名・行番号の場合はその値、リストやスライスの場合は選択した複数の値をpandas.Seriesとして取得できる。
- リストやスライスの場合、一行だけが抽出されてもpandas.Series。
- 行名（行ラベル）のスライスの場合はstopの行も選択される。

import pandas as pd

df = pd.read_csv('data/sample_pandas_normal.csv')
print(df)

      name  age state  point
0    Alice   24    NY     64
1      Bob   42    CA     92
2  Charlie   18    CA     70
3     Dave   68    TX     70
4    Ellen   24    CA     88
5    Frank   30    NY     57

In [2]:

# デフォルトのインデクスは、行番号（RangeIndex）
print(df.index)
print('')

# 特定列をインデックスとして指定できる
print('index=name')
df1 = df.set_index('name')
print(df1)
print(df1.index)
print('')

print('index=age')
df2 = df.set_index('age')
print(df2)
print(df2.index)

RangeIndex(start=0, stop=6, step=1)

index=name
         age state  point
name                     
Alice     24    NY     64
Bob       42    CA     92
Charlie   18    CA     70
Dave      68    TX     70
Ellen     24    CA     88
Frank     30    NY     57
Index(['Alice', 'Bob', 'Charlie', 'Dave', 'Ellen', 'Frank'], dtype='object', name='name')

index=age
        name state  point
age                      
24     Alice    NY     64
42       Bob    CA     92
18   Charlie    CA     70
68      Dave    TX     70
24     Ellen    CA     88
30     Frank    NY     57
Int64Index([24, 42, 18, 68, 24, 30], dtype='int64', name='age')

In [3]:

# pandasのインデックス参照で行・列を選択し取得

# pandas.DataFrame, pandas.Seriesのインデックス（添字）[]を指定することで、
# 行・列または要素の値を選択し取得することができる。

# pandas.DataFrameの場合、[]の中に指定する値のタイプによって以下のようなデータが取得できる。
# [列名] : 単独の列をpandas.Seriesとして取得
# [列名のリスト] : 複数列をpandas.DataFrameとして取得
# [行名・行番号のスライス] : 単独または複数行をpandas.DataFrameとして取得

# pandas.Seriesの場合は以下のようになる。
# [行名・行番号] : 単独の要素の値をそれぞれの型で取得
# [行名・行番号のリスト] : 複数の要素の値をpandas.Seriesとして取得
# [行名・行番号のスライス] : 単独または複数の要素の値をpandas.Seriesとして取得

# pandas.DataFrameで要素の値を抽出したり、
# 列をスライス、行を行名・行番号やそのリストで選択したい場合はat, iat, loc, ilocを使う。

In [4]:

# pandas.DataFrameの列を取得
# []に列名（列ラベル）を指定すると、選択した列が抽出されpandas.Seriesとして取得できる。

print(df['age'])
print(type(df['age']))

0    24
1    42
2    18
3    68
4    24
5    30
Name: age, dtype: int64
<class 'pandas.core.series.Series'>

In [5]:

# attribute（属性）として.に続けて列名を指定することもできる。
# ただし、列名が既存のメソッド名などと被るとそちらが優先されるので注意が必要。

print(df.age)
print(type(df.age))

0    24
1    42
2    18
3    68
4    24
5    30
Name: age, dtype: int64
<class 'pandas.core.series.Series'>

In [6]:

# 列名のリストを指定すると、選択した複数列が抽出されpandas.DataFrameとして取得できる。

print(df[['age', 'point']])
print(type(df[['age', 'point']]))

   age  point
0   24     64
1   42     92
2   18     70
3   68     70
4   24     88
5   30     57
<class 'pandas.core.frame.DataFrame'>

In [7]:

# スライスの場合は空のpandas.DataFrameとなって使えない。
# スライスは行の指定だと見なされるため。

#print(df['age':'point'])
# TypeError: cannot do slice indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [age] of <class 'str'>

In [8]:

# locを用いると列のスライスも可能。
# またilocを用いると列名（列ラベル）ではなく列番号で指定することもできる。

print(df.loc[:, 'age':'point'])
print(type(df.loc[:, 'age':'point']))

   age state  point
0   24    NY     64
1   42    CA     92
2   18    CA     70
3   68    TX     70
4   24    CA     88
5   30    NY     57
<class 'pandas.core.frame.DataFrame'>

In [9]:

print(df.iloc[:, [0, 2]])
print(type(df.iloc[:, [0, 2]]))

      name state
0    Alice    NY
1      Bob    CA
2  Charlie    CA
3     Dave    TX
4    Ellen    CA
5    Frank    NY
<class 'pandas.core.frame.DataFrame'>

In [10]:

# pandas.DataFrameの行を取得
# []にスライスを指定すると、該当範囲の複数行が抽出されpandas.DataFrameとして取得できる。

print(df[1:4])
print(type(df[1:4]))

      name  age state  point
1      Bob   42    CA     92
2  Charlie   18    CA     70
3     Dave   68    TX     70
<class 'pandas.core.frame.DataFrame'>

In [11]:

# スライスなので、start:stop:stepのようにstepを指定することも可能。
# 奇数行または偶数行を抽出して取得できる。

print(df[::2])
print(type(df[::2]))

      name  age state  point
0    Alice   24    NY     64
2  Charlie   18    CA     70
4    Ellen   24    CA     88
<class 'pandas.core.frame.DataFrame'>

In [12]:

print(df[1::2])
print(type(df[1::2]))

    name  age state  point
1    Bob   42    CA     92
3   Dave   68    TX     70
5  Frank   30    NY     57
<class 'pandas.core.frame.DataFrame'>

In [13]:

# スライスでないとダメで、行番号を単独で指定するとエラーとなる。

#print(df[1])
# KeyError: 1

In [14]:

# 一行だけ選択される場合も、取得できるのはpandas.DataFrame。
# pandas.Seriesにはならない。

print(df[1:2])
print(type(df[1:2]))

  name  age state  point
1  Bob   42    CA     92
<class 'pandas.core.frame.DataFrame'>

In [15]:

# 行番号ではなく行名（行ラベル）でスライスを指定することもできる。
# 行名（行ラベル）のスライスの場合はstopの行も選択される。

#print(df['Bob':'Ellen'])
#print(type(df['Bob':'Ellen']))
# TypeError: cannot do slice indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [Bob] of <class 'str'>

# 行名（行ラベル）の実質は、行のインデックス
df1 = df.set_index('name')
print(df1['Bob':'Ellen'])
print(type(df1['Bob':'Ellen']))

         age state  point
name                     
Bob       42    CA     92
Charlie   18    CA     70
Dave      68    TX     70
Ellen     24    CA     88
<class 'pandas.core.frame.DataFrame'>

In [16]:

# locやilocを用いると行に対して行名・行番号を単独で指定して pandas.Seriesとして取得したり、
# リストで複数行を指定して pandas.DataFrameとして取得したりできる。

# locを用いると行名を指定できる。
print(df1.loc['Bob'])
print(type(df1.loc['Bob']))

print(df1.loc[['Bob', 'Ellen']])
print(type(df1.loc[['Bob', 'Ellen']]))
print('=====')

# ilocを用いると行番号を指定できる。
print(df1.iloc[1])
print(type(df1.iloc[1]))

print(df1.iloc[[1, 4]])
print(type(df1.iloc[[1, 4]]))

age      42
state    CA
point    92
Name: Bob, dtype: object
<class 'pandas.core.series.Series'>
       age state  point
name                   
Bob     42    CA     92
Ellen   24    CA     88
<class 'pandas.core.frame.DataFrame'>
=====
age      42
state    CA
point    92
Name: Bob, dtype: object
<class 'pandas.core.series.Series'>
       age state  point
name                   
Bob     42    CA     92
Ellen   24    CA     88
<class 'pandas.core.frame.DataFrame'>

In [17]:

# pandas.Seriesの値を取得

s = df1['age']
print(s)

name
Alice      24
Bob        42
Charlie    18
Dave       68
Ellen      24
Frank      30
Name: age, dtype: int64

In [18]:

# 行名・行番号の場合はその値、リストやスライスの場合は選択した複数の値をpandas.Seriesとして取得できる。
# リストやスライスの場合、一行だけが抽出されてもpandas.Series。
# 行名（行ラベル）のスライスの場合はstopの行も選択される。

print(s[3])
print(type(s[3]))
print('')
print(s[[1, 3]])
print(type(s[[1, 3]]))
print('')
print(s[1:3])
print(type(s[1:3]))
print('')
print(s[[1]])
print(type(s[[1]]))
print('')
print(s[['Bob']])
print(type(s[['Bob']]))
print('')
print(s['Bob':'Bob'])
print(type(s['Bob':'Bob']))
print('')
print(s['Bob'])
print(type(s['Bob']))

68
<class 'numpy.int64'>

name
Bob     42
Dave    68
Name: age, dtype: int64
<class 'pandas.core.series.Series'>

name
Bob        42
Charlie    18
Name: age, dtype: int64
<class 'pandas.core.series.Series'>

name
Bob    42
Name: age, dtype: int64
<class 'pandas.core.series.Series'>

name
Bob    42
Name: age, dtype: int64
<class 'pandas.core.series.Series'>

name
Bob    42
Name: age, dtype: int64
<class 'pandas.core.series.Series'>

42
<class 'numpy.int64'>

In [19]:

# 行名が整数値の場合、行名・行番号の指定が曖昧になるため、例えば最終行を示す-1を指定するとエラーになる。
# 行名と行番号を明確に区別して指定するにはpandas.Seriesに対してもat, iat, loc, ilocを使う。

s_i = s.reset_index(drop=True)
print(s_i)

# 終行を示す-1を指定するとエラーになる。
#print(s_i[-1])
# KeyError: -1

# 行名が文字列であれば-1でもエラーは発生しない。
print(s[-1])
# iatを用いて行番号として明確に指定
print(s_i.iat[-1])

0    24
1    42
2    18
3    68
4    24
5    30
Name: age, dtype: int64
30
30

In [20]:

# pandas.DataFrameの要素の値を取得
# pandas.DataFrameからpandas.Seriesを抽出し、さらにそのpandas.Seriesから値を選択して取得することで、
# pandas.DataFrameから要素の値を取得できる。

print(df1['age']['Alice'])
print(df1.age[0])

24
24

In [21]:

# スライスやリストを組み合わせて任意の範囲を抽出することもできる。

print(df1['Bob':'Dave'][['age', 'point']])
print(type(df1['Bob':'Dave'][['age', 'point']]))

         age  point
name               
Bob       42     92
Charlie   18     70
Dave      68     70
<class 'pandas.core.frame.DataFrame'>

showeryのブログ

Python / pandas

pandas.DataFrame, pandas.Seriesのインデクス(index)について

コメント

「Python」カテゴリの最新記事

コメント