Pandas csv讀寫檔

8 min readMay 21, 2021

Panda提供很多函式供大家使用，可以操控不同資料類型的資料。這邊就只提csv的使用。主要分為兩部分，讀檔、寫檔。

CSV 讀檔(Reading File)

pd.read_csv(‘檔案路徑’)

ch05_01.csv 
white,red,blue,green,animal
1,5,2,3,cat
2,7,8,5,dog
3,3,6,7,horse
2,2,8,3,duck
4,4,2,1,mouse
=======================================================================csvframe = pd.read_csv(‘ch05_01.csv’)Output:
             white  red  blue  green animal
         0      1    5     2      3    cat
         1      2    7     8      5    dog
         2      3    3     6      7  horse
         3      2    2     8      3   duck
         4      4    4     2      1  mouse

pd.read_table(‘檔案路徑’, sep=’’)

ch05_02.csv
1,5,2,3,cat
2,7,8,5,dog
3,3,6,7,horse
2,2,8,3,duck
4,4,2,1,mouse
=======================================================================csvfrmae = pd.read_table(‘ch05_02.csv’, sep=’,’)Output:
           1  5  2  3    cat
        0  2  7  8  5    dog
        1  3  3  6  7  horse
        2  2  2  8  3   duck
        3  4  4  2  1  mouse

說明: sep 代表使用什麼樣的符號做分割。

read_csv(‘檔案路徑’, header=None)

說明:額外在表格中加上column，預設(default)是用數字。

pd.read_csv(‘ch05_02.csv’, header=None)Output:
          0  1  2  3      4   # 因為header=None
       0  1  5  2  3    cat
       1  2  7  8  5    dog
       2  3  3  6  7  horse
       3  2  2  8  3   duck
       4  4  4  2  1  mouse

read_csv(‘檔案路徑’, names=[…])

pd.read_csv(‘檔案路徑’, names=[‘white’,’red’,’blue’,’green’,’animal’])Output:
           white  red  blue  green animal   # 給header加上名稱
        0      1    5     2      3    cat
        1      2    7     8      5    dog
        2      3    3     6      7  horse
        3      2    2     8      3   duck
        4      4    4     2      1  mouse

說明:在header上加上名稱

read_csv(‘檔案路徑’, index_col=[‘color’, ‘status’] )

ch05_03.csv:
color,status,item1,item2,item3
black,up,3,4,6
black,down,2,6,7
white,up,5,5,5
white,down,3,3,2
white,left,1,2,1
red,up,2,2,2
red,down,1,1,4
=======================================================================csvfrmae = pd.read_table(‘ch05_03.csv’, index_col=[‘color’, ‘status’])Output:
      隸屬於index區塊)    item1  item2  item3
        color status                     
        black up          3      4      6
              down        2      6      7
        white up          5      5      5
              down        3      3      2
              left        1      2      1
        red   up          2      2      2
              down        1      1      4

說明: 當資料比較複雜的時候，在index的部分，加上col欄位，可以做更細的分割。

處理txt文字檔

處理txt文字檔時，大多會使用RegExp(正規表達式來做處理)。常用的函式為read_table()。

去除空白(Space)

txt檔案: (原本有空白符號)
white red blue green
1 5 2 3
2 7 8 5
3 3 6 7
=======================================================================# 利用空白符號做分割
pd.read_table(‘檔案路徑’, sep=’\s+’, engine=’python’)Output:
           white  red  blue  green
        0      1    5     2      3
        1      2    7     8      5
        2      3    3     6      7

說明:可以參照上表，查看對應的資訊。

去除文字

txt File
000END123AAA122
001END124BBB321
002END125CCC333
=======================================================================# 以字母部分作為分割符，把字母部分全部清除。
pd.read_table(‘檔案路徑’, sep=’\D+’, header=None, engine=’python’)Output:
          0    1    2
       0  0  123  122
       1  1  124  321
       2  2  125  333

說明: 以D+(Non-digit char)，作為分割符，也就是在資料看到non-digit的字元時，直接省略。

省略掉不需要的行數
工具: skiprows=[]

txt file (將被省略，以粗體線表示)
########### LOG FILE ############
This file has been generated by automatic system
white,red,blue,green,animal
12-Feb-2015: Counting of animals inside the house
1,5,2,3,cat
2,7,8,5,dog
13-Feb-2015: Counting of animals outside the house
3,3,6,7,horse
2,2,8,3,duck
4,4,2,1,mouse
=======================================================================pd.read_table(‘檔案路徑’, sep=',', skiprows=[0, 1, 3, 6])Output:
          white  red  blue  green animal
       0      1    5     2      3    cat
       1      2    7     8      5    dog
       2      3    3     6      7  horse
       3      2    2     8      3   duck
       4      4    4     2      1  mouse

說明: 去除掉第0, 1, 3, 6 行。

分區段來讀取檔案

skiprow=[]，省略第幾列。
nrows=數字，總共顯示幾列。

pd.read_csv('ch05_02.csv', sep=',' ,skiprows=[2], nrows=3, header=None)Output:
          0  1  2  3     4
       0  1  5  2  3   cat
       1  2  7  8  5   dog
       2  2  2  8  3  duck

說明: 省略第二列，總共顯示3列。

chunksize=num，一次讀入num列。

chunk = pd.read_csv('ch05_1.csv', chunksize=3)
for piece in chunk:
     print(piece)Output:
          white  red  blue  green animal
       0      1    5     2      3    cat
       1      2    7     8      5    dog
       2      3    3     6      7  horse           white  red  blue  green animal
       3      2    2     8      3   duck
       4      4    4     2      1  mouse

說明: 每次讀取三列，所以一開始為0, 1, 2，後來則為3, 4。

CSV寫檔

pd.to_csv(‘檔案路徑’)

frame = pd.DataFrame(np.arange(16).reshape((4, 4)),
        index=['red', 'blue','yellow','white'],
        columns=['ball','pen','pencil','paper'])frame.to_csv('ch05_07.csv')

na_rep=’NaN’，(常用的替換字符有”NULL, 0, NaN”)

ch05_09.csv如下
,ball,mug,paper,pen,pencil
blue,6.0,,,6.0,
green,,,,,
red,,,,,
white,20.0,,,20.0,
yellow,19.0,,,19.0,
=======================================================================frame3 = pd.read_csv('ch05_09.csv')
frame3.to_csv('ch05_test.csv', na_rep ='NaN')Output:
       ,Unnamed: 0,ball,mug,paper,pen,pencil
       0,blue,6.0,NaN,NaN,6.0,NaN
       1,green,NaN,NaN,NaN,NaN,NaN
       2,red,NaN,NaN,NaN,NaN,NaN
       3,white,20.0,NaN,NaN,20.0,NaN
       4,yellow,19.0,NaN,NaN,19.0,NaN

說明: 把資料中很多逗號為空的部分替換成指定字符。

Reference:

https://www.amazon.com/Python-Data-Analytics-Pandas-Matplotlib-ebook/dp/B07FT6FB6Y

Pandas csv讀寫檔

CSV 讀檔(Reading File)

處理txt文字檔

分區段來讀取檔案

CSV寫檔

Written by Sharon Peng

No responses yet