Python Pandas to extract rows and columns from a DataFrame
Get number of rows and columns
kodingwindow@kw:~$ python3 ... >>> import pandas as pd >>> df=pd.read_csv("/home/kodingwindow/kw.csv") >>> df account_no name city dob bank amount 0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000 1 25622348990 Donald Taylor Irvine 1990-08-20 Citi 7000 2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000 3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000 4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250 5 25622348994 Aanya Delhi 1975-08-18 SBI 105000 6 25622348995 James Moore NaN 1978-06-26 Citi 97800 >>> df.shape (7, 6) >>> row,col=df.shape >>> r,c=df.shape >>> r 7 >>> c 6
Slicing on a DataFrame
>>> df[0:7] account_no name city dob bank amount 0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000 1 25622348990 Donald Taylor Irvine 1990-08-20 Citi 7000 2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000 3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000 4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250 5 25622348994 Aanya Delhi 1975-08-18 SBI 105000 6 25622348995 James Moore NaN 1978-06-26 Citi 97800 >>> df[4:7] account_no name city dob bank amount 4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250 5 25622348994 Aanya Delhi 1975-08-18 SBI 105000 6 25622348995 James Moore NaN 1978-06-26 Citi 97800 >>> df[0::2]# Retrieving alternate rows (starting from 0 to multiple of 2) account_no name city dob bank amount 0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000 2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000 4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250 6 25622348995 James Moore NaN 1978-06-26 Citi 97800 >>> df[0::3] account_no name city dob bank amount 0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000 3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000 6 25622348995 James Moore NaN 1978-06-26 Citi 97800
Retrieving column(s) data
>>> df.columns Index(['account_no', 'name', 'city', 'dob', 'bank', 'amount'], dtype='object') >>> df.account_no 0 25622348989 1 25622348990 2 25622348991 3 25622348992 4 25622348993 5 25622348994 6 25622348995 Name: account_no, dtype: int64 >>> df.dob 0 1985-05-26 1 1990-08-20 2 1994-01-29 3 1982-01-14 4 1967-01-05 5 1975-08-18 6 1978-06-26 Name: dob, dtype: object >>> df[['account_no','name']] account_no name 0 25622348989 James Moore 1 25622348990 Donald Taylor 2 25622348991 Edward Parkar 3 25622348992 Ryan Bakshi 4 25622348993 Marie Peters 5 25622348994 Aanya 6 25622348995 James Moore >>> df.amount 0 5000 1 7000 2 95000 3 50000 4 12250 5 105000 6 97800 Name: amount, dtype: int64
Retrieving data using query operation
>>> df['amount'].min() 5000 >>> df['amount'].max() 105000 >>> df[df.amount>=50000] account_no name city dob bank amount 2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000 3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000 5 25622348994 Aanya Delhi 1975-08-18 SBI 105000 6 25622348995 James Moore NaN 1978-06-26 Citi 97800 >>> df[['account_no','name','amount']][df.amount>=50000] account_no name amount 2 25622348991 Edward Parkar 95000 3 25622348992 Ryan Bakshi 50000 5 25622348994 Aanya 105000 6 25622348995 James Moore 97800
Getting top and bottom data
>>> df.head()# head() will retrieve top 5 rows account_no name city dob bank amount 0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000 1 25622348990 Donald Taylor Irvine 1990-08-20 Citi 7000 2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000 3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000 4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250 >>> df.tail()# tail() will retrieve bottom 5 rows account_no name city dob bank amount 2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000 3 25622348992 Ryan Bakshi Mumbai 1982-01-14 Citi 50000 4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250 5 25622348994 Aanya Delhi 1975-08-18 SBI 105000 6 25622348995 James Moore NaN 1978-06-26 Citi 97800 >>> df.head(3) account_no name city dob bank amount 0 25622348989 James Moore Phoenix 1985-05-26 Barclays 5000 1 25622348990 Donald Taylor Irvine 1990-08-20 Citi 7000 2 25622348991 Edward Parkar Irvine 1994-01-29 ICICI 95000 >>> df.tail(3) account_no name city dob bank amount 4 25622348993 Marie Peters Ribe 1967-01-05 DZBank 12250 5 25622348994 Aanya Delhi 1975-08-18 SBI 105000 6 25622348995 James Moore NaN 1978-06-26 Citi 97800
Statistical description of a DataFrame
>>> df.describe() account_no amount count 7.000000e+00 7.000000 mean 2.562235e+10 53150.000000 std 2.160247e+00 45761.055131 min 2.562235e+10 5000.000000 25% 2.562235e+10 9625.000000 50% 2.562235e+10 50000.000000 75% 2.562235e+10 96400.000000 max 2.562235e+10 105000.000000
What Next?
Python Pandas to index and sort a DataFrame
Python Pandas to fill and drop missing data in a DataFrame
Data Visualization
Advertisement