Python Pandas to extract rows and columns from a DataFrame
Get number of rows and columns
kodingwindow@kw:~$ python3
...
>>> import pandas as pd
>>> df=pd.read_csv("/home/kodingwindow/kw.csv")
>>> df
    account_no           name     city         dob      bank  amount
0  25622348989    James Moore  Phoenix  1985-05-26  Barclays    5000
1  25622348990  Donald Taylor   Irvine  1990-08-20      Citi    7000
2  25622348991  Edward Parkar   Irvine  1994-01-29     ICICI   95000
3  25622348992    Ryan Bakshi   Mumbai  1982-01-14      Citi   50000
4  25622348993   Marie Peters     Ribe  1967-01-05    DZBank   12250
5  25622348994          Aanya    Delhi  1975-08-18       SBI  105000
6  25622348995    James Moore      NaN  1978-06-26      Citi   97800

>>> df.shape
(7, 6)

>>> row,col=df.shape
>>> r,c=df.shape
>>> r
7
>>> c
6
Slicing on a DataFrame
>>> df[0:7]
    account_no           name     city         dob      bank  amount
0  25622348989    James Moore  Phoenix  1985-05-26  Barclays    5000
1  25622348990  Donald Taylor   Irvine  1990-08-20      Citi    7000
2  25622348991  Edward Parkar   Irvine  1994-01-29     ICICI   95000
3  25622348992    Ryan Bakshi   Mumbai  1982-01-14      Citi   50000
4  25622348993   Marie Peters     Ribe  1967-01-05    DZBank   12250
5  25622348994          Aanya    Delhi  1975-08-18       SBI  105000
6  25622348995    James Moore      NaN  1978-06-26      Citi   97800

>>> df[4:7]
    account_no          name   city         dob    bank  amount
4  25622348993  Marie Peters   Ribe  1967-01-05  DZBank   12250
5  25622348994         Aanya  Delhi  1975-08-18     SBI  105000
6  25622348995   James Moore    NaN  1978-06-26    Citi   97800

>>> df[0::2] # Retrieving alternate rows (starting from 0 to multiple of 2)
    account_no           name     city         dob      bank  amount
0  25622348989    James Moore  Phoenix  1985-05-26  Barclays    5000
2  25622348991  Edward Parkar   Irvine  1994-01-29     ICICI   95000
4  25622348993   Marie Peters     Ribe  1967-01-05    DZBank   12250
6  25622348995    James Moore      NaN  1978-06-26      Citi   97800

>>> df[0::3]
    account_no         name     city         dob      bank  amount
0  25622348989  James Moore  Phoenix  1985-05-26  Barclays    5000
3  25622348992  Ryan Bakshi   Mumbai  1982-01-14      Citi   50000
6  25622348995  James Moore      NaN  1978-06-26      Citi   97800
Retrieving column(s) data
>>> df.columns
Index(['account_no', 'name', 'city', 'dob', 'bank', 'amount'], dtype='object')

>>> df.account_no
0    25622348989
1    25622348990
2    25622348991
3    25622348992
4    25622348993
5    25622348994
6    25622348995
Name: account_no, dtype: int64

>>> df.dob
0    1985-05-26
1    1990-08-20
2    1994-01-29
3    1982-01-14
4    1967-01-05
5    1975-08-18
6    1978-06-26
Name: dob, dtype: object

>>> df[['account_no','name']]
    account_no           name
0  25622348989    James Moore
1  25622348990  Donald Taylor
2  25622348991  Edward Parkar
3  25622348992    Ryan Bakshi
4  25622348993   Marie Peters
5  25622348994          Aanya
6  25622348995    James Moore

>>> df.amount
0      5000
1      7000
2     95000
3     50000
4     12250
5    105000
6     97800
Name: amount, dtype: int64
Retrieving data using query operation
>>> df['amount'].min()
5000

>>> df['amount'].max()
105000

>>> df[df.amount>=50000]
    account_no           name    city         dob   bank  amount
2  25622348991  Edward Parkar  Irvine  1994-01-29  ICICI   95000
3  25622348992    Ryan Bakshi  Mumbai  1982-01-14   Citi   50000
5  25622348994          Aanya   Delhi  1975-08-18    SBI  105000
6  25622348995    James Moore     NaN  1978-06-26   Citi   97800

>>> df[['account_no','name','amount']][df.amount>=50000]
    account_no           name  amount
2  25622348991  Edward Parkar   95000
3  25622348992    Ryan Bakshi   50000
5  25622348994          Aanya  105000
6  25622348995    James Moore   97800
Getting top and bottom data
>>> df.head() # head() will retrieve top 5 rows
    account_no           name     city         dob      bank  amount
0  25622348989    James Moore  Phoenix  1985-05-26  Barclays    5000
1  25622348990  Donald Taylor   Irvine  1990-08-20      Citi    7000
2  25622348991  Edward Parkar   Irvine  1994-01-29     ICICI   95000
3  25622348992    Ryan Bakshi   Mumbai  1982-01-14      Citi   50000
4  25622348993   Marie Peters     Ribe  1967-01-05    DZBank   12250

>>> df.tail() # tail() will retrieve bottom 5 rows
    account_no           name    city         dob    bank  amount
2  25622348991  Edward Parkar  Irvine  1994-01-29   ICICI   95000
3  25622348992    Ryan Bakshi  Mumbai  1982-01-14    Citi   50000
4  25622348993   Marie Peters    Ribe  1967-01-05  DZBank   12250
5  25622348994          Aanya   Delhi  1975-08-18     SBI  105000
6  25622348995    James Moore     NaN  1978-06-26    Citi   97800

>>> df.head(3)
    account_no           name     city         dob      bank  amount
0  25622348989    James Moore  Phoenix  1985-05-26  Barclays    5000
1  25622348990  Donald Taylor   Irvine  1990-08-20      Citi    7000
2  25622348991  Edward Parkar   Irvine  1994-01-29     ICICI   95000

>>> df.tail(3)
    account_no          name   city         dob    bank  amount
4  25622348993  Marie Peters   Ribe  1967-01-05  DZBank   12250
5  25622348994         Aanya  Delhi  1975-08-18     SBI  105000
6  25622348995   James Moore    NaN  1978-06-26    Citi   97800
Statistical description of a DataFrame
>>> df.describe()
         account_no         amount
count  7.000000e+00       7.000000
mean   2.562235e+10   53150.000000
std    2.160247e+00   45761.055131
min    2.562235e+10    5000.000000
25%    2.562235e+10    9625.000000
50%    2.562235e+10   50000.000000
75%    2.562235e+10   96400.000000
max    2.562235e+10  105000.000000
Advertisement