Pandas tricks

Dataframes

Merge dataframes

#on: the key (common column names) of the two dataframes;
#how: inner (keep common values only), outer: keep all values
df = pd.merge(df1, df2, how="inner",on="wavelength")

Concatenate

pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)

Add new row to dataframe

Just assign row to a particular index, using loc:

df.loc[-1] = [2, 3, 4]  # adding a row
df.index = df.index + 1  # shifting index
df = df.sort_index()  # sorting by index
**#add row to end of DataFrame
df.loc[len(df.index)] = [value1, value2, value3, ...]**

Add a column to pandas dataframe

You can use the assign() function to add a new column to the end of a pandas DataFrame: df = df.assig……


You can use the assign() function to add a new column to the end of a pandas DataFrame:

df = df.assign(col_name=[value1, value2, value3, ...])

And you can use the insert() function to add a new column to a specific location in a pandas DataFrame:

df.insert(position, 'col_name', [value1, value2, value3, ...])

The following examples show how to use this syntax in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'points': [25, 12, 15, 14, 19, 23],
                   'assists': [5, 7, 7, 9, 12, 9],
                   'rebounds': [11, 8, 10, 6, 6, 5]})

#view DataFrame
df

points    assists    rebounds
0    25    5    11
1    12    7    8
2    15    7    10
3    14    9    6
4    19    12    6
5    23    9    5

Example 1: Add New Column to End of DataFrame

The following code shows how to add a new column to the end of the DataFrame:

#add 'steals' column to end of DataFrame
df = df.assign(steals=[2, 2, 4, 7, 4, 1])

#view DataFrame
df

points    assists    rebounds steals
0    25    5    11     2
1    12    7    8     2
2    15    7    10     4
3    14    9    6     7
4    19    12    6     4
5    23    9    5     1

Example 2: Add Multiple Columns to End of DataFrame

The following code shows how to add multiple new columns to the end of the DataFrame:

#add 'steals' and 'blocks' columns to end of DataFrame
df = df.assign(steals=[2, 2, 4, 7, 4, 1],
               blocks=[0, 1, 1, 3, 2, 5])

#view DataFrame
df

points    assists    rebounds steals    blocks
0    25    5    11     2    0
1    12    7    8     2    1
2    15    7    10     4    1
3    14    9    6     7    3
4    19    12    6     4    2
5    23    9    5     1    5

Example 3: Add New Column Using Existing Column

The following code shows how to add a new column to the end of the DataFrame, based on the values in an existing column:

#add 'half_pts' to end of DataFrame
df = df.assign(half_pts=lambda x: x.points / 2)

#view DataFrame
df

        points    assists    rebounds half_pts
0    25    5    11     12.5
1    12    7    8     6.0
2    15    7    10     7.5
3    14    9    6     7.0
4    19    12    6     9.5
5    23    9    5     11.5

Example 4: Add New Column in Specific Location of DataFrame

The following code shows how to add a new column by inserting it into a specific location in the DataFrame:

#add 'steals' to column index position 2 in DataFrame
df.insert(2, 'steals', [2, 2, 4, 7, 4, 1])

#view DataFrame
df

points    assists    steals    rebounds
0    25    5    2    11
1    12    7    2    8
2    15    7    4    10
3    14    9    7    6
4    19    12    4    6
5    23    9    1    5

Data

Replace values

# Replace a Single Value
df['Age'] = df['Age'].replace(23, 99)

# Replace Multiple Values
df['Age'] = df['Age'].replace([23, 45], [99, 999])

# Also works in the Entire DataFrame
df = df.replace(23, 99)
df = df.replace([23, 45], [99, 999])

# Replace Multiple Values with a Single Value
df['Age'] = df['Age'].replace([23, 45, 35], 99)

# Using a Dictionary (Dict is passed into to_replace=)
df['Age'] = df['Age'].replace({23:99, 45:999})

# Using a Dictionary for Column Replacements (key:value = column:value)
df = df.replace({'Name': 'Jane', 'Age': 45}, 99)

Index, slicing, filtering & sorting

Reset index

pandas.DataFrame.reset_index — pandas 1.2.4 documentation (pydata.org)

df.reset_index(drop=True)

Set first column as index

df = pd.read_excel(file, header=0, index_col=0)

Select specified columns:

df.filter(["species","bill_length_mm"])

Filtering a column by multiple conditions:

dataFrame.loc[(dataFrame['Salary']>=100000) & (dataFrame['Age']< 40) & (dataFrame['JOB'].str.startswith('D')),
             ['Name','JOB']]
# Select the data by condition: 449.75<data['e loss']<470.25
data[(449.75<data['e loss']) & (data['e loss']<470.25)]
# Select the data by condition: data['e loss'] <  449.75or data['e loss'] > 470.25
data[(449.75<data['e loss']) | (data['e loss']<470.25)]

Slicing

#select the values after row 209
df=df1.iloc[209:]
#select
df=df1.iloc[156:190]

Sorting

df_sort = df.sort_values(by = 'column_1')
df_sort = df.sort_values(by = ['column_1', 'column_2'])

File I/O

Read txt

import pandas as pd
data = pd.read_csv('output_list.txt', header = None)

Skip rows

pandas.read_csv(filepath_or_buffer, skiprows=N, ....)

If the unwanted rows are begin with #, than these rows can be ignored by:

df=pd.read_csv("1-Signal-43135.msa", header=None, sep = ',', comment='#', delimiter=',',dtype=float)

Specify column names

Use names to specify the column names:

df = pd.read_csv(path, header=None, sep = ',', comment='#',
                 delimiter=',', names=('e loss', 'intensity'),
                 dtype=float)

Write data to excel file

Create, write to and save a workbook:

df1 = pd.DataFrame([['a', 'b'], ['c', 'd']],
                   index=['row 1', 'row 2'],
                   columns=['col 1', 'col 2'])
df1.to_excel("output.xlsx")

To specify the sheet name:

df1.to_excel("output.xlsx",
             sheet_name='Sheet_name_1')

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

df2 = df1.copy()
with pd.ExcelWriter('output.xlsx') as writer:  
    df1.to_excel(writer, sheet_name='Sheet_name_1')
    df2.to_excel(writer, sheet_name='Sheet_name_2')

ExcelWriter can also be used to append to an existing Excel file:

with pd.ExcelWriter('output.xlsx', engine="openpyxl",
                    mode='a') as writer:  
    df.to_excel(writer, sheet_name='Sheet_name_3')

To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):

df1.to_excel('output1.xlsx', engine='xlsxwriter')

results matching ""

    No results matching ""