Pandas

Complete Pandas Library Explained from Start to End – Chapter Final

Reading Time: 5 minutes
pandas logo

Welcome to the third and final chapter of complete pandas library explained from start to end. If you haven’t read the initial articles, I would recommend you to read Chapter 1 and Chapter 2.

In this final chapter, we will cover the following contents.

  • Aggregating data
  • Grouping data
  • Joining different files
  • Writing data to Files
  • Becoming Pandas Master from here.

This chapter is highly important, Let’s take a deep breath and start.

Aggregating Data

If you want to aggregate the data using one or more data over a specified axis, Pandas have an agg() method for that. Let’s understand using the example of the iris dataset.

Suppose we have a requirement to aggregate the data on the basis of sum, minimum value or maximum value of data columns. The below code of agg() will help to achieve that.



You must be wondering, whats use of this function, why can’t we use describe method for the count, min, max, etc. You are absolutely correct, you can use. But in aggregation, there are tons of default methods which you can use. You can even create your custom aggregation function just like count, sum. Let’s see an example.

Custom Aggregation Functions in Pandas



Grouping data

The group by operation involves splitting the data, applying a function and combining the results.

The above diagram shows the splitting of the input data in groups, then applying the function on each of the group, then combining the results. There are classes from 1 to 4, having different marks. The splitting step involves splitting of each class into four groups i.e. class 1, class 2, … class 4. Then applying function, in the example we applied mean function. The final step involves combining the records.

Hands on Example with iris dataset.



Bonus: Using Custom Function in group by




Joining different files

In real-world problems, the data is rarely a single file. Generally, it is present in different files. We merge all files into a single unit based on our need, after that we continue our analysis.

In this section we will learn how to join different files in pandas.

Before divings, lets see the different types of joins:

  • Inner Join/Inner Merge: In the inner merge, we keep the rows which are common between the left dataframe and right data frame.
  • Left Join/Left Merge: In this merge, we keep the rows of the left dataframe, where there are no matching records in the right dataframe they got replaced by NAN.
  • Right Join/Right Merge: In this merge, we keep the rows of the right dataframe, where there are no matching records in the left dataframe they got replaced by NAN.
  • Full Outer Join/Outer Merge: In this merge, we keep rows of both dataframes, where there are no matching records, they got replaced by NAN.

Syntax:


You need to mention:

  • Left Dataframe
  • Right Dataframe
  • On -> The column which is common in both dataframe, sometimes the column names of left dataframe and right dataframe are different in that case use. left_on for the left dataframe column name and right_on for the right dataframe column name.
  • how -> Specify which join.

Deep Diving to Joins:

Left Merge:

See in the example below, if we have a left join, we will keep all the rows of the left dataframe. If no respective records are found in right dataframe replace the same with NAN. Just like we have Class_ID as 1, there are no matching records in Right dataframe hence it will be replaced with NAN. Also, we will discard the right non-matched records, like Class ID 10 and 11.

Example:




Right Merge

See in the example below, if we have a right join, we will keep all the rows of the right dataframe. If no respective records are found in left dataframe replace the same with NAN. Just like we have Class_ID as 10,11, there are no matching records in left dataframe hence it will be replaced with NAN. Also, we will discard the right non-matched records, like Class ID 1.

Example:




Outer Merge:

In the example below, if we have an outer join, we will keep all the rows of the right dataframe and left dataframe. If no respective records are found in any of the dataframe replace the same with NAN.

Example:




Inner Merge (Default)

If we don’t specify how parameter the panda’s default will use inner merge, which will only consider the common records of both dataframes based on “on” parameter. Please see the image below:

Example:



Writing data to Files

Whether you are doing any data science project or participating in a kaggle competition, the writing of the data to the files is extremely important. There are plenty of options available in pandas for writing files to different extensions csv, excel, json, etc. Just write dataframe.to_csv for CSV to_excel for excel, etc.

Example for writing data to a JSON Format



Becoming Pandas Master from here:

Congrats! You have covered pandas and now you have an in-depth understanding of it. But this is not the end, mastering pandas involves an immense hands on. Since you are ready for the hands-on session.

Please find the below link of hands-on practicing pandas for machine learning

Pandas Handson Exercise – Kaggle

If you have any queries or suggestions. Please drop a comment below. Will happy to help you :).

Leave a Reply