Hello Everyone,

I am highly pleased to inform you that I have launched a new python series on youtube.

Please find the course Launch Video.

Also Find the Attached Curriculum:

]]>General: SF Pro Display Light 10pt

Fixed width: Hack 9pt

Small: SF Pro Display 10pt

Toolbar: SF Pro Display 10pt

Menu: SF Pro Display 10pt

Window title: SF Pro Display 10pt** **

Note: Please Install Latte-Dock Stable instead of git version if you prefer stability.

Latte Dock Settings:

* *To Be Installed Separator*

Latte Spacer, Kpple Menu*, Application Title (Only Application name)*, Two Latte Splitter (**), Bluetooth, Network Connections, Power Saving, Audio Volume, Chilli Clock (Custom date->ddd d, ), Search, Sidebar Button

* *To Be Installed Separator*

Recently I’ve posted a screenshot of my custom KDE plasma desktop on Reddit, received a great number of upvotes and comments. I already made a promise for publishing a guide after receiving positive feedback. So I am keeping my promise by adding a video guide so that it can be easily customized even by a Linux novice.

Before jumping to video lets install some pre-requisites.

**1. Installing Latte Dock:**

If you already have latte dock installed in your system please skip this step. Otherwise, you can install it according to your distribution, visiting official KDE Github link.

https://github.com/KDE/latte-dock

**2. Downloading Wallpaper:**

You can download No Man’s Sky Wallpaper visiting below link:

**3. Installing San Francisco Pro Font: **

You can download San Francisco Pro font using below link:

https://github.com/sahibjotsaggu/San-Francisco-Pro-Fonts/blob/master/SF-Pro-Display-Regular.otf

That’s all about pre-requisites, now you can execute the steps watching below video.

****Optional Step: Setting up super key to invoke Top Panel’s Application Menu**

Edit File ` ~/.config/kwinrc`

and add content below

`[ModifierOnlyShortcuts] Meta=org.kde.lattedock,/Latte,org.kde.LatteDock,activateLauncherMenu`

Reload KWin

`qdbus org.kde.KWin /KWin reconfigure`

If you still have any question or suggestion. Please feel free to drop a comment below. Will be happy to help you :).

]]>Welcome to the third and final chapter of complete pandas library explained from start to end. If you haven’t read the initial articles, I would recommend you to read Chapter 1 and Chapter 2.

In this final chapter, we will cover the following contents.

- Aggregating data
- Grouping data
- Joining different files
- Writing data to Files
- Becoming Pandas Master from here.

This chapter is highly important, Let’s take a deep breath and start.

**Aggregating Data**

If you want to aggregate the data using one or more data over a specified axis, Pandas have an agg() method for that. Let’s understand using the example of the iris dataset.

Suppose we have a requirement to aggregate the data on the basis of sum, minimum value or maximum value of data columns. The below code of agg() will help to achieve that.

You must be wondering, whats use of this function, why can’t we use describe method for the count, min, max, etc. You are absolutely correct, you can use. But in aggregation, there are tons of default methods which you can use. You can even create your custom aggregation function just like count, sum. Let’s see an example.

**Custom Aggregation Functions in Pandas**

**Grouping data**

The group by operation involves splitting the data, applying a function and combining the results.

The above diagram shows the splitting of the input data in groups, then applying the function on each of the group, then combining the results. There are classes from 1 to 4, having different marks. The splitting step involves splitting of each class into four groups i.e. class 1, class 2, … class 4. Then applying function, in the example we applied mean function. The final step involves combining the records.

Hands on Example with iris dataset.

Bonus: Using Custom Function in group by

**Joining different files**

In real-world problems, the data is rarely a single file. Generally, it is present in different files. We merge all files into a single unit based on our need, after that we continue our analysis.

In this section we will learn how to join different files in pandas.

Before divings, lets see the different types of joins:

**Inner Join/Inner Merge:**In the inner merge, we keep the rows which are common between the left dataframe and right data frame.

**Left Join/Left Merge:**In this merge, we keep the rows of the left dataframe, where there are no matching records in the right dataframe they got replaced by NAN.

**Right Join/Right Merge:**In this merge, we keep the rows of the right dataframe, where there are no matching records in the left dataframe they got replaced by NAN.

**Full Outer Join/Outer Merge:**In this merge, we keep rows of both dataframes, where there are no matching records, they got replaced by NAN.

Syntax:

You need to mention:

- Left Dataframe
- Right Dataframe
- On -> The column which is common in both dataframe, sometimes the column names of left dataframe and right dataframe are different in that case use. left_on for the left dataframe column name and right_on for the right dataframe column name.
- how -> Specify which join.

**Deep Diving to Joins:**

**Left Merge:**

See in the example below, if we have a left join, we will keep all the rows of the left dataframe. If no respective records are found in right dataframe replace the same with NAN. Just like we have Class_ID as 1, there are no matching records in Right dataframe hence it will be replaced with NAN. Also, we will discard the right non-matched records, like Class ID 10 and 11.

**Example:**

**Right Merge**

See in the example below, if we have a right join, we will keep all the rows of the right dataframe. If no respective records are found in left dataframe replace the same with NAN. Just like we have Class_ID as 10,11, there are no matching records in left dataframe hence it will be replaced with NAN. Also, we will discard the right non-matched records, like Class ID 1.

**Example:**

**Outer Merge:**

In the example below, if we have an outer join, we will keep all the rows of the right dataframe and left dataframe. If no respective records are found in any of the dataframe replace the same with NAN.

**Example:**

**Inner Merge (Default)**

If we don’t specify how parameter the panda’s default will use inner merge, which will only consider the common records of both dataframes based on “on” parameter. Please see the image below:

**Example:**

**Writing data to Files**

Whether you are doing any data science project or participating in a kaggle competition, the writing of the data to the files is extremely important. There are plenty of options available in pandas for writing files to different extensions csv, excel, json, etc. Just write dataframe.to_csv for CSV to_excel for excel, etc.

Example for writing data to a JSON Format

**Becoming Pandas Master from here**:

Congrats! You have covered pandas and now you have an in-depth understanding of it. But this is not the end, mastering pandas involves an immense hands on. Since you are ready for the hands-on session.

Please find the below link of hands-on practicing pandas for machine learning

Pandas Handson Exercise – Kaggle

If you have any queries or suggestions. Please drop a comment below. Will happy to help you :).

]]>This is the second chapter of the series, “Complete Pandas Library Explained from Start to End”. If you haven’t seen the introductory post, I will encourage you to please do some hands-on following Chapter 1 Link.

The table of contents of the chapter 2 are:

- Dataframe Operations for Getting a high-level understanding of Data.
- Different ways to select particular Columns, Rows and Filtering the data.

**1. Dataframe Basic Methods.**

As you know, In the previous chapter we learned about dataframe, series data structures and importing the dataset. Now, after importing your data, understanding the high-level data is most important. Whether you are Kaggle competition winner or working in the top notch MNC, The high-level data understanding is done by every data scientist. Pandas have collections of functions in the dataframe which provide a high-level overview of data.

Let’s go back to our previous example. Importing an iris dataset in the data frame.

For retrieving statistical information about a dataframe, we have describe() function.

Syntax:

`df.describe()`

You can clearly see the high level statistical overview of data ie. count, mean, min, max and std of sepal length, sepal width, petal length and petal width.

Apart from this, Some other basic functions of pandas are:

The **shape**** **method will return the tuple having dimensionality of dataframe.

The ** ndim **method will return the dimensions of the underlying data

The ** size **method will return the number of elements of underlying data

The ** dtypes **method will return the data type of object

The ** values **method will return the Series as ndarray

**2. Different ways to select particular Columns and Rows.**

**Selecting Columns**

In order to select a particular column in dataframe, use any the following syntax.

```
#1. Create a list of columns
column_list = ["First","Second"]
#2. Select columns in dataframe passing the column list to df
df[column_list]
#3. Another way using loc
df.loc[:,column_list]
#4 Another way using iloc (This way will accept index of columns)
df.iloc[:,2:5]
```

Note: iloc has [:,:] i.e [START_ROW_INDEX:END_ROW_INDEX+1 and after, START_COLUMN_INDEX:END_COLUMN_INDEX+1] whereas loc has [:, List] we can select rows specifying at first and after the comma, we can pass the selected column list.

**Selecting Rows:**

In order to select a particular row in dataframe, use any the following syntax.

# 1`. using numerical indexes - iloc`

`df.iloc[START_INDEX:END_INDEX+1, :]`

`# 2. using labels as index - loc (The below example will happen if the default index is used )`

`row_index_to_select = [0, 1, 2, 3] df.loc[row_index_to_select]`

loc is used when we used labels as an index, we can directly search index searching become really fast.

Please follow the below example for better clarity.

For loc, let’s make sepal length as an index , and then search for the rows which as sepal length as 5.0

**Filtering data**

In the real world examples, a case can arise if you have to filter out a record which is not in the index. The above methodology will help you to achieve that.

Just take an example we need to select records in the iris dataset whose petal length is greater than 5 and sepal length is greater than 6.

NOTE:Please use the brackets otherwise you will encounter error in expression.

Congrats! You have finished the second chapter. Now, you can read the final chapter.

If you have any queries or suggestion. Please leave a comment below. Will help you as soon as possible.

]]>Making data easier to read, preprocessing and removing the noisy data is the Data Scientists day to day tasks.

Pandas is the open source library used by Machine Learning people for Data Analysis and Manipulation

If you are starting your machine learning journey. You will come across the buzzword called Pandas. So I will explain you the complete Pandas from Beginning to End.

I have divided the post into three Chapters – Chapter 1, Chapter 2, Final Chapter

**Contents of Chapter 1:**

- Why we need Pandas Library.
- Introduction to Data frames and Series
- Different ways to Import a Dataset in Pandas

1. **Why we need Pandas Library**:

The Initial steps of the machine learning is to gather the data, then we need to prepare the data. So, in order to perform the Data analysis and manipulation easier we need Pandas. Internally pandas library is build on the top of Numpy and Matplotlib.

When we import the data from the different sources, we may need to join them together into a single place, do some statistical data analysis and dealing with the missing or noisy data. Pandas can do it all for you, the library is pretty helpful.

**Importing the library:**

`import pandas as pd `

**2. Introduction to Data frame and Series.**

Before we practically deep dive into Pandas, lets understand the data structures Dataframe and Series.

**Series: **

Series is one dimensional array holding any one data type i.e. int, string, float, Python objects etc.

*Syntax of Series:*

`series = pd.Series(data= YOUR_DATA , index= INDEX)`

The index plays an important role as it is the axis labels of data. Length of “data” should be equivalent to the length of index. **Note:** Its okay, if you don’t specify the index, in such case pandas will create an automatic index for you having values [0,1,2,3 …. N], where N is the length of the data.

**Tips:**

You can specify the Series data and index individually using list , or you can specify the python dict which has key value pairs, key will represent index and value will represent the data values.

**Examples:**

**Way 1** : **(Series Created with Index and Data)**

**Way 2** : **(Series Created with dict having key as index and values as data points)**

**Way 3: (Series without index) **

**Note:** You can try out adding two series, you can see the elements having similar index will get added.

** Example** that you can try out:

Having a question? What if we add two series which differ in indexes. Let’s try out.

Since we all indexes are different, hence the result will produce a null values.

**Data frames:**

Dataframe is 2 dimensional labelled data structure with columns of different data types. You can think of a spreadsheet with columns and rows. Each column can hold different data type. We can also say the Dataframes are collection of series.

*Syntax:*

`pd.DataFrame(data= DATA_GOES_HERE)`

**Example:**

As you can clearly see we have passed a collection of series to data frame specifying column names.

Now you are done with the basic data structures of Pandas.

Before we head towards importing Dataset in pandas, We have a head function in dataframe `df.head()`

, which helps in returning the top 5 rows of the dataframe. You can alter this “5” number say 10, you use `df.head(10)`

**3. Different ways to Import a Dataset in Pandas **

Since we have completed the basics, In the real world data we have to read the data of the various formats. So, now we will learn how to import various formats data to a pandas Data Frame.

Just a recap, we have series which has 1 dimension data and Dataframe has 2 dimensional labelled data with columns.

**Importing a CSV File.**

`df = pd.read_csv("https://URLGoesHere")`

**Importing an Excel File.**

`df = pd.read_excel("https://remote_url")`

Similarly we can import the data of various formats. Different functions are available in pandas such as:

*read_clipboard, read_feather, read_html, read_json, read_sas, read_sql, read_table etc.*

**Example:**

Congrats ! You have covered first chapter of pandas series.

If you have any comments and suggestions, Please drop a comment below.

]]>I often see many questions by people (mostly self-learners) newly coming into this grossing field that “how can we start to learn Machine Learning?” and “What is the curriculum that every self-learner must follow?”.

I already created a separate post for how can I learn machine learning? and this is the post for second question i.e. what is the curriculum of machine learning for self-learners?

There are plenty of curriculums already made over the internet but I connected with various data scientists in my connections having rich experience in this field.

So after summing up all of the discussion. I am creating this curriculum for you. You can follow that in order to master machine learning.

**Note:** This covers only Machine Learning not deep learning

I have divided the curriculum into three levels.

- Level 1 (Complete Novice).
- Level 2 (Knows Mathematics but not Data Science basics).
- Level 3 (Knows Mathematics, Data Science basics but not ML).

According to your level of expertise you can iterate between different levels accordingly.

I am specifying a high-level curriculum for Level 1 and Level 2. As it is the prerequisites required for learning machine learning.

The first level involves learning mathematics.

**Learning Mathematics:**

- Linear Algebra
- Calculus
- Probability and Statistics

If you are wondering where to learn. Please follow this [Link]

The second level involves learning Data Science.

For additional resources , Please follow this [Link]

Machine Learning Curriculum

- Importing the Dataset + Practice
- Exploratory Data Analysis + Practice
- Data Preprocessing + Practice
- Handling Missing Data + Practice
- Feature Scaling and Selection + Practice
- Scikit Learn Library
- Bias Variance Tradeoff
- Introduction to Supervised Learning + Practice
- Linear Regression with one variable + Practice
- Linear Regression with multiple variables and Regularization + Practice
- SVMs + Practice
- Logistic Regression + Practice
- Naive Bayes + Practice
- Decision Tree + Practice
- Introduction to Ensembles + Practice
- Random Forests + Practice
- K-Nearest Neighbour + Practice
- PCA + Practice
- Introduction Unsupervised Learning
- Clustering – DBScan, KMeans + Practice
- Cross-Validation and Grid Search CV + Practice
- Stochastic Gradient Descent for Classification and Regression + Practice
- Time Series Analysis + Practice
- Bagging and Boosting Techniques + Practice
- XGBoost, CATBoost, LighGBM + Practice
- Kaggle Ex-Competition Practice

References: [SciKit Learn Documentation]

I will post the series of Intuitions and Jupyter notebooks in upcoming posts using the prefix “ML-Series” or you can also find them by looking into the category “Machine Learning Series”. Stay Tuned for more updates.

If you have any queries or suggestions, please feel free to drop a comment below.

]]>

“Machine Learning is one of the grossing field, where computer learns to perform a specific task without intervention of humans.“

**G**enerally learning machine learning is not very difficult if you know the pre-requisites. So let’s start with some prerequisites required for learning Machine learning.

**Pre-Requisites:**

1. The first and foremost prerequisite is “**Motivation**“. If you have enough motivation within you, that you can learn Machine learning Congrats! you are fulfilling the first prerequisite criterion.

**Note:** Mostly websites directly starts with other prerequisites without knowing that this is the most essential criterion.

2. **Mathematics:**

Mathematics is the backbone of machine learning. You need to be comfortable with the numbers, to excel machine learning. If you are a computer programmer or a student, you need to brush up your mathematics.

**Topics of Mathematics:**

- Linear Algebra
- Calculus
- Probability & Statistics.

Having a question? Where to learn Mathematics?

There are different types of learners, some prefer to learn from books, some from MOOCs, etc. So, I am specifying some of the Books & MOOCs for mathematics.

**Best Moocs for Mathematics:**

Linear Algebra – Khan Academy (Course)

3Blue1Brown – Essence of Calculus (Course).

Statistics and Probability – Khan Academy (Course)

**Best Books for Mathematics:**

Introduction to Linear Algebra by Gilbert Strang (Book)

Naked Statistics by Charles Wheelan (Book)

Calculus by Michael Spivak (Book)

3. Now, Pick up a programming Language of your choice. If you are complete novice then choose Python, it’s really easy and powerful. If you already know programming language then you are good with this step.

After getting some grip in a programming language, start with the Data Science with Python – Numpy, Seaborn, Matplotlib, Pandas, SKLearn, etc.

**Best MOOCS:**

Data Science With Python – University of Michigan (Coursera)

Introduction to Data Analysis (Udacity)

**Best Books:**

Python Data Science Handbook: Essential Tools for Working with Data

**Machine Learning**

Now you are done with all the prerequisites. Let’s start with Machine Learning.

The two aspects of Machine Learning – One is the **theory **and one is **practical**. If you know the intuition or theory behind the algorithm, the practical implementation is very simple in python (Scikit learn).

The best course of the Internet for developing the intuition behind ML is Andrew NG’s Stanford youtube playlist, not the Coursera one. Don’t worry too much about mathematics, just try to learn the intuition behind the algorithm.

For practicals, Machine Learning A-Z is best

Best MOOC for Theory : Andrew NG Stanford Lectures [Link]

Best MOOC for Practical : Machine Learning A-Z [Link]

**Bonus Tips**

- Don’t focus too much on the syntax of the code. Just start with the flow. With the passage of time, you will be an expert.
- Start contributing to Kaggle, as soon as you start the Data Science Basics.
- Don’t lose motivation, you can’t become a master of Machine learning overnight :). No course will help you, which states learn machine learning in 24 hours.
- Talk to the Data science people make your connections.

I have connected with many data scientists, with the majority of voting – created a curriculum for the machine learning journey. Please follow this post – ” Ensemble Machine Learning Curriculum [Created By ML Experts] for Self Learners ”

If you still have queries, please feel free to drop a comment below.

]]>