Reviewing Udacity's Introduction To Machine Learning Course

By Raj Rajhans -

March 26th, 2020

3 minute read

I recently completed Udacity’s free UD 120 Introduction to Machine Learning. This post is a summary of what the course has to offer.

Here is the link to my GitHub repo which includes all code files for the mini projects done in the course.

1. Introduction to Machine Learning

The course started with an introduction to machine learning, various types of machine learning algorithms – supervised and unsupervised, where they might be applicable. Various types of supervised and unsupervised algorithms and their basic differences and applications were discussed.

2. Naive Bayes Classification

The first algorithm was basic Naive Bayes classification algorithm, with a mini project on classifying an email according to it’s author.

3. Support Vector Machines (SVMs)

In the third lesson, SVM algorithm was introduced. The mini project done in the first lesson was implemented using the SVM algorithm. The two algorithms were compared based on accuracy, time required to predict, time required to train, etc.

4. Decision Trees

In the fourth lesson, Decision Tree algorithm for classification was introduced. Similar comparative analysis was done based on accuracy, time required to predict and time required to train for the email author classification mini project.

5. Datasets and Questions

In this part of the course, importance of datasets in machine learning was elaborated. The Enron Corpus dataset was introduced, along with a discussion on what questions one should ask the dataset to get an idea about the dataset.

6. Regression

Linear Regression algorithm was introduced, along with a mini project on predicting Enron employee’s bonuses based on their salaries. Various metrics to evaluate a regression like SSE, R Squared were discussed.

7. Outliers

Outliers and their significance was introduced, along with methods to clean outliers. The mini project included identifying outliers in Enron’s employee salary and bonuses data.

8. Clustering

Unsupervised learning was introduced along with algorithms like k means clustering. In the mini project for this lesson, clustering was applied on Enron’s employee salary data.

9. Feature Selection

Identifying the most important features of your data using human intuition as well as algorithms.

10. Feature Scaling

How to preprocess data with feature scaling to improve your algorithms. min max scaler in sklearn was introduced.

11. Text Learning

Using text data in machine learning. Concepts like Bag of Words, stemming, TfIfd Vectorizer, etc were introduced. In the mini project, text learning algorithms were applied on the Enron Corpus’s employee emails data.

12. Principal Components Analysis

PCA was introduced. Use of PCA in feature selection and in unsupervised learning was discussed. In the mini project, PCA was used on images of past 10 presidents of USA to classify a new image.

13. Validation

Validating a machine learning algorithm, splitting your dataset into training testing parts using sklearn, cross-validation technique, Grid Search Cross Validation for parameter tuning were introduced.

14. Evaluation Metrics

Metrics to evaluate a machine learning algorithm like Accuracy, Precision, Recall, F1 Score were introduced.

Verdict

Udacity UD120 is a really good course which gives a high level introduction to many concepts in machine learning with a lot of hands on practice with mini projects in each lesson that use scikit learn in Python. As a beginner in Machine Learning, it really helped me a lot. The only problem is that a lot of sklearn functions used in the course have been deprecated in sklearn’s latest version and you will have to research on the alternatives for the deprecated functions.

Project Case Study: WeTalk

Clearing up the confusion between Concurrency, Parallelism and Asynchrony

Raj Rajhans

Product Engineer @ invideo