# Data Analysis with Python – Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn)

August 6, 2024 2024-08-06 1:32# Data Analysis with Python – Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn)

## Data Analysis with Python – Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn)

Learn Data Analysis with Python in this comprehensive tutorial for beginners, with exercises included!

NOTE: Check description for updated Notebook links.

Data Analysis has been around for a long time, but up until a few years ago, it was practiced using closed, expensive and limited tools like Excel or Tableau. Python, SQL and other open libraries have changed Data Analysis forever.

In this tutorial you’ll learn the whole process of Data Analysis: reading data from multiple sources (CSVs, SQL, Excel, etc), processing them using NumPy and Pandas, visualize them using Matplotlib and Seaborn and clean and process it to create reports.

Additionally, we’ve included a thorough Jupyter Notebook tutorial, and a quick Python reference to refresh your programming skills.

💻 Course created by Santiago Basulto from DataWars

🔗 Check out all Data Science courses from DataWars: https://datawars.io/ref=fcc

⚠️ Note: Instead of loading the notebooks on notebooks.ai, you should use Google Colab instead. Here are instructions on loading a notebook directly from GitHub into Google Colab: https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb#scrollTo=K-NVg7RjyeTk

⭐️ Course Contents ⭐️

⌨️ Part 1: Introduction

What is Data Analysis, why Python?, what other options are there? what’s the cycle of a Data Analysis project? What’s the difference between Data Analysis and Data Science?

🔗 Slides for this section: https://docs.google.com/presentation/d/1XXhVx2a7z2GrG5qddIyLFk4T_5s5mmdqSptDGBD9hWk/edit?usp=sharing

⌨️ Part 2: Real Life Example of a Python/Pandas Data Analysis project (00:11:11)

A demonstration of a real life data analysis project using Python, Pandas, SQL and Seaborn. Don’t worry, we’ll dig deeper in the following sections

🔗 Notebooks: https://github.com/rmotr-curriculum/FreeCodeCamp-Pandas-Real-Life-Example

⌨️ Part 3: Jupyter Notebooks Tutorial (00:30:50)

A step by step tutorial to learn how to use Juptyer Notebooks

🔗 Twitter Cheat Sheet: https://twitter.com/rmotr_com/status/1122176794696847361

🔗 Notebooks: https://github.com/rmotr-curriculum/ds-content-interactive-jupyterlab-tutorial

⌨️ Part 4: Intro to NumPy (01:04:58)

Learn why NumPy was such an important library for the data-processing world in Python. Learn about low level details of computations and memory storage, and why tools like Excel will always be limited when processing large volumes of data.

🔗 Notebooks: https://github.com/rmotr-curriculum/freecodecamp-intro-to-numpy

⌨️ Part 5: Intro to Pandas (01:57:08)

Pandas is arguably the most important library for Data Processing in the Python world. Learn how it works and how its main data structure, the Data Frame, compares to other tools like spreadsheets or DFs used for Big Data

🔗 Notebooks: https://github.com/rmotr-curriculum/freecodecamp-intro-to-pandas

⌨️ Part 6: Data Cleaning (02:47:18)

Learn the different types of issues that we’ll face with our data: null values, invalid values, statistical outliers, etc, and how to clean them.

🔗 Notebooks: https://github.com/rmotr-curriculum/data-cleaning-rmotr-freecodecamp

⌨️ Part 7: Reading Data from other sources (03:25:15)

🔗 Notebooks: https://github.com/rmotr-curriculum/RDP-Reading-Data-with-Python-and-Pandas

⌨️ Part 8: Python Recap (03:55:19)

If your Python or coding skills are rusty, check out this section for a quick recap of Python main features and control flow structures.

🔗 Notebooks: https://github.com/rmotr-curriculum/ds-content-python-under-10-minutes

—

Learn to code for free and get a developer job: https://www.freecodecamp.org

Read hundreds of articles on programming: https://freecodecamp.org/news

source

## Comments (37)

## @freecodecamp

⚠️ Note: Instead of loading the notebooks on notebooks.ai, you should use Google Colab instead. Here are instructions on loading a notebook directly from GitHub into Google Colab: https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb#scrollTo=K-NVg7RjyeTk

The code links in the description have been updated to the content stored on GitHub.

## @fifadroids3382

from where I can get csv files and resources that you are using in your tutorials ?

## @Goti2005

is this relevant in 2024?

## @adityavulli3650

38:00

## @oyelaranpaul

I love this session, please i want to ask if I can get link to the sales dataset used. thanks

## @prithaguha2918

I am getting error in Jupiter as head is not recognized pls help

## @lethukuthulastayieh9249

What the name of IDE used ?

## @trivenim6877

The notebook for Python recap is not available in the link provided. Can you please provide it?

## @84poudyal24

2:47:18

## @84poudyal24

1:57:08

## @84poudyal24

1:34:00

## @charugoyal4782

where can i find the data you are using that is sales data ?

## @mohsenkalani

I never could imagine to find such an invaluable complete course for FREE in YouTube. I can not find words to appreciate.

## @YouTubeShortsyt

1:29:12 if u want to skip the low-level numpy

## @shaderone07

18:15 – i tried to run it but it throwed error saying non-numeric columns are also being selected.

this helped :

“`

import pandas as pd

def drop_non_numeric_columns(df):

"""Drops all non-numeric columns from a pandas DataFrame.

Args:

df (pd.DataFrame): The DataFrame to process.

Returns:

pd.DataFrame: A new DataFrame containing only numeric columns.

"""

# Get all numeric data types

numeric_types = ['int64', 'float64']

# Select columns with numeric data types

numeric_columns = df.select_dtypes(include=numeric_types).columns

# Drop all other columns (effectively non-numeric)

df_numeric = df[numeric_columns]

return df_numeric

# Example usage

sales_numeric = drop_non_numeric_columns(sales.copy()) # Make a copy to avoid modifying original

sales_numeric

corr = sales_numeric.corr()

corr

“`

I just started so i don't know whether is the 'right' way..but it works.

## @littlecodely1265

Can you please! provide csv file on which u guys are explaining so that we can practice it

## @littlecodely1265

Can you please! provide csv file on which u guys are explaining so that we can practice it

## @littlecodely1265

Can you please! provide csv file on which u guys are explaining so that we can practice it

## @littlecodely1265

Can you please! provide csv file on which u guys are explaining so that we can practice it

## @littlecodely1265

Can you please! provide csv file on which u guys are explaining so that we can practice it

## @littlecodely1265

Can you please! provide csv file on which u guys are explaining so that we can practice it

## @littlecodely1265

Can you please! provide csv file on which u guys are explaining so that we can practice it

## @littlecodely1265

Can you please! provide csv file on which u guys are explaining so that we can practice it

## @littlecodely1265

Can you please! provide csv file on which u guys are explaining so that we can practice it

## @sayanbose9287

1. Table of Content 1:45

2. Introduction 2:52

2.1 What is data analysis 2:52

2.2 Data analysis tools 4:38

2.3 Data analysis process 7:31

2.4 Data Analysis vs Data Science 8:56

2.5 Python and PyData Ecosystem 9:28

2.6 Python data analysis vs Excel 9:46

3. Real example data analysis with Python: getting a sense of what you can learn from this course 11:00

4. How to use Jupyter Notebooks 30:50

5. Intro to NumPy 1:04:58

5.1 Low-level basis: binary numbers, memory footprint 1:09:32

5.2 Python is not memory efficient to store numbers since it wraps everything into objects. Whereas in NumPy, we can select the number of bits to represent numbers 1:22:50

5.3 NumPy can compute arrays faster than Python 1:24:58

5.4 NumPy tutorial: NumPy arrays, matrices 1:29:47

5.5 Memory footprint and performance: Python vs NumPy 1:53:14

6. Intro to Pandas: getting, processing and visualizing data 1:56:58

6.1 Pandas data structure: Series 1:58:41

6.2 We can change the index of Pandas series and this is fundamentally different from NumPy arrays 2:02:55

6.3 The upper limit of slicing in Pandas series is included, whereas, in NumPy, the limit is excluded 2:07:55

6.4 Pandas data structure: DataFrames 2:14:36

6.5 Most operations in Pandas are immutable 2:29:10

6.7 Reading external data 2:36:47

6.8 Pandas plotting 2:44:41

7. Data cleaning 2:47:18

7.1 Handling miss data 2:51:40

7.2 Cleaning invalidate values 3:03:17

7.3 Handling duplicated data 3:06:09

7.4 Handling text data 3:11:05

7.5 Data visualization 3:13:41

7.6 Matplotlib global API 3:14:25

7.7 Matplotlib OOP API 3:18:27

8. Working with data from(/to) SQL, CSV, txt, API etc. 3:25:15

8.1 Python methods for working with files 3:26:37

8.2 Python methods for working with CSV files 3:29:33

8.3 Pandas methods for working with CSV files 3:30:05

8.4 Python methods for working with SQL 3:36:17

8.5 Pandas methods for working with SQL 3:38:58

8.6 Pandas methods for working with HTML 3:43:09

8.7 Pandas methods for working with Excel files 3:49:56

9. Python recap 3:55:18

## @i4imranmurtaza1

Really junk of knowledge

## @mariamnawaz9855

22:20

## @leonabbassi5888

Worst Data Analysis tutorial, not for beginners

## @Opoliades

There is a small typo* when generating the dynamic plot with Bokeh. It seems that the title of the plot i not equal to "legend", but now is "legend_label". The codesnippet cannot run without that change. timestamp in video: 1:00:15

https://youtu.be/r-uOLxNrNk8?t=3615

## @irfanncp

I am doing this tutorial for the first time. My question is that how to connect with 'data/skala.db'. I have to download this database first or directly connect, plz guide me

## @thecsslife

I'm following to 17:10 and all the sales['Unit_Cost'].plot() plots are plotting in one single graph. How do I get separate renderings for each plot?

## @ObinnaPaschalN

This guy is just wasting people's time. He was just scrolling the screen up and down.

"SHOW HOW"

Show us Practically HOW TO DO IT

DO IT PRACTICALLY WHY PEOPLE WATCH!!!

Instead of the full course which didn't add any value you would have chosen a single topic from the full course and show us the practical how to do it.

You are just there speaking only. No impact.

## @bbnoni3051

while i wa trying to do the exercise i encountered thi enter the url of the running jupyter server vscode

## @mortenvonsildskjde7847

Is it possible to land a data analysis / science job through datawars?

## @facundonieto1598

Seguir desde Conditional Selection (boolean arrays) 2:08:15

## @russfox77910105

I think the crypto api is not available anymore, which is a shame as I was trying to follow along….

## @moneyhustler1487

the course is kinda of messy, he doesn't teach the code, and he just speaks about what is happening after executing. this is not for beginners