# Learn Data Science Tutorial – Full Course for Beginners

Learn Data Science is this full tutorial course for absolute beginners. Data science is considered the “sexiest job of the 21st century.” You’ll learn the important elements of data science. You’ll be introduced to the principles, practices, and tools that make data science the powerful medium for critical insight in business and research. You’ll have a solid foundation for future learning and applications in your work. With data science, you can do what you want to do, and do it better. This course covers the foundations of data science, data sourcing, coding, mathematics, and statistics.

💻 Course created by Barton Poulson from datalab.cc.

🔗 Check out the datalab.cc YouTube channel: https://www.youtube.com/user/datalabcc

🔗 Watch more free data science courses at http://datalab.cc/

⭐️ Course Contents ⭐️

⌨️ Part 1: Data Science: An Introduction: Foundations of Data Science

– Welcome (1.1)

– Demand for Data Science (2.1)

– The Data Science Venn Diagram (2.2)

– The Data Science Pathway (2.3)

– Roles in Data Science (2.4)

– Teams in Data Science (2.5)

– Big Data (3.1)

– Coding (3.2)

– Statistics (3.3)

– Business Intelligence (3.4)

– Do No Harm (4.1)

– Methods Overview (5.1)

– Sourcing Overview (5.2)

– Coding Overview (5.3)

– Math Overview (5.4)

– Statistics Overview (5.5)

– Machine Learning Overview (5.6)

– Interpretability (6.1)

– Actionable Insights (6.2)

– Presentation Graphics (6.3)

– Reproducible Research (6.4)

– Next Steps (7.1)

⌨️ Part 2: Data Sourcing: Foundations of Data Science (1:39:46)

– Welcome (1.1)

– Metrics (2.1)

– Accuracy (2.2)

– Social Context of Measurement (2.3)

– Existing Data (3.1)

– APIs (3.2)

– Scraping (3.3)

– New Data (4.1)

– Interviews (4.2)

– Surveys (4.3)

– Card Sorting (4.4)

– Lab Experiments (4.5)

– A/B Testing (4.6)

– Next Steps (5.1)

⌨️ Part 3: Coding (2:32:42)

– Welcome (1.1)

– Spreadsheets (2.1)

– Tableau Public (2.2)

– SPSS (2.3)

– JASP (2.4)

– Other Software (2.5)

– HTML (3.1)

– XML (3.2)

– JSON (3.3)

– R (4.1)

– Python (4.2)

– SQL (4.3)

– C, C++, & Java (4.4)

– Bash (4.5)

– Regex (5.1)

– Next Steps (6.1)

⌨️ Part 4: Mathematics (4:01:09)

– Welcome (1.1)

– Elementary Algebra (2.1)

– Linear Algebra (2.2)

– Systems of Linear Equations (2.3)

– Calculus (2.4)

– Calculus & Optimization (2.5)

– Big O (3.1)

– Probability (3.2)

⌨️ Part 5: Statistics (4:44:03)

– Welcome (1.1)

– Exploration Overview (2.1)

– Exploratory Graphics (2.2)

– Exploratory Statistics (2.3)

– Descriptive Statistics (2.4)

– Inferential Statistics (3.1)

– Hypothesis Testing (3.2)

– Estimation (3.3)

– Estimators (4.1)

– Measures of Fit (4.2)

– Feature Selection (4.3)

– Problems in Modeling (4.4)

– Model Validation (4.5)

– DIY (4.6)

– Next Step (5.1)

Thank you ❤. I'm not a techy person. However the way you handle the material is so smooth, so compelling that my fears have disappeared and I'm ready to start learning

Where do we use math in this why people say you should know math to learn data science

finished the full 6 hour course. very imformative . i was thinking of learning R for my research analysis. before that i thought i need some foundation on data science and i saw this video. thanks a lote❤

Amazing course! I'm having a turn in my career after 10 years of school teaching, and this course is just what I needed to complement my studies in DS. Just an

errata: at 2:13:20 you mentioned "data de novo" as your personal expression for the concept of "new data". I don't know you if it meant for this expression to be extracted from portuguese language or latin, but if you took it from portuguese, the adequate expression should be "Nova Data" or "Novo Dado". In portuguese "de novo" means the same as "again" or to do something one more time.

## @kaykwanu

🎯 Key Takeaways for quick navigation:

00:02 Data

Science Creativity02:48 Data

Inclusivity Insight03:42 Data

Science Demand08:07 Data

Science Ingredients11:49 Data

Science Pathway19:34 Data

Science RolesDiverse Data

ScienceTeamwork Makes

UnicornsData Science

vs. BIPrivacy, Anonymity,

ProprietaryCopyright, Data

SecurityPotential Bias,

Overconfidence01:05:08 Statistical

models utility.01:06:01 Machine

learning overview.01:09:08 Clear

communication crucial.01:14:10 Simplify

presentation graphics.01:19:33 Actionable

insights importance.Clear, simple

chartsStorytelling with

dataReproducible research

01:46:31 Metrics

& Methods Balance01:48:24 Accuracy

Metrics Overview01:51:00 Social

Context Awareness01:54:14 Data

Sourcing Methods02:01:23 Utilizing

APIs in Data RetrievalAPIs simplify

web dataScraping retrieves

web dataMind copyright

lawsExperimental Research

Benefits: Random assignment minimizes confounds.Challenges of

Experimentation: Training, time-consuming, expensive.A/B Testing

Overview: Compare webpage versions for optimization.A/B Testing

Tools: Optimizely, VWO for statistical analysis.Data Sourcing:

Explore, consider vendors, create new data.Importance of

Spreadsheets: Ubiquitous, versatile, essential for data manipulation.Tidy Data

Concept: Structured format crucial for analysis.Tableau for

Visualization: Powerful, insightful, available in free version.Download Tableau,

InstallBring in

DataCreate Graphs

03:08:10 Collaborative

OSF Analysis03:09:09 Diverse

Software Choices03:18:43 Web

Data BasicsStructure Data

with JSONR: Language

of DataPython: General

PurposeSQL: Language

of DatabasesC/C++/Java: Fast,

ReliableBash: Command

LineCommand line

interaction predates monitors.Shells wrap

around computer interaction.Bash and

PowerShell are common shells.Bash utilities

focus on simplicity.Regular expressions

are powerful search tools.Mathematics is

vital for data science.Algebra is

foundational in data science.Linear algebra

is key for manipulating data.04:10:26 Matrix

representation explained.04:12:14 Linear

algebra benefits.04:17:34 Graphical

system solutions.04:21:10 Derivative

calculation.04:28:14 Maximizing

revenue.04:29:59 Optimize

Price Revenue04:31:41 Big

O Growth04:44:03 Arithmetic

Probability04:49:04 Test

result probability: 81.6%04:49:57 Positive

test: 32.1%04:57:37 Explore

data thoroughly05:07:48 Robust

statistics stability05:09:10 Resampling

principle explanation05:10:06 Transforming

variables concept05:26:55 Hypothesis

Testing Basics05:28:17 False

Positive Concept05:29:13 False

Negative Concept05:31:06 Critiques

of Hypothesis Testing05:31:55 Hypothesis

Testing Value05:32:49 Estimation

Introduction05:33:42 Confidence

Intervals Overview05:36:03 Accuracy

vs Precision05:37:21 Interpreting

Confidence Intervals05:40:52 Estimators

Overview05:46:08 Measures

of Fit Explanation05:47:01 R2:

Measure variance.05:47:30 -2

Log-likelihood: Nested model fit.05:47:55 Model

variations: AIC, BIC.05:48:24 Chi-squared:

Observed vs. expected.05:48:53 Feature

selection: Reduce overfitting.05:49:19 Multicollinearity:

Predictor overlap.05:50:12 P

values: Individual predictor significance.05:50:40 Betas:

Standardized coefficients.05:51:10 Newer

methods: Dominance, Commonality, Relative Importance.05:51:40 Common

modeling problems: Non-Normality, Non-Linearity, Multicollinearity, Missing Data.05:52:09 Dimensionality:

Reducing variables.05:52:38 Model

validation: Bayes, Replication, Holdout, Cross-Validation.05:53:07 DIY

attitude: Start now.05:53:36 Beware

critics: Mistakes happen.05:53:56 Data

value: All data matters.05:54:05 Continuous

improvement mindset.05:54:42 Explore

and analyze.05:55:01 Domain

expertise matters.05:55:20 Start

now.-05:54:05Continuous improvement mindset.05:54:05 Additional

conceptual courses.05:54:05 Practical

hands-on tutorials.05:54:05 "Write

what you know".05:54:05 Domain

expertise importance.05:54:05 You

don't have to be perfect.05:54:05 Just

get started.Made with HARPA AI

🎯 Key Takeaways for quick navigation:

02:48 🌐

Data Science is inclusive analysis, involving all data to provide the most insightful answers to research questions.14:33 🌐

Data Science involves diverse skills and backgrounds, encompassing coding, statistics, math, and domain expertise, making it a compelling career alternative.23:09 🦄

Data Science Diversity: Data science is diverse, involving people with different goals, skills, and experiences working in various contexts, making it a rich and interconnected field.24:34 🤖

Unicorn Analogy: The term "unicorn" is used in data science to describe a mythical data scientist with universal abilities in coding, statistics, design, business, and management. However, in reality, such individuals are rare, and collaboration among specialists is more common.35:25 📊

Data Science vs. Statistics: Data science and statistics share common procedures, but they differ in training backgrounds, goals, and contexts, highlighting their conceptual distinctions despite overlapping elements.47:08 🧠

Data science analyses are limited simplifications; humans are essential for interpretation and application. Overconfidence in algorithmic results can lead to incorrect conclusions.48:25 🌐

Data science projects can't be neutral; algorithms reflect the biases of their creators. Good judgment is vital for the quality and success of a data science project.58:25 🧮

Math is foundational in data science; understanding procedures, addressing issues, and some manual calculations are essential for informed decisions.01:10:01 📈

When conducting data analysis, focus on maximizing the story to maximize value. Clearly align the narrative with specific goals, especially when answering client queries.01:13:47 📊

In presenting data analysis, adhere to the principle of being minimally sufficient. Embrace simplicity, as emphasized by quotes like "Everything should be made as simple as possible, but not simpler."01:19:33 🎯

Data science is goal-focused. When communicating results, provide specific, justifiable next steps based on the analysis. Consider the social, political, and economic context for actionable insights.[01:34:28 URL](https://youtu.be/ua-CiDNNj30?t=5668s)

📘 Use narrative methods like Jupyter Notebooks or RMarkdown to document and share the data analysis process, allowing for transparency and understanding of conclusions.[01:36:46 URL](https://youtu.be/ua-CiDNNj30?t=5806s)

🚀 Next steps after the tutorial: Explore coding in R or Python, dive into data visualization, brush up on statistics and math, explore machine learning, and consider community involvement in data science conferences and projects.[01:39:27 URL](https://youtu.be/ua-CiDNNj30?t=5967s)

🌐 Data science is democratic and essential for everyone. Encourages learning to work with data intelligently and sensitively, emphasizing its fundamental importance for all.02:12:52 🕵️♀️

Data scraping from webpages, PDFs, images, and media when no API is available. Code scraping examples in R and Python. Emphasizes respect for copyright and privacy to avoid legal issues.[02:16:06 URL](https://youtu.be/ua-CiDNNj30?t=9766s)

🎙️ Interviews are valuable for new situations or audiences, offering open-ended information. Structured interviews have predetermined questions, while unstructured interviews are more conversational and varied.[02:18:26 URL](https://youtu.be/ua-CiDNNj30?t=11006s)

📊 Surveys are effective for obtaining data quickly, but clarity in question wording and response scales is crucial. Beware of bias, and ensure the questions align with the audience's understanding.[02:25:42 URL](https://youtu.be/ua-CiDNNj30?t=15442s)

⚗️ Laboratory experiments are crucial for determining cause and effect, requiring specialized training. They offer reliable information but can be time-consuming and expensive.02:40:02 🧹

Spreadsheets excel in tasks like data browsing, sorting, rearranging, finding/replacing, formatting, transposing, tracking changes, creating pivot tables, and arranging output for consumption.02:41:51 🔍

When working with spreadsheets, maintaining "Tidy Data" is crucial for easy transfer between programs. Tidy Data involves having equivalent columns for variables and rows for cases, ensuring one sheet per file, and maintaining a consistent level of measurement per file.02:54:39 📊

SPSS (Statistical Package for the Social Sciences) is a desktop program used in academic and business research, known for its point-and-click interface and drop-down menus. The program is available for free for students, with paid versions for others.02:59:14 📊

SPSS provides various options for data analysis, including descriptive statistics and visualization tools like stem-and-leaf plots and box plots.03:02:22 🆕

JASP, a free and open-source alternative to SPSS, offers intuitive features, replicability, and includes Bayesian approaches.03:09:09 💻

A wide range of data science tools, including SAS, Stata, MATLAB, Wolfram Alpha, RapidMiner, KNIME, SOFA Statistics, and more, are discussed with considerations for functionality, ease of use, community support, and cost.03:34:45 🐍

Python, a general-purpose language, is popular in data science. It has a vast community, multiple versions (2.x and 3.x), and interfaces like Jupyter. Python's strength lies in numerous packages, including NumPy, Pandas, and scikit-learn.03:40:12 💽

SQL, the language of databases, is crucial in data science. It excels in relational databases (RDBMS), with popular choices like Oracle, SQL Server, MySQL, and PostgreSQL. SQL minimizes data redundancy and is often used via GUIs like SQL Developer.03:43:44 📊

SQL commands: Learn essential SQL commands, including SELECT, FROM, WHERE, and ORDER BY, for efficient data extraction and organization from relational databases.03:44:41 ⚙️

Data Science Languages: C, C++, and Java serve as foundational languages in data science, particularly for the back end, offering speed and reliability.04:06:51 🧮

Algebra is crucial to data science, enabling the combination of scores and various manipulations. Linear algebra, also known as matrix algebra, is the next step, representing data with vectors and matrices, making computations more efficient.04:29:59 📈

Lowering the cost by 20%, from $500 to $400 per year, can increase sales by 33%, leading to a 7% increase in total revenue.04:35:51 📊

Understanding Big O helps optimize algorithms, considering time and space complexity, crucial for efficient data processing.04:41:19 🎲

Probability calculations involve adding or multiplying probabilities, considering overlaps and conditional probabilities.04:49:31 🔄

Positive test result doesn't guarantee disease; Bayes theorem crucial for accurate probability calculations.04:51:23 🎯

Focus on goals in data science; understand procedures, diagnose problems, and prioritize meaning.05:09:40 📊

Tukey's ladder of powers helps transform skewed data; exploring numerical distributions aids in understanding data stability, outliers, and skewness.05:11:04 📈

Descriptive statistics involve center, spread, and shape. Common measures include mode, median, mean for center; range, interquartile range, variance, and standard deviation for spread.05:19:02 🔄

Measures of spread (range, interquartile range, variance, and standard deviation) have pros and cons; variance and standard deviation are less intuitive but more useful in data science.05:20:47 📊

Understanding the shape of the distribution (symmetrical, skewed, unimodal, bimodal, uniform, u-shaped) is crucial for interpreting numerical summaries like mean and standard deviation.05:26:01 📊

Inferential statistics involve sampling data from populations, adjusting for sampling error; common approaches include hypothesis testing (null and alternative hypotheses) and estimation.05:51:10 🔄

Multicollinearity, the association between predictors, poses challenges in regression analysis. Methods like stepwise regression, commonality analysis, dominance analysis, and relative importance weights help address multicollinearity issues.05:51:40 📊

Non-Normality, Non-Linearity, Multicollinearity, and Missing Data are common problems in modeling. Skewed distributions, outliers, and mixed distributions impact the assumptions of statistical procedures. Strategies include data transformation, polynomial terms, and reducing variables.05:52:38 📉

Model Validation is crucial for assessing the generalizability of statistical models. Approaches like Bayesian methods, replication, holdout validation, and cross-validation help evaluate model performance on different data sets.05:53:36 🛠️

Adopt a DIY (Do It Yourself) attitude in data science. Emphasize the importance of getting started, align methods with goals, focus on usability, and beware of trolls and critics. Acknowledge that no analysis is perfect, but the goal is to add value to the understanding of the data.Made with HARPA AI

