🔬🖥📚💖💖💖
Data science is an interdisciplinary field that includes knowledge of:
- Programming in languages such as:
- Python (Recommended)
- R
- Scala
- Julia
Note: For now the majority of resources will be in the python language but others languages such as R will come soon. As well as, you don't need to learn all just pick one technology or skill. Usually, the most popular will be the one labeled as recommeded. But, feel free to choose the most convenient for you. 😀🤞
-
An understanding of:
- Data Structures and Algorithms
- Object Oriented Programming
- Functional Programming
-
Version Control
- Git (Recommended)
- Gitlab
- Bitbucket
-
Data Wrangling -> Please see the Data Analyst Roadmap
-
Data Visualization -> Please see the Data Analyst Roadmap
-
Visual Storytelling -> Please see the Data Analyst Roadmap
-
Probability and Statistics -> Please see the Data Analyst Roadmap
-
Calculus
- Including Derivatives & Integrals
-
Linear Algebra
- Matrices & Eigen Values
-
Relational Database Systems
- MySQL
- PostgreSQL
-
Non-Relational Databases a.k.a NoSQL
- MongoDB (Recommended)
- Cassandra
-
Machine Learning Algorithms -> Please see the Machine Learning Roadmap
- Supervised (Recommended)
- Unsupervised
- Clustering
- Reinforcement
-
Deep Learning -> Please see the Deep Learning Roadmap
- Artificial Neural Networks
- Convolutional Neural Networks
- Recurrent Neural Networks
- Deep Learning Platforms
- Tensorflow
- Pytorch (Recommended)
- Keras
- CNTK
-
Big Data at Scale
- Spark (PySpark)
- Hadoop Ecosystem
- Hbase
- MapReduce
- ZooKeeper
- Spark MLLib (Recommended)
- YARN
- Pig
- Hive
- Sqoop
- Oozie
-
Data Engineering -> please see the Data Engineering roadmap
-
Software Engineering Best Practices
- Writing pythonic code by using PEP8 Standards
- Testing and debugging
-
Machine Learning Deployment Fundamentals (ML/DevOps)
- Flask --> Backend
- FastAPI --> Backend
- Streamlit --> Front-end (Recommended)
- Using Heroku (Recommended), AWS, Azure --> Deployment platforms
As a consequence, some Data Science specializations overlap and some of their subfields can be studied separately. Such is the case for Data Analyst, Machine Learning, Deep learning Engineer and as well as Data Engineers. One of the first step to take if you are a beginner is to follow the Data Analyst, then proceed to Data Science, Machine Learning, Deep Learning and finally the Data Engineering roadmap.
Moreover, Here, you will find some resources to learn Data Science that are not included in the other subfields roadmaps. If you feel that you do not see what you are looking for please take a look at the other roadmap resources available.
| Index | Course Name | Link | Description |
|---|---|---|---|
| 1.0 | A git & github crash course | course link | An introduction to git and github for beginners. |
| Index | Course Name | Link | Description |
|---|---|---|---|
| 1.0 | Data Structures & Algorithms from Udacity | course link | In this course you will learn data structures and algorithms by solving 80+ practice problems. |
| 2.0 | Data Stuctures by William Fiset | course link | This course teaches data structures to beginners using high quality animations to represent the data structures visually. |
| 3.0 | OOP by MIT OpenCourseWare | course link | Introduction to Computer Science and Programming in Python. |
| 4.0 | Functional Programming from Real Python | course link | A playlist tha focuses primarily in filter, map and reduce. |
| Index | Course Name | Link | Description |
|---|---|---|---|
| 1.0 | Calculus from Khan Academy | course link | A complete course including limits, derivates & integrals. |
| Index | Course Name | Link | Description |
|---|---|---|---|
| 1.0 | Khan Academy Linear Algebra | course link | Covers all topics in a first year college linear algebra course. |
| 2.0 | Linear Algebra | course link | A youtube course in linear algebra for machine learning. |
| Index | Course Name | Link | Description |
|---|---|---|---|
| 1.0 | MySQL course by Programming with Mosh | course link | A complete course of MySQL for beginners. |
| 2.0 | Intro to Relational DB by Coursera | course link | Getting hands-on experience working with a relational database using MySQL Workbench. |
| 3.0 | Course from Amigoscode | course link | A complete introductory course for PostgreSQL. |
| Index | Course Name | Link | Description |
|---|---|---|---|
| 1.0 | Introduction to MongoDB by Coursera | course link | Introductory course that will teach the fundamentals of MongoDB, including MongoDB’s Document data model, importing data into a cluster, working with CRUD API and Aggregation Framework. |
| 2.0 | Introduction to NoSQL by w3resource | course link | Introduction to NoSQL concept such as ACID, Distributed Systems, Scalability and a comparison of SQL vs NOSQL systems. |
| Index | Course Name | Link | Description |
|---|---|---|---|
| 1.0 | Data Analysis using PySpark by Coursera | course link | Use PySpark alongside Colab to handle distributed data processing. |
| 2.0 | Spark by SimplyLearn | course link | Introductory course to Apache Spark. |
| 3.0 | Hadoop tutorial by Frank Kane | course_link | Hadoop ecosystem tutorial for beginners. |
| 4.0 | Hadoop Platform and Application Framework by Coursera | course link | Complete Hadoop Tutorial for beginners. |
| Index | Course Name | Link | Description |
|---|---|---|---|
| 1.0 | Code style tutorial by The Hitchhiker's | tutorial link | An introduction to writing more pythonic code. |
| 2.0 | Pythonic code for DS | webcast link | Michael Kennedy explains how to write pythonic code for data science. |
| 3.0 | Unit testing by datacamp | course link | Learn how to write unit tests for your Data Science projects in Python using pytest. |
| 4.0 | Unit testing by ProgrammingKnowledge | tutorial link | A tutorial that will introduce you to unit testing in python. |
| Index | Course Name | Link | Description |
|---|---|---|---|
| 1.0 | ML Deployments by Krish Naik | playlist link | A playlist to learn ML deployments using Heroku, AWS and GCloud. |
| 2.0 | Streamlit tutorials by JCharisTech | course link | A complete playlist showing the full stack for ML production. |
| 2.0 | ML serving with Tiangolo FastAPI's creator. | tutorial link | Build a ML API from scratch using FastAPI. |
| 3.0 | Serving ML models by JCharisTech | course link | Serving ML models as API with FastAPI. |
| 4.0 | Deploying ML models by Coursera | course link | Deploying ML models. |
| 5.0 | ML modes using Flask by Analytics Vidhya | tutorial link | How to deploy ML models using flask. |
| Index | Specialization Name | Link | Description |
|---|---|---|---|
| 1.0 | Complete Data Science Bootcamp | link | This is a great introductory course to Data Science. It is very friendly and gear towards beginners. It covers topics such as statistical analysis, numpy, pandas, matplotlib, scikit-learn and tensorflow. |
| 2.0 | Data Scientist Nanodegree | link | This specialization created by Udacity focuses in real-world data science experience with projects designed by industry experts. You will learn to run data pipelines, design experiments and deploy models to the cloud. |
| 3.0 | IBM Data Science Professional Certificate | link | This specialization covers courses that will provide you with the latest job-ready tools and skills, including open source tools and libraries, databases, SQL, data visualization, data analysis, statistical analysis, predictive modeling and machine learning algorithms. |
| 4.0 | Applied Data Science with Python Specialization | link | This specialization from the University of Michigan provides hands-on projects using python toolkits such as pandas, matplotlib, scikit-lean, nltk, and networkx to gain insights from data. |
More to come! 🔔🔔🔔
