Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

README.md

Data Science Roadmap

License Contribute

🔬🖥📚💖💖💖

main

Data science is an interdisciplinary field that includes knowledge of:

  • Programming in languages such as:
    • Python (Recommended)
    • R
    • Scala
    • Julia

Note: For now the majority of resources will be in the python language but others languages such as R will come soon. As well as, you don't need to learn all just pick one technology or skill. Usually, the most popular will be the one labeled as recommeded. But, feel free to choose the most convenient for you. 😀🤞

  • An understanding of:

    • Data Structures and Algorithms
    • Object Oriented Programming
    • Functional Programming
  • Version Control

    • Git (Recommended)
    • Gitlab
    • Bitbucket
  • Data Wrangling -> Please see the Data Analyst Roadmap

  • Data Visualization -> Please see the Data Analyst Roadmap

  • Visual Storytelling -> Please see the Data Analyst Roadmap

  • Probability and Statistics -> Please see the Data Analyst Roadmap

  • Calculus

    • Including Derivatives & Integrals
  • Linear Algebra

    • Matrices & Eigen Values
  • Relational Database Systems

    • MySQL
    • PostgreSQL
  • Non-Relational Databases a.k.a NoSQL

    • MongoDB (Recommended)
    • Cassandra
  • Machine Learning Algorithms -> Please see the Machine Learning Roadmap

    • Supervised (Recommended)
    • Unsupervised
    • Clustering
    • Reinforcement
  • Deep Learning -> Please see the Deep Learning Roadmap

    • Artificial Neural Networks
    • Convolutional Neural Networks
    • Recurrent Neural Networks
    • Deep Learning Platforms
      • Tensorflow
      • Pytorch (Recommended)
      • Keras
      • CNTK
  • Big Data at Scale

    • Spark (PySpark)
    • Hadoop Ecosystem
      • Hbase
      • MapReduce
      • ZooKeeper
      • Spark MLLib (Recommended)
      • YARN
      • Pig
      • Hive
      • Sqoop
      • Oozie
  • Data Engineering -> please see the Data Engineering roadmap

  • Software Engineering Best Practices

    • Writing pythonic code by using PEP8 Standards
    • Testing and debugging
  • Machine Learning Deployment Fundamentals (ML/DevOps)

    • Flask --> Backend
    • FastAPI --> Backend
    • Streamlit --> Front-end (Recommended)
    • Using Heroku (Recommended), AWS, Azure --> Deployment platforms

As a consequence, some Data Science specializations overlap and some of their subfields can be studied separately. Such is the case for Data Analyst, Machine Learning, Deep learning Engineer and as well as Data Engineers. One of the first step to take if you are a beginner is to follow the Data Analyst, then proceed to Data Science, Machine Learning, Deep Learning and finally the Data Engineering roadmap.

Moreover, Here, you will find some resources to learn Data Science that are not included in the other subfields roadmaps. If you feel that you do not see what you are looking for please take a look at the other roadmap resources available.




1.0 Version Control


Index Course Name Link Description
1.0 A git & github crash course course link An introduction to git and github for beginners.

2.0 Data Structures and Algorithms, Object Oriented and Functional Programming


Index Course Name Link Description
1.0 Data Structures & Algorithms from Udacity course link In this course you will learn data structures and algorithms by solving 80+ practice problems.
2.0 Data Stuctures by William Fiset course link This course teaches data structures to beginners using high quality animations to represent the data structures visually.
3.0 OOP by MIT OpenCourseWare course link Introduction to Computer Science and Programming in Python.
4.0 Functional Programming from Real Python course link A playlist tha focuses primarily in filter, map and reduce.

3.0 Calculus


Index Course Name Link Description
1.0 Calculus from Khan Academy course link A complete course including limits, derivates & integrals.

4.0 Linear Algebra


Index Course Name Link Description
1.0 Khan Academy Linear Algebra course link Covers all topics in a first year college linear algebra course.
2.0 Linear Algebra course link A youtube course in linear algebra for machine learning.

5.0 Relational Database Systems


Index Course Name Link Description
1.0 MySQL course by Programming with Mosh course link A complete course of MySQL for beginners.
2.0 Intro to Relational DB by Coursera course link Getting hands-on experience working with a relational database using MySQL Workbench.
3.0 Course from Amigoscode course link A complete introductory course for PostgreSQL.

6.0 NoSQL Databases


Index Course Name Link Description
1.0 Introduction to MongoDB by Coursera course link Introductory course that will teach the fundamentals of MongoDB, including MongoDB’s Document data model, importing data into a cluster, working with CRUD API and Aggregation Framework.
2.0 Introduction to NoSQL by w3resource course link Introduction to NoSQL concept such as ACID, Distributed Systems, Scalability and a comparison of SQL vs NOSQL systems.

7.0 Big Data at Scale


Index Course Name Link Description
1.0 Data Analysis using PySpark by Coursera course link Use PySpark alongside Colab to handle distributed data processing.
2.0 Spark by SimplyLearn course link Introductory course to Apache Spark.
3.0 Hadoop tutorial by Frank Kane course_link Hadoop ecosystem tutorial for beginners.
4.0 Hadoop Platform and Application Framework by Coursera course link Complete Hadoop Tutorial for beginners.

8.0 Software Engineering Best Practices


Index Course Name Link Description
1.0 Code style tutorial by The Hitchhiker's tutorial link An introduction to writing more pythonic code.
2.0 Pythonic code for DS webcast link Michael Kennedy explains how to write pythonic code for data science.
3.0 Unit testing by datacamp course link Learn how to write unit tests for your Data Science projects in Python using pytest.
4.0 Unit testing by ProgrammingKnowledge tutorial link A tutorial that will introduce you to unit testing in python.

9.0 Machine Learning Algorithms Deployment


Index Course Name Link Description
1.0 ML Deployments by Krish Naik playlist link A playlist to learn ML deployments using Heroku, AWS and GCloud.
2.0 Streamlit tutorials by JCharisTech course link A complete playlist showing the full stack for ML production.
2.0 ML serving with Tiangolo FastAPI's creator. tutorial link Build a ML API from scratch using FastAPI.
3.0 Serving ML models by JCharisTech course link Serving ML models as API with FastAPI.
4.0 Deploying ML models by Coursera course link Deploying ML models.
5.0 ML modes using Flask by Analytics Vidhya tutorial link How to deploy ML models using flask.

Data Science Specializations


Index Specialization Name Link Description
1.0 Complete Data Science Bootcamp link This is a great introductory course to Data Science. It is very friendly and gear towards beginners. It covers topics such as statistical analysis, numpy, pandas, matplotlib, scikit-learn and tensorflow.
2.0 Data Scientist Nanodegree link This specialization created by Udacity focuses in real-world data science experience with projects designed by industry experts. You will learn to run data pipelines, design experiments and deploy models to the cloud.
3.0 IBM Data Science Professional Certificate link This specialization covers courses that will provide you with the latest job-ready tools and skills, including open source tools and libraries, databases, SQL, data visualization, data analysis, statistical analysis, predictive modeling and machine learning algorithms.
4.0 Applied Data Science with Python Specialization link This specialization from the University of Michigan provides hands-on projects using python toolkits such as pandas, matplotlib, scikit-lean, nltk, and networkx to gain insights from data.

More to come! 🔔🔔🔔