In the 21st century, computer science advancement, development of intelligent machines and generation of immense amounts of data has led to the development of new fields of study, buzzwords, Data Science and Machine Learning. From simple tasks like sales prediction of the industry to ambitious projects like self-driven cars, everything is becoming possible by using algorithms and techniques of Data Science.
Realising the potential of this field a lot of students in and around the universities are motivated and enthusiastic to pursue it. The internet has become a hub of innumerable resources to guide the students but that leads to more confusion than clarity. With video lectures, online courses, language & software packages, books, practice platforms etc the path is not well defined.
Hence we have tried to curate a well structured and resourceful path for anyone who wants to try their hands out in this field. One must keep in mind, it will require dedicated efforts and time, but in the end it’s all worth it.
Probability and statistics will help you understand the fundamentals behind Machine Learning Algorithms, hence, having good understanding is important. You can follow Probability and Statistics for Data Science (Series) on Medium. This is a 6-blog series that will help you with the basics of probability and statistics.
In order to understand Deep Learning Techniques, one must be comfortable with Calculus and Matrices. Three Blue One Brown Lecture Series will help you develop a good understanding of the topics. You can check out Gilbert Strang’s Linear Algebra MIT Open Courseware if you really want to explore the field.
Machine Learning is perhaps the most important aspect of Data Science. This is the step where most beginners quit. But if you persist, there’s nothing stopping you!
Estimated Time: 25–30 days These books will help you with the fundamentals of machine learning, covering all the major topics, side by side you can get started with the implementation of ML algorithms.
ISLR is a theory/math-intensive book & the codes are written in R and thus you may refer to the book’s python conversion here. On the other hand, Raschka is more interactive in the sense that, implementation and theory go hand in hand.
Some common libraries in Python such as sklearn, numpy, pandas, matplotlib and scipy will come in handy during the implementation of ML algos. We strongly suggest you to not worry a lot about them and instead of paying special attention to learning them, you can just start with the code following either of the 2 books and learn about these libraries along the way.
Estimated Time: 10–15 days For people who don’t like reading books, MOOCs are a good alternative. You can follow any of the two courses — Andrew NG’s CS229 Machine Learning or Harvard’s CS109 Data Science.
Having completed the ML part, you are now adept to start participating in different competitions where you can test your skills with Competitive Data Science. However, don’t get stuck as participating in competitions is more like a sport which ensures learning and growth accompanied by fun and excitement.
Fundamentally, deep learning is a part of machine learning. But given the popularity and the innumerable resources focused on this, it is apt to treat this as a separate domain.
Estimated Time: 20–25 days This is a 5-course specialization which will help you with the fundamentals of deep learning, various techniques used an introduction to Computer Vision and Natural Language Processing.
Don’t forget to apply for financial aid well before time so that you can undertake assignments as well as quizzes.
Estimated Time: 15–20 Days This is DSG’s in house starter book for people who want to implement as well learn Deep Learning theory from scratch in PyTorch.
We strongly suggest you read this. It is available as a GitHub repo named d2l-PyTorch. You can have a look at this if you want to develop some idea and understanding of Deep Learning concepts in less than 20 days without any prior knowledge. We welcome any contributions to the repo! :)
You can go either for PyTorch Tutorials or Tensorflow Tutorials. Start and excel in one of these as it is the implementation that matters ultimately.
The above resources will help you with the fundamentals of different aspects of Data Science. But given the depth and breadth of the field, there’s always more to learn. So here are the resources you can refer to if you want to explore various aspects such as statistics, computer vision, natural language processing, etc.
50 Challenging Problems in Probability
CS 231n: Convolutional Neural Networks for Visual Recognition (Strongly Recommended)
CS 224n: Deep Learning for Natural Language Processing (Strongly Recommended)
Note: If you have any doubts, you can refer to blogs on Analytics Vidhya and Medium.
** Remember, Google is your best friend! :D **
This is certainly not the end of things and is probably the start of all the topics one can dive into and explore after completing the above path.