15 Days for reviewing data science
As a graduating student in Data Science without much experience in that field, I want to write something for people with similiar background on how to review data science and get ready for interview. Here is my methodology adapted from the book Soft Skills: The Software Developer’s Life Manual.
Before we started, it is very important to know yourself in the data science field: what is your current stage, what kind of applicant you want to be, and what is your strength and weakness? If you are completely new to the data science field, this guide isn’t for you. I will discover a detailed accessment form later if I have time, but if you think at least five sentences below is your case, then this guide is exactly what you want.
- You know the word data science, and you have some kind of own understanding of it.
- You know at least one programming language in R, Python, JavaScript, Java etc.
- You know the difference between stack and queue, and you know what binary tree is.
- You know SQL and could write some queries to pull the data from database or merge data from different tables.
- You know how to clean data by subsetting, replacing and merging.
- You learned in some machine learning frameworks (sklearn etc.) and how to use them.
- You know some basic about statistical inference, A/B testing or research design.
- You are familiar with linear algebra, calculus, statistics and probability.
- You heard of regression, classification and clustering before and know the basic idea of them.
- You have the enthuistics to use some tools I mentioned above to solve some questions.
Day 1: Get the big picture, determine scope, define success, find resources
OK, it seems to be lots of works to be done today, but don’t be afraid. I will guide you along the list. First, what exactly is data science? Wikipedia has the following defination: Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.
The scope of the data science review is to go over all fields of it with special concentation on the subjects that are most likely appear in the interview. As a result, you might not to get too deep in one subject and lose you big picture. You could choose to advance one of it as you are familiar with big picture, but it is not the time to do so.
I define the success to be the following aspects:
- You could write basic and compound SQL queries to do search with conditions, create new variables, modify sources, know how to apply basic optimization techniques without any references.
- You could solve basic probability and statistics quizes, address how to do A/B testing and design the research criteria.
- You could solve easy and medium algorithm questions using common data structure.
- You know several algorithms in regression, classification and clustering, prove the theorem of those algorithms, when and how to use them, what are the pros and cons of them.
- Given a business/real-life case, you could extract the key information from it and transfer it into a data-related problem statement.
- Continuing with the question above, you could design and implement a pipeline to solve the problem using the framework you learned, and you could illustrate the overall process in a clear manner.
You might want to find some resources on how to achieve your goal, and here are some of them.