I completed the data science specialization from IBM on Coursera last month. It took me several months to complete all the courses of the specialization and a lot of work. Now, I would like to share on the blog some of the knowledge I gained through this specialization. First, I would like to answer to a very simple question: What is data science? Indeed, we hear a lot of data science lately but for most people, this is a very abstract concept.
What is data science?
The concept of data science is relatively new. People really began to speak about data science around 2009 and the field has been increasingly popular since. However, data science has similarities with more traditional fields such as statistics. The development of data science is closely related to the new possibilities to get and store data but also to work on it. We have now tons of data available and data science enables us to analyze this data and make sense of it. Another important point is the use of algorithms in data analysis work. Data science is not just about figures though, understanding the story behind the data and communicating the results is also an important part of the job.
What is a data scientist?
As data science is a new field, this is difficult to define a data scientist. Many data scientists studied and began working in different fields such as statistics, mathematics, IT, engineering or business. As they were still at school, you could not graduate in data science because it did not exist. The easiest definition of a data scientist is that it is someone who works with data. Some definitions are more restrictive though, confining data science to machine learning. A good data scientist is able to process a huge amount quickly and analyze the results but this is not the only important thing in the job. A data scientist has to be curious and ask good questions, because if you cannot ask good questions, you won’t be able to find what you should analyze. A data scientist should also have storytelling skills and communicate their results.
What is machine learning?
Machine learning is a field of computer science that enables computers to learn without being explicitly programmed. A machine learning model gets trained by going through data iteratively. It learns from data and detects patterns. It can be used for a variety of tasks, such as object recognition, summarization, or recommendation.
For instance, let’s say you have data from the patients of a hospital that were treated for a same disease. You have data about the patients (age, gender, weight, and so on) and you have information about which patients relapsed or not. Machine learning algorithms will enable you to detect patterns and predict better whether new patients are likely to relapse or not.