Data Science: Essentials

Robert McKeon Aloe
4 min readFeb 4, 2020

--

This is for those new to data science or are interested in getting involved. These are the attributes I think are key for me to function as a data scientist. You may or may not agree with me, and that’s okay. Data science is an emerging field, so it has been interesting what the field trying to find itself in an ever diverse crowd of job functions related to data.

Fundamental knowledge

Image from https://es.iith.ac.in
  • An engineering or science field of study gives a great leg up.
  • Data science can be used for any kind of data for any kind of field, but having a primary field of study before data science helps determine how to look at data, determine if you have the right data, and figure out how to get the right data. Think of it like partial fractions, if you don’t know how to solve partial fractions, you can’t solve differential equations using Fourier Transforms.
  • Do you understand statistics? Do you know what a t-test is? Two-tailed t-test? Statistical significance? Sample population? Normal distribution?
  • Standard Deviation: Do you understand what standard deviation is, how it is used, and how it is related to a normal distribution?

Data

Created by Author
Created by Author
  • Do you have data?
  • Design of Experiment (DOE): Do you know how to get more useful data?
  • Do you know how to clean data?
  • Do you know how to check data after data collection?
  • Do you know how to analyze data?
  • Do you know how to present data outside of the curse of knowledge?

Analyzing/Presenting Data

Remember: You are bound by the curse of knowledge which means that you are so deep in the data you’re analyzing, it may be difficult to communicate concisely what the results are to non-experts. Most execs spend less than a few minutes a month looking at your work, so they don’t have the depth you do, and you need to make that message clear.

The best plot is the one where the conclusion is clear and the audience draws upon the same conclusion as you without suggestion. Here are some good methods you should know when it comes to data presentation:

1. Created by Author, 2. Matlab Example
1. Created by Author, 2. Matlab Example

Tools

Created by author: jet color scheme for Matlab logo
  1. Programming Language: Matlab, Python, or R
  2. Data Inspection: Numbers and/or Excel
  3. Data Presentation: Keynote and/or Powerpoint
  4. Scripting: Bash, Python, Perl, etc.

Advanced Knowledge

The advent of AI, machine learning, and computer vision has begun to really affect our lives. Therefore, I think it is important to understand the fundamentals of these techniques to be an effective data scientist because most data will be filtered through these methods before analysis.

From https://stackoverflow.com/questions/47531863/why-key-is-not-unique-in-mapreduce-function
  1. Neural Networks (NN) and/or Convolutional Neural Networks (CNN)
  2. Map Reduce
  3. Natural Language Processing
  4. General computer vision or signal processing
  5. K-means Clustering

In Conclusion

These skills are not something that happens overnight or over the course of a few weeks or months. Usually, these skills take years to mature as you’re given opportunities to improve them. Good opportunities are challenging problems with no clear or obvious end in sight, and solving them requires hard work, persistent energy, and trudging onwards through the depths of despair.

--

--

Robert McKeon Aloe

I’m in love with my Wife, my Kids, Espresso, Data Science, tomatoes, cooking, engineering, talking, family, Paris, and Italy, not necessarily in that order.