Cheat Sheet: Get started with Data Science and Python

This blog post is a short cheat sheet when it comes to getting started with data science and Python and is not a full list. The post lists some useful main topics to get familiar with:

  • Basic tooling
  • Using existing models
  • Build your own models and machine learning
  • Deep learning
  • Useful learning resources

1. Basic tooling

2. Existing models

Maybe it’s a good choice to start with Natural Language Processing. Therefore you can start with spaCy an Open-Source Natural Language Processing library for Python and you can check out the following topics:

3. Build models and machine learning.

Machine learning algorithms build a model based on sample data, known as training data, to make predictions or decisions without being explicitly programmed to do so.

  • Scikit learn which does contain simple and efficient tools for predictive data analysis.
  • Supervised learning using labeled data for historical data (for example spam filters, what can be future spams?).
  • Unsupervised learning is a type of algorithm that learns patterns from untagged data.
    • term frequency–inverse document frequency (tf-idf) which is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. Used in context with LDA and NMF.
    • Latent Dirichlet allocation LDA is a generative statistical model that explains a set of observations through unobserved groups, and each group explains why some parts of the data are similar.
    • Non-negotiable matrix factorization NMF is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements.
  • Accuracy and precision how often do the model deliver the right result?
  • A confusion matrix is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix).

4. Get started with deep learning.

  • Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervisedsemi-supervised, or unsupervised.
    • Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning.
    • Rectifier helps to provide flexible handling of input for the activation function.
    • Keras is an open-source software library that provides a Python interface for artificial neural networks.
    • A recurrent neural network is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes.

5. Useful learning resources


I hope this was useful to you and let’s see what’s next?

Greetings,

Thomas

#ai, #cheatsheet, #datascience

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.

Up ↑

%d bloggers like this: