Future Technology: Machine Learning

Level:

Beginner

Date:

04.03 - 08.03.19

16.09 - 20.09.19

Duration:

5 days

Sign up for this course

Audience

In this course you will learn how to analyse, solve and implement Machine Learning problems in Python. Although the course covers the functions, libraries, etc. used, it does not cover the details of the language. For this reason, you should have a basic knowledge of Python.

Content

Day 1: Installation and Introduction

Introducing ML (Machine Learning):

  • What is Machine Learning?
  • Applications (Financial Forecasting, Profiling, Text Mining, Image Recognition, etc.)
  • Supervised Learning vs. Unsupervised Learning
  • Other applications: Information Retrieval, Optimizations, Graph Analysis
  • Exploratory Analysis. What is it and why is it important?

Installation (Windows, Linux)

Installation of PyCharm

An introduction to Python for ML (Machine Learning):

  • Introduction to Python
  • Introduction to essential libraries for ML:
    • numpy
    • Pandas
    • Scikit
    • matplotlib

Overview of PyCharm:s

  • Basic functions
  • Useful Shortcuts
  • Debug Mode
  • Data View

 

Tag 2: Explorative Analysis

Guidelines how to proceed with an explorative analysis:

  • What problem do I have? Is there a target variable?
  • What do my data look like? Which variables are correlated? Are all relationships important? Which criterion is used to determine whether the relationship found (correlation) is important or not? Are the criteria only statistical?
  • What statistical methods are available to perform an explorative analysis? Correlation tests, correlation: categorical-categorical, categorical-continuous, continuous-continuous, correlation matrices, PCA, Correspondence Analysis, etc.
  • Visualization as an important component – Graphs as part of explorative analysis: How can insights be drawn from graphs? Analysis of simple (ordinary) and complex graphs.

The group will work out the answers to these questions together. At the same time they will learn how to implement the aforementioned points in Python. Regard to the technical aspects, the course will cover the following topics:

  • Loading data (CSV, SQL, JSON, etc.)
  • Data processing (subsets, aggregations, etc.)
  • Running statistical methods
  • Graphs

 

Tag 3: Supervised Learning

  • Main concepts in Supervised Learning:
    • Training, Testing and Validation
    • Cross-validation
    • Overfitting and Underfitting
    • Generation of additional variables
  • Supervised learning in Python:
    • Regressions
    • Naive Bayes
    • Classification Trees & Random Forests
    • SVM
    • GBM
    • Neural Networks
  • Performance measurements:
    • R2
    • Mean Square/Absolute Error
    • Confusion Matrix
    • ROC and AUC
  • Technical aspects in Python:
  • Train/test splitting
  • Create additional variables
  • Training models
  • Predictions
  • Performance measurements

The third day will start with an insight into the main concepts and techniques of Supervised Learning. Due to the strong practical orientation of this course, the participants will afterwards be divided into small groups in which they will practice different Supervised Learning models. Each team receives the same initial data and decides individually which additional variables should be created. The experiences and thinking processes will be shared with the others. The performance of the trained models will be tested with new, unknown data and subsequently analysed and discussed together.
The consultant is of course available to answer any questions during the entire process. Through the training, the participants not only deal with a “real” machine learning problem, in addition they learn how to solve it in Python.

 

Tag 4: Unsupervised Learning + Text Mining

  • Unsupervised Learning:
    • Kmeans
    • Hierarchical Clustering & Heatmap
    • Principal Component Analysis/ Correspondence Analysis
  • Introducing Text Mining and Information Retrieval
  • Introducing Text Mining and Information Retrieval
    • Applications (Sentiment Analysis, IR, etc.)
    • From words to numbers:
    • Pre-processing (punctuation, lowercase, etc.)
    • Porting – Stemming
    • Lemmatization
    • N-grams
    • Numeric matrices from text

On day 4, two topics will be dealt with separately (morning & afternoon). Each module is organised as follows:

  • theoretical introduction
    • Examples in Python
    • Analysis of the results
    • Independent development of similar tasks by the participants
    • Joint analysis and interpretation of the results

Used Python libraries are explained, for example:

  • Scikit clustering
  • Nltk (Natural Language Toolkit)

 

Tag 5: From laboratory to production

In the morning the implementation of Machine Learning in a productive environment is worked on.

  • Online vs. Offline Predictions
  • data collection strategies – Response Times
  • APIs in Machine Learning
  • Batch predictions
  • Reports
  • Dashboards

For each topic the corresponding Python tools are explained:

  • Flask
  • Jupyter
  • Dash

In the afternoon, the course concludes with a summary of the topics covered:

  • Overview of topics
  • Discussion: How do the topics relate to each other?
  • Final Q&A

 

Sign up for this course