Learn to instruct a computer to acquire concepts using data, without being explicitly programmed. Acquire knowledge of the main methods of supervised and non-supervised machine learning, and discuss the properties and criteria of applicability. Acquire the ability to formulate correctly the problem, to choose the appropriate algorithm, and to perform the experimental analysis in order to evaluate the results obtained. Take care of the practical aspect of the implementation of the introduced methods by presenting different examples of use in different application scenarios.

Curriculum

Programme

Introduction and generality; What is machine learning; definitions; supervised and unsupervised learning; regression and clustering;Univariate linear regression; representation; the hypothesis function; the choice of the parameters of the hypothesis function; the cost function; the Gradient Descent algorithm; the choice of the alpha parameter;

Multivariate linear regression; vector notation of the hypothesis function and of the cost function; Gradient Descent algorithm for multivariate; matrix notation; feature scaling and normalization; polynomial regression; the Normal Equation for multivariate regression; final notes on the comparison of the Gradient Descent algorithm and the calculation of the Normal Equation;

Logistic Regression; binary classification; representation of the hypotheses; the logistics function; the decision boundary; the cost function for logistic regression; the gradient descent algorithm for logistic regression; analytic derivation of the gradient of the cost function for logistic regression; notes on the implementation in Octave of the cost function and of the gradient descent algorithno in the case of logistic regression; considerations on advanced optimization methods; multi-class classification; the one-vs-all method;

The regularization; the problem of overfitting / underfitting (ie high variance / high bias); modification of the cost function; the regularization parameter; regularization of linear regression; the algorithm of gradient descent with regularization; the regularized normal equation; logistic regression with regularization; Neural networks history; AI and connectionism; the perceiver; Rosenblatt's learning rule; learning of boolean functions; the limits of the perceiver;

Neural networks; reasons; neurons; neuroplasticity and the one-learning-algorithm hypothesis; model representation; the neuron as a logistic unit; the weight matrix; the bias; the activation function; forward propagation; vector version; the NNs as an extension of the logistic regression; calculation of the Boolean functions AND, OR, NOT, XNOR; multiclass classification with Neural Networks;

Neural Network Learning; cost function of a Multi Layer Perceptron; the Backpropagation algorithm; Intuition and formalization;

Neural Network learning; Error BackPropagation Algorithm (scalar version, vector version); Notes on implementation; rolling and unrolling of the parameters for passing the weight matrix into Octave; Gradient checking by calculating the numerical approximate gradient; initialization of weights and symmetry breaking; The ALVINN network (an autonomous driving system);

Machine Learning Diagnostic; Evaluating a Learning Algorithm; The test set error; Model selection + training, validation and test set; The concept of Bias and variance; Regularization and Bias / Variance; Choosing the regularization parameter; Putting all together: diagnostic method; Learning curves; Machine Learning system design; Debugging a learning algorithm; Diagnosing Neural Networks; Model selection; Error analysis; The importance of numerical evaluation; Error Metrics for Skewed Classes; Precision / Recall and Accuracy; Trading Off Precision and Recall; The F1 score; Data for Machine Learning; Designing a high accuracy learning system; Rationale for large data;

Support Vector Machines; SVM Cost function; SVM as Large margin Classifiers; the Kernels; choice of landmarks; choice of parameters C and sigma; Multi-class Classification with SVM Comparison between Logistic Regression and SVM and between NN vs. SVM;

Clustering; the K-means algorithm; cluster assignment step; move centroids step; optimization objective; choosing the number of clusters, the elbow method; Dimensionality Reduction; Principal Component Analysis; Motivation I: Data compression; Motivation II: data visualization - Problem Formulation; Goal of PCA; The role of Singular Value Decomposition in the PCA algorithm; Reconstruction from compressed representation; Algorithm for choosing k; Advice for Applying PCA; The most common use of PCA; Misuse of PCA;

Anomaly Detection; Problem motivation; Density estimation; Gaussian distribution; Anomaly Detection; Gaussian distribution; Parameter estimation; The Anomaly Detection Algorithm; Anomaly Detection vs. Supervised Learning; Multivariate Gaussian Distribution; Recommender Systems; Collaborative Filtering; Motivation; Problem Formulation; Content Based Recommendations; Notation; Optimization objective; Gradient descent update; Low Rank Matrix Factorization;

Learning with large datasets; Online learning; Stochastic gradient descent; Mini-batch gradient descent; Checking for convergence; Map reduce and data parallelism;

Machine Learning pipeline; the OCR systeml ceiling analysis; Laboratory: exercise related to Recommender Systems;

Core Documentation

J. Watt, R. Borhani, A. K. Katsaggelos. Machine Learning Refined. Cambridge Univ. Press 2016Reference Bibliography

C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 R.O. Duda, P.E. Hart, D.G. Stork. Pattern Classification (2001) John Wiley & Sons.Type of evaluation

Mid-term evaluation + Final examinationProgramme

Introduction and generality; What is machine learning; definitions; supervised and unsupervised learning; regression and clustering;Univariate linear regression; representation; the hypothesis function; the choice of the parameters of the hypothesis function; the cost function; the Gradient Descent algorithm; the choice of the alpha parameter;

Multivariate linear regression; vector notation of the hypothesis function and of the cost function; Gradient Descent algorithm for multivariate; matrix notation; feature scaling and normalization; polynomial regression; the Normal Equation for multivariate regression; final notes on the comparison of the Gradient Descent algorithm and the calculation of the Normal Equation;

Logistic Regression; binary classification; representation of the hypotheses; the logistics function; the decision boundary; the cost function for logistic regression; the gradient descent algorithm for logistic regression; analytic derivation of the gradient of the cost function for logistic regression; notes on the implementation in Octave of the cost function and of the gradient descent algorithno in the case of logistic regression; considerations on advanced optimization methods; multi-class classification; the one-vs-all method;

The regularization; the problem of overfitting / underfitting (ie high variance / high bias); modification of the cost function; the regularization parameter; regularization of linear regression; the algorithm of gradient descent with regularization; the regularized normal equation; logistic regression with regularization; Neural networks history; AI and connectionism; the perceiver; Rosenblatt's learning rule; learning of boolean functions; the limits of the perceiver;

Neural networks; reasons; neurons; neuroplasticity and the one-learning-algorithm hypothesis; model representation; the neuron as a logistic unit; the weight matrix; the bias; the activation function; forward propagation; vector version; the NNs as an extension of the logistic regression; calculation of the Boolean functions AND, OR, NOT, XNOR; multiclass classification with Neural Networks;

Neural Network Learning; cost function of a Multi Layer Perceptron; the Backpropagation algorithm; Intuition and formalization;

Neural Network learning; Error BackPropagation Algorithm (scalar version, vector version); Notes on implementation; rolling and unrolling of the parameters for passing the weight matrix into Octave; Gradient checking by calculating the numerical approximate gradient; initialization of weights and symmetry breaking; The ALVINN network (an autonomous driving system);

Machine Learning Diagnostic; Evaluating a Learning Algorithm; The test set error; Model selection + training, validation and test set; The concept of Bias and variance; Regularization and Bias / Variance; Choosing the regularization parameter; Putting all together: diagnostic method; Learning curves; Machine Learning system design; Debugging a learning algorithm; Diagnosing Neural Networks; Model selection; Error analysis; The importance of numerical evaluation; Error Metrics for Skewed Classes; Precision / Recall and Accuracy; Trading Off Precision and Recall; The F1 score; Data for Machine Learning; Designing a high accuracy learning system; Rationale for large data;

Support Vector Machines; SVM Cost function; SVM as Large margin Classifiers; the Kernels; choice of landmarks; choice of parameters C and sigma; Multi-class Classification with SVM Comparison between Logistic Regression and SVM and between NN vs. SVM;

Clustering; the K-means algorithm; cluster assignment step; move centroids step; optimization objective; choosing the number of clusters, the elbow method; Dimensionality Reduction; Principal Component Analysis; Motivation I: Data compression; Motivation II: data visualization - Problem Formulation; Goal of PCA; The role of Singular Value Decomposition in the PCA algorithm; Reconstruction from compressed representation; Algorithm for choosing k; Advice for Applying PCA; The most common use of PCA; Misuse of PCA;

Anomaly Detection; Problem motivation; Density estimation; Gaussian distribution; Anomaly Detection; Gaussian distribution; Parameter estimation; The Anomaly Detection Algorithm; Anomaly Detection vs. Supervised Learning; Multivariate Gaussian Distribution; Recommender Systems; Collaborative Filtering; Motivation; Problem Formulation; Content Based Recommendations; Notation; Optimization objective; Gradient descent update; Low Rank Matrix Factorization;

Learning with large datasets; Online learning; Stochastic gradient descent; Mini-batch gradient descent; Checking for convergence; Map reduce and data parallelism;

Machine Learning pipeline; the OCR systeml ceiling analysis; Laboratory: exercise related to Recommender Systems;

Core Documentation

J. Watt, R. Borhani, A. K. Katsaggelos. Machine Learning Refined. Cambridge Univ. Press 2016Reference Bibliography

C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 R.O. Duda, P.E. Hart, D.G. Stork. Pattern Classification (2001) John Wiley & Sons.Type of evaluation

Mid-term evaluation + Final examinationProgramme

Introduction and generality; What is machine learning; definitions; supervised and unsupervised learning; regression and clustering;Univariate linear regression; representation; the hypothesis function; the choice of the parameters of the hypothesis function; the cost function; the Gradient Descent algorithm; the choice of the alpha parameter;

Multivariate linear regression; vector notation of the hypothesis function and of the cost function; Gradient Descent algorithm for multivariate; matrix notation; feature scaling and normalization; polynomial regression; the Normal Equation for multivariate regression; final notes on the comparison of the Gradient Descent algorithm and the calculation of the Normal Equation;

Logistic Regression; binary classification; representation of the hypotheses; the logistics function; the decision boundary; the cost function for logistic regression; the gradient descent algorithm for logistic regression; analytic derivation of the gradient of the cost function for logistic regression; notes on the implementation in Octave of the cost function and of the gradient descent algorithno in the case of logistic regression; considerations on advanced optimization methods; multi-class classification; the one-vs-all method;

The regularization; the problem of overfitting / underfitting (ie high variance / high bias); modification of the cost function; the regularization parameter; regularization of linear regression; the algorithm of gradient descent with regularization; the regularized normal equation; logistic regression with regularization; Neural networks history; AI and connectionism; the perceiver; Rosenblatt's learning rule; learning of boolean functions; the limits of the perceiver;

Neural networks; reasons; neurons; neuroplasticity and the one-learning-algorithm hypothesis; model representation; the neuron as a logistic unit; the weight matrix; the bias; the activation function; forward propagation; vector version; the NNs as an extension of the logistic regression; calculation of the Boolean functions AND, OR, NOT, XNOR; multiclass classification with Neural Networks;

Neural Network Learning; cost function of a Multi Layer Perceptron; the Backpropagation algorithm; Intuition and formalization;

Neural Network learning; Error BackPropagation Algorithm (scalar version, vector version); Notes on implementation; rolling and unrolling of the parameters for passing the weight matrix into Octave; Gradient checking by calculating the numerical approximate gradient; initialization of weights and symmetry breaking; The ALVINN network (an autonomous driving system);

Machine Learning Diagnostic; Evaluating a Learning Algorithm; The test set error; Model selection + training, validation and test set; The concept of Bias and variance; Regularization and Bias / Variance; Choosing the regularization parameter; Putting all together: diagnostic method; Learning curves; Machine Learning system design; Debugging a learning algorithm; Diagnosing Neural Networks; Model selection; Error analysis; The importance of numerical evaluation; Error Metrics for Skewed Classes; Precision / Recall and Accuracy; Trading Off Precision and Recall; The F1 score; Data for Machine Learning; Designing a high accuracy learning system; Rationale for large data;

Support Vector Machines; SVM Cost function; SVM as Large margin Classifiers; the Kernels; choice of landmarks; choice of parameters C and sigma; Multi-class Classification with SVM Comparison between Logistic Regression and SVM and between NN vs. SVM;

Clustering; the K-means algorithm; cluster assignment step; move centroids step; optimization objective; choosing the number of clusters, the elbow method; Dimensionality Reduction; Principal Component Analysis; Motivation I: Data compression; Motivation II: data visualization - Problem Formulation; Goal of PCA; The role of Singular Value Decomposition in the PCA algorithm; Reconstruction from compressed representation; Algorithm for choosing k; Advice for Applying PCA; The most common use of PCA; Misuse of PCA;

Anomaly Detection; Problem motivation; Density estimation; Gaussian distribution; Anomaly Detection; Gaussian distribution; Parameter estimation; The Anomaly Detection Algorithm; Anomaly Detection vs. Supervised Learning; Multivariate Gaussian Distribution; Recommender Systems; Collaborative Filtering; Motivation; Problem Formulation; Content Based Recommendations; Notation; Optimization objective; Gradient descent update; Low Rank Matrix Factorization;

Learning with large datasets; Online learning; Stochastic gradient descent; Mini-batch gradient descent; Checking for convergence; Map reduce and data parallelism;

Machine Learning pipeline; the OCR systeml ceiling analysis; Laboratory: exercise related to Recommender Systems;

Core Documentation

J. Watt, R. Borhani, A. K. Katsaggelos. Machine Learning Refined. Cambridge Univ. Press 2016Reference Bibliography

C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 R.O. Duda, P.E. Hart, D.G. Stork. Pattern Classification (2001) John Wiley & Sons.Type of evaluation

Mid-term evaluation + Final examination