Neural networks and learning systems – I [E9 253: Spring 2019] | Physical Nano-Memories, Signal and Information Processing Laboratory

Instructor:

Shayan Srinivasa Garani,

[Tuesday, Thursday 11:30 am – 1:00 pm, DESE classroom]

Office Hours: Wednesday 4 – 5PM

Pre-requisities:
Familiarity with digital signal processing and engineering mathematics at the undergraduate level.

Course Syllabus:

Introduction, models of a neuron, neural networks as directed graphs, network architectures (feed-forward, feedback etc.), Learning processes, learning tasks, Perceptron, perceptron convergence theorem, relationship between perceptron and Bayes classifiers, batch perceptron algorithm, modeling through regression: linear, logistic for multiple classes, Multilayer perceptron (MLP), batch and online learning, derivation of the back propagation algorithm, XOR problem, Role of Hessian in online learning, annealing and optimal control of learning rate, Approximations of functions, universal approximation theorem, cross-validation, network pruning and complexity regularization, convolution networks, nonlinear filtering, Cover’s theorem and pattern separability, the interpolation problem, RBF networks, hybrid learning procedure for RBF networks, Kernel regression and relationship to RBFs., Support vector machines, optimal hyperplane for linear separability, optimal hyperplane for non-separable patterns, SVM as a kernel machine, design of SVMs, XOR problem revisited, robustness considerations for regression, representer theorem, introduction to regularization theory, Hadamard’s condition for well-posedness, Tikhonov regularization, regularization networks, generalized RBF networks, estimation of regularization parameter etc., L1 regularization basics, algorithms and extensions, Principal component analysis: Hebbian based PCA, Kernel based PCA, Kernel Hebbian algorithm, deep MLPs, deep auto-encoders, stacked denoising auto-encoders.

Reference Books:

S. Haykin, Neural Networks and Learning Machines, Pearson Press.
K. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press.

Grading Policy:

Homeworks: 30%
Midterm: 20%
Project: 30%
Final exam: 20%

Homeworks:

Exams:

Exam #1 – Solutions

Papers to read:

Moshe Leshno, Vladimir Ya. Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6:861–867, 1993.
G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems (MCSS), 2(4):303–314, December 1989.

Announcements:

Project Presentation: 3rd May, 08:30 to 11:00, DESE Classroom
Final exam: 30th April, 09:00 to 12:00, DESE Classroom
Midterm 1: Thursday, 28th February, 11:30 – 13:00.
Homework 2:
- Problem 1 – To be submitted in class on 26th February.
- Problem 2 and 3 – To be submitted in class on 5th March.

	Topics Covered	Lecture Notes
Week 1	The human brain; Introduction to Neural Networks; Models of a neuron; Feedback and network architectures; Knowledge representation; Prior information and invariances	NNLS_Week_01
Week 2	Learning processes; Perceptron; Batch perceptron algorithm; Perceptron and Bayes classifier	NNLS_Week_02
Week 3	Linear regression; Logistic regression ; Multi-layer perceptron	NNLS_Week_03
Week 4	Multi-layer perceptron; Back propagation; XOR problem	NNLS_Week_04
Week 5	Universal approximation function; Complexity Regularization and Cross validation; Convolutional Neural Networks (CNN); Cover’s Theorem	NNLS_Week_05
Week 6	Multivariate interpolation problem; Radial basis functions (RBF); Recursive least squares algorithm; Comparison of RBF with MLP; Kernel regression using RBFs; Kernel Functions; Basics of constrained optimization	NNLS_Week_06
Week 7	Optimization with equality constraint; Optimization with inequality constraint; Support Vector Machines (SVM); Optimal hyperplane for linearly separable patterns; Quadratic optimization for finding optimal hyperplane	NNLS_Week_07
Week 8	Optimal hyperplane for non-linearly separable patterns; Inner product kernel and Mercer’s theorem; Optimal design of an SVM; ε-insensitive loss function; XOR problem revisited using SVMs; Hilbert Space	NNLS_Week_08
Week 9	Reproducing Kernel Hilbert Space; Representer Theorem; Generalized applicability of the representer theorem; Regularization Theory; Euler-Lagrange Equation; Regularization Networks	NNLS_Week_09
Week 10	Generalized RBF networks; XOR problem revisited using RBF; Structural Risk Minimization; Bias-Variance Dilemma; Estimation of regularization parameters	NNLS_Week_10
Week 11	Basics of L1 regularization; Grafting; VC dimension; Autoencoders; Denoising Autoencoders	NNLS_Week_11
Week 12	Kernel PCA; Hebbian based maximum eigen filter	NNLS_Week_12