Training feedforward neural networks using orthogonal iteration of the Hessian eigenvectors

Hunter, Andrew (2000) Training feedforward neural networks using orthogonal iteration of the Hessian eigenvectors. In: International Joint Conference of Neural Networks, 24-27 July 2000, Como, Italy.

Documents
Training feedforward neural networks using orthogonal iteration of the Hessian eigenvectors
Training feedforward neural networks using orthogonal iteration of the Hessian eigenvectors
[img]
[Download]
Training feedforward neural networks using orthogonal iteration of the Hessian eigenvectors
Publisher version
[img]
[Download]
[img] PDF
EQUALOrthogonalIteration.PDF
Restricted to Repository staff only

35Kb
[img]
Preview
PDF
06192173.pdf

530Kb

Official URL: http://doi.ieeecomputersociety.org/10.1109/IJCNN.2...

Abstract

Introduction
Training algorithms for Multilayer Perceptrons optimize the set of W weights and biases, w, so as to minimize an
error function, E, applied to a set of N training patterns. The well-known back propagation algorithm combines an
efficient method of estimating the gradient of the error function in weight space, DE=g, with a simple gradient
descent procedure to adjust the weights, Dw = -hg. More efficient algorithms maintain the gradient estimation
procedure, but replace the update step with a faster non-linear optimization strategy [1].
Efficient non-linear optimization algorithms are based upon second order approximation [2]. When sufficiently
close to a minimum the error surface is approximately quadratic, the shape being determined by the Hessian matrix.
Bishop [1] presents a detailed discussion of the properties and significance of the Hessian matrix. In principle, if
sufficiently close to a minimum it is possible to move directly to the minimum using the Newton step, -H-1g.
In practice, the Newton step is not used as H-1 is very expensive to evaluate; in addition, when not sufficiently close
to a minimum, the Newton step may cause a disastrously poor step to be taken. Second order algorithms either build
up an approximation to H-1, or construct a search strategy that implicitly exploits its structure without evaluating it;
they also either take precautions to prevent steps that lead to a deterioration in error, or explicitly reject such steps.
In applying non-linear optimization algorithms to neural networks, a key consideration is the high-dimensional
nature of the search space. Neural networks with thousands of weights are not uncommon. Some algorithms have
O(W2) or O(W3) memory or execution times, and are hence impracticable in such cases. It is desirable to identify
algorithms that have limited memory requirements, particularly algorithms where one may trade memory usage
against convergence speed.
The paper describes a new training algorithm that has scalable memory requirements, which may range from O(W)
to O(W2), although in practice the useful range is limited to lower complexity levels. The algorithm is based upon a
novel iterative estimation of the principal eigen-subspace of the Hessian, together with a quadratic step estimation
procedure.
It is shown that the new algorithm has convergence time comparable to conjugate gradient descent, and may be
preferable if early stopping is used as it converges more quickly during the initial phases.
Section 2 overviews the principles of second order training algorithms. Section 3 introduces the new algorithm.
Second 4 discusses some experiments to confirm the algorithm's performance; section 5 concludes the paper.

Item Type:Conference or Workshop Item (Paper)
Additional Information:Introduction Training algorithms for Multilayer Perceptrons optimize the set of W weights and biases, w, so as to minimize an error function, E, applied to a set of N training patterns. The well-known back propagation algorithm combines an efficient method of estimating the gradient of the error function in weight space, DE=g, with a simple gradient descent procedure to adjust the weights, Dw = -hg. More efficient algorithms maintain the gradient estimation procedure, but replace the update step with a faster non-linear optimization strategy [1]. Efficient non-linear optimization algorithms are based upon second order approximation [2]. When sufficiently close to a minimum the error surface is approximately quadratic, the shape being determined by the Hessian matrix. Bishop [1] presents a detailed discussion of the properties and significance of the Hessian matrix. In principle, if sufficiently close to a minimum it is possible to move directly to the minimum using the Newton step, -H-1g. In practice, the Newton step is not used as H-1 is very expensive to evaluate; in addition, when not sufficiently close to a minimum, the Newton step may cause a disastrously poor step to be taken. Second order algorithms either build up an approximation to H-1, or construct a search strategy that implicitly exploits its structure without evaluating it; they also either take precautions to prevent steps that lead to a deterioration in error, or explicitly reject such steps. In applying non-linear optimization algorithms to neural networks, a key consideration is the high-dimensional nature of the search space. Neural networks with thousands of weights are not uncommon. Some algorithms have O(W2) or O(W3) memory or execution times, and are hence impracticable in such cases. It is desirable to identify algorithms that have limited memory requirements, particularly algorithms where one may trade memory usage against convergence speed. The paper describes a new training algorithm that has scalable memory requirements, which may range from O(W) to O(W2), although in practice the useful range is limited to lower complexity levels. The algorithm is based upon a novel iterative estimation of the principal eigen-subspace of the Hessian, together with a quadratic step estimation procedure. It is shown that the new algorithm has convergence time comparable to conjugate gradient descent, and may be preferable if early stopping is used as it converges more quickly during the initial phases. Section 2 overviews the principles of second order training algorithms. Section 3 introduces the new algorithm. Second 4 discusses some experiments to confirm the algorithm's performance; section 5 concludes the paper.
Keywords:Neural networks, Algorithms
Subjects:G Mathematical and Computer Sciences > G730 Neural Computing
Divisions:College of Science > School of Computer Science
ID Code:1901
Deposited By: Tammie Farley
Deposited On:25 Jun 2009 12:26
Last Modified:13 Mar 2013 08:32

Repository Staff Only: item control page