Hunter, Andrew (2000) Training feedforward neural networks using orthogonal iteration of the Hessian eigenvectors. In: International Joint Conference of Neural Networks, 2427 July 2000, Como, Italy.
Full content URL: http://doi.ieeecomputersociety.org/10.1109/IJCNN.2...
Documents 


PDF
EQUALOrthogonalIteration.PDF Restricted to Repository staff only 36kB  

PDF
06192173.pdf 542kB 
Abstract
Introduction
Training algorithms for Multilayer Perceptrons optimize the set of W weights and biases, w, so as to minimize an
error function, E, applied to a set of N training patterns. The wellknown back propagation algorithm combines an
efficient method of estimating the gradient of the error function in weight space, DE=g, with a simple gradient
descent procedure to adjust the weights, Dw = hg. More efficient algorithms maintain the gradient estimation
procedure, but replace the update step with a faster nonlinear optimization strategy [1].
Efficient nonlinear optimization algorithms are based upon second order approximation [2]. When sufficiently
close to a minimum the error surface is approximately quadratic, the shape being determined by the Hessian matrix.
Bishop [1] presents a detailed discussion of the properties and significance of the Hessian matrix. In principle, if
sufficiently close to a minimum it is possible to move directly to the minimum using the Newton step, H1g.
In practice, the Newton step is not used as H1 is very expensive to evaluate; in addition, when not sufficiently close
to a minimum, the Newton step may cause a disastrously poor step to be taken. Second order algorithms either build
up an approximation to H1, or construct a search strategy that implicitly exploits its structure without evaluating it;
they also either take precautions to prevent steps that lead to a deterioration in error, or explicitly reject such steps.
In applying nonlinear optimization algorithms to neural networks, a key consideration is the highdimensional
nature of the search space. Neural networks with thousands of weights are not uncommon. Some algorithms have
O(W2) or O(W3) memory or execution times, and are hence impracticable in such cases. It is desirable to identify
algorithms that have limited memory requirements, particularly algorithms where one may trade memory usage
against convergence speed.
The paper describes a new training algorithm that has scalable memory requirements, which may range from O(W)
to O(W2), although in practice the useful range is limited to lower complexity levels. The algorithm is based upon a
novel iterative estimation of the principal eigensubspace of the Hessian, together with a quadratic step estimation
procedure.
It is shown that the new algorithm has convergence time comparable to conjugate gradient descent, and may be
preferable if early stopping is used as it converges more quickly during the initial phases.
Section 2 overviews the principles of second order training algorithms. Section 3 introduces the new algorithm.
Second 4 discusses some experiments to confirm the algorithm's performance; section 5 concludes the paper.
Item Type:  Conference or Workshop contribution (Paper) 

Additional Information:  Introduction Training algorithms for Multilayer Perceptrons optimize the set of W weights and biases, w, so as to minimize an error function, E, applied to a set of N training patterns. The wellknown back propagation algorithm combines an efficient method of estimating the gradient of the error function in weight space, DE=g, with a simple gradient descent procedure to adjust the weights, Dw = hg. More efficient algorithms maintain the gradient estimation procedure, but replace the update step with a faster nonlinear optimization strategy [1]. Efficient nonlinear optimization algorithms are based upon second order approximation [2]. When sufficiently close to a minimum the error surface is approximately quadratic, the shape being determined by the Hessian matrix. Bishop [1] presents a detailed discussion of the properties and significance of the Hessian matrix. In principle, if sufficiently close to a minimum it is possible to move directly to the minimum using the Newton step, H1g. In practice, the Newton step is not used as H1 is very expensive to evaluate; in addition, when not sufficiently close to a minimum, the Newton step may cause a disastrously poor step to be taken. Second order algorithms either build up an approximation to H1, or construct a search strategy that implicitly exploits its structure without evaluating it; they also either take precautions to prevent steps that lead to a deterioration in error, or explicitly reject such steps. In applying nonlinear optimization algorithms to neural networks, a key consideration is the highdimensional nature of the search space. Neural networks with thousands of weights are not uncommon. Some algorithms have O(W2) or O(W3) memory or execution times, and are hence impracticable in such cases. It is desirable to identify algorithms that have limited memory requirements, particularly algorithms where one may trade memory usage against convergence speed. The paper describes a new training algorithm that has scalable memory requirements, which may range from O(W) to O(W2), although in practice the useful range is limited to lower complexity levels. The algorithm is based upon a novel iterative estimation of the principal eigensubspace of the Hessian, together with a quadratic step estimation procedure. It is shown that the new algorithm has convergence time comparable to conjugate gradient descent, and may be preferable if early stopping is used as it converges more quickly during the initial phases. Section 2 overviews the principles of second order training algorithms. Section 3 introduces the new algorithm. Second 4 discusses some experiments to confirm the algorithm's performance; section 5 concludes the paper. 
Keywords:  Neural networks, Algorithms 
Subjects:  G Mathematical and Computer Sciences > G730 Neural Computing 
Divisions:  College of Science > School of Computer Science 
ID Code:  1901 
Deposited By:  Tammie Farley 
Deposited On:  25 Jun 2009 12:26 
Last Modified:  13 Mar 2013 08:32 
Repository Staff Only: item control page