Abstract
Estimates of the generalization error are proved for a residual neural network with L random Fourier features layers (z) over bar (l+1) = (z) over bar (l) + Re Sigma(K)(k=1)(b) over bar (lk) e(i omega lk (z) over barl) + Re Sigma(K)(k=1) (c) over bar (lk) e(i omega'lk.x). An optimal distribution for the frequencies (omega(lk), omega'(lk)) of the random Fourier features e(i omega lk (z) over barl) and e(i omega'lk.x) is derived. This derivation is based on the corresponding generalization error for the approximation of the function values f(x). The generalization error turns out to be smaller than the estimate parallel to(f) over cap parallel to(2)(L1(Rd))/(KL) of the generalization error for random Fourier features with one hidden layer and the same total number of nodes KL, in the case the L-infinity-norm of f is much less than the L-1-norm of its Fourier transform (f) over cap. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.