Statistics Seminar

Andrew BarronYale University

Upper and lower risk bounds for high-dimensional ridge function combinations including neural networks

Wednesday, April 12, 2017 - 4:15pm

Biotech G01

Let $f$ be a function of $d$ variables with variation $v_f$ with respect to a class of smooth ridge functions with $l_1$ control on their internal parameter vectors. For a general noise settings, we show that the statistical risk $E|| hat.f− f||^2$ is not more than $v_f {(\log d) /n}^{1/3}$, to within a constant factor, where $n$ is the sample size and $hat.f$ is either a penalized least squares estimator or a greedily obtained version of such using linear combinations of the specified smooth ridge functions (e.g. using sinusoidal, spline or sigmoidal activation functions as arise in single hidden layer neural nets). Our risk bound is effective even when the dimension $d$ is much larger than the available sample size, as long as $d = \exp{o(n)}$. In this setting these are among the first results to provide favorable risk control for this flexible class of very high-dimensional nonlinear regression. When the dimension is larger than the cube root of the sample size this quantity is seen to improve the more familiar risk bound of $v_f {d \log(n/d)/n}^{1/2}$, also investigated here. Similar lower bounds on the minimax risk are obtained of order ${(\log d)/n}^{1/2}$. Thus the upper and lower bounds on optimal risk are of the form $(\log d)/n$ to a fractional power between 1/3 and 1/2. This is joint research with Jason Klusowski.