y = sig(W*x), z = sig(V*y) undergoing supervised training to separate blue and red, adapted from Olshausen, 2010.
“…Trained networks… contract space at the center of decision volumes and expand space in the vicinity of decision boundaries” – Nayebi & Ganguli, 2017
In order to correctly classify these data points in a 2D space as red or blue, a supervised two-layer network needs three feature vectors in the first layer (W) (Olah, 2014). If the projections onto those three feature vectors are plotted in a representation space where the features are orthogonal (right), we can observe how the network learns to warp the 2D input space lattice in 3D representation space in order to make classification possible with a plane. Convolutional networks do something this in high dimensions to classify images, and measurements of the local curvature of the input space reveal how individual neurons and layers contribute to successful recognition.
The network is described by these equations for the two layers: y = sig(W*x), z = sig(V*y), where sig is a sigmoid nonlinearity.
Golden, JR, Erickson-Davis, C, Cottaris, NP, Parthasarathy, N, Rieke, FM, Brainard, DH, Wandell, BA, Chichilnisky, EJ (2018). Simulation of visual perception and learning with a retinal prosthesis. bioRxiv, 206409.