ldctbench.methods.qae.network
Model(args)
Bases: Module
Quadratic Autoencoder (QAE)
In the paper, the authors descibe their initialization as follows: As far as the Q-AE is concerned, parameters \(w_r\) and \(w_g\) of each layer were randomly initialized with a truncated Gaussian function, \(b_g\) are set to 1 for all the layers. In this way, quadratic term \((w_r x^T + b_r )(w_g x^T + b_g)\) turns into linear term \((w_r x^T + b_r )\). The reason why we use such initialization is because quadratic terms should not be pre-determined, they should be learned in the training. br and c were set to 0 initially for all the layers. wb was set to 0 here, we will discuss the influence of wb on the network in the context of direct initialization and transfer learning later.
In our experiments, the network diverges with these settings. In their source code the authors also commented out the lines where they initialize \(W_g\) as truncated normal, and initialize it with zeros instead. We here follow their official GitHub implementation and initialize \(W_g\) as zero.