Skip to content

Commit eb6e759

Browse files
author
Yoshua Bengio
committed
Merge branch 'master' of github.com:lisa-lab/DeepLearningTutorials
2 parents a5fa087 + 4930d1f commit eb6e759

4 files changed

Lines changed: 33 additions & 34 deletions

File tree

code/SdA.py

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,8 @@ def __init__(self, n_visible= 784, n_hidden= 500, input= None):
207207
# Equation (3)
208208
self.z = T.nnet.sigmoid(T.dot(self.y, self.W_prime) + self.b_prime)
209209
# Equation (4)
210+
# note : we sum over the size of a datapoint; if we are using minibatches,
211+
# L will be a vector, with one entry per example in minibatch
210212
self.L = - T.sum( self.x*T.log(self.z) + (1-self.x)*T.log(1-self.z), axis=1 )
211213
# note : L is now a vector, where each element is the cross-entropy cost
212214
# of the reconstruction of the corresponding example of the
@@ -235,9 +237,7 @@ class SdA():
235237
"""
236238

237239
def __init__(self, input, n_ins, hidden_layers_sizes, n_outs):
238-
""" This class is costum made for a three layer SdA, and therefore
239-
is created by specifying the sizes of the hidden layers of the
240-
3 dAs used to generate the network.
240+
""" This class is made to support a variable number of layers.
241241
242242
:param input: symbolic variable describing the input of the SdA
243243
@@ -262,17 +262,13 @@ def __init__(self, input, n_ins, hidden_layers_sizes, n_outs):
262262
# input size is that of the previous layer
263263
# input is the output of the last layer inserted in our list
264264
# of layers `self.layers`
265-
print i
266-
print theano.pp(self.layers[-1].hidden_values)
267265
layer = dA( hidden_layers_sizes[i-1], \
268266
hidden_layers_sizes[i], \
269267
input = self.layers[-1].hidden_values )
270268
self.layers += [layer]
271269

272270

273271
self.n_layers = len(self.layers)
274-
print '------------------------------------------'
275-
print theano.pp(self.layers[-1].hidden_values)
276272
# now we need to use same weights and biases to define an MLP
277273
# We can simply use the `hidden_values` of the top layer, which
278274
# computes the input that we would normally feed to the logistic
@@ -304,7 +300,7 @@ def errors(self, y):
304300

305301

306302

307-
def sgd_optimization_mnist( learning_rate=0.1, pretraining_epochs = 10, \
303+
def sgd_optimization_mnist( learning_rate=0.1, pretraining_epochs = 15, \
308304
pretraining_lr = 0.1, training_epochs = 1000, dataset='mnist.pkl.gz'):
309305
"""
310306
Demonstrate stochastic gradient descent optimization for a multilayer
@@ -359,7 +355,7 @@ def shared_dataset(data_xy):
359355

360356
# construct the logistic regression class
361357
classifier = SdA( input=x, n_ins=28*28, \
362-
hidden_layers_sizes = [700, 700, 700], n_outs=10)
358+
hidden_layers_sizes = [1000, 1000, 1000], n_outs=10)
363359

364360
## Pre-train layer-wise
365361
for i in xrange(classifier.n_layers):
@@ -385,7 +381,7 @@ def shared_dataset(data_xy):
385381
# go through the training set
386382
for batch_index in xrange(n_train_batches):
387383
c = layer_update(batch_index)
388-
print 'Pre-training layer %i, epoch %d'%(i,epoch),c
384+
print 'Pre-training layer %i, epoch %d'%(i,epoch),c[0]
389385

390386

391387

@@ -460,10 +456,8 @@ def shared_dataset(data_xy):
460456
iter = epoch * n_train_batches + minibatch_index
461457

462458
if (iter+1) % validation_frequency == 0:
463-
print cost_ij
464459
cost_ij = []
465460
validation_losses = [validate_model(i) for i in xrange(n_valid_batches)]
466-
print validation_losses
467461
this_validation_loss = numpy.mean(validation_losses)
468462
print('epoch %i, minibatch %i/%i, validation error %f %%' % \
469463
(epoch, minibatch_index+1, n_train_batches, \

doc/SdA.txt

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ tutorial with a short digression on :ref:`autoencoders`
1313
and then move on to how classical
1414
autoencoders are extended to denoising autoencoders (:ref:`dA`).
1515
Throughout the following subchapters we will stick as close as possible to
16-
the original paper ( [Vincent08]_ ).
16+
the original paper ( [Vincent08] ).
1717

1818

1919
.. _autoencoders:
@@ -103,9 +103,15 @@ signal :
103103

104104
.. code-block:: python
105105

106-
self.y = T.nnet.sigmoid(T.dot(x, self.W ) + self.b)
107-
z = T.nnet.sigmoid(T.dot(self.y, self.W_prime) + self.b_prime)
108-
self.L = - T.sum( x*T.log(z) + (1-x)*T.log(1-z), axis=1 )
106+
self.y = T.nnet.sigmoid(T.dot(self.x, self.W ) + self.b)
107+
self.z = T.nnet.sigmoid(T.dot(self.y, self.W_prime) + self.b_prime)
108+
# note : we sum over the size of a datapoint; if we are using minibatches,
109+
# L will be a vector, with one entry per example in minibatch
110+
self.L = - T.sum( self.x*T.log(self.z) + (1-self.x)*T.log(1-self.z), axis=1 )
111+
# note : L is now a vector, where each element is the cross-entropy cost
112+
# of the reconstruction of the corresponding example of the
113+
# minibatch. We need to compute the average of all these to get
114+
# the cost of the minibatch
109115
self.cost = T.mean(self.L)
110116

111117
Training the autoencoder consist now in updating the parameters ``W``,
@@ -121,7 +127,7 @@ cost is minimized.
121127

122128
Note that for the stacked denoising autoencoder we will not use the
123129
``train`` function as defined here, this is here just to illustrate how
124-
the autoencoder would work. In [Bengio07]_ autoencoders are used to
130+
the autoencoder would work. In [Bengio07] autoencoders are used to
125131
build deep networks.
126132

127133

@@ -136,7 +142,7 @@ This can be understood from different perspectives
136142
stochastic operator perspective,
137143
bottom-up -- information theoretic perspective,
138144
top-down -- generative model perspective ), all of which are explained in
139-
[Vincent08]_.
145+
[Vincent08].
140146

141147

142148
To convert the autoencoder class into a denoising autoencoder one, all we
@@ -192,14 +198,14 @@ The final denoising autoencoder class becomes :
192198
if input == None :
193199
# we use a matrix because we expect a minibatch of several examples,
194200
# each example being a row
195-
x = T.dmatrix(name = 'input')
201+
self.x = T.dmatrix(name = 'input')
196202
else:
197-
x = input
203+
self.x = input
198204

199-
tilde_x = theano_rng.binomial( x.shape, 1, 0.9) * x
200-
self.y = T.nnet.sigmoid(T.dot(tilde_x, self.W ) + self.b)
201-
z = T.nnet.sigmoid(T.dot(self.y, self.W_prime) + self.b_prime)
202-
self.L = - T.sum( x*T.log(z) + (1-x)*T.log(1-z), axis=1 )
205+
self.tilde_x = theano_rng.binomial( self.x.shape, 1, 0.9) * self.x
206+
self.y = T.nnet.sigmoid(T.dot(self.tilde_x, self.W ) + self.b)
207+
self.z = T.nnet.sigmoid(T.dot(self.y, self.W_prime) + self.b_prime)
208+
self.L = - T.sum( self.x*T.log(self.z) + (1-self.x)*T.log(1-self.z), axis=1 )
203209
# note : L is now a vector, where each element is the cross-entropy cost
204210
# of the reconstruction of the corresponding example of the
205211
# minibatch. We need to compute the average of all these to get
@@ -209,7 +215,7 @@ The final denoising autoencoder class becomes :
209215
# we will need the hidden layer obtained from the uncorrupted
210216
# input when for example we will pass this as input to the layer
211217
# above
212-
self.hidden_values = T.nnet.sigmoid( T.dot(x, self.W) + self.b)
218+
self.hidden_values = T.nnet.sigmoid( T.dot(self.x, self.W) + self.b)
213219

214220

215221

@@ -433,11 +439,11 @@ TODO
433439
References
434440
++++++++++
435441

436-
.. [Vincent08] Vincent, P., Larochelle H., Bengio Y. and Manzagol P.A.
437-
(2008). Extracting and Composing Robust Features with Denoising
438-
Autoencoders. ICML'08, pp. 1096 - 1103
442+
.. [Vincent08] Vincent, P., Larochelle H., Bengio Y. and Manzagol P.A. `Extracting and Composing Robust Features with Denoising Autoencoders`_. Proceedings of the Twenty-fifth International Confrence on Machine Learning (ICML'08), pages 1096 - 1103, ACM, 2008
439443

440-
.. [Bengio07] Bengio Y., Lamblin P., Popovici D. and Larochelle H.
441-
(2007). Greedy Layer-Wise Training of Deep Networks. NIPS'06, pp
442-
153-160
444+
.. [Bengio07] Bengio Y., Lamblin P., Popovici D. and Larochelle H. `Greedy Layer-Wise Training of Deep Networks`_. Advances in Neural Information Processing Systems 19 (NIPS'06), pages 153-160, MIT Press 2007
443445

446+
447+
.. _Extracting and Composing Robust Features with Denoising Autoencoders: http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/217
448+
449+
.. _Greedy Layer-Wise Training of Deep Networks: http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/190

doc/contents.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Contents
1414
logreg
1515
mlp
1616
lenet
17+
SdA
1718
rbm
1819
dbn
1920
dae
20-
sdae

doc/index.txt

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ Contents
1414
logreg
1515
mlp
1616
lenet
17+
SdA
1718
rbm
1819
dbn
19-
dae
20-
sdae

0 commit comments

Comments
 (0)