@@ -58,10 +58,10 @@ loss function as being the negative log-likelihood.
5858
5959.. math::
6060 \mathcal{L}(\theta, \mathcal{D}) = \frac{1}{N} \sum_{x^{(i)} \in
61- \mathcal{D}} \log\ p(x^{(i)}). \\
61+ \mathcal{D}} \log\ p(x^{(i)})\\
6262 \ell (\theta, \mathcal{D}) = - \mathcal{L} (\theta, \mathcal{D})
6363
64- using the stochastic gradient :math:`\frac{\partial - \log p(x^{(i)})}{\partial
64+ using the stochastic gradient :math:`- \frac{\partial \log p(x^{(i)})}{\partial
6565\theta}`, where :math:`\theta` are the parameters of the model.
6666
6767
9797.. math::
9898 :label: free_energy_grad
9999
100- - \frac{\partial - \log p(x)}{\partial \theta}
100+ - \frac{\partial \log p(x)}{\partial \theta}
101101 &= \frac{\partial \mathcal{F}(x)}{\partial \theta} -
102102 \sum_{\tilde{x}} p(\tilde{x}) \
103103 \frac{\partial \mathcal{F}(\tilde{x})}{\partial \theta}.
@@ -124,7 +124,7 @@ denoted as :math:`\mathcal{N}`. The gradient can then be written as:
124124.. math::
125125 :label: bm_grad
126126
127- \frac{\partial - \log p(x)}{\partial \theta}
127+ - \frac{\partial \log p(x)}{\partial \theta}
128128 &\approx
129129 \frac{\partial \mathcal{F}(x)}{\partial \theta} -
130130 \frac{1}{|\mathcal{N}|}\sum_{\tilde{x} \in \mathcal{N}} \
@@ -213,12 +213,12 @@ following log-likelihood gradients for an RBM with binary units:
213213.. math::
214214 :label: rbm_grad
215215
216- \frac {\partial{- \log p(v)}} {\partial W_{ij}} &=
216+ - \frac{\partial{ \log p(v)}}{\partial W_{ij}} &=
217217 E_v[p(h_i|v) \cdot v_j]
218- - v^{(i)}_j \cdot sigm(W_i \cdot v^{(i)} + c_i)
219- \frac {\partial{- \log p(v)}} {\partial c_i} &=
218+ - v^{(i)}_j \cdot sigm(W_i \cdot v^{(i)} + c_i) \\
219+ - \frac{\partial{ \log p(v)}}{\partial c_i} &=
220220 E_v[p(h_i|v)] - sigm(W_i \cdot v^{(i)}) \\
221- \frac {\partial{- \log p(v)}} {\partial b_j} &=
221+ - \frac{\partial{ \log p(v)}}{\partial b_j} &=
222222 E_v[p(v_j|h)] - v^{(i)}_j
223223
224224For a more detailed derivation of these equations, we refer the reader to the
@@ -396,10 +396,8 @@ with Eqs. :eq:`rbm_propup` - :eq:`rbm_propdown`. The code is as follows:
396396
397397.. code-block:: python
398398
399-
400399 def propup(self, vis):
401- ''' This function propagates the visible units activation upwards to
402- the hidden units '''
400+ ''' This function propagates the visible units activation upwards to the hidden units '''
403401 return T.nnet.sigmoid(T.dot(v, self.W) + self.hbias)
404402
405403 def sample_h_given_v(self, v0_sample):
@@ -414,9 +412,8 @@ with Eqs. :eq:`rbm_propup` - :eq:`rbm_propdown`. The code is as follows:
414412 dtype = theano.config.floatX)
415413 return [h1_mean, h1_sample]
416414
417- def propdown(self.hid):
418- '''This function propagates the hidden units activation downwards to
419- the visible units'''
415+ def propdown(self, hid):
416+ '''This function propagates the hidden units activation downwards to the visible units'''
420417 return T.nnet.sigmoid(T.dot(hid,self.W.T) + self.vbias)
421418
422419 def sample_v_given_h(self, h0_sample):
@@ -724,6 +721,22 @@ been shown to lead to a better generative model ([Tieleman08]_).
724721
725722 print 'Training epoch %d, cost is '%epoch, numpy.mean(mean_cost)
726723
724+ # Plot filters after each training epoch
725+ plotting_start = time.clock()
726+ # Construct image from the weight matrix
727+ image = PIL.Image.fromarray(tile_raster_images( X = rbm.W.value.T,
728+ img_shape = (28,28),tile_shape = (10,10),
729+ tile_spacing=(1,1)))
730+ image.save('filters_at_epoch_%i.png'%epoch)
731+ plotting_stop = time.clock()
732+ plotting_time += (plotting_stop - plotting_start)
733+
734+ end_time = time.clock()
735+
736+ pretraining_time = (end_time - start_time) - plotting_time
737+
738+ print ('Training took %f minutes' %(pretraining_time/60.))
739+
727740Once the RBM is trained, we can then use the ``gibbs_vhv`` function to implement
728741the Gibbs chain required for sampling. We initialize the Gibbs chain starting
729742from test examples (although we could as well pick it from the training set)
0 commit comments