more info about batchsizes

pascanur · pascanur · commit 5e4b30366136 · 2010-03-28T12:23:35.000-04:00
diff --git a/doc/gettingstarted.txt b/doc/gettingstarted.txt
@@ -398,8 +398,18 @@ With large :math:`B`, time is wasted in reducing the variance of the gradient
 estimator, that time would be better spent on additional gradient steps.
 An optimal :math:`B` is model-, dataset-, and hardware-dependent, and can be
 anywhere from 1 to maybe several hundreds.  In the tutorial we set it to 20, 
-but this choice is almost arbitrary (though harmless). All code-blocks
-above show pseudocode of how the algorithm looks like. Implementing such 
+but this choice is almost arbitrary (though harmless). 
+
+.. note::
+
+    If you are training for a fixed number of epochs, the minibatch size becomes important 
+    because it controls the number of updates done to your parameters. Training the same model
+    for 10 epochs using a batch size of 1 yields completely different results compared
+    to training for the same 10 epochs but with a batchsize of 20. Keep this in mind when
+    switching between batch sizes and be prepared to tweak all the other parameters acording 
+    to the batch size used.
+    
+All code-blocks above show pseudocode of how the algorithm looks like. Implementing such 
 algorithm in Theano can be done as follows : 
 
 .. code-block:: python