forked from root-project/root
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathUsingTMVA.tex
More file actions
1296 lines (1179 loc) · 68.7 KB
/
UsingTMVA.tex
File metadata and controls
1296 lines (1179 loc) · 68.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\section{Using TMVA}
\label{sec:usingtmva}
A typical TMVA classification or regression analysis consists of two independent phases:
the {\em training} phase, where the multivariate methods are trained, tested and
evaluated, and an {\em application} phase, where the chosen methods are applied to
the concrete classification or regression problem they have been trained for.
An overview of the code flow for these two phases as implemented in the
examples \code{TMVAClassification.C}\index{TMVAClassification} and
\code{TMVAClassificationApplication.C}\index{TMVAClassificationApplication}
(for classification -- see Sec.~\ref{sec:examplejob}), and
\code{TMVARegression.C}\index{TMVARegression} and
\code{TMVARegressionApplication.C}\index{TMVARegressionApplication} (for regression)
are sketched in Fig.~\ref{fig:TMVAflow}. Multiclass classification does not differ much from
two class classification from a technical point of view and differences will only be highlighted
where neccessary.
In the training phase, the communication of the user with the data sets and the
MVA methods is performed via a \code{Factory} object, created at the beginning of
the program. The TMVA Factory provides member functions to specify the training
and test data sets, to register the discriminating input and -- in case of
regression -- target variables, and to
book the multivariate methods. Subsequently the Factory calls for
training, testing and the evaluation of the booked MVA methods. Specific
result (``weight'') files are created after the training phase by each booked
MVA method.
The application of training results to a data set with unknown sample composition
(classification) / target value (regression) is governed by the \code{Reader} object.
During initialisation, the user registers the input variables\footnote
{
This somewhat redundant operation is required to verify the correspondence between
the Reader analysis and the weight files used.
}
together with their local memory addresses, and books the MVA methods that were found
to be the most appropriate after evaluating the training results. As booking argument, the
name of the weight file is given. The weight file provides for each of the methods full
and consistent configuration according to the training setup and results. Within the
event loop, the input variables are updated for each event and the MVA response
values are computed. Some methods also provide the computation of errors.
For standalone use of the trained MVA methods, TMVA also generates lightweight
C++ response classes, which contain the encoded information from the weight files
so that these are not required anymore (\cf\ Sec.~\ref{sec:usingtmva:standaloneClasses}).
\begin{figure}[p]
\begin{center}
\includegraphics[width=0.50\textwidth]{plots/TMVAnalysisFlow}
\hspace{-0.65cm}
\includegraphics[width=0.516\textwidth]{plots/TMVAppFlow}
\end{center}
\vspace{-0.5cm}
\caption[.]{\underline{Left:} Flow (top to bottom) of a typical TMVA
training application.
The user script can be a ROOT macro, C++ executable, python
script or similar. The user creates a ROOT \code{TFile},
which is used by the TMVA Factory to store output histograms
and trees. After creation by the user, the Factory organises the
user's interaction with the TMVA modules. It is the only TMVA object
directly created and owned by the user. First the discriminating
variables that must be \code{TFormula}-compliant functions of
branches in the training trees are registered. For regression also
the target variable must be specified. Then, selected MVA methods
are booked through a type identifier and a user-defined unique name,
and configuration options are specified via an option string.
The TMVA analysis proceeds by consecutively calling the training,
testing and performance evaluation methods of the Factory. The training
results for all booked methods are written to custom weight files in
XML format and the evaluation histograms are stored in the output file.
They can be analysed with specific macros that come with TMVA (\cf\
Tables~\ref{pgr:scripttable1} and \ref{pgr:scripttable2}). \\
\underline{Right:} Flow (top to bottom) of a typical
TMVA analysis application. The MVA methods qualified by the preceding
training and evaluation step are now used to classify data of unknown
signal and background composition or to predict a regression target.
First, a \code{Reader} class object is created, which
serves as interface to the method's response, just as was the Factory
for the training and performance evaluation. The discriminating variables
and references to locally declared memory placeholders are registered
with the Reader. The variable names and types must be equal to those
used for the training. The selected MVA methods are booked with their
weight files in the argument, which fully configures them. The user
then runs the event loop, where for each event the values of the input
variables are copied to the reserved memory addresses, and the MVA
response values (and in some cases errors) are computed.
\index{Factory}\index{Reader}\index{TMVA analysis flow}
}
\label{fig:TMVAflow}
\end{figure}
\subsection{The TMVA Factory\index{Factory}}
\label{sec:usingtmva:factory}
The TMVA training phase begins by instantiating a \code{Factory} object
with configuration options listed in Option-Table~\ref{opt:factory}.
\begin{codeexample}
\begin{tmvacode}
TMVA::Factory* factory
= new TMVA::Factory( "<JobName>", outputFile, "<options>" );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Instantiating a Factory class object. The first
argument is the user-defined job name that will reappear in the names of
the weight files containing the training results. The second argument is the
pointer to a writable \code{TFile} output file created by the user, where
control and performance histograms are stored. }
\end{codeexample}
% ======= input option table ==========================================
\begin{option}[t]
\input optiontables/Factory.tex
\caption[.]{\optionCaptionSize
Configuration options reference for class: {\em Factory}.
Coloured output is switched on by default, except when running ROOT in batch
mode (\ie, when the '\code{-b}' option of the CINT interpreter is invoked).
The list of transformations contains a default set of data preprocessing steps
for test and visualisation purposes only. The usage of preprocessing transformations
in conjunction with MVA methods must be configured when booking the methods.
}
\label{opt:factory}
\end{option}
% =====================================================================
\subsubsection{Specifying training and test data\index{Factory!specifying input data (trees)}}
The input data sets used for training and testing of the multivariate methods
need to be handed to the Factory. TMVA supports ROOT \code{TTree} and derived
\code{TChain} objects as well as text files. If ROOT trees are used for classification
problems, the signal and background events can be located in the same or in different
trees. Data trees can be provided specifically for the purpose of either training or testing or for both purposes. In the latter case the factory then splits the tree into one part for training, the other for testing (see also section \ref{sec:PreparingTrainingTestData}).
Overall weights can be specified for the signal and background training data
(the treatment of event-by-event weights is discussed below).
Specifying {\bf classification training and test data} in ROOT tree format with signal
and background events being located in different trees:
\begin{codeexample}
\begin{tmvacode}
// Get the signal and background trees from TFile source(s);
// multiple trees can be registered with the Factory
TTree* sigTree = (TTree*)sigSrc->Get( "<YourSignalTreeName>" );
TTree* bkgTreeA = (TTree*)bkgSrc->Get( "<YourBackgrTreeName_A>" );
TTree* bkgTreeB = (TTree*)bkgSrc->Get( "<YourBackgrTreeName_B>" );
TTree* bkgTreeC = (TTree*)bkgSrc->Get( "<YourBackgrTreeName_C>" );
// Set the event weights per tree (these weights are applied in
// addition to individual event weights that can be specified)
Double_t sigWeight = 1.0;
Double_t bkgWeightA = 1.0, bkgWeightB = 0.5, bkgWeightC = 2.0;
// Register the trees
factory->AddSignalTree ( sigTree, sigWeight );
factory->AddBackgroundTree( bkgTreeA, bkgWeightA );
factory->AddBackgroundTree( bkgTreeB, bkgWeightB );
factory->AddBackgroundTree( bkgTreeC, bkgWeightC );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Registration of signal and background ROOT trees
read from \code{TFile} sources. Overall signal and background weights
per tree can also be specified.
The \code{TTree} object may be replaced by a \code{TChain}. The trees will be later split by the factory into subsamples used for testing and training. }
\end{codeexample}
Specifying {\bf classification training and test data} in ROOT tree format with signal
and background events being located in the same tree:
\begin{codeexample}
\begin{tmvacode}
TTree* inputTree = (TTree*)source->Get( "<YourTreeName>" );
TCut signalCut = ...; // how to identify signal events
TCut backgrCut = ...; // how to identify background events
factory->SetInputTrees( inputTree, signalCut, backgrCut );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Registration of a single ROOT tree containing the
input data for signal {\em and} background, read from a \code{TFile} source.
The \code{TTree} object may be replaced by a \code{TChain}. The cuts
identify the event species.}
\end{codeexample}
Specifying {\bf classification data} in ROOT tree format with signal
and background training/test events being located in separate trees:
\begin{codeexample}
\begin{tmvacode}
#include "TMVA/Types.h"
// Get the signal and background training and test trees from TFile source(s);
TTree* sigTreeTrain = (TTree*)sigSrc->Get( "<YourSignalTrainTreeName>" );
TTree* bkgTreeTrain = (TTree*)bkgSrc->Get( "<YourBackgrTrainTreeName>" );
TTree* sigTreeTest = (TTree*)sigSrc->Get( "<YourSignalTestTreeName>" );
TTree* bkgTreeTest = (TTree*)bkgSrc->Get( "<YourBackgrTestTreeName>" );
// Set the event weights (these weights are applied in
// addition to individual event weights that can be specified)
Double_t sigWeight = 1.0;
Double_t bkgWeight = 1.0;
// Register the trees
factory->AddSignalTree ( sigTreeTrain, sigWeight, TMVA::Types::kTraining);
factory->AddBackgroundTree( bkgTreeTrain, bkgWeight, TMVA::Types::kTraining);
factory->AddSignalTree ( sigTreeTest, sigWeight, TMVA::Types::kTesting);
factory->AddBackgroundTree( bkgTreeTest, bkgWeight, TMVA::Types::kTesting);
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Registration of signal and background ROOT trees
read from \code{TFile} sources.
The first two tree are specified to be used only for training the other two only for testing. Please note that the tree type testing/training requires the inclusion of the header file TMVA/Types.h.}
\end{codeexample}
Specifying {\bf classification training and test data} in text format:
\begin{codeexample}
\begin{tmvacode}
// Text file format (available types: 'F' and 'I')
// var1/F:var2/F:var3/F:var4/F
// 0.21293 -0.49200 -0.58425 -0.70591
// ...
TString sigFile = "signal.txt"; // text file for signal
TString bkgFile = "background.txt"; // text file for background
Double_t sigWeight = 1.0; // overall weight for all signal events
Double_t bkgWeight = 1.0; // overall weight for all background events
factory->SetInputTrees( sigFile, bkgFile, sigWeight, bkgWeight );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Registration of signal and background text files used for training and testing.
Names and types of the input variables are given in the first line,
followed by the values.}
\end{codeexample}
\clearpage
Specifying {\bf regression training and test data} in ROOT tree format:
\begin{codeexample}
\begin{tmvacode}
factory->AddRegressionTree( regTree, weight );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Registration of a ROOT tree containing the
input and target variables. An overall weight per tree can also be specified.
The \code{TTree} object may be replaced by a \code{TChain}.
}
\end{codeexample}
Rather than having only global weighting factors for individual input
trees which allow to scale them to the same ``luminosity'', individual
event weights can be applied as well. These weights should be
available event-by-event, i.e. as a column or a function of columns of
the input data sets. To specify the weights to be used for the
training use the command:\index{Factory!specifying event weights}
\begin{codeexample}
\begin{tmvacode}
factory->SetWeightExpression( "<YourWeightExpression>" );
\end{tmvacode}
or if you have different expressions (variables) used as weights in the signal and background
trees:
\begin{tmvacode}
factory->SetSignalWeightExpression( "<YourSignalWeightExpression>" );
factory->SetBackgroundWeightExpression( "<YourBackgroundWeightExpression>" );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Specification of individual weights for the
training events. The expression must be a function of variables present in
the input data set.}
\end{codeexample}
\subsubsection{Negative event weights\index{Negative Event Weights}}
\label{sec:NegativeEventWeights}
In next-to-leading order Monte Carlo generators, events with
(unphysical) negative weights may occur in some phase space
regions. Such events are often troublesome to deal with, and it
depends on the concrete implementation of the MVA method, whether or
not they are treated properly. Among those methods that correctly
incorporate events with negative weights are likelihood
and multi-dimensional probability density estimators, but also
decision trees. A summary of this feature for all TMVA methods is given in
Table~\ref{tab:methodStatus}. In cases where a method does {\em not}
properly treat events with negative weights, it is advisable to ignore
such events for the training - but to include them in the performance
evaluation to not bias the results. This can be explicitly requested for
each MVA method via the boolean configuration option \code{IgnoreNegWeightsInTraining}
(\cf\ Option Table~\ref{opt:mva::methodbase} on
page~\pageref{opt:mva::methodbase}).
\subsubsection{Defining input variables, spectators and targets\index{Factory!selecting input variables}}
\label{sec:defineVariables}
The variables in the input trees used to train the MVA methods are registered
with the Factory using the \code{AddVariable} method. It takes the variable name
(string), which must have a correspondence in the input ROOT tree or input text file,
and optionally a number type (\code{'F'} (default) and \code{'I'}). The type is used
to inform the method whether a variable takes continuous floating point or discrete
values.\footnote
{
For example for the projective likelihood method, a histogram out of discrete
values would not (and should not) be interpolated between bins.
}
Note that \code{'F'} indicates {\em any} floating point type, \ie, \code{float}
{\em and} \code{double}. Correspondingly, \code{'I'} stands for integer,
{\em including} \code{int}, \code{short}, \code{char}, and the corresponding
\code{unsigned} types. Hence, if a variable in the input tree is \code{double},
it should be declared \code{'F'} in the \code{AddVariable} call.
It is possible to specify variable expressions, just as for the \code{TTree::Draw}
command (the expression is interpreted as a \code{TTreeFormula}, including the use
of arrays). Expressions may be abbreviated for more concise screen output (and plotting)
purposes by defining shorthand-notation {\em labels} via the assignment operator \code{:=}.
In addition, two more arguments may be inserted into the \code{AddVariable}
call, allowing the user to specify {\em titles} and {\em units} for the input variables
for displaying purposes.
The following code example revises all possible options to declare an input variable:
\begin{codeexample}
\begin{tmvacode}
factory->AddVariable( "<YourDescreteVar>", 'I' );
factory->AddVariable( "log(<YourFloatingVar>)", 'F' );
factory->AddVariable( "SumLabel := <YourVar1>+<YourVar2>", 'F' );
factory->AddVariable( "<YourVar3>", "Pretty Title", "Unit", 'F' );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Declaration of variables used to train the
MVA methods. Each variable is specified by its name in the training
tree (or text file), and optionally a type (\code{'F'} for
floating point and \code{'I'} for integer, \code{'F'} is default if
nothing is given). Note that \code{'F'} indicates {\em any} floating point
type, \ie, \code{float} {\em and} \code{double}. Correspondingly, \code{'I'}
stands for integer, {\em including} \code{int}, \code{short}, \code{char},
and the corresponding \code{unsigned} types. Hence, even if a variable in
the input tree is \code{double}, it should be declared \code{'F'} here.
Here, \code{YourVar1} has discrete values and is thus declared
as an integer. Just as in the \code{TTree::Draw} command, it
is also possible to specify expressions of variables. The \code{:=} operator
defines labels (third row), used for shorthand notation in screen outputs
and plots. It is also possible to define titles and units for the variables
(fourth row), which are used for plotting. If labels {\em and} titles are
defined, labels are used for abbreviated screen outputs, and titles for plotting.
}
\label{ce:addvariable}
\end{codeexample}
It is possible to define {\em spectator variables}\index{Spectator variables}, which are
part of the input data set, but which are not used in the MVA training, test nor during
the evaluation. They are copied into the \code{TestTree}, together with the used input
variables and the MVA response values for each event, where the spectator variables can
be used for correlation tests or others. Spectator variables are declared as follows:
\begin{codeexample}
\begin{tmvacode}
factory->AddSpectator( "<YourSpectatorVariable>" );
factory->AddSpectator( "log(<YourSpectatorVariable>)" );
factory->AddSpectator( "<YourSpectatorVariable>", "Pretty Title", "Unit" );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Various ways to declare a spectator variable, not
participating in the MVA anlaysis, but written into the final \code{TestTree}.
}
\end{codeexample}
For a regression problem, the target variable is defined similarly, without however
specifying a number type:
\begin{codeexample}
\begin{tmvacode}
factory->AddTarget( "<YourRegressionTarget1>" );
factory->AddTarget( "log(<YourRegressionTarget2>)" );
factory->AddTarget( "<YourRegressionTarget3>", "Pretty Title", "Unit" );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Various ways to declare the target variables used
to train a multivariate regression method. If the MVA method supports
multi-target (multidimensional)
regression\index{Regression!multi-target (multidimensional)},
more than one regression target can be defined.
}
\end{codeexample}
\subsubsection{Preparing the training and test
data\index{Factory!preparing training and test data}}
\label{sec:PreparingTrainingTestData}
The input events that are handed to the Factory are internally copied
and split into one {\em training} and one {\em test} ROOT tree. This
guarantees a statistically independent evaluation of the MVA
algorithms based on the test sample.\footnote { A fully unbiased
training and evaluation requires at least three statistically
independent data sets. See comments in Footnote~\ref{ftn:training}
on page~\pageref{ftn:training}. } The numbers of events used in
both samples are specified by the user. They must not exceed the
entries of the input data sets. In case the user has provided a ROOT
tree, the event copy can (and should) be accelerated by disabling all
branches not used by the input variables.
It is possible to apply selection requirements (cuts) upon the input
events. These requirements can depend on any variable present in the
input data sets, \ie, they are not restricted to the variables used by
the methods. The full command is as follows:
\begin{codeexample}
\begin{tmvacode}
TCut preselectionCut = "<YourSelectionString>";
factory->PrepareTrainingAndTestTree( preselectionCut, "<options>" );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Preparation of the internal TMVA
training and test trees. The sizes (number of events) of these trees
are specified in the configuration option string. For classification
problems, they can be set individually for signal and
background. Note that the preselection cuts are applied before the
training and test samples are created, \ie, the tree sizes apply to
numbers of {\em selected} events. It is also possible to choose
among different methods to select the events entering the training
and test trees from the source trees. All options are described in
Option-Table~\ref{opt:datasetfactory}. See also the text for further
information.}
\label{ce:treePreparation}
\end{codeexample}
For {\bf classification}, the numbers of signal and background events
used for training and testing are specified in the configuration
string by the variables \code{nTrain_Signal},
\code{nTrain_Background}, \code{nTest_Signal} and
\code{nTest_Background} (for example,
\code{"nTrain_Signal=5000:nTrain_Background=5000:nTest_Signal=4000:nTest_Background=5000"}).
The default value (zero) signifies that all available events are
taken, \eg, if \code{nTrain_Signal=5000} and \code{nTest_Signal=0},
and if the total signal sample has 15000 events, then 5000 signal
events are used for training and the remaining 10000 events are used
for testing. If \code{nTrain_Signal=0} and \code{nTest_Signal=0}, the
signal sample is split in half for training and testing. The same
rules apply to background. Since zero is default, not specifying
anything corresponds to splitting the samples in two halves.
For {\bf regression}, only the sizes of the train and test samples are given, \eg,
\code{"nTrain_Regression=0:nTest_Regression=0"}, so that one half of the input
sample is used for training and the other half for testing. If a tree is
given to the factory as a training tree. The events of that tree can only be used for training. The same is true for test trees.
The option \code{SplitMode} defines how the training and test samples
are selected from the source trees. With \code{SplitMode=Random},
events are selected randomly. With \code{SplitMode=Alternate}, events
are chosen in alternating turns for the training and test samples as
they occur in the source trees until the desired numbers of training
and test events are selected. The training and test samples should
contain the same number of events for each event class.
In the \code{SplitMode=Block} mode the
first \code{nTrain_Signal} and \code{nTrain_Background}
(classification), or \code{nTrain_Regression} events (regression) of
the input data set are selected for the training sample, and the next
\code{nTest_Signal} and \code{nTest_Background} or
\code{nTest_Regression} events comprise the test data. This is usually
not desired for data that contains varying conditions over the range
of the data set. For the \code{Random} selection mode, the seed of the
random generator can be set. With \code{SplitSeed=0} the generator
returns a different random number series every time. The default seed
of 100 ensures that the same training and test samples are used each time
TMVA is
run (as does any other seed apart from 0). The option \code{MixMode}
defines the order of how the training and test events of the different classes
are combined into a training sample. It also defines the order in which they appear in the test sample. The available options for
\code{MixMode} are the same as for \code{SplitMode}. By default, the same
option is chosen for the \code{MixMode} as given in \code{SplitMode}. Again,
with \code{MixMode=Random}, the order of the events in the samples is random.
With \code{MixMode=Alternate} subsequent events are always of the next class
(e.g. 0, 1, 2, 3, 0, 1, 2, 3, $\cdots$). With \code{MixMode=Block} all events
of one class are inserted in a block into the training/test samples (e.g.
0, 0, $\cdots$, 0, 1, 1, $\cdots$, 1, 2, 2, $\cdots$, 2, $\cdots$ ).
In some cases event weights are given by Monte Carlo generators, and
may turn out to be overall very small or large numbers. To avoid
artifacts due to this, TMVA can internally renormalise the signal and
background training(!) weights such that their respective sums of
effective (weighted) events is equal. This is the default
renormalisation and it can be modified with the configuration option
\code{NormMode} (\cf\ Table~\ref{opt:datasetfactory}). Possible
settings are: \code{None}: no renormalisation is applied (the weights
are used as given), \code{NumEvents} : renormalisation of the training
events such that the sum of event weights of the Signal and Background
events, respectively are equal to the number of events \code{Ns, Nb}
requested in the call
\code{Factory::PrepareTrainingAndTestTree("","nTrain_Signal=Ns,nTrain_Background=Nb..."},
\code{EqualNumEvents} (default): the event weights are renormalised such
that both, the sum of all weighted signal training events equals the
sum of all weights of the background training events. Note: All this renormalisation only
affects the training events as the training of some classifiers is sensitive to the
relative amount of signal and background in the training data. On the other hand, the
background or signal efficiency of the trained classifier as determined from the test
sample is independent of the relative abundance of signal and background events.
% ======= input option table ==========================================
\begin{option}[p]
\input optiontables/DataSetFactory.tex
\caption[.]{\optionCaptionSize Configuration options reference in call
\code{Factory::PrepareTrainingAndTestTree(..)}. For regression,
\code{nTrain_Signal} and \code{nTest_Signal} are replaced by
\code{nTrain_Regression} and \code{nTest_Regression}, respectively,
and \code{nTrain_Background} and \code{nTest_Background} are
removed. See also Code-Example~\ref{ce:treePreparation} and
comments in the text. }
\label{opt:datasetfactory}
\end{option}
% =====================================================================
\clearpage
\subsubsection{Booking MVA methods\index{Factory!booking MVA methods}}
\label{sec:usingtmva:booking}
All MVA methods are booked via the Factory by specifying the method's
type, plus a unique name chosen by the user, and a set of specific
configuration options encoded in a string qualifier.\footnote { In the
TMVA package all MVA methods are derived from the abstract interface
\code{IMethod} and the base class \code{MethodBase}. } If the same
method type is booked several times with different options (which is
useful to compare different sets of configurations for optimisation
purposes), the specified names must be different to distinguish the
instances and their weight files. A booking example for the likelihood
method is given in Code Example~\ref{codeex:factoryBooking}
below. Detailed descriptions of the configuration options are given in
the corresponding tools and MVA sections of this Users Guide, and
booking examples for most of the methods are given in
Appendix~\ref{sec:appendix:booking}. With the MVA booking the
initialisation of the Factory is complete and no MVA-specific actions
are left to do. The Factory takes care of the subsequent training,
testing and evaluation of the MVA methods.
\begin{codeexample}
\begin{tmvacode}
factory->BookMethod( TMVA::Types::kLikelihood, "LikelihoodD",
"!H:!V:!TransformOutput:PDFInterpol=Spline2:\
NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmooth=5:\
NAvEvtPerBin=50:VarTransform=Decorrelate" );
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Example booking of the likelihood
method. The first argument is a unique type enumerator (the
available types can be looked up in \code{src/Types.h}), the second
is a user-defined name which must be unique among all booked MVA
methods, and the third is a configuration option string that is
specific to the method. For options that are not explicitly set in
the string default values are used, which are printed to standard
output. The syntax of the options should be explicit from the above
example. Individual options are separated by a ':'. Boolean
variables can be set either explicitly as
\code{MyBoolVar=True/False}, or just via
\code{MyBoolVar/!MyBoolVar}. All specific options are explained in
the tools and MVA sections of this Users Guide. There is no
difference in the booking of methods for classification or
regression applications. See Appendix~\ref{sec:appendix:booking} on
page~\pageref{sec:appendix:booking} for a complete booking list of
all MVA methods in TMVA.}
\label{codeex:factoryBooking}
\end{codeexample}
\subsubsection{Help option for MVA booking\index{Help!method-specific help messages}
\index{Help!booking options}
\index{Help!MVA method optimisation}}
\label{sec:usingtmva:gettingHelp}
Upon request via the configuration option "\code{H}" (see code example above) the TMVA
methods print concise help messages. These include a brief description of the
algorithm, a performance assessment, and hints for setting the most important
configuration options. The messages can also be evoked by the command
{\tt factory->PrintHelpMessage("<MethodName>")}.
\subsubsection{Training the MVA methods\index{Training MVA methods}}
\label{sec:usingtmva:training}
The training of the booked methods is invoked by the command:
\begin{codeexample}
\begin{tmvacode}
factory->TrainAllMethods();
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Executing the MVA training via the Factory.}
\end{codeexample}
The training results are stored in the weight files\index{Weight
files} which are saved in the directory \code{weights} (which, if
not existing is created).\footnote { The default weight file directory
name can be modified from the user script through the global
configuration variable
\code{(TMVA::gConfig().GetIONames()).fWeightFileDir}. } The weight
files are named \code{Jobname_MethodName.weights.<extension>}, where
the job name has been specified at the instantiation of the Factory,
and \code{MethodName} is the unique method name specified in the
booking command. Each method writes a custom weight file in XML format
(extension is \code{xml})\index{Weight files!XML format}, where the
configuration options, controls and training results for the method
are stored.
\subsubsection{Testing the MVA methods\index{Testing multivariate methods}}
The trained MVA methods are applied to the test data set and provide
scalar outputs according to which an event can be classified as either
signal or background, or which estimate the regression
target.\footnote { In classification mode, TMVA discriminates signal
from background in data sets with unknown composition of these two
samples. In frequent use cases the background (sometimes also the
signal) consists of a variety of different populations with
characteristic properties, which could call for classifiers with
more than two discrimination classes. However, in practise it is
usually possible to serialise background fighting by training
individual classifiers for each background source, and applying
consecutive requirements to these. Since TMVA 4, the framework directly
supports multi-class classification. However, some MVA
methods have not yet been prepared for it. } The MVA outputs are
stored in the test tree (\code{TestTree}) to which a column is added
for each booked method. The tree is eventually written to the output
file and can be directly analysed in a ROOT session. The testing of
all booked methods is invoked by the command:
\begin{codeexample}
\begin{tmvacode}
factory->TestAllMethods();
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Executing the validation (testing) of the MVA
methods via the Factory.}
\end{codeexample}
\subsubsection{Evaluating the
MVA methods\index{Evaluating MVA methods}\index{Performance evaluation}}
\label{sec:usingtmva:evaluation}
The Factory and data set classes of TMVA perform a preliminary
property assessment of the input variables used by the MVA methods,
such as computing correlation coefficients and ranking the variables
according to their separation (for classification), or according to
their correlations with the target variable(s) (for regression). The
results are printed to standard output.
The performance evaluation in terms of signal efficiency, background rejection,
faithful estimation of a regression target, etc., of the trained and tested MVA
methods is invoked by the command:
\begin{codeexample}
\begin{tmvacode}
factory->EvaluateAllMethods();
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Executing the performance evaluation
via the Factory.}
\end{codeexample}
The performance measures differ between classification and regression
problems. They are summarised below.
\subsubsection{Classification performance evaluation}
After training and testing, the linear correlation coefficients among
the classifier outputs are printed. In addition, overlap matrices are
derived (and printed) for signal and background that determine the
fractions of signal and background events that are equally classified
by each pair of classifiers. This is useful when two classifiers have
similar performance, but a significant fraction of non-overlapping
events. In such a case a combination of the classifiers (\eg, in a
{\em Committee} classifier) could improve the performance (this can be
extended to any combination of any number of classifiers).
The optimal method to be used for a specific analysis strongly depends on the
problem at hand and no general recommendations can be given. To ease the choice
TMVA computes a number of benchmark quantities that assess the performance of the
methods on the independent test sample. For classification these are
\begin{itemize}
\item The {\bf signal efficiency at three representative background
efficiencies} (the efficiency is equal to $1-{\rm rejection}$)
obtained from a cut on the classifier output. Also given is the area
of the background rejection versus signal efficiency function (the
larger the area the better the performance).\index{Performance
evaluation!background rejection vs. signal efficiency}
\item The {\bf separation}\index{Performance evaluation!separation} \Separation
of a classifier $y$, defined by the integral~\cite{Cornelius}
\beq
\Separation =
\frac{1}{2} \int\frac{\left(\yPDFS(y) - \yPDFB(y)\right)^2}{\yPDFS(y) + \yPDFB(y)} dy\,,
\eeq
where $\yPDFS$ and $\yPDFB$ are the signal and background PDFs of $y$,
respectively (\cf\ Sec.~\ref{sec:otherRepresentations}).
The separation is zero for identical signal and background
shapes, and it is one for shapes with no overlap.
\item The discrimination {\bf significance}\index{Performance
evaluation!significance} of a classifier, defined by the difference
between the classifier means for signal and background divided by
the quadratic sum of their root-mean-squares.
\end{itemize}
The results of the evaluation are printed to standard output. Smooth
background rejection/efficiency versus signal efficiency curves are
written to the output ROOT file, and can be plotted using custom
macros (see Sec.~\ref{sec:rootmacros}).
\subsubsection{Regression performance evaluation}
Ranking for regression is based on the correlation strength between
the input variables or MVA method response and the regression
target. Several correlation measures are implemented in TMVA to
capture and quantify nonlinear dependencies. Their results are printed
to standard output.
\begin{itemize}
\item The {\bf Correlation}\index{Correlation} between two random variables $X$ and $Y$ is
usually measured with the correlation coefficient $\rho$, defined by
\beq
\label{eqn:corrCoeff}
\rho(X,Y) = \frac{{\rm cov}(X,Y)}{\sigma_X \sigma_Y}. \eeq
The correlation coefficient is symmetric in $X$ and $Y$, lies
within the interval $[-1,1]$, and quantifies by definition a
linear relationship. Thus $\rho = 0$ holds for independent
variables, but the converse is not true in general. In
particular, higher order functional or non-functional
relationships may not, or only marginally, be reflected in
the value of $\rho$ (see Fig.~\ref{fig:correlationTypes}).
\item The {\bf correlation ratio}\index{Correlation ratio} is defined by
\beq
\label{eqn:corrRatio}
\eta^2(Y|X) = \frac{\sigma_{E(Y|X)}} {\sigma_Y}\,,
\eeq
where
\beq
\label{eqn:condExp}
E(Y|X) = \int y \ P(y|x) \ dy\,, \eeq is the conditional
expectation of $Y$ given $X$ with the associated conditional
probability density function $P(Y|X)$. The correlation ratio
$\eta^2$ is in general not symmetric and its value lies
within $[0,1]$, according to how well the data points can be
fitted with a linear or nonlinear regression curve. Thus
non-functional correlations cannot be accounted for by the
correlation ratio. The following relations can be derived for
$\eta^2$ and the squared correlation coefficient
$\rho^2$~\cite{kendall:stuart:ord:arnold:1999:2A}:
\begin{itemize}
\item[$\circ$] $\rho^2 = \eta^2=1$, if $X$ and $Y$ are in a
strict linear functional relationship.
\item[$\circ$] $\rho^2 \leq \eta^2=1$, if $X$ and $Y$ are in a
strict nonlinear functional relationship.
\item[$\circ$] $\rho^2 = \eta^2 < 1$, if there is no strict
functional relationship but the regression of $X$ on $Y$ is
exactly linear.
\item[$\circ$] $\rho^2 < \eta^2 < 1$, if there is no strict
functional relationship but some nonlinear regression curve is
a better fit then the best linear fit.
\end{itemize}
Some characteristic examples and their corresponding values for $\eta^2$ are
shown in Fig.~\ref{fig:correlationTypes}. In the special case, where all data
points take the same value, $\eta$ is undefined.
\begin{figure}[t]
\begin{center}
\includegraphics[width=6.2cm]{plots/linDep} \hspace{0.3cm}
\includegraphics[width=6.2cm]{plots/funcDep} \\\vspace{+0.2cm}
\includegraphics[width=6.2cm]{plots/nonFuncDep} \hspace{0.3cm}
\includegraphics[width=6.2cm]{plots/noDep}
\end{center}
\vspace{-0.7cm}
\caption{Various types of correlations between two random variables
and their corresponding values for the correlation coefficient
$\rho$, the correlation ratio $\eta$, and mutual information
$I$. Linear relationship (upper left), functional relationship
(upper right), non-functional relationship (lower left), and
independent variables (lower right).}
\label{fig:correlationTypes}
\end{figure}
\item {\bf Mutual information} allows to detect any predictable
relationship between two random variables, be it of functional or
non-functional form. It is defined by~\cite{citeulike:165404}
\beq
\label{eqn:MI}
I(X,Y) = \sum_{X,Y}P(X,Y) \ln \frac{P(X,Y)}{P(X) P(Y)}\,,
\eeq
where $P(X,Y)$ is the joint probability density function of
the random variables $X$ and $Y$, and $P(X)$, $P(Y)$ are the
corresponding marginal probabilities. Mutual information
originates from information theory and is closely related to
entropy which is a measure of the uncertainty associated with
a random variable. It is defined by
\beq
\label{eqn:MIH}
H(X) = - \sum_{X}P(X) \ln {P(X)}\,,
\eeq
where $X$ is the
discrete random variable and $P(X)$ the associated
probability density function. The connection between the two
quantities is given by the following transformation
\begin{align}
I(X,Y) &= \sum_{X,Y}P(X,Y) \ln \frac{P(X,Y)}{P(X) P(Y)}\\
&= \sum_{X,Y}P(X,Y) \ln \frac{P(X|Y)}{P_X(X)}\\
&= -\sum_{X,Y}P(X,Y) \ln P(X) + \sum_{X,Y}P(X,Y) \ln P(X|Y)\\
&= -\sum_{X,Y}P(X) \ln P(X) - (-\sum_{X,Y}P(X,Y) \ln P(X|Y) ) \\
&=H(X) - H(X|Y)\,,
\end{align}
where $H(X|Y)$ is the conditional entropy of $X$ given $Y$. Thus
mutual information is the reduction of the uncertainty in
variable $X$ due to the knowledge of $Y$. Mutual information is
symmetric and takes positive absolute values. In the case of two
completely independent variables $I(X,Y)$ is zero.
For experimental measurements the joint and marginal probability
density functions are a priori unknown and must be approximated
by choosing suitable binning procedures such as kernel
estimation techniques (see, \eg,
\cite{PhysRevE.52.2318}). Consequently, the values of $I(X,Y)$
for a given data set will strongly depend on the statistical
power of the sample and the chosen binning parameters.
For the purpose of ranking variables from data sets of equal
statistical power and identical binning, however, we assume that
the evaluation from a simple two-dimensional histogram without
further smoothing is sufficient.
\end{itemize}
A comparison of the correlation coefficient $\rho$, the correlation
ratio $\eta$, and mutual information $I$ for linearly correlated
two-dimensional Gaussian toy MC simulations is shown in
Table~\ref{tab:compLinToys}.
\begin{table}[t]
\begin{tabularx}{1.0\linewidth}{lXXXXXXXXXXX}
\hline
&&&&&&&&&&&\\[\BD]
$\rho_{\rm PDF}$ & 0.0 & 0.1 & 0.2 & 0.3 & 0.4 & 0.5 & 0.6 & 0.7 & 0.8 & 0.9 & 0.9999\\[\AD]
\hline
&&&&&&&&&&&\\[\BD]
$\rho$ & 0.006& 0.092& 0.191& 0.291& 0.391& 0.492& 0.592& 0.694& 0.795& 0.898& 1.0\\
$\eta^2$& 0.004& 0.012& 0.041& 0.089& 0.156& 0.245& 0.354& 0.484& 0.634& 0.806& 1.0\\
$I$ & 0.093& 0.099& 0.112& 0.139& 0.171& 0.222& 0.295& 0.398& 0.56& 0.861& 3.071\\[\AD]
\hline
\end{tabularx}
\caption{Comparison of the correlation coefficient $\rho$, correlation ratio $\eta$, and
mutual information $I$ for two-dimensional Gaussian toy Monte-Carlo distributions
with linear correlations as indicated ($20000~{\rm data~points}/100\times100~{\rm bins}$ .}
\label{tab:compLinToys}
\end{table}
\subsubsection{Overtraining\index{Overtraining}}
\label{sec:usingtmva:overtraining}
Overtraining occurs when a machine learning problem has too few
degrees of freedom, because too many model parameters of an algorithm
were adjusted to too few data points. The sensitivity to overtraining
therefore depends on the MVA method. For example, a Fisher (or {\em
linear}) discriminant can hardly ever be overtrained, whereas,
without the appropriate counter measures, boosted decision trees
usually suffer from at least partial overtraining, owing to their
large number of nodes. Overtraining leads to a seeming increase in
the classification or regression performance over the objectively
achievable one, if measured on the training sample, and to an
effective performance decrease when measured with an independent test
sample. A convenient way to detect overtraining and to measure its
impact is therefore to compare the performance results between
training and test samples. Such a test is performed by TMVA with the
results printed to standard output.
Various method-specific solutions to counteract overtraining
exist. For example, binned likelihood reference distributions are
smoothed before interpolating their shapes, or unbinned kernel density
estimators smear each training event before computing the PDF; neural
networks steadily monitor the convergence of the error estimator
between training and test samples\footnote {
\label{ftn:training}
Proper training and validation requires three statistically
independent data sets: one for the parameter optimisation, another
one for the overtraining detection, and the last one for the
performance validation. In TMVA, the last two samples have been
merged to increase statistics. The (usually insignificant) bias
introduced by this on the evaluation results does not affect the
analysis as far as classification cut efficiencies or the
regression resolution are independently validated with data. }
suspending the training when the test sample has passed its minimum;
the number of nodes in boosted decision trees can be reduced by
removing insignificant ones (``tree pruning''), etc.
\subsubsection{Other representations of MVA outputs for classification: probabilities and probability integral transformation ({\em Rarity})}
\label{sec:otherRepresentations}
In addition to the MVA response value \yMVA of a classifier, which is
typically used to place a cut for the classification of an event as
either signal or background, or which could be used in a subsequent
likelihood fit, TMVA also provides the classifier's signal and
background PDFs, $\yPDFSB$. The PDFs can be used to derive
classification probabilities for individual events, or to compute any
kind of transformation of which the {\em Probability integral transformation} (Rarity) transformation is
implemented in TMVA.
\begin{itemize}
\item {\bf Classification probability}:\index{Classification
probability} The techniques used to estimate the shapes of the PDFs
are those developed for the likelihood classifier (see
Sec.~\ref{sec:likelihood:description} for details) and can be
customised individually for each method (the control options are
given in Sec.~\ref{sec:tmvaClassifiers}). The probability for event
$i$ to be of signal type is given by\index{Signal probability}, \beq
\label{eq:proba}
\proba(i) = \frac{\fS \cdot\yPDFS(i)}{\fS\cdot \yPDFS(i) + (1
- \fS)\cdot\yPDFB(i)}\,, \eeq where $\fS=\NS/(\NS+\NB)$ is
the expected signal fraction, and $\NSB$ is the expected
number of signal (background) events (default is
$\fS=0.5$).\footnote { The $\proba$ distributions may exhibit
a somewhat peculiar structure with frequent narrow
peaks. They are generated by regions of classifier output
values in which $\yPDFS\propto\yPDFB$ for which $\proba$
becomes a constant. }
\item {\bf Probability Integral Transformation}:\index{Rarity}
The Probability integral transformation $\Rarity(y)$ of a classifier $y$ is given by the integral~\cite{Rarity}
\beq
\label{eq:rarity}
\Rarity(y) = \intl_{-\infty}^{y}\yPDFB(y^\prime)\,d
y^\prime~, \eeq which is defined such that $\Rarity(y_B)$
for background events is uniformly distributed between 0 and
1, while signal events cluster towards 1. The signal
distributions can thus be directly compared among the
various classifiers. The stronger the peak towards 1, the
better is the discrimination. Another useful aspect of the
probability integral transformation is the possibility to directly visualise deviations
of a test background (which could be physics data) from the
training sample, by exhibition of non-uniformity.
The probability integral transformation distributions of the Likelihood and Fisher classifiers for the example
used in Sec.~\ref{sec:quickstart}
are plotted in Fig.~\ref{fig:usingtmva:rarity}. Since Fisher performs better
(\cf\ Fig.~\ref{fig:usingtmva:rejBvsS} on page~\pageref{fig:usingtmva:rejBvsS}),
its signal distribution is stronger peaked towards 1. By construction, the
background distributions are uniform within statistical fluctuations.
\end{itemize}
The probability and probability integral transformation distributions can be plotted with dedicated macros,
invoked through corresponding GUI buttons.
\begin{figure}[t]
\begin{center}
\includegraphics[width=0.50\textwidth]{plots/Rarity-Likelihood}
\hspace{-0.3cm}
\includegraphics[width=0.50\textwidth]{plots/Rarity-Fisher}
\end{center}
\vspace{-0.5cm}
\caption[.]{Example plots for classifier probability integral transformation distributions for signal and
background events from the academic test sample. Shown are
likelihood (left) and Fisher (right).}
\label{fig:usingtmva:rarity}
\end{figure}
% =============================================================================
% === Cross validation
% =============================================================================
\input CrossValidation
\subsection{ROOT macros to plot training, testing and evaluation
results\index{ROOT!macros}}
\label{sec:rootmacros}
TMVA provides simple GUIs (\code{TMVAGui.C} and
\code{TMVARegGui.C}\index{Graphical user interface (GUI)}, see
Fig.~\ref{fig:tmvagui}), which interface ROOT macros that visualise
the various steps of the training analysis. The macros are
respectively located in \code{TMVA/macros/} (Sourceforge.net
distribution) and {\tt \$}\code{ROOTSYS/tmva/test/} (ROOT
distribution), and can also be executed from the command line. They
are described in Tables~\ref{pgr:scripttable1} and
\ref{pgr:scripttable2}. All plots drawn are saved as {\em png} files
(or optionally as {\em eps}, {\em gif} files) in the macro
subdirectory \code{plots} which, if not existing, is created.
The binning and histogram boundaries for some of the histograms
created during the training, testing and evaluation phases are
controlled via the global singleton class \code{TMVA::Config}. They
can be modified as follows:
\begin{codeexample}
\begin{tmvacode}
// Modify settings for the variable plotting
(TMVA::gConfig().GetVariablePlotting()).fTimesRMS = 8.0;
(TMVA::gConfig().GetVariablePlotting()).fNbins1D = 60.0;
(TMVA::gConfig().GetVariablePlotting()).fNbins2D = 300.0;
// Modify the binning in the ROC curve (for classification only)
(TMVA::gConfig().GetVariablePlotting()).fNbinsXOfROCCurve = 100;
// For file name settings, modify the struct TMVA::Config::IONames
(TMVA::gConfig().GetIONames()).fWeightFileDir = "myWeightFileDir";
\end{tmvacode}
\caption[.]{\codeexampleCaptionSize Modifying global parameter
settings for the plotting of the discriminating input variables. The
values given are the TMVA defaults. Consult the class files
\href{http://tmva.svn.sourceforge.net/viewvc/tmva/trunk/TMVA/src/Config.h?view=markup}{Config.h}
and
\href{http://tmva.svn.sourceforge.net/viewvc/tmva/trunk/TMVA/src/Config.cxx?view=markup}{Config.cxx}
for all available global configuration variables and their default
settings, respectively. Note that the additional parentheses are
mandatory when used in CINT.}
\label{ce:gconfig}
\end{codeexample}
\begin{table}[p]
\begin{programtable}
variables.C & Plots the signal and background MVA input variables (training sample).
The second argument sets the directory, which determines the
preprocessing type (\code{InputVariables_Id} for default identity
transformation, \cf\ Sec.~\ref{sec:variableTransform}). The third
argument is a title, and the fourth argument is a flag whether or not
the input variables served a regression analysis. \\
correlationscatter.C & Plots superimposed scatters and profiles for all pairs of input
variables used during the training phase (separate plots for
signal and background in case of classification). The arguments
are as above. \\
correlations.C & Plots the linear correlation matrices for the input variables in the
training sample (distinguishing signal and background for classification). \\
mvas.C & Plots the classifier response distributions of the test sample for
signal and background. The second argument (\code{HistType=0,1,2,3}) allows
to also plot the probability (1) and probability integral transformation (2) distributions of
the classifiers, as well as a comparison of the output distributions
between test and training samples.
Plotting of probability and probability integral transformation requires