DABEST is a package for Data Analysis with Bootstrapped ESTimation
+
Estimation statistics is a simple framework that avoids the pitfalls of significance testing. It uses familiar statistical concepts: means, mean differences, and error bars. More importantly, it focuses on the effect size of one’s experiment/intervention, as opposed to a false dichotomy engendered by P values.
+
An estimation plot has two key features.
+
+
It presents all datapoints as a swarmplot, which orders each point to display the underlying distribution.
+
It presents the effect size as a bootstrap 95% confidence interval on a separate but aligned axes.
+
+
DABEST powers estimationstats.com, allowing everyone access to high-quality estimation plots.
+
+
+
Requirements
+
Python 3.11 is recommended. DABEST has also been tested with Python 3.10 and onwards.
+
In addition, the following packages are also required (listed with their minimal versions):
Alpha testers from the Claridge-Chang lab: Sangyu Xu, Xianyuan Zhang, Farhan Mohammad, Jurga Mituzaitė, Stanislav Ott, Tayfun Tumkaya, Jonathan Anns, Nicole Lee and Yishan Mai.
+
DizietAsahi (DizietAsahi) with PR #86: Fix bugs in slopegraph and reference line keyword parsing.
Mason Malone (@MasonM) with PR #30: Fix plot error when effect size is 0.
+
Matthew Edwards (@mje-nz) with PR #71: Specify dependencies correctly in setup.py.
+
Adam Nekimken (@anekimken) with PR #73: Implement inset axes so estimation plots can be plotted on a pre-determined :py:mod:matplotlib :py:class:Axes object.
+
Marin Manuel (@MarinManuel) with PR #109: Fixed bug preventing non-string columns from being used.
+
Mike Lotinga (@mlotinga): Helped with addition of jitter and the adjusted p-value calculation, both of which are included in the v2025.03.27 release.
Copyright (c) 2016-2023, Joses W. Ho All rights reserved.
+
Redistribution and use in source and binary forms, with or without modification, are permitted (subject to the limitations in the disclaimer below) provided that the following conditions are met:
+
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+ * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY’S PATENT RIGHTS ARE GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+def bootstrap(
+ x1:array, # The data in a one-dimensional array form. Only x1 is required. If x2 is given, the bootstrapped summary difference between the two groups (x2-x1) is computed. NaNs are automatically discarded.
+ x2:array=None, # The data in a one-dimensional array form. Only x1 is required. If x2 is given, the bootstrapped summary difference between the two groups (x2-x1) is computed. NaNs are automatically discarded.
+ paired:bool=False, # Whether or not x1 and x2 are paired samples. If 'paired' is None then the data will not be treated as paired data in the subsequent calculations. If 'paired' is 'baseline', then in each tuple of x, other groups will be paired up with the first group (as control). If 'paired' is 'sequential', then in each tuple of x, each group will be paired up with the previous group (as control).
+ stat_function:callable=mean, # The summary statistic called on data.
+ smoothboot:bool=False, # Taken from seaborn.algorithms.bootstrap. If True, performs a smoothed bootstrap (draws samples from a kernel destiny estimate).
+ alpha_level:float=0.05, # Denotes the likelihood that the confidence interval produced does not include the true summary statistic. When alpha = 0.05, a 95% confidence interval is produced.
+ reps:int=5000, # Number of bootstrap iterations to perform.
+):
+
+
Computes the summary statistic and a bootstrapped confidence interval.
+def summary_ci_1group(
+ x:np.array, # An numerical iterable.
+ func, # The function to be applied to x.
+ resamples:int=5000, # The number of bootstrap resamples to be taken of func(x).
+ alpha:float=0.05, # Denotes the likelihood that the confidence interval produced _does not_ include the true summary statistic. When alpha = 0.05, a 95% confidence interval is produced.
+ random_seed:int=12345, # `random_seed` is used to seed the random number generator during bootstrap resampling. This ensures that the confidence intervals reported are replicable.
+ sort_bootstraps:bool=True, args:VAR_POSITIONAL, kwargs:VAR_KEYWORD
+): # `summary`: float.
+ The outcome of func(x).
+`func`: function.
+ The function applied to x.
+`bca_ci_low`: float
+`bca_ci_high`: float.
+ The bias-corrected and accelerated confidence interval, for the
+ given alpha.
+`bootstraps`: array.
+ The bootstraps used to generate the confidence interval.
+ These will be sortedin ascending order if `sort_bootstraps`
+ was True.
+
+
Given an array-like x, returns func(x), and a bootstrap confidence interval of func(x).
+def compute_meandiff_bias_correction(
+ bootstraps, # An numerical iterable, comprising bootstrap resamples of the effect size.
+ effsize, # The effect size for the original sample.
+): # The bias correction value for the given bootstraps
+and effect size.
+
+
Computes the bias correction required for the BCa method of confidence interval construction.
+def compute_delta2_bootstrapped_diff(
+ x1:np.ndarray, # Control group 1
+ x2:np.ndarray, # Test group 1
+ x3:np.ndarray, # Control group 2
+ x4:np.ndarray, # Test group 2
+ is_paired:str=None, resamples:int=5000, random_seed:int=12345, proportional:bool=False
+)->tuple:
+
+
Bootstraps the effect size deltas’ g or proportional delta-delta
DABEST v2025.03.27
+==================
+
+Good morning!
+The current time is Tue Mar 25 10:08:38 2025.
+
+The unpaired mean difference between control and test is 0.5 [95%CI 0.00172, 1.04].
+The p-value of the two-sided permutation t-test is 0.0758, calculated for legacy purposes only.
+
+5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
+Any p-value reported is the probability of observing theeffect size (or greater),
+assuming the null hypothesis of zero difference is true.
+For each p-value, 5000 reshuffles of the control and test labels were performed.
+
+To get the results of all valid statistical tests, use `.mean_diff.statistical_tests`
+
+
+
This is simply the mean of the control group subtracted from the mean of the test group.
/Users/jonathananns/GitHub/DABEST-python/dabest/_stats_tools/effsize.py:82: UserWarning: Using median as the statistic in bootstrapping may result in a biased estimate and cause problems with BCa confidence intervals. Consider using a different statistic, such as the mean.
+When plotting, please consider using percetile confidence intervals by specifying `ci_type='pct'`. For detailed information, refer to https://github.com/ACCLAB/DABEST-python/issues/129
+
+ warnings.warn(message=mes1+mes2, category=UserWarning)
+
+
+
DABEST v2025.03.27
+==================
+
+Good morning!
+The current time is Tue Mar 25 10:08:39 2025.
+
+The unpaired median difference between control and test is 0.5 [95%CI -0.0401, 1.04].
+The p-value of the two-sided permutation t-test is 0.103, calculated for legacy purposes only.
+
+5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
+Any p-value reported is the probability of observing theeffect size (or greater),
+assuming the null hypothesis of zero difference is true.
+For each p-value, 5000 reshuffles of the control and test labels were performed.
+
+To get the results of all valid statistical tests, use `.median_diff.statistical_tests`
+
+
+
This is the median difference between the control group and the test group.
+
If the comparison(s) are unpaired, median_diff is computed with the following equation:
Using median difference as the statistic in bootstrapping may result in a biased estimate and cause problems with BCa confidence intervals. Consider using mean difference instead.
+
When plotting, consider using percentile confidence intervals instead of BCa confidence intervals by specifying ci_type = 'percentile' in .plot().
+
For detailed information, please refer to Issue 129.
DABEST v2025.03.27
+==================
+
+Good morning!
+The current time is Tue Mar 25 10:08:39 2025.
+
+The unpaired Cohen's d between control and test is 0.471 [95%CI -0.0405, 0.973].
+The p-value of the two-sided permutation t-test is 0.0758, calculated for legacy purposes only.
+
+5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
+Any p-value reported is the probability of observing theeffect size (or greater),
+assuming the null hypothesis of zero difference is true.
+For each p-value, 5000 reshuffles of the control and test labels were performed.
+
+To get the results of all valid statistical tests, use `.cohens_d.statistical_tests`
+
+
+
Cohen’s d is simply the mean of the control group subtracted from the mean of the test group.
+
If paired is None, then the comparison(s) are unpaired; otherwise the comparison(s) are paired.
+
If the comparison(s) are unpaired, Cohen’s d is computed with the following equation:
+
\[d = \frac{\overline{x}_{Test} - \overline{x}_{Control}} {\text{pooled standard deviation}}\]
+
For paired comparisons, Cohen’s d is given by
+
\[d = \frac{\overline{x}_{Test} - \overline{x}_{Control}} {\text{average standard deviation}}\]
+
where \(\overline{x}\) is the mean of the respective group of observations, \({Var}_{x}\) denotes the variance of that group,
\[\text{average standard deviation} = \sqrt{ \frac{{Var}_{control} + {Var}_{test}} {2}}\]
+
The sample variance (and standard deviation) uses N-1 degrees of freedoms. This is an application of Bessel’s correction, and yields the unbiased sample variance.
DABEST v2025.03.27
+==================
+
+Good morning!
+The current time is Tue Mar 25 10:08:41 2025.
+
+The unpaired Cohen's h between control and test is 0.0 [95%CI -0.563, 0.474].
+The p-value of the two-sided permutation t-test is 0.799, calculated for legacy purposes only.
+
+5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
+Any p-value reported is the probability of observing theeffect size (or greater),
+assuming the null hypothesis of zero difference is true.
+For each p-value, 5000 reshuffles of the control and test labels were performed.
+
+To get the results of all valid statistical tests, use `.cohens_h.statistical_tests`
+
+
+
Cohen’s h uses the information of proportion in the control and test groups to calculate the distance between two proportions.
+
It can be used to describe the difference between two proportions as “small”, “medium”, or “large”.
+
It can be used to determine if the difference between two proportions is “meaningful”.
+
A directional Cohen’s h is computed with the following equation:
DABEST v2025.03.27
+==================
+
+Good morning!
+The current time is Tue Mar 25 10:08:41 2025.
+
+The unpaired Hedges' g between control and test is 0.465 [95%CI -0.04, 0.96].
+The p-value of the two-sided permutation t-test is 0.0758, calculated for legacy purposes only.
+
+5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
+Any p-value reported is the probability of observing theeffect size (or greater),
+assuming the null hypothesis of zero difference is true.
+For each p-value, 5000 reshuffles of the control and test labels were performed.
+
+To get the results of all valid statistical tests, use `.hedges_g.statistical_tests`
+
+
+
Hedges’ g is cohens_d corrected for bias via multiplication with the following correction factor:
DABEST v2025.03.27
+==================
+
+Good morning!
+The current time is Tue Mar 25 10:08:41 2025.
+
+The unpaired Cliff's delta between control and test is 0.28 [95%CI -0.0111, 0.544].
+The p-value of the two-sided permutation t-test is 0.061, calculated for legacy purposes only.
+
+5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
+Any p-value reported is the probability of observing theeffect size (or greater),
+assuming the null hypothesis of zero difference is true.
+For each p-value, 5000 reshuffles of the control and test labels were performed.
+
+To get the results of all valid statistical tests, use `.cliffs_delta.statistical_tests`
+
+
+
Cliff’s delta is a measure of ordinal dominance, ie. how often the values from the test sample are larger than values from the control sample.
where \(\#\) denotes the number of times a value from the test sample exceeds (or is lesser than) values in the control sample.
+
Cliff’s delta ranges from -1 to 1; it can also be thought of as a measure of the degree of overlap between the two samples. An attractive aspect of this effect size is that it does not make an assumptions about the underlying distributions that the samples were drawn from.
DABEST v2025.03.27
+==================
+
+Good morning!
+The current time is Tue Mar 25 10:08:42 2025.
+
+The unpaired Hedges' g between W Placebo and M Placebo is 1.74 [95%CI 1.09, 2.33].
+The p-value of the two-sided permutation t-test is 0.0, calculated for legacy purposes only.
+
+The unpaired Hedges' g between W Drug and M Drug is 1.33 [95%CI 0.632, 1.98].
+The p-value of the two-sided permutation t-test is 0.0, calculated for legacy purposes only.
+
+The delta g between Placebo and Drug is -0.651 [95%CI -1.53, 0.21].
+The p-value of the two-sided permutation t-test is 0.0694, calculated for legacy purposes only.
+
+5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
+Any p-value reported is the probability of observing the effect size (or greater),
+assuming the null hypothesis of zero difference is true.
+For each p-value, 5000 reshuffles of the control and test labels were performed.
+
+To get the results of all valid statistical tests, use `.hedges_g.statistical_tests`
+
+
+
Delta g is an effect size that only applied on experiments with a 2-by-2 arrangement where two independent variables, A and B, each have two categorical values, 1 and 2, which calculates hedges_g for delta-delta statistics.
A class to compute and store the delta-delta statistics for experiments with a 2-by-2 arrangement where two independent variables, A and B, each have two categorical values, 1 and 2. The data is divided into two pairs of two groups, and a primary delta is first calculated as the mean difference between each of the pairs:
where \(\overline{X}_{A_{i}, B_{j}}\) is the mean of the sample with A = i and B = j, \(\Delta\) is the mean difference between two samples.
+
A delta-delta value is then calculated as the mean difference between the two primary deltas:
+
\[\Delta_{\Delta} = \Delta_{2} - \Delta_{1}\]
+
and a delta g value is calculated as the mean difference between the two primary deltas divided by the standard deviation of the delta-delta value, which is calculated from a pooled variance of the 4 samples:
where \(s\) is the standard deviation and \(n\) is the sample size.
+
+
Example: delta-delta
+
+
np.random.seed(9999) # Fix the seed so the results are replicable.
+N =20
+# Create samples
+y = norm.rvs(loc=3, scale=0.4, size=N*4)
+y[N:2*N] = y[N:2*N]+1
+y[2*N:3*N] = y[2*N:3*N]-0.5
+# Add a `Treatment` column
+t1 = np.repeat('Placebo', N*2).tolist()
+t2 = np.repeat('Drug', N*2).tolist()
+treatment = t1 + t2
+# Add a `Rep` column as the first variable for the 2 replicates of experiments done
+rep = []
+for i inrange(N*2):
+ rep.append('Rep1')
+ rep.append('Rep2')
+# Add a `Genotype` column as the second variable
+wt = np.repeat('W', N).tolist()
+mt = np.repeat('M', N).tolist()
+wt2 = np.repeat('W', N).tolist()
+mt2 = np.repeat('M', N).tolist()
+genotype = wt + mt + wt2 + mt2
+# Add an `id` column for paired data plotting.
+id=list(range(0, N*2))
+id_col =id+id
+# Combine all columns into a DataFrame.
+df_delta2 = pd.DataFrame({'ID' : id_col,
+'Rep' : rep,
+'Genotype' : genotype,
+'Treatment': treatment,
+'Y' : y
+ })
+unpaired_delta2 = dabest.load(data = df_delta2, x = ["Genotype", "Genotype"], y ="Y", delta2 =True, experiment ="Treatment")
+unpaired_delta2.mean_diff.plot();
+
+
C:\Users\maiyi\anaconda3\Lib\site-packages\dabest\plot_tools.py:2537: UserWarning: 5.0% of the points cannot be placed. You might want to decrease the size of the markers.
+ warnings.warn(err)
+C:\Users\maiyi\anaconda3\Lib\site-packages\dabest\plot_tools.py:2537: UserWarning: 5.0% of the points cannot be placed. You might want to decrease the size of the markers.
+ warnings.warn(err)
+C:\Users\maiyi\anaconda3\Lib\site-packages\dabest\plot_tools.py:2537: UserWarning: 20.0% of the points cannot be placed. You might want to decrease the size of the markers.
+ warnings.warn(err)
DABEST v2025.03.27
+==================
+
+Good afternoon!
+The current time is Mon Sep 1 16:03:47 2025.
+
+The weighted-average unpaired mean differences is 0.0336 [95%CI -0.136, 0.236].
+The p-value of the two-sided permutation t-test is 0.736, calculated for legacy purposes only.
+
+5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated.
+Any p-value reported is the probability of observing theeffect size (or greater),
+assuming the null hypothesis of zero difference is true.
+For each p-value, 5000 reshuffles of the control and test labels were performed.
+
+
+
As of version 2023.02.14, weighted delta can only be calculated for mean difference, and not for standardized measures such as Cohen’s d.
+
Details about the calculated weighted delta are accessed as attributes of the mini_meta class. See the minimetadelta for details on usage.
+
Refer to Chapter 10 of the Cochrane handbook for further information on meta-analysis: https://training.cochrane.org/handbook/current/chapter-10
+def two_group_difference(
+ control:list|tuple| np.ndarray, # Accepts lists, tuples, or numpy ndarrays of numeric types.
+ test:list|tuple| np.ndarray, # Accepts lists, tuples, or numpy ndarrays of numeric types.
+ is_paired:NoneType=None, # If not None, returns the paired Cohen's d
+ effect_size:str='mean_diff', # Any one of the following effect sizes: ["mean_diff", "median_diff", "cohens_d", "hedges_g", "cliffs_delta"]
+)->float: # The desired effect size.
+
+
Computes the following metrics for control and test:
+
- Unstandardized mean difference
+- Standardized mean differences (paired or unpaired)
+ * Cohen's d
+ * Hedges' g
+- Median difference
+- Cliff's Delta
+- Cohen's h (distance between two proportions)
mean_diff: This is simply the mean of `control` subtracted from
+ the mean of `test`.
+
+cohens_d: This is the mean of control subtracted from the
+ mean of test, divided by the pooled standard deviation
+ of control and test. The pooled SD is the square as:
+
+ (n1 - 1) * var(control) + (n2 - 1) * var(test)
+ sqrt ( ------------------------------------------- )
+ (n1 + n2 - 2)
+
+ where n1 and n2 are the sizes of control and test
+ respectively.
+
+hedges_g: This is Cohen's d corrected for bias via multiplication
+ with the following correction factor:
+
+ gamma(n/2)
+ J(n) = ------------------------------
+ sqrt(n/2) * gamma((n - 1) / 2)
+
+ where n = (n1 + n2 - 2).
+
+median_diff: This is the median of `control` subtracted from the
+ median of `test`.
+def cohens_d(
+ control:list|tuple| np.ndarray, test:list|tuple| np.ndarray,
+ is_paired:str=None, # If not None, the paired Cohen's d is returned.
+)->float:
+
+
Computes Cohen’s d for test v.s. control. See here
The sample variance (and standard deviation) uses N-1 degrees of freedoms. This is an application of Bessel’s correction, and yields the unbiased sample variance.
Computes Cohen’s h for test v.s. control. See here for reference.
+
Notes:
+
+
Assuming the input data type is binary, i.e. a series of 0s and 1s, and a dict for mapping the 0s and 1s to the actual labels, e.g.{1: “Smoker”, 0: “Non-smoker”}
Computes Hedges’ g for for test v.s. control. It first computes Cohen’s d, then calulates a correction factor based on the total degress of freedom using the gamma function.