Results: Stanley (2017) • PublicationBiasBenchmark

Complete Results

These results are based on Stanley (2017) data-generating mechanism with a total of 324 conditions.

Average Performance

Method performance measures are aggregated across all simulated conditions to provide an overall impression of method performance. However, keep in mind that a method with a high overall ranking is not necessarily the “best” method for a particular application. To select a suitable method for your application, consider also non-aggregated performance measures in conditions most relevant to your application.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	RoBMA (PSMA)	4.975	1	RoBMA (PSMA)	5.293
2	AK (AK1)	6.765	2	AK (AK1)	6.707
3	SM (3PSM)	7.191	3	SM (3PSM)	7.105
4	FMA (default)	8.185	4	FMA (default)	8.191
5	WLS (default)	8.198	5	WLS (default)	8.204
6	WAAPWLS (default)	8.324	6	WAAPWLS (default)	8.380
7	puniform (star)	9.414	7	puniform (star)	9.364
8	SM (4PSM)	9.448	8	SM (4PSM)	9.435
9	WILS (default)	9.688	9	WILS (default)	9.627
10	RMA (default)	10.012	10	RMA (default)	10.031
11	trimfill (default)	10.065	11	trimfill (default)	10.093
12	PEESE (default)	10.367	12	PEESE (default)	10.398
13	PETPEESE (default)	10.401	12	PETPEESE (default)	10.398
14	EK (default)	11.491	14	AK (AK2)	11.429
15	PET (default)	11.549	15	EK (default)	11.481
16	AK (AK2)	11.636	16	PET (default)	11.540
17	pcurve (default)	12.568	17	pcurve (default)	12.577
18	puniform (default)	13.244	18	puniform (default)	13.219
19	mean (default)	13.886	19	mean (default)	13.935

RMSE (Root Mean Square Error) is an overall summary measure of estimation performance that combines bias and empirical SE. RMSE is the square root of the average squared difference between the meta-analytic estimate and the true effect across simulation runs. A lower RMSE indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average RMSE is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of RMSE values on the corresponding outcome scale.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	SM (3PSM)	6.451	1	SM (3PSM)	6.386
2	SM (4PSM)	6.880	2	SM (4PSM)	6.941
3	AK (AK1)	7.562	3	AK (AK1)	7.583
4	RoBMA (PSMA)	7.580	4	RoBMA (PSMA)	7.765
5	puniform (star)	8.340	5	puniform (star)	8.324
6	PETPEESE (default)	8.750	6	PETPEESE (default)	8.784
7	EK (default)	8.941	7	EK (default)	8.954
8	PET (default)	8.954	8	PET (default)	8.966
9	WAAPWLS (default)	9.059	9	WAAPWLS (default)	9.154
10	PEESE (default)	9.583	10	PEESE (default)	9.636
11	WLS (default)	10.105	11	WLS (default)	10.207
12	FMA (default)	10.120	12	FMA (default)	10.222
13	WILS (default)	11.210	13	AK (AK2)	10.590
14	AK (AK2)	11.414	14	WILS (default)	11.213
15	puniform (default)	11.864	15	puniform (default)	11.870
16	RMA (default)	11.867	16	RMA (default)	11.895
17	trimfill (default)	12.164	17	trimfill (default)	12.244
18	pcurve (default)	12.806	18	pcurve (default)	12.840
19	mean (default)	14.086	19	mean (default)	14.160

Bias is the average difference between the meta-analytic estimate and the true effect across simulation runs. Ideally, this value should be close to 0. Methods are compared using condition-wise ranks. Direct comparison using the average bias is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	RMA (default)	3.380	1	RMA (default)	3.238
2	WLS (default)	3.975	2	WLS (default)	3.821
3	FMA (default)	3.981	3	FMA (default)	3.827
4	AK (AK1)	5.694	4	AK (AK1)	5.833
5	WAAPWLS (default)	7.241	5	WAAPWLS (default)	7.167
6	trimfill (default)	7.386	6	trimfill (default)	7.343
7	RoBMA (PSMA)	7.664	7	RoBMA (PSMA)	7.994
8	mean (default)	8.235	8	mean (default)	8.157
9	SM (3PSM)	9.802	9	SM (3PSM)	9.793
10	pcurve (default)	11.000	10	pcurve (default)	10.920
11	PEESE (default)	11.299	11	PEESE (default)	11.293
12	WILS (default)	11.627	12	WILS (default)	11.611
13	puniform (default)	12.009	13	puniform (default)	11.951
14	puniform (star)	12.043	14	puniform (star)	12.046
15	AK (AK2)	12.917	15	AK (AK2)	13.318
16	SM (4PSM)	13.485	16	SM (4PSM)	13.457
17	PETPEESE (default)	14.052	17	PETPEESE (default)	14.068
18	EK (default)	15.840	18	EK (default)	15.815
19	PET (default)	15.886	19	PET (default)	15.864

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance. Methods are compared using condition-wise ranks. Direct comparison using the average empirical standard error is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of empirical standard error values on the corresponding outcome scale.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	4.000	1	RoBMA (PSMA)	4.296
2	SM (3PSM)	5.327	2	SM (3PSM)	5.435
3	puniform (star)	6.123	3	puniform (star)	6.160
4	AK (AK1)	6.269	4	AK (AK1)	6.210
5	SM (4PSM)	7.309	5	SM (4PSM)	6.994
6	WAAPWLS (default)	9.037	6	WAAPWLS (default)	9.182
7	WLS (default)	9.741	7	WLS (default)	9.796
8	RMA (default)	9.802	8	RMA (default)	9.824
9	EK (default)	9.920	9	EK (default)	9.870
10	trimfill (default)	10.235	10	trimfill (default)	10.302
11	PETPEESE (default)	10.389	11	AK (AK2)	10.367
12	PEESE (default)	10.438	12	PETPEESE (default)	10.392
13	AK (AK2)	10.701	13	PEESE (default)	10.500
14	PET (default)	10.997	14	PET (default)	10.938
15	puniform (default)	11.704	15	puniform (default)	11.719
16	WILS (default)	11.926	16	WILS (default)	11.867
17	FMA (default)	12.056	17	FMA (default)	12.065
18	mean (default)	14.457	18	mean (default)	14.509
19	pcurve (default)	19.000	19	pcurve (default)	19.000

The interval score measures the accuracy of a confidence interval by combining its width and coverage. It penalizes intervals that are too wide or that fail to include the true value. A lower interval score indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average Interval Score is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of empirical standard error values on the corresponding outcome scale.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.902	1	RoBMA (PSMA)	0.896
2	SM (4PSM)	0.845	2	SM (4PSM)	0.833
3	AK (AK2)	0.801	3	AK (AK2)	0.783
4	puniform (star)	0.765	4	puniform (star)	0.765
5	SM (3PSM)	0.743	5	SM (3PSM)	0.736
6	EK (default)	0.715	6	EK (default)	0.715
7	PET (default)	0.688	7	PET (default)	0.688
8	PETPEESE (default)	0.681	8	PETPEESE (default)	0.681
9	AK (AK1)	0.608	9	AK (AK1)	0.607
10	puniform (default)	0.541	10	puniform (default)	0.542
11	PEESE (default)	0.524	11	PEESE (default)	0.524
12	WAAPWLS (default)	0.510	12	WAAPWLS (default)	0.510
13	trimfill (default)	0.497	13	trimfill (default)	0.497
14	WILS (default)	0.494	14	WILS (default)	0.494
15	RMA (default)	0.493	15	RMA (default)	0.493
16	WLS (default)	0.481	16	WLS (default)	0.481
17	FMA (default)	0.380	17	FMA (default)	0.380
18	mean (default)	0.366	18	mean (default)	0.366
19	pcurve (default)	NaN	19	pcurve (default)	NaN

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	FMA (default)	2.373	1	FMA (default)	2.346
2	WILS (default)	3.188	2	WILS (default)	3.139
3	WLS (default)	3.969	3	WLS (default)	3.904
4	WAAPWLS (default)	5.972	4	WAAPWLS (default)	5.883
5	trimfill (default)	7.105	5	trimfill (default)	7.015
6	RMA (default)	7.148	6	RMA (default)	7.049
7	mean (default)	8.312	7	mean (default)	8.309
8	RoBMA (PSMA)	9.043	8	RoBMA (PSMA)	9.037
9	PEESE (default)	9.074	9	PEESE (default)	9.096
10	AK (AK1)	9.191	10	AK (AK1)	9.201
11	SM (3PSM)	10.608	11	SM (3PSM)	10.682
12	puniform (default)	11.438	12	puniform (default)	11.404
13	PETPEESE (default)	11.469	13	PETPEESE (default)	11.515
14	puniform (star)	12.509	14	puniform (star)	12.688
15	AK (AK2)	13.698	15	AK (AK2)	13.864
16	SM (4PSM)	14.080	16	SM (4PSM)	13.898
17	PET (default)	14.991	17	PET (default)	15.056
18	EK (default)	16.259	18	EK (default)	16.343
19	pcurve (default)	19.000	19	pcurve (default)	19.000

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average CI width is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of CI width values on the corresponding outcome scale.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	RoBMA (PSMA)	5.355	1	RoBMA (PSMA)	4.807
2	AK (AK2)	2.939	2	AK (AK2)	2.293
3	SM (4PSM)	2.219	3	SM (4PSM)	2.238
4	puniform (star)	2.163	4	puniform (star)	2.163
5	SM (3PSM)	1.998	5	SM (3PSM)	1.984
6	EK (default)	1.840	6	EK (default)	1.840
7	PET (default)	1.838	7	PET (default)	1.838
8	PETPEESE (default)	1.834	8	PETPEESE (default)	1.834
9	puniform (default)	1.728	9	puniform (default)	1.667
10	AK (AK1)	1.493	10	AK (AK1)	1.445
11	WILS (default)	1.370	11	WILS (default)	1.370
12	RMA (default)	1.161	12	RMA (default)	1.161
13	PEESE (default)	1.109	13	PEESE (default)	1.109
14	WAAPWLS (default)	1.056	14	WAAPWLS (default)	1.056
15	WLS (default)	1.019	15	WLS (default)	1.019
16	trimfill (default)	0.953	16	trimfill (default)	0.953
17	mean (default)	0.849	17	mean (default)	0.849
18	FMA (default)	0.800	18	FMA (default)	0.800
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The positive likelihood ratio is an overall summary measure of hypothesis testing performance that combines power and type I error rate. It indicates how much a significant test result changes the odds of the alternative hypothesis versus the null hypothesis. A useful method has a positive likelihood ratio greater than 1 (or a log positive likelihood ratio greater than 0). A higher (log) positive likelihood ratio indicates a better method.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	SM (3PSM)	-4.898	1	AK (AK2)	-6.215
2	RoBMA (PSMA)	-4.722	2	SM (3PSM)	-4.947
3	PETPEESE (default)	-4.721	3	SM (4PSM)	-4.828
4	AK (AK2)	-4.622	4	RoBMA (PSMA)	-4.737
5	EK (default)	-4.570	5	PETPEESE (default)	-4.721
6	PET (default)	-4.570	6	EK (default)	-4.570
7	SM (4PSM)	-4.438	7	PET (default)	-4.570
8	puniform (star)	-4.287	8	puniform (star)	-4.287
9	PEESE (default)	-3.817	9	PEESE (default)	-3.817
10	WILS (default)	-3.661	10	WILS (default)	-3.661
11	puniform (default)	-3.597	11	puniform (default)	-3.603
12	trimfill (default)	-3.502	12	trimfill (default)	-3.502
13	AK (AK1)	-3.458	13	AK (AK1)	-3.461
14	WLS (default)	-3.393	14	WLS (default)	-3.393
15	RMA (default)	-3.312	15	RMA (default)	-3.312
16	FMA (default)	-3.220	16	FMA (default)	-3.220
17	WAAPWLS (default)	-3.095	17	WAAPWLS (default)	-3.095
18	mean (default)	-2.700	18	mean (default)	-2.700
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The negative likelihood ratio is an overall summary measure of hypothesis testing performance that combines power and type I error rate. It indicates how much a non-significant test result changes the odds of the alternative hypothesis versus the null hypothesis. A useful method has a negative likelihood ratio less than 1 (or a log negative likelihood ratio less than 0). A lower (log) negative likelihood ratio indicates a better method.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.022	1	RoBMA (PSMA)	0.031
2	AK (AK2)	0.082	2	SM (4PSM)	0.125
3	SM (4PSM)	0.112	3	AK (AK2)	0.156
4	PET (default)	0.237	4	PET (default)	0.237
5	EK (default)	0.237	5	EK (default)	0.237
6	puniform (star)	0.242	6	puniform (star)	0.242
7	PETPEESE (default)	0.269	7	PETPEESE (default)	0.269
8	SM (3PSM)	0.282	8	SM (3PSM)	0.288
9	WILS (default)	0.373	9	WILS (default)	0.373
10	PEESE (default)	0.541	10	PEESE (default)	0.541
11	puniform (default)	0.544	11	puniform (default)	0.542
12	AK (AK1)	0.556	12	AK (AK1)	0.557
13	WAAPWLS (default)	0.573	13	WAAPWLS (default)	0.573
14	RMA (default)	0.603	14	RMA (default)	0.603
15	WLS (default)	0.612	15	WLS (default)	0.612
16	trimfill (default)	0.615	16	trimfill (default)	0.615
17	mean (default)	0.688	17	mean (default)	0.688
18	FMA (default)	0.720	18	FMA (default)	0.720
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The type I error rate is the proportion of simulation runs in which the null hypothesis of no effect was incorrectly rejected when it was true. Ideally, this value should be close to the nominal level of 5%.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	FMA (default)	0.990	1	FMA (default)	0.990
2	WLS (default)	0.981	2	WLS (default)	0.981
3	RMA (default)	0.980	3	RMA (default)	0.980
4	trimfill (default)	0.979	4	trimfill (default)	0.979
5	AK (AK1)	0.970	5	AK (AK2)	0.977
6	mean (default)	0.965	6	AK (AK1)	0.970
7	WAAPWLS (default)	0.956	7	mean (default)	0.965
8	PEESE (default)	0.951	8	WAAPWLS (default)	0.956
9	AK (AK2)	0.949	9	PEESE (default)	0.951
10	SM (3PSM)	0.936	10	SM (3PSM)	0.945
11	puniform (default)	0.911	11	puniform (default)	0.913
12	WILS (default)	0.898	12	SM (4PSM)	0.900
13	PETPEESE (default)	0.884	13	WILS (default)	0.898
14	puniform (star)	0.876	14	PETPEESE (default)	0.884
15	SM (4PSM)	0.869	15	puniform (star)	0.876
16	EK (default)	0.852	16	EK (default)	0.852
17	PET (default)	0.851	17	PET (default)	0.851
18	RoBMA (PSMA)	0.832	18	RoBMA (PSMA)	0.834
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Conditional on Method Convergence)

The results below are conditional on method convergence. Note that the methods might differ in convergence rate and are therefore not compared on the same data sets.

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance. Methods are compared using condition-wise ranks. Direct comparison using the empirical standard error is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval width across different methods

The interval score measures the accuracy of a confidence interval by combining its width and coverage. It penalizes intervals that are too wide or that fail to include the true value. A lower interval score indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the interval score is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average 95% CI width is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of 95% CI width values on the corresponding outcome scale.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Replacement in Case of Non-Convergence)

The results below incorporate method replacement to handle non-convergence. If a method fails to converge, its results are replaced with the results from a simpler method (e.g., random-effects meta-analysis without publication bias adjustment). This emulates what a data analyst may do in practice in case a method does not converge. However, note that these results do not correspond to “pure” method performance as they might combine multiple different methods. See Method Replacement Strategy for details of the method replacement specification.

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance. Methods are compared using condition-wise ranks. Direct comparison using the empirical standard error is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval width across different methods

The interval score measures the accuracy of a confidence interval by combining its width and coverage. It penalizes intervals that are too wide or that fail to include the true value. A lower interval score indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the interval score is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average 95% CI width is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of 95% CI width values on the corresponding outcome scale.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

Subset: Standardized Mean Difference Effect Sizes

These results are based on Stanley (2017) data-generating mechanism with a total of 1 conditions.

Average Performance

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.057	1	RoBMA (PSMA)	0.057
2	AK (AK2)	0.067	2	AK (AK2)	0.083
3	WILS (default)	0.095	3	WILS (default)	0.095
4	PEESE (default)	0.105	4	PEESE (default)	0.105
5	WAAPWLS (default)	0.108	5	WAAPWLS (default)	0.108
6	PETPEESE (default)	0.109	6	PETPEESE (default)	0.109
7	trimfill (default)	0.111	7	trimfill (default)	0.111
8	FMA (default)	0.112	8	FMA (default)	0.112
8	WLS (default)	0.112	8	WLS (default)	0.112
10	EK (default)	0.119	10	EK (default)	0.119
11	PET (default)	0.119	11	PET (default)	0.119
12	RMA (default)	0.123	12	RMA (default)	0.123
13	SM (3PSM)	0.130	13	SM (3PSM)	0.126
14	mean (default)	0.146	14	mean (default)	0.146
15	AK (AK1)	0.163	15	AK (AK1)	0.158
16	SM (4PSM)	0.202	16	SM (4PSM)	0.203
17	pcurve (default)	0.467	17	pcurve (default)	0.420
18	puniform (default)	0.648	18	puniform (default)	0.545
19	puniform (star)	90.749	19	puniform (star)	90.749

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.006	1	RoBMA (PSMA)	0.006
2	WILS (default)	0.007	2	WILS (default)	0.007
3	AK (AK2)	0.010	3	AK (AK2)	0.011
4	PET (default)	0.015	4	PET (default)	0.015
5	EK (default)	0.015	5	EK (default)	0.015
6	SM (4PSM)	-0.023	6	SM (4PSM)	-0.022
7	SM (3PSM)	0.023	7	SM (3PSM)	0.025
8	PETPEESE (default)	0.036	8	PETPEESE (default)	0.036
9	PEESE (default)	0.057	9	PEESE (default)	0.057
10	AK (AK1)	0.060	10	AK (AK1)	0.061
11	trimfill (default)	0.069	11	trimfill (default)	0.069
12	WAAPWLS (default)	0.076	12	WAAPWLS (default)	0.076
13	FMA (default)	0.085	13	FMA (default)	0.085
13	WLS (default)	0.085	13	WLS (default)	0.085
15	puniform (default)	0.088	15	RMA (default)	0.103
16	RMA (default)	0.103	16	puniform (default)	0.103
17	mean (default)	0.125	17	mean (default)	0.125
18	pcurve (default)	0.322	18	pcurve (default)	0.260
19	puniform (star)	-9.762	19	puniform (star)	-9.762

Bias is the average difference between the meta-analytic estimate and the true effect across simulation runs. Ideally, this value should be close to 0.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RMA (default)	0.038	1	RMA (default)	0.038
2	mean (default)	0.042	2	mean (default)	0.042
3	FMA (default)	0.043	3	FMA (default)	0.043
3	WLS (default)	0.043	3	WLS (default)	0.043
5	RoBMA (PSMA)	0.045	5	RoBMA (PSMA)	0.045
6	trimfill (default)	0.046	6	trimfill (default)	0.046
7	WAAPWLS (default)	0.047	7	WAAPWLS (default)	0.047
8	PEESE (default)	0.058	8	PEESE (default)	0.058
9	AK (AK2)	0.059	9	WILS (default)	0.063
10	WILS (default)	0.063	10	AK (AK2)	0.076
11	PETPEESE (default)	0.079	11	PETPEESE (default)	0.079
12	EK (default)	0.095	12	EK (default)	0.095
13	PET (default)	0.095	13	PET (default)	0.095
14	SM (3PSM)	0.111	14	SM (3PSM)	0.107
15	AK (AK1)	0.113	15	AK (AK1)	0.109
16	SM (4PSM)	0.200	16	SM (4PSM)	0.201
17	pcurve (default)	0.290	17	pcurve (default)	0.286
18	puniform (default)	0.550	18	puniform (default)	0.448
19	puniform (star)	89.867	19	puniform (star)	89.867

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.271	1	RoBMA (PSMA)	0.271
2	AK (AK2)	0.508	2	AK (AK2)	0.641
3	puniform (star)	0.914	3	SM (4PSM)	0.891
4	EK (default)	1.027	4	puniform (star)	0.914
5	SM (3PSM)	1.058	5	EK (default)	1.027
6	PET (default)	1.097	6	SM (3PSM)	1.042
7	SM (4PSM)	1.140	7	PET (default)	1.097
8	PETPEESE (default)	1.536	8	PETPEESE (default)	1.536
9	WILS (default)	1.658	9	WILS (default)	1.658
10	PEESE (default)	1.834	10	PEESE (default)	1.834
11	WAAPWLS (default)	2.206	11	WAAPWLS (default)	2.206
12	trimfill (default)	2.263	12	trimfill (default)	2.263
13	WLS (default)	2.545	13	WLS (default)	2.545
14	RMA (default)	2.805	14	RMA (default)	2.805
15	FMA (default)	3.110	15	FMA (default)	3.110
16	AK (AK1)	3.353	16	AK (AK1)	3.157
17	puniform (default)	4.061	17	puniform (default)	3.890
18	mean (default)	4.067	18	mean (default)	4.067
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.950	1	RoBMA (PSMA)	0.950
2	SM (4PSM)	0.924	2	SM (4PSM)	0.911
3	AK (AK2)	0.902	3	AK (AK2)	0.904
4	puniform (star)	0.830	4	puniform (star)	0.830
5	SM (3PSM)	0.807	5	SM (3PSM)	0.802
6	EK (default)	0.744	6	EK (default)	0.744
7	PETPEESE (default)	0.718	7	PETPEESE (default)	0.718
8	PET (default)	0.716	8	PET (default)	0.716
9	AK (AK1)	0.666	9	AK (AK1)	0.665
10	PEESE (default)	0.573	10	PEESE (default)	0.573
11	WAAPWLS (default)	0.571	11	WAAPWLS (default)	0.571
12	trimfill (default)	0.550	12	trimfill (default)	0.550
13	RMA (default)	0.546	13	RMA (default)	0.546
14	WLS (default)	0.537	14	WLS (default)	0.537
15	puniform (default)	0.531	15	puniform (default)	0.532
16	WILS (default)	0.524	16	WILS (default)	0.524
17	FMA (default)	0.414	17	FMA (default)	0.414
18	mean (default)	0.376	18	mean (default)	0.376
19	pcurve (default)	NaN	19	pcurve (default)	NaN

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	FMA (default)	0.079	1	FMA (default)	0.079
2	mean (default)	0.105	2	mean (default)	0.105
3	WILS (default)	0.130	3	WILS (default)	0.130
4	WLS (default)	0.134	4	WLS (default)	0.134
5	WAAPWLS (default)	0.149	5	WAAPWLS (default)	0.149
6	trimfill (default)	0.167	6	trimfill (default)	0.167
7	PEESE (default)	0.171	7	PEESE (default)	0.171
8	RMA (default)	0.173	8	RMA (default)	0.173
9	RoBMA (PSMA)	0.191	9	RoBMA (PSMA)	0.191
10	PETPEESE (default)	0.235	10	PETPEESE (default)	0.235
11	AK (AK2)	0.255	11	puniform (star)	0.278
12	puniform (star)	0.278	12	PET (default)	0.300
13	PET (default)	0.300	13	SM (3PSM)	0.321
14	SM (3PSM)	0.366	14	EK (default)	0.367
15	EK (default)	0.367	15	puniform (default)	0.453
16	puniform (default)	0.588	16	AK (AK2)	0.464
17	SM (4PSM)	0.992	17	SM (4PSM)	0.655
18	AK (AK1)	1.982	18	AK (AK1)	1.781
19	pcurve (default)	NaN	19	pcurve (default)	NaN

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	RoBMA (PSMA)	5.324	1	RoBMA (PSMA)	5.324
2	AK (AK2)	3.027	2	AK (AK2)	2.486
3	SM (4PSM)	2.431	3	SM (4PSM)	2.441
4	puniform (star)	2.007	4	puniform (star)	2.007
5	SM (3PSM)	1.871	5	SM (3PSM)	1.852
6	PET (default)	1.818	6	PET (default)	1.818
7	EK (default)	1.818	7	EK (default)	1.818
8	PETPEESE (default)	1.677	8	PETPEESE (default)	1.677
9	AK (AK1)	1.339	9	AK (AK1)	1.313
10	WILS (default)	1.130	10	WILS (default)	1.130
11	RMA (default)	1.060	11	RMA (default)	1.060
12	PEESE (default)	0.977	12	PEESE (default)	0.977
13	puniform (default)	0.962	13	puniform (default)	0.966
14	WAAPWLS (default)	0.946	14	WAAPWLS (default)	0.946
15	WLS (default)	0.882	15	WLS (default)	0.882
16	trimfill (default)	0.825	16	trimfill (default)	0.825
17	mean (default)	0.651	17	mean (default)	0.651
18	FMA (default)	0.566	18	FMA (default)	0.566
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	RoBMA (PSMA)	-5.344	1	AK (AK2)	-6.371
2	PETPEESE (default)	-5.031	2	RoBMA (PSMA)	-5.344
3	AK (AK2)	-4.939	3	SM (4PSM)	-5.289
4	PET (default)	-4.925	4	PETPEESE (default)	-5.031
5	EK (default)	-4.925	5	SM (3PSM)	-4.949
6	SM (3PSM)	-4.896	6	PET (default)	-4.925
7	SM (4PSM)	-4.804	7	EK (default)	-4.925
8	puniform (star)	-4.121	8	puniform (star)	-4.121
9	puniform (default)	-3.885	9	puniform (default)	-3.892
10	PEESE (default)	-3.691	10	PEESE (default)	-3.691
11	WAAPWLS (default)	-3.494	11	WAAPWLS (default)	-3.494
12	trimfill (default)	-3.454	12	trimfill (default)	-3.454
13	AK (AK1)	-3.453	13	AK (AK1)	-3.453
14	WLS (default)	-3.274	14	WLS (default)	-3.274
15	WILS (default)	-3.233	15	WILS (default)	-3.233
16	RMA (default)	-3.200	16	RMA (default)	-3.200
17	FMA (default)	-3.049	17	FMA (default)	-3.049
18	mean (default)	-3.038	18	mean (default)	-3.038
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.025	1	RoBMA (PSMA)	0.025
2	AK (AK2)	0.073	2	SM (4PSM)	0.109
3	SM (4PSM)	0.095	3	AK (AK2)	0.136
4	PET (default)	0.259	4	PET (default)	0.259
5	EK (default)	0.259	5	EK (default)	0.259
6	puniform (star)	0.268	6	puniform (star)	0.268
7	PETPEESE (default)	0.296	7	PETPEESE (default)	0.296
8	SM (3PSM)	0.308	8	SM (3PSM)	0.314
9	WILS (default)	0.410	9	WILS (default)	0.410
10	PEESE (default)	0.568	10	PEESE (default)	0.568
11	AK (AK1)	0.579	11	AK (AK1)	0.579
12	WAAPWLS (default)	0.590	12	WAAPWLS (default)	0.590
13	puniform (default)	0.615	13	puniform (default)	0.612
14	RMA (default)	0.623	14	RMA (default)	0.623
15	WLS (default)	0.634	15	WLS (default)	0.634
16	trimfill (default)	0.638	16	trimfill (default)	0.638
17	mean (default)	0.723	17	mean (default)	0.723
18	FMA (default)	0.753	18	FMA (default)	0.753
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	mean (default)	0.999	1	mean (default)	0.999
2	FMA (default)	0.999	2	FMA (default)	0.999
3	puniform (default)	0.997	3	puniform (default)	0.997
4	AK (AK1)	0.990	4	AK (AK1)	0.990
5	WLS (default)	0.989	5	WLS (default)	0.989
6	RMA (default)	0.989	6	RMA (default)	0.989
7	trimfill (default)	0.988	7	trimfill (default)	0.988
8	WAAPWLS (default)	0.972	8	AK (AK2)	0.983
9	PEESE (default)	0.969	9	WAAPWLS (default)	0.972
10	AK (AK2)	0.968	10	PEESE (default)	0.969
11	SM (3PSM)	0.960	11	SM (3PSM)	0.965
12	PETPEESE (default)	0.931	12	SM (4PSM)	0.938
13	EK (default)	0.905	13	PETPEESE (default)	0.931
13	PET (default)	0.905	14	EK (default)	0.905
15	SM (4PSM)	0.903	14	PET (default)	0.905
16	WILS (default)	0.901	16	WILS (default)	0.901
17	RoBMA (PSMA)	0.898	17	RoBMA (PSMA)	0.898
18	puniform (star)	0.886	18	puniform (star)	0.886
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Conditional on Method Convergence)

The results below are conditional on method convergence. Note that the methods might differ in convergence rate and are therefore not compared on the same data sets.

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

Bias is the average difference between the meta-analytic estimate and the true effect across simulation runs. Ideally, this value should be close to 0. Values lower than -0.5 or larger than 0.5 are visualized as -0.5 and 0.5 respectively.

Raincloud plot showing bias across different methods

Raincloud plot showing 95% confidence interval width across different methods

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Replacement in Case of Non-Convergence)

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

Raincloud plot showing 95% confidence interval width across different methods

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

Subset: Log Odd Ratio Effect Sizes

These results are based on Stanley (2017) data-generating mechanism with a total of 1 conditions.

Average Performance

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.206	1	RoBMA (PSMA)	0.238
2	EK (default)	0.263	2	EK (default)	0.263
3	PET (default)	0.264	3	PET (default)	0.264
4	AK (AK2)	0.265	4	SM (3PSM)	0.273
5	WILS (default)	0.274	5	WILS (default)	0.274
6	SM (3PSM)	0.274	6	PETPEESE (default)	0.297
7	PETPEESE (default)	0.297	7	SM (4PSM)	0.308
8	PEESE (default)	0.313	8	PEESE (default)	0.313
9	SM (4PSM)	0.316	9	trimfill (default)	0.348
10	trimfill (default)	0.348	10	WAAPWLS (default)	0.352
11	WAAPWLS (default)	0.352	11	FMA (default)	0.372
12	FMA (default)	0.372	12	WLS (default)	0.372
13	WLS (default)	0.372	13	RMA (default)	0.391
14	RMA (default)	0.391	14	mean (default)	0.501
15	mean (default)	0.501	15	AK (AK2)	1.061
16	pcurve (default)	1.293	16	pcurve (default)	1.127
17	AK (AK1)	1.581	17	puniform (default)	1.339
18	puniform (default)	1.713	18	AK (AK1)	1.429
19	puniform (star)	157.696	19	puniform (star)	157.696

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	SM (4PSM)	0.160	1	puniform (default)	-0.069
2	EK (default)	0.171	2	SM (4PSM)	0.166
3	PET (default)	0.171	3	EK (default)	0.171
4	RoBMA (PSMA)	0.184	4	PET (default)	0.171
5	SM (3PSM)	0.214	5	RoBMA (PSMA)	0.204
6	PETPEESE (default)	0.220	6	PETPEESE (default)	0.220
7	AK (AK2)	0.231	7	SM (3PSM)	0.222
8	puniform (default)	-0.240	8	WILS (default)	0.240
9	WILS (default)	0.240	9	AK (AK1)	0.253
10	AK (AK1)	0.248	10	AK (AK2)	0.259
11	PEESE (default)	0.284	11	PEESE (default)	0.284
12	trimfill (default)	0.329	12	trimfill (default)	0.329
13	WAAPWLS (default)	0.335	13	WAAPWLS (default)	0.335
14	FMA (default)	0.356	14	FMA (default)	0.356
15	WLS (default)	0.356	15	WLS (default)	0.356
16	RMA (default)	0.375	16	RMA (default)	0.375
17	mean (default)	0.464	17	mean (default)	0.464
18	pcurve (default)	0.933	18	pcurve (default)	0.698
19	puniform (star)	-3.661	19	puniform (star)	-3.661

Bias is the average difference between the meta-analytic estimate and the true effect across simulation runs. Ideally, this value should be close to 0.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	FMA (default)	0.058	1	FMA (default)	0.058
2	WLS (default)	0.058	2	WLS (default)	0.058
3	trimfill (default)	0.060	3	trimfill (default)	0.060
4	RMA (default)	0.061	4	RMA (default)	0.061
5	RoBMA (PSMA)	0.064	5	WAAPWLS (default)	0.067
6	WAAPWLS (default)	0.067	6	WILS (default)	0.075
7	WILS (default)	0.075	7	PEESE (default)	0.088
8	AK (AK2)	0.075	8	RoBMA (PSMA)	0.091
9	PEESE (default)	0.088	9	SM (3PSM)	0.095
10	SM (3PSM)	0.100	10	mean (default)	0.113
11	mean (default)	0.113	11	PETPEESE (default)	0.134
12	PETPEESE (default)	0.134	12	EK (default)	0.141
13	EK (default)	0.141	13	PET (default)	0.142
14	PET (default)	0.142	14	SM (4PSM)	0.170
15	SM (4PSM)	0.179	15	pcurve (default)	0.843
16	pcurve (default)	0.812	16	AK (AK2)	0.861
17	AK (AK1)	1.373	17	puniform (default)	1.175
18	puniform (default)	1.516	18	AK (AK1)	1.222
19	puniform (star)	156.858	19	puniform (star)	156.858

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	EK (default)	3.581	1	EK (default)	3.581
2	PET (default)	3.689	2	PET (default)	3.689
3	RoBMA (PSMA)	3.887	3	RoBMA (PSMA)	4.717
4	SM (4PSM)	4.723	4	SM (4PSM)	4.768
5	SM (3PSM)	5.619	5	SM (3PSM)	5.735
6	puniform (star)	5.831	6	puniform (star)	5.831
7	AK (AK2)	5.959	7	puniform (default)	6.525
8	PETPEESE (default)	6.582	8	PETPEESE (default)	6.582
9	WILS (default)	7.531	9	WILS (default)	7.531
10	PEESE (default)	7.667	10	PEESE (default)	7.667
11	puniform (default)	7.877	11	trimfill (default)	9.544
12	trimfill (default)	9.543	12	WAAPWLS (default)	9.694
13	WAAPWLS (default)	9.694	13	FMA (default)	10.660
14	FMA (default)	10.660	14	WLS (default)	10.776
15	WLS (default)	10.776	15	RMA (default)	10.804
16	RMA (default)	10.804	16	AK (AK2)	12.194
17	mean (default)	13.014	17	mean (default)	13.014
18	AK (AK1)	44.040	18	AK (AK1)	32.961
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.661	1	RoBMA (PSMA)	0.625
2	puniform (default)	0.592	2	puniform (default)	0.588
3	EK (default)	0.574	3	EK (default)	0.574
4	PET (default)	0.548	4	PET (default)	0.548
5	PETPEESE (default)	0.496	5	PETPEESE (default)	0.496
6	SM (4PSM)	0.454	6	SM (4PSM)	0.443
7	puniform (star)	0.440	7	puniform (star)	0.440
8	SM (3PSM)	0.423	8	SM (3PSM)	0.407
9	AK (AK2)	0.394	9	WILS (default)	0.343
10	WILS (default)	0.343	10	AK (AK1)	0.319
11	AK (AK1)	0.321	11	mean (default)	0.314
12	mean (default)	0.314	12	AK (AK2)	0.297
13	PEESE (default)	0.277	13	PEESE (default)	0.277
14	trimfill (default)	0.231	14	trimfill (default)	0.231
15	RMA (default)	0.226	15	RMA (default)	0.226
16	FMA (default)	0.213	16	FMA (default)	0.213
17	WAAPWLS (default)	0.205	17	WAAPWLS (default)	0.205
18	WLS (default)	0.200	18	WLS (default)	0.200
19	pcurve (default)	NaN	19	pcurve (default)	NaN

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	WILS (default)	0.221	1	WILS (default)	0.221
2	WLS (default)	0.235	2	WLS (default)	0.235
3	WAAPWLS (default)	0.249	3	WAAPWLS (default)	0.249
4	FMA (default)	0.250	4	FMA (default)	0.250
5	RoBMA (PSMA)	0.271	5	RoBMA (PSMA)	0.271
6	trimfill (default)	0.274	6	trimfill (default)	0.274
7	RMA (default)	0.298	7	RMA (default)	0.298
8	PEESE (default)	0.299	8	PEESE (default)	0.299
9	AK (AK2)	0.322	9	SM (3PSM)	0.332
10	SM (3PSM)	0.344	10	puniform (star)	0.407
11	puniform (star)	0.407	11	PETPEESE (default)	0.457
12	PETPEESE (default)	0.457	12	SM (4PSM)	0.503
13	PET (default)	0.536	13	PET (default)	0.536
14	SM (4PSM)	0.569	14	EK (default)	0.658
15	EK (default)	0.658	15	mean (default)	1.505
16	mean (default)	1.505	16	puniform (default)	2.751
17	puniform (default)	4.072	17	AK (AK2)	5.859
18	AK (AK1)	38.095	18	AK (AK1)	26.987
19	pcurve (default)	NaN	19	pcurve (default)	NaN

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	RoBMA (PSMA)	5.471	1	puniform (default)	4.298
2	puniform (default)	4.600	2	RoBMA (PSMA)	2.869
3	puniform (star)	2.747	3	puniform (star)	2.747
4	AK (AK2)	2.708	4	SM (3PSM)	2.476
5	SM (3PSM)	2.473	5	PETPEESE (default)	2.425
6	PETPEESE (default)	2.425	6	WILS (default)	2.268
7	WILS (default)	2.268	7	AK (AK1)	1.940
8	AK (AK1)	2.070	8	EK (default)	1.922
9	EK (default)	1.922	9	PET (default)	1.913
10	PET (default)	1.913	10	AK (AK2)	1.782
11	FMA (default)	1.678	11	FMA (default)	1.678
12	PEESE (default)	1.605	12	PEESE (default)	1.605
13	mean (default)	1.593	13	mean (default)	1.593
14	RMA (default)	1.539	14	RMA (default)	1.539
15	WLS (default)	1.530	15	WLS (default)	1.530
16	WAAPWLS (default)	1.469	16	SM (4PSM)	1.478
17	trimfill (default)	1.432	17	WAAPWLS (default)	1.469
18	SM (4PSM)	1.425	18	trimfill (default)	1.432
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	WILS (default)	-5.267	1	AK (AK2)	-5.801
2	puniform (star)	-4.910	2	WILS (default)	-5.267
3	SM (3PSM)	-4.904	3	SM (3PSM)	-4.940
4	PEESE (default)	-4.288	4	puniform (star)	-4.910
5	FMA (default)	-3.859	5	PEESE (default)	-4.288
6	WLS (default)	-3.837	6	FMA (default)	-3.859
7	AK (AK2)	-3.785	7	WLS (default)	-3.837
8	RMA (default)	-3.730	8	RMA (default)	-3.730
9	trimfill (default)	-3.682	9	trimfill (default)	-3.682
10	PETPEESE (default)	-3.556	10	PETPEESE (default)	-3.556
11	AK (AK1)	-3.477	11	AK (AK1)	-3.488
12	EK (default)	-3.238	12	EK (default)	-3.238
13	PET (default)	-3.236	13	PET (default)	-3.236
14	SM (4PSM)	-3.063	14	SM (4PSM)	-3.100
15	puniform (default)	-2.518	15	puniform (default)	-2.517
16	RoBMA (PSMA)	-2.388	16	RoBMA (PSMA)	-2.460
17	WAAPWLS (default)	-1.596	17	WAAPWLS (default)	-1.596
18	mean (default)	-1.432	18	mean (default)	-1.432
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.003	1	puniform (default)	0.017
2	puniform (default)	0.009	2	puniform (star)	0.053
3	puniform (star)	0.053	3	PETPEESE (default)	0.072
4	PETPEESE (default)	0.072	4	PET (default)	0.075
5	PET (default)	0.075	5	EK (default)	0.076
6	EK (default)	0.076	6	RoBMA (PSMA)	0.080
7	SM (3PSM)	0.089	7	SM (3PSM)	0.096
8	WILS (default)	0.099	8	WILS (default)	0.099
9	AK (AK2)	0.130	9	SM (4PSM)	0.238
10	SM (4PSM)	0.237	10	AK (AK2)	0.257
11	PEESE (default)	0.334	11	PEESE (default)	0.334
12	AK (AK1)	0.385	12	AK (AK1)	0.387
13	mean (default)	0.420	13	mean (default)	0.420
14	trimfill (default)	0.444	14	trimfill (default)	0.444
15	WAAPWLS (default)	0.445	15	WAAPWLS (default)	0.445
16	WLS (default)	0.446	16	WLS (default)	0.446
17	RMA (default)	0.450	17	RMA (default)	0.450
18	FMA (default)	0.468	18	FMA (default)	0.468
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	FMA (default)	0.958	1	FMA (default)	0.958
2	WLS (default)	0.950	2	AK (AK2)	0.958
3	RMA (default)	0.948	3	WLS (default)	0.950
4	trimfill (default)	0.943	4	RMA (default)	0.948
5	WAAPWLS (default)	0.896	5	trimfill (default)	0.943
6	AK (AK1)	0.893	6	WAAPWLS (default)	0.896
7	WILS (default)	0.888	7	AK (AK1)	0.894
8	AK (AK2)	0.886	8	WILS (default)	0.888
9	PEESE (default)	0.882	9	PEESE (default)	0.882
10	SM (3PSM)	0.850	10	SM (3PSM)	0.871
11	puniform (star)	0.842	11	puniform (star)	0.842
12	mean (default)	0.835	12	mean (default)	0.835
13	SM (4PSM)	0.742	13	SM (4PSM)	0.759
14	PETPEESE (default)	0.708	14	PETPEESE (default)	0.708
15	EK (default)	0.654	15	EK (default)	0.654
16	PET (default)	0.652	16	PET (default)	0.652
17	puniform (default)	0.591	17	puniform (default)	0.597
18	RoBMA (PSMA)	0.584	18	RoBMA (PSMA)	0.596
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Conditional on Method Convergence)

The results below are conditional on method convergence. Note that the methods might differ in convergence rate and are therefore not compared on the same data sets.

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

Raincloud plot showing 95% confidence interval width across different methods

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Replacement in Case of Non-Convergence)

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

Raincloud plot showing 95% confidence interval width across different methods

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

Session Info

This report was compiled on Fri Dec 05 12:33:25 2025 (UTC) using the following computational environment

sessionInfo()

## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scales_1.4.0                   ggdist_3.3.3                  
## [3] ggplot2_4.0.1                  PublicationBiasBenchmark_0.1.3
## 
## loaded via a namespace (and not attached):
##  [1] generics_0.1.4       sandwich_3.1-1       sass_0.4.10         
##  [4] xml2_1.5.1           stringi_1.8.7        lattice_0.22-7      
##  [7] httpcode_0.3.0       digest_0.6.39        magrittr_2.0.4      
## [10] evaluate_1.0.5       grid_4.5.2           RColorBrewer_1.1-3  
## [13] fastmap_1.2.0        jsonlite_2.0.0       crul_1.6.0          
## [16] urltools_1.7.3.1     httr_1.4.7           purrr_1.2.0         
## [19] viridisLite_0.4.2    textshaping_1.0.4    jquerylib_0.1.4     
## [22] Rdpack_2.6.4         cli_3.6.5            rlang_1.1.6         
## [25] triebeard_0.4.1      rbibutils_2.4        withr_3.0.2         
## [28] cachem_1.1.0         yaml_2.3.11          tools_4.5.2         
## [31] memoise_2.0.1        kableExtra_1.4.0     curl_7.0.0          
## [34] vctrs_0.6.5          R6_2.6.1             clubSandwich_0.6.1  
## [37] zoo_1.8-14           lifecycle_1.0.4      stringr_1.6.0       
## [40] fs_1.6.6             htmlwidgets_1.6.4    ragg_1.5.0          
## [43] pkgconfig_2.0.3      desc_1.4.3           osfr_0.2.9          
## [46] pkgdown_2.2.0        bslib_0.9.0          pillar_1.11.1       
## [49] gtable_0.3.6         Rcpp_1.1.0           glue_1.8.0          
## [52] systemfonts_1.3.1    xfun_0.54            tibble_3.3.0        
## [55] rstudioapi_0.17.1    knitr_1.50           farver_2.1.2        
## [58] htmltools_0.5.9      labeling_0.4.3       svglite_2.2.2       
## [61] rmarkdown_2.30       compiler_4.5.2       S7_0.2.1            
## [64] distributional_0.5.0