Results: Overall • PublicationBiasBenchmark

Complete Results

These results are based on Stanley (2017), Alinaghi (2018), Bom (2019), and Carter (2019) data-generating mechanisms with a total of 1665 conditions.

Average Performance

Method performance measures are aggregated across all simulated conditions to provide an overall impression of method performance. However, keep in mind that a method with a high overall ranking is not necessarily the “best” method for a particular application. To select a suitable method for your application, consider also non-aggregated performance measures in conditions most relevant to your application.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	AK (AK1)	6.339	1	AK (AK1)	6.384
2	RoBMA (PSMA)	7.038	2	RoBMA (PSMA)	7.204
3	WAAPWLS (default)	7.520	3	WAAPWLS (default)	7.533
4	FMA (default)	7.958	4	FMA (default)	7.971
5	WLS (default)	7.965	5	WLS (default)	7.978
6	trimfill (default)	8.514	6	trimfill (default)	8.532
7	SM (3PSM)	8.941	7	SM (3PSM)	8.969
8	PEESE (default)	8.951	8	PEESE (default)	8.977
9	PETPEESE (default)	9.559	9	PETPEESE (default)	9.582
10	WILS (default)	9.701	10	WILS (default)	9.712
11	puniform (star)	9.782	11	puniform (star)	9.799
12	RMA (default)	10.416	12	RMA (default)	10.434
13	AK (AK2)	11.341	13	AK (AK2)	10.780
14	EK (default)	11.449	14	EK (default)	11.471
15	SM (4PSM)	11.497	15	SM (4PSM)	11.538
16	PET (default)	11.583	16	PET (default)	11.608
17	pcurve (default)	11.686	17	pcurve (default)	11.703
18	puniform (default)	13.138	18	puniform (default)	13.157
19	mean (default)	14.269	19	mean (default)	14.317

RMSE (Root Mean Square Error) is an overall summary measure of estimation performance that combines bias and empirical SE. RMSE is the square root of the average squared difference between the meta-analytic estimate and the true effect across simulation runs. A lower RMSE indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average RMSE is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of RMSE values on the corresponding outcome scale.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	WAAPWLS (default)	7.392	1	WAAPWLS (default)	7.489
2	AK (AK1)	7.741	2	AK (AK1)	7.810
3	PETPEESE (default)	7.910	3	PETPEESE (default)	7.967
4	PEESE (default)	8.189	4	PEESE (default)	8.250
5	SM (3PSM)	8.389	5	SM (3PSM)	8.380
6	RoBMA (PSMA)	8.643	6	RoBMA (PSMA)	8.610
7	puniform (star)	9.047	7	puniform (star)	9.112
8	EK (default)	9.088	8	EK (default)	9.147
9	PET (default)	9.171	9	PET (default)	9.229
10	WLS (default)	9.213	10	WLS (default)	9.305
11	FMA (default)	9.217	11	FMA (default)	9.309
12	SM (4PSM)	9.627	12	SM (4PSM)	9.682
13	WILS (default)	10.234	13	AK (AK2)	9.971
14	trimfill (default)	10.392	14	WILS (default)	10.283
15	AK (AK2)	10.969	15	trimfill (default)	10.467
16	RMA (default)	12.153	16	RMA (default)	12.220
17	puniform (default)	12.634	17	puniform (default)	12.677
18	pcurve (default)	12.804	18	pcurve (default)	12.836
19	mean (default)	15.017	19	mean (default)	15.086

Bias is the average difference between the meta-analytic estimate and the true effect across simulation runs. Ideally, this value should be close to 0. Methods are compared using condition-wise ranks. Direct comparison using the average bias is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	RMA (default)	3.868	1	RMA (default)	3.726
2	AK (AK1)	5.163	2	WLS (default)	5.267
3	WLS (default)	5.375	3	FMA (default)	5.272
4	FMA (default)	5.380	4	AK (AK1)	5.315
5	trimfill (default)	6.847	5	trimfill (default)	6.799
6	mean (default)	7.882	6	mean (default)	7.821
7	pcurve (default)	7.912	7	pcurve (default)	7.871
8	RoBMA (PSMA)	8.532	8	WAAPWLS (default)	8.538
9	WAAPWLS (default)	8.625	9	RoBMA (PSMA)	9.159
10	PEESE (default)	10.629	10	PEESE (default)	10.601
11	SM (3PSM)	10.903	11	SM (3PSM)	10.861
12	puniform (default)	11.119	12	puniform (default)	11.085
13	WILS (default)	12.284	13	WILS (default)	12.230
14	puniform (star)	12.305	14	puniform (star)	12.254
15	AK (AK2)	13.214	15	PETPEESE (default)	13.235
16	PETPEESE (default)	13.253	16	AK (AK2)	13.374
17	SM (4PSM)	14.047	17	SM (4PSM)	13.998
18	EK (default)	15.160	18	EK (default)	15.126
19	PET (default)	15.235	19	PET (default)	15.204

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance. Methods are compared using condition-wise ranks. Direct comparison using the average empirical standard error is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of empirical standard error values on the corresponding outcome scale.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	5.554	1	RoBMA (PSMA)	5.711
2	AK (AK1)	5.962	2	AK (AK1)	5.971
3	SM (3PSM)	6.729	3	SM (3PSM)	6.820
4	puniform (star)	7.120	4	puniform (star)	7.214
5	WAAPWLS (default)	7.590	5	WAAPWLS (default)	7.648
6	trimfill (default)	8.422	6	trimfill (default)	8.461
7	SM (4PSM)	8.810	7	SM (4PSM)	8.605
8	WLS (default)	9.283	8	WLS (default)	9.339
9	PEESE (default)	9.313	9	PEESE (default)	9.347
10	PETPEESE (default)	9.744	10	AK (AK2)	9.412
11	AK (AK2)	9.890	11	PETPEESE (default)	9.754
12	RMA (default)	10.127	12	RMA (default)	10.170
13	EK (default)	10.463	13	EK (default)	10.469
14	PET (default)	11.374	14	PET (default)	11.376
15	WILS (default)	11.458	15	WILS (default)	11.449
16	FMA (default)	11.743	16	FMA (default)	11.776
17	puniform (default)	11.791	17	puniform (default)	11.810
18	mean (default)	15.172	18	mean (default)	15.214
19	pcurve (default)	18.977	19	pcurve (default)	18.977

The interval score measures the accuracy of a confidence interval by combining its width and coverage. It penalizes intervals that are too wide or that fail to include the true value. A lower interval score indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average Interval Score is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of empirical standard error values on the corresponding outcome scale.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.800	1	RoBMA (PSMA)	0.798
2	AK (AK2)	0.795	2	AK (AK2)	0.769
3	SM (4PSM)	0.765	3	SM (4PSM)	0.760
4	puniform (star)	0.733	4	puniform (star)	0.733
5	SM (3PSM)	0.733	5	SM (3PSM)	0.728
6	EK (default)	0.641	6	EK (default)	0.641
7	PETPEESE (default)	0.629	7	PETPEESE (default)	0.629
8	PET (default)	0.620	8	PET (default)	0.620
9	AK (AK1)	0.609	9	AK (AK1)	0.609
10	WAAPWLS (default)	0.582	10	WAAPWLS (default)	0.582
11	trimfill (default)	0.544	11	trimfill (default)	0.543
12	PEESE (default)	0.526	12	PEESE (default)	0.526
13	WILS (default)	0.504	13	WILS (default)	0.504
14	puniform (default)	0.484	14	puniform (default)	0.484
15	WLS (default)	0.464	15	WLS (default)	0.464
16	RMA (default)	0.457	16	RMA (default)	0.457
17	FMA (default)	0.342	17	FMA (default)	0.342
18	mean (default)	0.260	18	mean (default)	0.260
19	pcurve (default)	NaN	19	pcurve (default)	NaN

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	FMA (default)	2.348	1	FMA (default)	2.310
2	WLS (default)	3.883	2	WLS (default)	3.841
3	WILS (default)	4.891	3	WILS (default)	4.871
4	mean (default)	7.108	4	mean (default)	7.107
5	PEESE (default)	7.142	5	PEESE (default)	7.129
6	WAAPWLS (default)	7.474	6	WAAPWLS (default)	7.433
7	RMA (default)	7.726	7	RMA (default)	7.682
8	trimfill (default)	7.771	8	trimfill (default)	7.693
9	AK (AK1)	8.774	9	AK (AK1)	8.777
10	PETPEESE (default)	9.258	10	PETPEESE (default)	9.298
11	RoBMA (PSMA)	10.541	11	RoBMA (PSMA)	10.617
12	puniform (default)	11.068	12	puniform (default)	11.079
13	SM (3PSM)	12.107	13	SM (3PSM)	12.180
14	puniform (star)	13.280	14	PET (default)	13.437
15	PET (default)	13.342	15	puniform (star)	13.447
16	EK (default)	14.474	16	AK (AK2)	14.238
17	AK (AK2)	14.521	17	EK (default)	14.577
18	SM (4PSM)	14.836	18	SM (4PSM)	14.831
19	pcurve (default)	18.977	19	pcurve (default)	18.977

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average CI width is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of CI width values on the corresponding outcome scale.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	RoBMA (PSMA)	3.703	1	RoBMA (PSMA)	3.577
2	AK (AK2)	2.124	2	AK (AK2)	1.735
3	PETPEESE (default)	1.532	3	PETPEESE (default)	1.532
4	PET (default)	1.515	4	PET (default)	1.515
5	EK (default)	1.515	5	EK (default)	1.515
6	puniform (default)	1.515	6	puniform (default)	1.501
7	puniform (star)	1.325	7	puniform (star)	1.325
8	SM (3PSM)	1.325	8	SM (3PSM)	1.321
9	AK (AK1)	1.215	9	AK (AK1)	1.205
10	SM (4PSM)	1.156	10	SM (4PSM)	1.185
11	RMA (default)	0.998	11	RMA (default)	0.998
12	WAAPWLS (default)	0.945	12	WAAPWLS (default)	0.945
13	trimfill (default)	0.922	13	trimfill (default)	0.922
14	WILS (default)	0.871	14	WILS (default)	0.871
15	PEESE (default)	0.843	15	PEESE (default)	0.843
16	WLS (default)	0.790	16	WLS (default)	0.790
17	FMA (default)	0.503	17	FMA (default)	0.503
18	mean (default)	0.487	18	mean (default)	0.487
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The positive likelihood ratio is an overall summary measure of hypothesis testing performance that combines power and type I error rate. It indicates how much a significant test result changes the odds of the alternative hypothesis versus the null hypothesis. A useful method has a positive likelihood ratio greater than 1 (or a log positive likelihood ratio greater than 0). A higher (log) positive likelihood ratio indicates a better method.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	PETPEESE (default)	-4.626	1	AK (AK2)	-4.661
2	EK (default)	-4.496	2	PETPEESE (default)	-4.626
3	PET (default)	-4.496	3	EK (default)	-4.496
4	WAAPWLS (default)	-4.042	4	PET (default)	-4.496
5	PEESE (default)	-3.890	5	WAAPWLS (default)	-4.042
6	SM (3PSM)	-3.560	6	PEESE (default)	-3.890
7	WLS (default)	-3.450	7	SM (3PSM)	-3.593
8	trimfill (default)	-3.445	8	WLS (default)	-3.450
9	puniform (default)	-3.374	9	trimfill (default)	-3.446
10	RoBMA (PSMA)	-3.331	10	puniform (default)	-3.376
11	AK (AK1)	-3.277	11	RoBMA (PSMA)	-3.332
12	puniform (star)	-3.208	12	AK (AK1)	-3.281
13	AK (AK2)	-3.158	13	puniform (star)	-3.208
14	RMA (default)	-3.121	14	RMA (default)	-3.121
15	FMA (default)	-3.058	15	FMA (default)	-3.058
16	WILS (default)	-3.037	16	WILS (default)	-3.037
17	SM (4PSM)	-2.636	17	SM (4PSM)	-2.873
18	mean (default)	-2.503	18	mean (default)	-2.503
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The negative likelihood ratio is an overall summary measure of hypothesis testing performance that combines power and type I error rate. It indicates how much a non-significant test result changes the odds of the alternative hypothesis versus the null hypothesis. A useful method has a negative likelihood ratio less than 1 (or a log negative likelihood ratio less than 0). A lower (log) negative likelihood ratio indicates a better method.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.102	1	RoBMA (PSMA)	0.106
2	AK (AK2)	0.125	2	AK (AK2)	0.237
3	SM (4PSM)	0.245	3	SM (4PSM)	0.248
4	PET (default)	0.257	4	PET (default)	0.257
5	EK (default)	0.257	5	EK (default)	0.257
6	PETPEESE (default)	0.270	6	PETPEESE (default)	0.270
7	SM (3PSM)	0.277	7	SM (3PSM)	0.280
8	puniform (star)	0.293	8	puniform (star)	0.293
9	WILS (default)	0.391	9	WILS (default)	0.391
10	WAAPWLS (default)	0.523	10	WAAPWLS (default)	0.523
11	PEESE (default)	0.546	11	PEESE (default)	0.546
12	AK (AK1)	0.581	12	AK (AK1)	0.581
13	trimfill (default)	0.586	13	trimfill (default)	0.586
14	puniform (default)	0.608	14	puniform (default)	0.607
15	WLS (default)	0.621	15	WLS (default)	0.621
16	RMA (default)	0.622	16	RMA (default)	0.622
17	FMA (default)	0.772	17	FMA (default)	0.772
18	mean (default)	0.779	18	mean (default)	0.779
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The type I error rate is the proportion of simulation runs in which the null hypothesis of no effect was incorrectly rejected when it was true. Ideally, this value should be close to the nominal level of 5%.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	mean (default)	0.990	1	mean (default)	0.990
2	FMA (default)	0.989	2	FMA (default)	0.989
3	WLS (default)	0.976	3	WLS (default)	0.976
4	RMA (default)	0.974	4	RMA (default)	0.974
5	AK (AK1)	0.969	5	AK (AK1)	0.969
6	trimfill (default)	0.965	6	trimfill (default)	0.965
7	PEESE (default)	0.953	7	PEESE (default)	0.953
8	puniform (default)	0.939	8	puniform (default)	0.939
9	WAAPWLS (default)	0.934	9	WAAPWLS (default)	0.934
10	PETPEESE (default)	0.893	10	PETPEESE (default)	0.893
11	EK (default)	0.873	11	AK (AK2)	0.885
12	PET (default)	0.873	12	EK (default)	0.873
13	WILS (default)	0.864	13	PET (default)	0.873
14	SM (3PSM)	0.828	14	WILS (default)	0.864
15	AK (AK2)	0.812	15	SM (3PSM)	0.835
16	puniform (star)	0.808	16	puniform (star)	0.808
17	SM (4PSM)	0.754	17	SM (4PSM)	0.772
18	RoBMA (PSMA)	0.703	18	RoBMA (PSMA)	0.706
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Conditional on Method Convergence)

The results below are conditional on method convergence. Note that the methods might differ in convergence rate and are therefore not compared on the same data sets.

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance. Methods are compared using condition-wise ranks. Direct comparison using the empirical standard error is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval width across different methods

The interval score measures the accuracy of a confidence interval by combining its width and coverage. It penalizes intervals that are too wide or that fail to include the true value. A lower interval score indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the interval score is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average 95% CI width is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of 95% CI width values on the corresponding outcome scale.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Replacement in Case of Non-Convergence)

The results below incorporate method replacement to handle non-convergence. If a method fails to converge, its results are replaced with the results from a simpler method (e.g., random-effects meta-analysis without publication bias adjustment). This emulates what a data analyst may do in practice in case a method does not converge. However, note that these results do not correspond to “pure” method performance as they might combine multiple different methods. See Method Replacement Strategy for details of the method replacement specification.

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance. Methods are compared using condition-wise ranks. Direct comparison using the empirical standard error is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval width across different methods

The interval score measures the accuracy of a confidence interval by combining its width and coverage. It penalizes intervals that are too wide or that fail to include the true value. A lower interval score indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the interval score is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average 95% CI width is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of 95% CI width values on the corresponding outcome scale.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

Subset: Publication Bias Present

These results are based on Stanley (2017), Alinaghi (2018), Bom (2019), and Carter (2019) data-generating mechanisms with a total of 1143 conditions.

Average Performance

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	AK (AK1)	6.486	1	AK (AK1)	6.475
2	WAAPWLS (default)	7.641	2	WAAPWLS (default)	7.654
3	RoBMA (PSMA)	7.654	3	RoBMA (PSMA)	7.863
4	trimfill (default)	8.411	4	trimfill (default)	8.414
5	PEESE (default)	8.544	5	PEESE (default)	8.573
6	FMA (default)	8.585	6	FMA (default)	8.590
7	WLS (default)	8.590	7	WLS (default)	8.594
8	PETPEESE (default)	8.877	8	PETPEESE (default)	8.891
9	WILS (default)	9.296	9	WILS (default)	9.304
10	SM (3PSM)	9.509	10	SM (3PSM)	9.535
11	puniform (star)	9.916	11	puniform (star)	9.920
12	pcurve (default)	10.421	12	AK (AK2)	10.412
13	EK (default)	10.623	13	pcurve (default)	10.435
14	PET (default)	10.752	14	EK (default)	10.644
15	AK (AK2)	10.911	15	PET (default)	10.777
16	SM (4PSM)	11.726	16	SM (4PSM)	11.760
17	RMA (default)	11.883	17	RMA (default)	11.911
18	puniform (default)	12.356	18	puniform (default)	12.374
19	mean (default)	15.540	19	mean (default)	15.594

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	WAAPWLS (default)	7.444	1	WAAPWLS (default)	7.535
2	PETPEESE (default)	7.618	2	PETPEESE (default)	7.651
3	AK (AK1)	8.054	3	RoBMA (PSMA)	8.066
4	RoBMA (PSMA)	8.150	4	AK (AK1)	8.081
5	PEESE (default)	8.301	5	PEESE (default)	8.353
6	SM (3PSM)	8.668	6	SM (3PSM)	8.652
7	EK (default)	8.883	7	EK (default)	8.918
8	WILS (default)	8.899	8	WILS (default)	8.953
9	PET (default)	9.010	9	PET (default)	9.047
10	puniform (star)	9.025	10	puniform (star)	9.101
11	trimfill (default)	9.208	11	trimfill (default)	9.266
12	SM (4PSM)	9.817	12	SM (4PSM)	9.866
13	FMA (default)	9.937	13	AK (AK2)	9.892
14	WLS (default)	9.944	14	FMA (default)	10.025
15	AK (AK2)	10.712	15	WLS (default)	10.032
16	pcurve (default)	11.549	16	pcurve (default)	11.581
17	puniform (default)	11.905	17	puniform (default)	11.964
18	RMA (default)	13.843	18	RMA (default)	13.926
19	mean (default)	16.829	19	mean (default)	16.886

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	RMA (default)	3.972	1	RMA (default)	3.817
2	AK (AK1)	4.755	2	AK (AK1)	4.822
3	WLS (default)	5.384	3	WLS (default)	5.251
4	FMA (default)	5.388	4	FMA (default)	5.255
5	trimfill (default)	6.523	5	trimfill (default)	6.468
6	pcurve (default)	6.996	6	pcurve (default)	6.943
7	mean (default)	7.560	7	mean (default)	7.476
8	WAAPWLS (default)	8.716	8	WAAPWLS (default)	8.607
9	RoBMA (PSMA)	9.600	9	PEESE (default)	10.407
10	PEESE (default)	10.482	10	RoBMA (PSMA)	10.519
11	puniform (default)	10.866	11	puniform (default)	10.818
12	SM (3PSM)	11.488	12	SM (3PSM)	11.402
13	WILS (default)	12.593	13	WILS (default)	12.507
14	puniform (star)	12.708	14	puniform (star)	12.605
15	AK (AK2)	13.037	15	PETPEESE (default)	13.332
16	PETPEESE (default)	13.399	16	AK (AK2)	13.437
17	SM (4PSM)	14.441	17	SM (4PSM)	14.350
18	EK (default)	14.930	18	EK (default)	14.873
19	PET (default)	14.989	19	PET (default)	14.936

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	5.997	1	RoBMA (PSMA)	6.184
2	AK (AK1)	6.247	2	AK (AK1)	6.209
3	puniform (star)	7.006	3	puniform (star)	7.083
4	SM (3PSM)	7.037	4	SM (3PSM)	7.133
5	WAAPWLS (default)	7.719	5	WAAPWLS (default)	7.766
6	trimfill (default)	8.204	6	trimfill (default)	8.228
7	SM (4PSM)	8.972	7	SM (4PSM)	8.787
8	PEESE (default)	9.093	8	AK (AK2)	9.111
9	PETPEESE (default)	9.225	9	PEESE (default)	9.126
10	AK (AK2)	9.500	10	PETPEESE (default)	9.212
11	EK (default)	9.646	11	EK (default)	9.647
12	WLS (default)	9.950	12	WLS (default)	10.008
13	PET (default)	10.542	13	PET (default)	10.542
14	puniform (default)	10.857	14	puniform (default)	10.869
15	WILS (default)	10.905	15	WILS (default)	10.870
16	RMA (default)	11.525	16	RMA (default)	11.588
17	FMA (default)	11.958	17	FMA (default)	11.989
18	mean (default)	16.203	18	mean (default)	16.234
19	pcurve (default)	18.975	19	pcurve (default)	18.975

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.759	1	RoBMA (PSMA)	0.756
2	AK (AK2)	0.754	2	SM (4PSM)	0.717
3	SM (4PSM)	0.722	3	AK (AK2)	0.716
4	SM (3PSM)	0.688	4	puniform (star)	0.688
5	puniform (star)	0.688	5	SM (3PSM)	0.682
6	EK (default)	0.611	6	EK (default)	0.611
7	PETPEESE (default)	0.599	7	PETPEESE (default)	0.599
8	PET (default)	0.588	8	PET (default)	0.588
9	AK (AK1)	0.526	9	AK (AK1)	0.526
10	WAAPWLS (default)	0.523	10	WAAPWLS (default)	0.523
11	trimfill (default)	0.485	11	trimfill (default)	0.484
12	WILS (default)	0.479	12	WILS (default)	0.479
13	puniform (default)	0.478	13	puniform (default)	0.478
14	PEESE (default)	0.467	14	PEESE (default)	0.467
15	WLS (default)	0.393	15	WLS (default)	0.393
16	RMA (default)	0.358	16	RMA (default)	0.358
17	FMA (default)	0.288	17	FMA (default)	0.288
18	mean (default)	0.148	18	mean (default)	0.148
19	pcurve (default)	NaN	19	pcurve (default)	NaN

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	FMA (default)	2.332	1	FMA (default)	2.283
2	WLS (default)	3.981	2	WLS (default)	3.934
3	WILS (default)	5.625	3	WILS (default)	5.608
4	mean (default)	6.780	4	mean (default)	6.776
5	PEESE (default)	6.983	5	PEESE (default)	6.973
6	WAAPWLS (default)	7.553	6	WAAPWLS (default)	7.510
7	trimfill (default)	7.796	7	trimfill (default)	7.729
8	RMA (default)	7.836	8	RMA (default)	7.800
9	AK (AK1)	8.204	9	AK (AK1)	8.205
10	PETPEESE (default)	9.124	10	PETPEESE (default)	9.162
11	puniform (default)	10.870	11	puniform (default)	10.900
12	RoBMA (PSMA)	11.163	12	RoBMA (PSMA)	11.255
13	SM (3PSM)	12.385	13	SM (3PSM)	12.484
14	PET (default)	13.105	14	PET (default)	13.203
15	puniform (star)	13.246	15	puniform (star)	13.417
16	EK (default)	14.241	16	AK (AK2)	13.943
17	AK (AK2)	14.346	17	EK (default)	14.343
18	SM (4PSM)	15.018	18	SM (4PSM)	15.061
19	pcurve (default)	18.975	19	pcurve (default)	18.975

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	RoBMA (PSMA)	3.072	1	RoBMA (PSMA)	2.925
2	AK (AK2)	1.862	2	puniform (default)	1.707
3	puniform (default)	1.712	3	PETPEESE (default)	1.411
4	PETPEESE (default)	1.411	4	PET (default)	1.402
5	PET (default)	1.402	5	EK (default)	1.401
6	EK (default)	1.401	6	AK (AK2)	1.346
7	SM (3PSM)	1.058	7	puniform (star)	1.057
8	puniform (star)	1.057	8	SM (3PSM)	1.047
9	SM (4PSM)	0.857	9	SM (4PSM)	0.891
10	AK (AK1)	0.816	10	AK (AK1)	0.814
11	WILS (default)	0.780	11	WILS (default)	0.780
12	trimfill (default)	0.710	12	trimfill (default)	0.710
13	WAAPWLS (default)	0.694	13	WAAPWLS (default)	0.694
14	RMA (default)	0.601	14	RMA (default)	0.601
15	PEESE (default)	0.592	15	PEESE (default)	0.592
16	WLS (default)	0.520	16	WLS (default)	0.520
17	FMA (default)	0.267	17	FMA (default)	0.267
18	mean (default)	0.143	18	mean (default)	0.143
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	PETPEESE (default)	-4.416	1	PETPEESE (default)	-4.416
2	PET (default)	-4.295	2	PET (default)	-4.295
3	EK (default)	-4.295	3	EK (default)	-4.295
4	WAAPWLS (default)	-3.646	4	AK (AK2)	-4.176
5	PEESE (default)	-3.433	5	WAAPWLS (default)	-3.646
6	puniform (default)	-3.371	6	PEESE (default)	-3.433
7	RoBMA (PSMA)	-2.917	7	puniform (default)	-3.370
8	SM (3PSM)	-2.841	8	RoBMA (PSMA)	-2.912
9	trimfill (default)	-2.827	9	SM (3PSM)	-2.882
10	WLS (default)	-2.818	10	trimfill (default)	-2.827
11	AK (AK2)	-2.697	11	WLS (default)	-2.818
12	AK (AK1)	-2.628	12	AK (AK1)	-2.629
13	puniform (star)	-2.613	13	puniform (star)	-2.613
14	WILS (default)	-2.580	14	WILS (default)	-2.580
15	RMA (default)	-2.441	15	RMA (default)	-2.441
16	FMA (default)	-2.367	16	FMA (default)	-2.367
17	SM (4PSM)	-1.917	17	SM (4PSM)	-2.168
18	mean (default)	-1.692	18	mean (default)	-1.692
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.124	1	RoBMA (PSMA)	0.129
2	AK (AK2)	0.150	2	SM (4PSM)	0.284
3	SM (4PSM)	0.279	3	PET (default)	0.293
4	PET (default)	0.293	4	EK (default)	0.294
5	EK (default)	0.294	5	AK (AK2)	0.307
6	PETPEESE (default)	0.311	6	PETPEESE (default)	0.311
7	SM (3PSM)	0.318	7	SM (3PSM)	0.323
8	puniform (star)	0.327	8	puniform (star)	0.327
9	WILS (default)	0.404	9	WILS (default)	0.404
10	puniform (default)	0.618	10	puniform (default)	0.618
11	WAAPWLS (default)	0.650	11	WAAPWLS (default)	0.650
12	PEESE (default)	0.664	12	PEESE (default)	0.664
13	trimfill (default)	0.706	13	trimfill (default)	0.706
14	AK (AK1)	0.731	14	AK (AK1)	0.731
15	WLS (default)	0.762	15	WLS (default)	0.762
16	RMA (default)	0.782	16	RMA (default)	0.782
17	FMA (default)	0.883	17	FMA (default)	0.883
18	mean (default)	0.931	18	mean (default)	0.931
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	mean (default)	0.996	1	mean (default)	0.996
2	FMA (default)	0.991	2	FMA (default)	0.991
3	RMA (default)	0.979	3	RMA (default)	0.979
4	WLS (default)	0.978	4	WLS (default)	0.978
5	AK (AK1)	0.974	5	AK (AK1)	0.974
6	trimfill (default)	0.971	6	trimfill (default)	0.971
7	PEESE (default)	0.953	7	PEESE (default)	0.953
8	WAAPWLS (default)	0.937	8	WAAPWLS (default)	0.937
9	puniform (default)	0.929	9	puniform (default)	0.929
10	PETPEESE (default)	0.886	10	PETPEESE (default)	0.886
11	EK (default)	0.865	11	AK (AK2)	0.870
12	PET (default)	0.865	12	EK (default)	0.865
13	WILS (default)	0.839	13	PET (default)	0.865
14	SM (3PSM)	0.789	14	WILS (default)	0.839
15	AK (AK2)	0.777	15	SM (3PSM)	0.798
16	puniform (star)	0.766	16	puniform (star)	0.766
17	SM (4PSM)	0.711	17	SM (4PSM)	0.732
18	RoBMA (PSMA)	0.665	18	RoBMA (PSMA)	0.669
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Conditional on Method Convergence)

The results below are conditional on method convergence. Note that the methods might differ in convergence rate and are therefore not compared on the same data sets.

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance. Methods are compared using condition-wise ranks. Direct comparison using the empirical standard error is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval width across different methods

The interval score measures the accuracy of a confidence interval by combining its width and coverage. It penalizes intervals that are too wide or that fail to include the true value. A lower interval score indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the interval score is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average 95% CI width is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of 95% CI width values on the corresponding outcome scale.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Replacement in Case of Non-Convergence)

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance. Methods are compared using condition-wise ranks. Direct comparison using the empirical standard error is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval width across different methods

The interval score measures the accuracy of a confidence interval by combining its width and coverage. It penalizes intervals that are too wide or that fail to include the true value. A lower interval score indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the interval score is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average 95% CI width is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of 95% CI width values on the corresponding outcome scale.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

Subset: Publication Bias Absent

These results are based on Stanley (2017), Alinaghi (2018), Bom (2019), and Carter (2019) data-generating mechanisms with a total of 522 conditions.

Average Performance

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	RoBMA (PSMA)	5.690	1	RoBMA (PSMA)	5.762
2	AK (AK1)	6.019	2	AK (AK1)	6.186
3	FMA (default)	6.584	3	FMA (default)	6.615
4	WLS (default)	6.598	4	WLS (default)	6.628
5	RMA (default)	7.205	5	RMA (default)	7.199
6	WAAPWLS (default)	7.255	6	WAAPWLS (default)	7.268
7	SM (3PSM)	7.695	7	SM (3PSM)	7.730
8	trimfill (default)	8.739	8	trimfill (default)	8.791
9	puniform (star)	9.489	9	puniform (star)	9.533
10	PEESE (default)	9.843	10	PEESE (default)	9.860
11	WILS (default)	10.588	11	WILS (default)	10.605
12	SM (4PSM)	10.994	12	SM (4PSM)	11.050
13	PETPEESE (default)	11.054	13	PETPEESE (default)	11.096
14	mean (default)	11.487	14	mean (default)	11.521
15	AK (AK2)	12.284	15	AK (AK2)	11.586
16	EK (default)	13.257	16	EK (default)	13.284
17	PET (default)	13.402	17	PET (default)	13.427
18	pcurve (default)	14.458	18	pcurve (default)	14.479
19	puniform (default)	14.851	19	puniform (default)	14.870

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	AK (AK1)	7.056	1	AK (AK1)	7.216
2	WAAPWLS (default)	7.278	2	WAAPWLS (default)	7.389
3	WLS (default)	7.613	3	WLS (default)	7.713
4	FMA (default)	7.642	4	FMA (default)	7.741
5	SM (3PSM)	7.780	5	SM (3PSM)	7.785
6	PEESE (default)	7.943	6	PEESE (default)	8.025
7	RMA (default)	8.454	7	RMA (default)	8.487
8	PETPEESE (default)	8.550	8	PETPEESE (default)	8.659
9	puniform (star)	9.094	9	puniform (star)	9.136
10	SM (4PSM)	9.211	10	SM (4PSM)	9.280
11	PET (default)	9.523	11	PET (default)	9.628
12	EK (default)	9.538	12	EK (default)	9.649
13	RoBMA (PSMA)	9.724	13	RoBMA (PSMA)	9.799
14	mean (default)	11.050	14	AK (AK2)	10.144
15	AK (AK2)	11.533	15	mean (default)	11.146
16	trimfill (default)	12.985	16	trimfill (default)	13.096
17	WILS (default)	13.157	17	WILS (default)	13.195
18	puniform (default)	14.232	18	puniform (default)	14.239
19	pcurve (default)	15.552	19	pcurve (default)	15.584

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	RMA (default)	3.640	1	RMA (default)	3.527
2	WLS (default)	5.356	2	WLS (default)	5.303
3	FMA (default)	5.362	3	FMA (default)	5.308
4	AK (AK1)	6.057	4	RoBMA (PSMA)	6.180
5	RoBMA (PSMA)	6.193	5	AK (AK1)	6.393
6	trimfill (default)	7.557	6	trimfill (default)	7.523
7	WAAPWLS (default)	8.427	7	WAAPWLS (default)	8.385
8	mean (default)	8.586	8	mean (default)	8.577
9	SM (3PSM)	9.621	9	SM (3PSM)	9.676
10	pcurve (default)	9.918	10	pcurve (default)	9.902
11	PEESE (default)	10.952	11	PEESE (default)	11.025
12	puniform (star)	11.423	12	puniform (star)	11.485
13	WILS (default)	11.607	13	WILS (default)	11.623
14	puniform (default)	11.672	14	puniform (default)	11.670
15	PETPEESE (default)	12.933	15	PETPEESE (default)	13.025
16	SM (4PSM)	13.186	16	SM (4PSM)	13.226
17	AK (AK2)	13.602	17	AK (AK2)	13.238
18	EK (default)	15.663	18	EK (default)	15.678
19	PET (default)	15.776	19	PET (default)	15.789

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	4.582	1	RoBMA (PSMA)	4.674
2	AK (AK1)	5.339	2	AK (AK1)	5.450
3	SM (3PSM)	6.054	3	SM (3PSM)	6.134
4	RMA (default)	7.065	4	RMA (default)	7.065
5	WAAPWLS (default)	7.308	5	WAAPWLS (default)	7.389
6	puniform (star)	7.370	6	puniform (star)	7.502
7	WLS (default)	7.824	7	WLS (default)	7.874
8	SM (4PSM)	8.456	8	SM (4PSM)	8.205
9	trimfill (default)	8.900	9	trimfill (default)	8.969
10	PEESE (default)	9.795	10	PEESE (default)	9.831
11	AK (AK2)	10.743	11	AK (AK2)	10.071
12	PETPEESE (default)	10.879	12	PETPEESE (default)	10.943
13	FMA (default)	11.272	13	FMA (default)	11.310
14	EK (default)	12.253	14	EK (default)	12.270
15	WILS (default)	12.669	15	WILS (default)	12.718
16	mean (default)	12.916	16	mean (default)	12.981
17	PET (default)	13.195	17	PET (default)	13.201
18	puniform (default)	13.837	18	puniform (default)	13.870
19	pcurve (default)	18.981	19	pcurve (default)	18.981

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.888	1	RoBMA (PSMA)	0.888
2	AK (AK2)	0.879	2	AK (AK2)	0.879
3	SM (4PSM)	0.860	3	SM (4PSM)	0.856
4	puniform (star)	0.832	4	puniform (star)	0.832
5	SM (3PSM)	0.831	5	SM (3PSM)	0.830
6	AK (AK1)	0.791	6	AK (AK1)	0.790
7	WAAPWLS (default)	0.711	7	WAAPWLS (default)	0.711
8	EK (default)	0.706	8	EK (default)	0.706
9	PETPEESE (default)	0.695	9	PETPEESE (default)	0.695
10	PET (default)	0.689	10	PET (default)	0.689
11	RMA (default)	0.675	11	RMA (default)	0.675
12	trimfill (default)	0.673	12	trimfill (default)	0.673
13	PEESE (default)	0.656	13	PEESE (default)	0.656
14	WLS (default)	0.619	14	WLS (default)	0.619
15	WILS (default)	0.557	15	WILS (default)	0.557
16	mean (default)	0.505	16	mean (default)	0.505
17	puniform (default)	0.497	17	puniform (default)	0.499
18	FMA (default)	0.461	18	FMA (default)	0.461
19	pcurve (default)	NaN	19	pcurve (default)	NaN

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Mean Rank	Rank	Method	Mean Rank
1	FMA (default)	2.385	1	FMA (default)	2.370
2	WILS (default)	3.284	2	WILS (default)	3.257
3	WLS (default)	3.669	3	WLS (default)	3.636
4	WAAPWLS (default)	7.303	4	WAAPWLS (default)	7.264
5	RMA (default)	7.485	5	RMA (default)	7.423
6	PEESE (default)	7.490	6	PEESE (default)	7.469
7	trimfill (default)	7.715	7	trimfill (default)	7.615
8	mean (default)	7.828	8	mean (default)	7.831
9	RoBMA (PSMA)	9.180	9	RoBMA (PSMA)	9.222
10	PETPEESE (default)	9.550	10	PETPEESE (default)	9.596
11	AK (AK1)	10.023	11	AK (AK1)	10.029
12	SM (3PSM)	11.498	12	puniform (default)	11.471
13	puniform (default)	11.504	13	SM (3PSM)	11.513
14	puniform (star)	13.354	14	puniform (star)	13.511
15	PET (default)	13.862	15	PET (default)	13.948
16	SM (4PSM)	14.437	16	SM (4PSM)	14.328
17	AK (AK2)	14.906	17	AK (AK2)	14.883
18	EK (default)	14.987	18	EK (default)	15.090
19	pcurve (default)	18.981	19	pcurve (default)	18.981

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	RoBMA (PSMA)	5.048	1	RoBMA (PSMA)	4.967
2	AK (AK2)	2.632	2	AK (AK2)	2.489
3	AK (AK1)	2.066	3	AK (AK1)	2.038
4	puniform (star)	1.896	4	SM (3PSM)	1.903
5	SM (3PSM)	1.894	5	puniform (star)	1.896
6	RMA (default)	1.842	6	RMA (default)	1.842
7	SM (4PSM)	1.794	7	SM (4PSM)	1.814
8	PETPEESE (default)	1.790	8	PETPEESE (default)	1.790
9	EK (default)	1.759	9	EK (default)	1.759
10	PET (default)	1.758	10	PET (default)	1.758
11	WAAPWLS (default)	1.480	11	WAAPWLS (default)	1.480
12	PEESE (default)	1.380	12	PEESE (default)	1.380
13	trimfill (default)	1.375	13	trimfill (default)	1.375
14	WLS (default)	1.367	14	WLS (default)	1.367
15	mean (default)	1.222	15	mean (default)	1.222
16	puniform (default)	1.095	16	WILS (default)	1.065
17	WILS (default)	1.065	17	puniform (default)	1.063
18	FMA (default)	1.006	18	FMA (default)	1.006
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Log Value	Rank	Method	Log Value
1	SM (3PSM)	-5.093	1	AK (AK2)	-5.601
2	PETPEESE (default)	-5.073	2	SM (3PSM)	-5.111
3	EK (default)	-4.924	3	PETPEESE (default)	-5.073
4	PET (default)	-4.924	4	EK (default)	-4.924
5	WAAPWLS (default)	-4.886	5	PET (default)	-4.924
6	PEESE (default)	-4.863	6	WAAPWLS (default)	-4.886
7	WLS (default)	-4.799	7	PEESE (default)	-4.863
8	trimfill (default)	-4.763	8	WLS (default)	-4.799
9	AK (AK1)	-4.662	9	trimfill (default)	-4.764
10	RMA (default)	-4.572	10	AK (AK1)	-4.670
11	FMA (default)	-4.531	11	RMA (default)	-4.572
12	puniform (star)	-4.477	12	FMA (default)	-4.531
13	mean (default)	-4.232	13	puniform (star)	-4.477
14	RoBMA (PSMA)	-4.216	14	SM (4PSM)	-4.377
15	SM (4PSM)	-4.170	15	mean (default)	-4.232
16	AK (AK2)	-4.051	16	RoBMA (PSMA)	-4.227
17	WILS (default)	-4.013	17	WILS (default)	-4.013
18	puniform (default)	-3.380	18	puniform (default)	-3.388
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	RoBMA (PSMA)	0.053	1	RoBMA (PSMA)	0.054
2	AK (AK2)	0.071	2	AK (AK2)	0.090
3	SM (4PSM)	0.167	3	SM (4PSM)	0.167
4	PET (default)	0.172	4	PET (default)	0.172
5	EK (default)	0.172	5	EK (default)	0.172
6	PETPEESE (default)	0.176	6	PETPEESE (default)	0.176
7	SM (3PSM)	0.183	7	SM (3PSM)	0.183
8	puniform (star)	0.216	8	puniform (star)	0.216
9	WAAPWLS (default)	0.233	9	WAAPWLS (default)	0.233
10	AK (AK1)	0.236	10	AK (AK1)	0.236
11	RMA (default)	0.255	11	RMA (default)	0.255
12	PEESE (default)	0.276	12	PEESE (default)	0.276
13	WLS (default)	0.296	13	WLS (default)	0.296
14	trimfill (default)	0.310	14	trimfill (default)	0.310
15	WILS (default)	0.361	15	WILS (default)	0.361
16	mean (default)	0.430	16	mean (default)	0.430
17	FMA (default)	0.518	17	FMA (default)	0.518
18	puniform (default)	0.587	18	puniform (default)	0.582
19	pcurve (default)	NaN	19	pcurve (default)	NaN

Conditional on Convergence			Replacement if Non-Convergence
Rank	Method	Value	Rank	Method	Value
1	FMA (default)	0.984	1	FMA (default)	0.984
2	mean (default)	0.978	2	mean (default)	0.978
3	WLS (default)	0.971	3	WLS (default)	0.971
4	RMA (default)	0.963	4	RMA (default)	0.963
5	AK (AK1)	0.960	5	puniform (default)	0.960
6	puniform (default)	0.959	6	AK (AK1)	0.960
7	trimfill (default)	0.953	7	trimfill (default)	0.953
8	PEESE (default)	0.952	8	PEESE (default)	0.952
9	WAAPWLS (default)	0.927	9	WAAPWLS (default)	0.927
10	WILS (default)	0.918	10	WILS (default)	0.918
11	SM (3PSM)	0.911	11	AK (AK2)	0.915
12	PETPEESE (default)	0.910	12	SM (3PSM)	0.914
13	puniform (star)	0.898	13	PETPEESE (default)	0.910
14	EK (default)	0.889	14	puniform (star)	0.898
15	PET (default)	0.889	15	EK (default)	0.889
16	AK (AK2)	0.883	16	PET (default)	0.889
17	SM (4PSM)	0.843	17	SM (4PSM)	0.858
18	RoBMA (PSMA)	0.784	18	RoBMA (PSMA)	0.785
19	pcurve (default)	NaN	19	pcurve (default)	NaN

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Conditional on Method Convergence)

The results below are conditional on method convergence. Note that the methods might differ in convergence rate and are therefore not compared on the same data sets.

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance. Methods are compared using condition-wise ranks. Direct comparison using the empirical standard error is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval width across different methods

The interval score measures the accuracy of a confidence interval by combining its width and coverage. It penalizes intervals that are too wide or that fail to include the true value. A lower interval score indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the interval score is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average 95% CI width is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of 95% CI width values on the corresponding outcome scale.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

By-Condition Performance (Replacement in Case of Non-Convergence)

Raincloud plot showing convergence rates across different methods

Raincloud plot showing RMSE (Root Mean Square Error) across different methods

Raincloud plot showing bias across different methods

The empirical SE is the standard deviation of the meta-analytic estimate across simulation runs. A lower empirical SE indicates less variability and better method performance. Methods are compared using condition-wise ranks. Direct comparison using the empirical standard error is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval width across different methods

The interval score measures the accuracy of a confidence interval by combining its width and coverage. It penalizes intervals that are too wide or that fail to include the true value. A lower interval score indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the interval score is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of bias values on the corresponding outcome scale.

Raincloud plot showing 95% confidence interval coverage across different methods

95% CI coverage is the proportion of simulation runs in which the 95% confidence interval contained the true effect. Ideally, this value should be close to the nominal level of 95%.

Raincloud plot showing 95% confidence interval width across different methods

95% CI width is the average length of the 95% confidence interval for the true effect. A lower average 95% CI length indicates a better method. Methods are compared using condition-wise ranks. Direct comparison using the average 95% CI width is not possible because the data-generating mechanisms differ in the outcome scale. See the DGM-specific results (or subresults) to see the distribution of 95% CI width values on the corresponding outcome scale.

Raincloud plot showing positive likelihood ratio across different methods

Raincloud plot showing negative likelihood ratio across different methods

Raincloud plot showing Type I Error rates across different methods

Raincloud plot showing statistical power across different methods

The power is the proportion of simulation runs in which the null hypothesis of no effect was correctly rejected when the alternative hypothesis was true. A higher power indicates a better method.

Session Info

This report was compiled on Fri Dec 05 12:38:09 2025 (UTC) using the following computational environment

sessionInfo()

## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scales_1.4.0                   ggdist_3.3.3                  
## [3] ggplot2_4.0.1                  PublicationBiasBenchmark_0.1.3
## 
## loaded via a namespace (and not attached):
##  [1] generics_0.1.4       sandwich_3.1-1       sass_0.4.10         
##  [4] xml2_1.5.1           stringi_1.8.7        lattice_0.22-7      
##  [7] httpcode_0.3.0       digest_0.6.39        magrittr_2.0.4      
## [10] evaluate_1.0.5       grid_4.5.2           RColorBrewer_1.1-3  
## [13] fastmap_1.2.0        jsonlite_2.0.0       crul_1.6.0          
## [16] urltools_1.7.3.1     httr_1.4.7           purrr_1.2.0         
## [19] viridisLite_0.4.2    textshaping_1.0.4    jquerylib_0.1.4     
## [22] Rdpack_2.6.4         cli_3.6.5            rlang_1.1.6         
## [25] triebeard_0.4.1      rbibutils_2.4        withr_3.0.2         
## [28] cachem_1.1.0         yaml_2.3.11          tools_4.5.2         
## [31] memoise_2.0.1        kableExtra_1.4.0     curl_7.0.0          
## [34] vctrs_0.6.5          R6_2.6.1             clubSandwich_0.6.1  
## [37] zoo_1.8-14           lifecycle_1.0.4      stringr_1.6.0       
## [40] fs_1.6.6             htmlwidgets_1.6.4    ragg_1.5.0          
## [43] pkgconfig_2.0.3      desc_1.4.3           osfr_0.2.9          
## [46] pkgdown_2.2.0        bslib_0.9.0          pillar_1.11.1       
## [49] gtable_0.3.6         Rcpp_1.1.0           glue_1.8.0          
## [52] systemfonts_1.3.1    xfun_0.54            tibble_3.3.0        
## [55] rstudioapi_0.17.1    knitr_1.50           farver_2.1.2        
## [58] htmltools_0.5.9      labeling_0.4.3       svglite_2.2.2       
## [61] rmarkdown_2.30       compiler_4.5.2       S7_0.2.1            
## [64] distributional_0.5.0