Many Labs 2: Investigating Variation in Replicability Across Samples and Settings

Richard Klein; Michelangelo Vianello; Fred Hasselman; Byron G. Adams; Reginald B. Adams; Sinan Alper; Mark Aveyard; Jordan Axt; Mayowa T. Babalola; Štěpán Bahník; Rishtee Batra; Mihály Berkics; Michael J. Bernstein; Daniel R. Berry; Olga Białobrzeska; Evans Dami Binan; Konrad Bocian; Mark J. Brandt; Robert Busching; Anna Cabak Rédei; Huajian Cai; Fanny Cambier; Katarzyna Cantarero; Cheryl L. Carmichael; Francisco Cerić; Jesse Chandler; Jen‐Ho Chang; Armand Chatard; Eva E. Chen; Winnee Cheong; David C. Cicero; Sharon Coen; Jennifer A. Coleman; Brian Collisson; Morgan Conway; Katherine S. Corker; Paul Curran; Fiery Cushman; Zubairu Kwambo Dagona; Ilker Dalgar; Anna Dalla Rosa; William E. Davis; Maaike de Bruijn; Leander De Schutter; Thierry Devos; Marieke de Vries; Canay Doğulu; Nerisa Dozo; Kristin Nicole Dukes; Yarrow Dunham; Kevin Durrheim; Charles R. Ebersole; John E. Edlund; Anja Eller; Alexander Scott English; Carolyn Finck; Natalia Frankowska; Miguel-Ángel Freyre; Michael Friedman; Elisa Maria Galliani; Joshua C. Gandi; Tanuka Ghoshal; Steffen R. Giessner; Tripat Gill; Timo Gnambs; Ángel Gómez; Roberto González; Jesse Graham; Jon Grahe; Ivan Grahek; Eva G. T. Green; Kakul Hai; Matthew Haigh; Elizabeth L. Haines; Michael P. Hall; Marie E. Heffernan; Joshua A. Hicks; Petr Houdek; Jeffrey R. Huntsinger; Ho Phi Huynh; Hans IJzerman; Yoel Inbar; Åse Innes-Ker; William Jiménez‐Leal; Melissa-Sue John; Jennifer A. Joy-Gaba; Roza Gizem Kamiloglu; Heather Barry Kappes; Serdar Karabatı; Haruna Karick; Victor N. Keller; Anna Kende; Nicolas Kervyn; Goran Knežević; Carrie Kovacs; Lacy E. Krueger; German Kurapov; Jamie Kurtz; Daniël Lakens; Ljiljana B. Lazarević

doi:10.1177/2515245918810225

Abstract

We conducted preregistered replications of 28 classic and contemporary published findings, with protocols that were peer reviewed in advance, to examine variation in effect magnitudes across samples and settings. Each protocol was administered to approximately half of 125 samples that comprised 15,305 participants from 36 countries and territories. Using the conventional criterion of statistical significance ( p < .05), we found that 15 (54%) of the replications provided evidence of a statistically significant effect in the same direction as the original finding. With a strict significance criterion ( p < .0001), 14 (50%) of the replications still provided such evidence, a reflection of the extremely high-powered design. Seven (25%) of the replications yielded effect sizes larger than the original ones, and 21 (75%) yielded effect sizes smaller than the original ones. The median comparable Cohen’s ds were 0.60 for the original findings and 0.15 for the replications. The effect sizes were small (< 0.20) in 16 of the replications (57%), and 9 effects (32%) were in the direction opposite the direction of the original effect. Across settings, the Q statistic indicated significant heterogeneity in 11 (39%) of the replication effects, and most of those were among the findings with the largest overall effect sizes; only 1 effect that was near zero in the aggregate showed significant heterogeneity according to this measure. Only 1 effect had a tau value greater than .20, an indication of moderate heterogeneity. Eight others had tau values near or slightly above .10, an indication of slight heterogeneity. Moderation tests indicated that very little heterogeneity was attributable to the order in which the tasks were performed or whether the tasks were administered in lab versus online. Exploratory comparisons revealed little heterogeneity between Western, educated, industrialized, rich, and democratic (WEIRD) cultures and less WEIRD cultures (i.e., cultures with relatively high and low WEIRDness scores, respectively). Cumulatively, variability in the observed effect sizes was attributable more to the effect being studied than to the sample or setting in which it was studied.

Keywords

StatisticsReplication (statistics)Statistical significanceModerationStatisticMathematicsVariation (astronomy)InteractionPsychologyDemographyEconometricsPhysics

Affiliated Institutions

Related Publications

Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015

Colin F. Camerer , Anna Dreber , Felix Holzmeister +21 more

Being able to replicate scientific findings is crucial for scientific progress<sup>1-15</sup>. We replicate 21 systematically selected experimental studies in the social science...

2018 Nature Human Behaviour 1495 citations

Small Telescopes

Uri Simonsohn

This article introduces a new approach for evaluating replication results. It combines effect-size estimation with hypothesis testing, assessing the extent to which the replicat...

2015 Psychological Science 752 citations

Do intentions predict condom use? Metaanalysis and examination of six moderator variables

Paschal Sheeran , Sheina Orbell

This study used meta‐analysis to quantify the relationship between intentions and behaviour in prospective studies of condom use. The effects of six moderator variables were als...

1998 British Journal of Social Psychology 420 citations

A simulation study of the number of events per variable in logistic regression analysis

Peter Peduzzi , John Concato , Elizabeth Kemper +2 more

We performed a Monte Carlo study to evaluate the effect of the number of events per variable (EPV) analyzed in logistic regression analysis. The simulations were based on data f...

1996 Journal of Clinical Epidemiology 8241 citations

Estimating the reproducibility of psychological science

Alexander A. Aarts , Joanna E. Anderson , Christopher Anderson +97 more

Empirically analyzing empirical evidence One of the central goals in any scientific endeavor is to understand causality. Experiments that seek to demonstrate a cause/effect rela...

2015 Science 8410 citations

Publication Info

Year: 2018
Type: article
Volume: 1
Issue: 4
Pages: 443-490
Citations: 987
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Many Labs 2: Investigating Variation in Replicability Across Samples and Settings

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

987

OpenAlex

Cite This

APA Style

                            
                                
                                    Richard Klein, 
                                
                                    Michelangelo Vianello, 
                                
                                    Fred Hasselman
                                
                                et al.
                            
                            (2018). 
                            Many Labs 2: Investigating Variation in Replicability Across Samples and Settings. 
                            Advances in Methods and Practices in Psychological Science
                            , 1
                            (4)
                            , 443-490.
                            https://doi.org/10.1177/2515245918810225
                        

Identifiers

DOI: 10.1177/2515245918810225