2 | Raffaello Seri's webpage

AI-assisted teams outperform AI-led teams but not human-only teams in assessing research reproducibility in quantitative social science

Large Language Models (LLMs) such as ChatGPT are transforming how scientists conduct and validate research, offering promise as tools to improve scientific reproducibility. However, computational reproducibility and error detection remain expensive and labor-intensive. We experimentally test how collaboration between researchers and LLM assistants influences the reproduction of quantitative social science findings across different levels of AI autonomy. We randomly assigned 288 researchers to 103 teams working under three conditions: human-only, AI-assisted (using ChatGPT as a collaborative tool), or AI-led (ChatGPT operating with minimal human oversight). Teams reproduced published results from leading social science journals, detected coding errors, and proposed robustness checks. Human-only and AI-assisted teams achieved comparable reproduction rates (94% vs. 91%) and performed similarly on most outcomes, except human-only teams identified significantly more major coding errors. Both substantially outperformed AI-led teams, which achieved only a 37% reproduction rate, detected fewer errors across all categories, proposed weaker robustness checks, and required more time. This autonomous approach, however, likely represents only a lower bound of AI capabilities. Despite rapid model advances, expert human judgment currently remains indispensable for reliable empirical verification. While AI assistance did not degrade most outcomes, it provided no measurable advantages and was associated with reduced detection of major errors. However, the 37% autonomous reproduction rate indicates that AI could provide value in settings where scale or cost constraints preclude human review of papers, even though general-purpose LLMs offer no immediate advantages for human-supervised verification.

Model Trustworthiness and Modeler Responsibility in Economic Agent-Based Modeling practices: a Meta-Analytical approach

Establishing credibility and trustworthiness is essential in Economic Agent-Based Modeling (ABM), where clear epistemic standards cannot be defined a priori. In this paper, we first review the notions of trustworthiness and credibility in modeling. We then introduce a framework that emphasizes the modeler’s epistemic responsibility to ensure coherence between modeling purposes, strategies and targets. We examine the challenges in assessing model reliability that arise from the interaction of conceptual, algorithmic and computational constituents, and we propose a meta-analytical approach to enhance model consistency by conceptualizing ABMs as iterated analogies. Our analysis outlines strategies for improving model accessibility and reliability while highlighting the modeler’s role in preventing mistargeting and misuse. This research provides a normative basis for justifying the credibility of both idealized and targetless models by promoting transparency and consistency between model design and intended purposes.

Investigating the analytical robustness of the social and behavioural sciences

The same dataset can be analysed in different justifiable ways to answer the same research question, potentially challenging the robustness of empirical science. In this crowd initiative, we investigated the degree to which research findings in the social and behavioural sciences are contingent on analysts’ choices. We examined a stratified random sample of 100 studies published between 2009 and 2018, in which, for one claim per study, at least five reanalysts independently reanalysed the original data. The statistical appropriateness of the reanalyses was assessed in peer evaluations, and the robustness indicators were inspected along a range of research characteristics and study designs. We found that 34% of the independent reanalyses yielded the same result (within a tolerance region of ±0.05 Cohen’s *d*) as the original report; with a four times broader tolerance region, this indicator increased to 57%. Of the reanalyses conducted, 74% reached the same conclusion as the original investigation, 24% yielded no effects or inconclusive results and 2% reported the opposite effect. This exploratory study indicates that the common single-path analyses in social and behavioural research should not be simply assumed to be robust to alternative analyses. Therefore, we recommend the development and use of practices to explore and communicate this neglected source of uncertainty.

The Analogical Roots of Agent-Based Modeling in Economics and Social Sciences: The case of Innovation Dynamics

Agent-based modeling (ABM) is a simulation technique which has been increasingly integrated into the economic discipline in order to understand complex systems. However, most of everyday research activities rely on the researchers' consensus concerning practical choices about modeling strategies, computational boundaries under scrutiny and the extent of empirical validation. Particularly lacking are reflections on the semantic construction of conceptual models. The paper reviews existing theoretical frameworks leading to understanding ABM as a technique, where the cognitive processing instantiated by the instrument is distributed across different modeling layers, including conceptual, algorithmic and computational ones, which can be interpreted as an interlinked set of analogies. Then, it introduces a framework for assessing ABM conceptual adequacy and tests it on two families of models in the field of economics of innovation, revealing several modeling constraints.

Valorisation of Organic Waste Through Black Soldier Fly: On the Way of a Real Circular Bioeconomy Process

The transition from a linear to a circular production system involves transforming waste (such as the organic fraction of municipal solid waste, OFMSW) into valuable resources. Insect-mediated bioconversion, particularly using black soldier fly (BSF) larvae, can offer a promising opportunity to convert OFMSW into protein-rich biomass. However, current regulatory restrictions limit the use of insect proteins for animal feed, prompting the exploration of other applications, such as the production of bioplastics. Here, we explored an innovative and integrated circular supply chain model which aims to valorise the OFMSW through BSF larvae for the production of biobased materials with high technological value. BSF larvae reared on the OFMSW showed excellent growth performance and bioconversion rate of the substrate. The use of well-suited extraction methods allowed the isolation of high-purity lipids, proteins, and chitin fractions, suitable building blocks to produce biobased materials. In particular, the protein fraction was used to develop biodegradable plastic films which showed potential for replacing traditional petroleum-based materials, with the promise to be fully recycle back to amino acids, thus promoting a circular economy process. Socioeconomic analysis highlighted values generated along the entire supply chain, and life cycle assessment pointed out that lipid extraction was the most challenging step. Implementation of more sustainable methods is thus needed to reduce the overall environmental impact of the proposed chain. In conclusion, this study represents a proof of concept gathering evidence to support the feasibility of an alternative supply chain that can promote circular economy while valorising organic waste.

Asymptotic Distributions of Covering and Separation Measures on the Hypersphere

We consider measures of covering and separation that are expressed through maxima and minima of distances between points of an hypersphere. We investigate the behavior of these measures when applied to a sample of independent and uniformly distributed points. In particular, we derive their asymptotic distributions when the number of points diverges. These results can be useful as a benchmark against which deterministic point sets can be evaluated. Whenever possible, we supplement the rigorous derivation of these limiting distributions with some heuristic reasonings based on extreme value theory. As a by-product, we provide a proof for a conjecture on the hole radius associated to a facet of the convex hull of points distributed on the hypersphere.

Examining the context sensitivity of research findings from archival data

This initiative examined systematically the extent to which a large set of archival research findings generalizes across contexts. We repeated the key analyses for 29 original strategic management effects in the same context (direct reproduction) as well as in 52 novel time periods and geographies; 45% of the reproductions returned results matching the original reports together with 55% of tests in different spans of years and 40% of tests in novel geographies. Some original findings were associated with multiple new tests. Reproducibility was the best predictor of generalizability—for the findings that proved directly reproducible, 84% emerged in other available time periods and 57% emerged in other geographies. Overall, only limited empirical evidence emerged for context sensitivity. In a forecasting survey, independent scientists were able to anticipate which effects would find support in tests in new samples.

Computing the Asymptotic Distribution of Second-order $U$- and $V$-statistics

Under general conditions, the asymptotic distribution of degenerate second-order $U$- and $V$-statistics is an (infinite) weighted sum of $\chi^2$ random variables whose weights are the eigenvalues of an integral operator associated with the kernel of the statistic. Also the behavior of the statistic in terms of power can be characterized through the eigenvalues and the eigenfunctions of the same integral operator. No general algorithm seems to be available to compute these quantities starting from the kernel of the statistic. An algorithm is proposed to approximate (as precisely as needed) the asymptotic distribution and the power of the test statistics, and to build several measures of performance for tests based on $U$- and $V$-statistics. The algorithm uses the Wielandt–Nyström method of approximation of an integral operator based on quadrature, and can be used with several methods of numerical integration. An extensive numerical study shows that the Wielandt–Nyström method based on Clenshaw–Curtis quadrature performs very well both for the eigenvalues and the eigenfunctions.

Asymptotic Properties of the Plug-in Estimator of the Discrete Entropy under Dependence

We consider the estimation of the entropy of a discretely-supported time series through a plug-in estimator. We provide a correction of the bias and we study the asymptotic properties of the estimator. We show that the widely-used correction proposed by Roulston (1999) is incorrect as it does not remove the $O\left(N^{-1}\right)$ part of the bias while ours does. We provide the asymptotic distribution and we show that it differs when the values taken by the marginal distribution of the process are equiprobable (a situation that we call *degeneracy*) and when they are not. We introduce estimators of the bias, the variance and the distribution under degeneracy and we study the estimation error. Finally, we propose a goodness-of-fit test based on entropy and give two motivations for it. The theoretical results are supported by specific numerical examples.

On the quest for defining organisational plasticity: a community modelling experiment

Purpose – This viewpoint article is concerned with an attempt to advance organisational plasticity (OP) modelling concepts by using a novel community modelling framework (PhiloLab) from the social simulation community to drive the process of idea generation. In addition, the authors want to feed back their experience with PhiloLab as they believe that this way of idea generation could also be of interest to the wider evidence-based human resource management (EBHRM) community. Design/methodology/approach – The authors used some workshop sessions to brainstorm new conceptual ideas in a structured and efficient way with a multidisciplinary group of 14 (mainly academic) participants using PhiloLab. This is a tool from the social simulation community, which stimulates and formally supports discussions about philosophical questions of future societal models by means of developing conceptual agent-based simulation models. This was followed by an analysis of the qualitative data gathered during the PhiloLab sessions, feeding into the definition of a set of primary axioms of a plastic organisation. Findings – The PhiloLab experiment helped with defining a set of primary axioms of a plastic organisation, which are presented in this viewpoint article. The results indicated that the problem was rather complex, but it also showed good potential for an agent-based simulation model to tackle some of the key issues related to OP. The experiment also showed that PhiloLab was very useful in terms of knowledge and idea gathering. Originality/value – Through information gathering and open debates on how to create an agent-based simulation model of a plastic organisation, the authors could identify some of the characteristics of OP and start structuring some of the parameters for a computational simulation. With the outcome of the PhiloLab experiment, the authors are paving the way towards future exploratory computational simulation studies of OP.