spotlight poster @ ICLR!

Date:

Fede’s work got selected as a spotlight poster @ the International Conference on Learning Representations!

Adolfi F, Vilas MG and Wareham T (2025). The Computational Complexity of Circuit Discovery for Inner Interpretability.

Available here.

Many proposed applications of neural networks in machine learning, cogni- tive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable al- gorithmic options. Despite advances in the design and testing of heuristics, there are concerns about their scalability and faithfulness at a time when we lack under- standing of the complexity properties of the problems they are deployed to solve. To address this, we study circuit discovery with classical and parameterized com- putational complexity theory: (1) we describe a conceptual scaffolding to reason about circuit finding queries in terms of affordances for description, explanation, prediction and control; (2) we formalize a comprehensive set of queries for mech- anistic explanation, and propose a formal framework for their analysis; (3) we use it to settle the complexity of many query variants and relaxations of practical in- terest on multi-layer perceptrons. Our findings reveal a challenging complexity landscape. Many queries are intractable, remain fixed-parameter intractable rela- tive to model/circuit features, and inapproximable under additive, multiplicative, and probabilistic approximation schemes. To navigate this landscape, we prove there exist transformations to tackle some of these hard problems with better- understood heuristics, and prove the tractability or fixed-parameter tractability of more modest queries which retain useful affordances. This framework allows us to understand the scope and limits of interpretability queries, explore viable options, and compare their resource demands on existing and future architectures.

Recommended citation: Adolfi F, Vilas MG and Wareham T (2025). The Computational Complexity of Circuit Discovery for Inner Interpretability.