\(\def\mymacro{{\mathbf{\alpha,\beta,\gamma}}}\)
\(\def\va{{\mathbf{a}}}\)
\(\def\vb{{\mathbf{b}}}\)
\(\def\vc{{\mathbf{c}}}\)
\(\def\vd{{\mathbf{d}}}\)
\(\def\ve{{\mathbf{e}}}\)
\(\def\vf{{\mathbf{f}}}\)
\(\def\vg{{\mathbf{g}}}\)
\(\def\vh{{\mathbf{h}}}\)
\(\def\vi{{\mathbf{i}}}\)
\(\def\vj{{\mathbf{j}}}\)
\(\def\vk{{\mathbf{k}}}\)
\(\def\vl{{\mathbf{l}}}\)
\(\def\vm{{\mathbf{m}}}\)
\(\def\vn{{\mathbf{n}}}\)
\(\def\vo{{\mathbf{o}}}\)
\(\def\vp{{\mathbf{p}}}\)
\(\def\vq{{\mathbf{q}}}\)
\(\def\vr{{\mathbf{r}}}\)
\(\def\vs{{\mathbf{s}}}\)
\(\def\vt{{\mathbf{t}}}\)
\(\def\vu{{\mathbf{u}}}\)
\(\def\vv{{\mathbf{v}}}\)
\(\def\vw{{\mathbf{w}}}\)
\(\def\vx{{\mathbf{x}}}\)
\(\def\vy{{\mathbf{y}}}\)
\(\def\vz{{\mathbf{z}}}\)
\(\def\vmu{{\mathbf{\mu}}}\)
\(\def\vsigma{{\mathbf{\sigma}}}\)
\(\def\vtheta{{\mathbf{\theta}}}\)
\(\def\vzero{{\mathbf{0}}}\)
\(\def\vone{{\mathbf{1}}}\)
\(\def\vell{{\mathbf{\ell}}}\)
\(\def\mA{{\mathbf{A}}}\)
\(\def\mB{{\mathbf{B}}}\)
\(\def\mC{{\mathbf{C}}}\)
\(\def\mD{{\mathbf{D}}}\)
\(\def\mE{{\mathbf{E}}}\)
\(\def\mF{{\mathbf{F}}}\)
\(\def\mG{{\mathbf{G}}}\)
\(\def\mH{{\mathbf{H}}}\)
\(\def\mI{{\mathbf{I}}}\)
\(\def\mJ{{\mathbf{J}}}\)
\(\def\mK{{\mathbf{K}}}\)
\(\def\mL{{\mathbf{L}}}\)
\(\def\mM{{\mathbf{M}}}\)
\(\def\mN{{\mathbf{N}}}\)
\(\def\mO{{\mathbf{O}}}\)
\(\def\mP{{\mathbf{P}}}\)
\(\def\mQ{{\mathbf{Q}}}\)
\(\def\mR{{\mathbf{R}}}\)
\(\def\mS{{\mathbf{S}}}\)
\(\def\mT{{\mathbf{T}}}\)
\(\def\mU{{\mathbf{U}}}\)
\(\def\mV{{\mathbf{V}}}\)
\(\def\mW{{\mathbf{W}}}\)
\(\def\mX{{\mathbf{X}}}\)
\(\def\mY{{\mathbf{Y}}}\)
\(\def\mZ{{\mathbf{Z}}}\)
\(\def\mStilde{\mathbf{\tilde{\mS}}}\)
\(\def\mGtilde{\mathbf{\tilde{\mG}}}\)
\(\def\mGoverline{{\mathbf{\overline{G}}}}\)
\(\def\mBeta{{\mathbf{\beta}}}\)
\(\def\mPhi{{\mathbf{\Phi}}}\)
\(\def\mLambda{{\mathbf{\Lambda}}}\)
\(\def\mSigma{{\mathbf{\Sigma}}}\)
\(\def\tA{{\mathbf{\mathsf{A}}}}\)
\(\def\tB{{\mathbf{\mathsf{B}}}}\)
\(\def\tC{{\mathbf{\mathsf{C}}}}\)
\(\def\tD{{\mathbf{\mathsf{D}}}}\)
\(\def\tE{{\mathbf{\mathsf{E}}}}\)
\(\def\tF{{\mathbf{\mathsf{F}}}}\)
\(\def\tG{{\mathbf{\mathsf{G}}}}\)
\(\def\tH{{\mathbf{\mathsf{H}}}}\)
\(\def\tI{{\mathbf{\mathsf{I}}}}\)
\(\def\tJ{{\mathbf{\mathsf{J}}}}\)
\(\def\tK{{\mathbf{\mathsf{K}}}}\)
\(\def\tL{{\mathbf{\mathsf{L}}}}\)
\(\def\tM{{\mathbf{\mathsf{M}}}}\)
\(\def\tN{{\mathbf{\mathsf{N}}}}\)
\(\def\tO{{\mathbf{\mathsf{O}}}}\)
\(\def\tP{{\mathbf{\mathsf{P}}}}\)
\(\def\tQ{{\mathbf{\mathsf{Q}}}}\)
\(\def\tR{{\mathbf{\mathsf{R}}}}\)
\(\def\tS{{\mathbf{\mathsf{S}}}}\)
\(\def\tT{{\mathbf{\mathsf{T}}}}\)
\(\def\tU{{\mathbf{\mathsf{U}}}}\)
\(\def\tV{{\mathbf{\mathsf{V}}}}\)
\(\def\tW{{\mathbf{\mathsf{W}}}}\)
\(\def\tX{{\mathbf{\mathsf{X}}}}\)
\(\def\tY{{\mathbf{\mathsf{Y}}}}\)
\(\def\tZ{{\mathbf{\mathsf{Z}}}}\)
\(\def\gA{{\mathcal{A}}}\)
\(\def\gB{{\mathcal{B}}}\)
\(\def\gC{{\mathcal{C}}}\)
\(\def\gD{{\mathcal{D}}}\)
\(\def\gE{{\mathcal{E}}}\)
\(\def\gF{{\mathcal{F}}}\)
\(\def\gG{{\mathcal{G}}}\)
\(\def\gH{{\mathcal{H}}}\)
\(\def\gI{{\mathcal{I}}}\)
\(\def\gJ{{\mathcal{J}}}\)
\(\def\gK{{\mathcal{K}}}\)
\(\def\gL{{\mathcal{L}}}\)
\(\def\gM{{\mathcal{M}}}\)
\(\def\gN{{\mathcal{N}}}\)
\(\def\gO{{\mathcal{O}}}\)
\(\def\gP{{\mathcal{P}}}\)
\(\def\gQ{{\mathcal{Q}}}\)
\(\def\gR{{\mathcal{R}}}\)
\(\def\gS{{\mathcal{S}}}\)
\(\def\gT{{\mathcal{T}}}\)
\(\def\gU{{\mathcal{U}}}\)
\(\def\gV{{\mathcal{V}}}\)
\(\def\gW{{\mathcal{W}}}\)
\(\def\gX{{\mathcal{X}}}\)
\(\def\gY{{\mathcal{Y}}}\)
\(\def\gZ{{\mathcal{Z}}}\)
\(\def\sA{{\mathbb{A}}}\)
\(\def\sB{{\mathbb{B}}}\)
\(\def\sC{{\mathbb{C}}}\)
\(\def\sD{{\mathbb{D}}}\)
\(\def\sF{{\mathbb{F}}}\)
\(\def\sG{{\mathbb{G}}}\)
\(\def\sH{{\mathbb{H}}}\)
\(\def\sI{{\mathbb{I}}}\)
\(\def\sJ{{\mathbb{J}}}\)
\(\def\sK{{\mathbb{K}}}\)
\(\def\sL{{\mathbb{L}}}\)
\(\def\sM{{\mathbb{M}}}\)
\(\def\sN{{\mathbb{N}}}\)
\(\def\sO{{\mathbb{O}}}\)
\(\def\sP{{\mathbb{P}}}\)
\(\def\sQ{{\mathbb{Q}}}\)
\(\def\sR{{\mathbb{R}}}\)
\(\def\sS{{\mathbb{S}}}\)
\(\def\sT{{\mathbb{T}}}\)
\(\def\sU{{\mathbb{U}}}\)
\(\def\sV{{\mathbb{V}}}\)
\(\def\sW{{\mathbb{W}}}\)
\(\def\sX{{\mathbb{X}}}\)
\(\def\sY{{\mathbb{Y}}}\)
\(\def\sZ{{\mathbb{Z}}}\)
\(\def\E{{\mathbb{E}}}\)
\(\def\jac{{\mathbf{\mathrm{J}}}}\)
\(\def\argmax{{\mathop{\mathrm{arg}\,\mathrm{max}}}}\)
\(\def\argmin{{\mathop{\mathrm{arg}\,\mathrm{min}}}}\)
\(\def\Tr{{\mathop{\mathrm{Tr}}}}\)
\(\def\diag{{\mathop{\mathrm{diag}}}}\)
\(\def\vec{{\mathop{\mathrm{vec}}}}\)
\(\def\Kern{{\mathop{\mathrm{Kern}}}}\)
\(\def\llbracket{⟦}\)
\(\def\rrbracket{⟧}\)

Curriculum vitæ: Felix Dangel

Table of Contents

(Download as pdf)

Bio: Felix Dangel is an incoming assistant professor at Concordia University in the Department of Computer Science and Software Engineering, and an Associate Academic Member of Mila. His research advances ML algorithms and software through consideration of quantities beyond the gradient, such as curvature information. His research interests include

Before, he completed a Postdoc at the Vector Institute in Toronto, obtained a PhD in Computer Science from the University of Tübingen, and a Master's and Bachelor's degree in Physics from the University of Stuttgart.

Education

Since 2026 Assistant Professor, Concordia University, Department of Computer Science & Software Engineering, Associate member at Mila – Quebec Artificial Intelligence Institute, Montreal. Research topics: Automatic Differentiation and Learning Algorithms Beyond the Gradient
2023–2026 Postdoctoral researcher, Vector Institute, Toronto. With: Yaoliang Yu, Roger Grosse. Research statement: Deep Learning Needs More Than Just the Gradient
2018–2023 PhD in Computer Science, Max Planck Institute for Intelligent Systems & University of Tübingen. Advisor: Philipp Hennig. Thesis: Backpropagation beyond the Gradient
2017–2018 Research assistant, University of Stuttgart. Host: Institute for Theoretical Physics 1. Paper: Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
2015–2017 MSc in Physics, University of Stuttgart. Advisor: Holger Cartarius. Thesis: Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
2012–2015 BSc in Physics, University of Stutgart. Advisor: Günter Wunner. Thesis: Microscopic description of a coupling process for \(\mathcal{PT}\!\!\!\) -symmetric Bose-Einstein condensates

Publications

Equal contributions highlighted with *.

2026 (pre-print) Sketching Low-Rank Plus Diagonal Matrices. A. Fernandez, F. Dangel, P. Hennig, F. Schneider (arXiv)
2026 (ICLR) Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature. A. Porrello, P. Buzzega, F. Dangel, T. Sommariva, R. Salami, L. Bonicelli, S. Calderara
2026 (ICLR) Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization. W. Lin, S. C. Lowe, F. Dangel , R. Eschenhagen, Z. Xu, R. B. Grosse (arXiv)
2026 (AISTATS) Efficient Bilevel Optimization with KFAC-Based Hypergradients. D. Liao, F. Dangel, Y. Yu
2025 (pre-print) Kronecker-factored Approximate Curvature (KFAC) From Scratch. F. Dangel* , T. Weber*, B. Muksányi*, R. Eschenhagen (arXiv, code)
2025 (NeurIPS) Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization. A. Guzmán-Cordero, F. Dangel, G. Goldshlager, M. Zeinhofer (arXiv)
2025 (NeurIPS) Collapsing Taylor Mode Automatic Differentiation. F. Dangel*, T. Siebert*, M. Zeinhofer, A. Walther (arXiv, code)
2025 (pre-print) Position: Curvature Matrices Should Be Democratized via Linear Operators. F. Dangel* , R. Eschenhagen*, W. Ormaniec, A. Fernandez, L. Tatzel, A. Kristiadi (arXiv, code)
2025 (pre-print) Spectral-factorized Positive-definite Curvature Learning for NN Training W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. E. Turner, R. B. Grosse (arXiv)
2025 (ICML spotlight) Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator. Y. X. Li, F. Dangel, D. Tam, C. Raffel (pdf)
2025 (ICML spotlight) Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It. M. F. da Silva, F. Dangel, S. Oore (pdf, arXiv)
2025 (ICLR spotlight) What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis. W. Ormaniec, F. Dangel, S. P. Singh (pdf, arXiv, video)
2024 (NeurIPS) Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks. F. Dangel*, J. Mueller*, M. Zeinhofer* (pdf, arXiv, video, poster, code, slides)
2024 (NeurIPS) Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods. F. Dangel (pdf, arXiv, code, video, poster, slides)
2024 (ICML workshop) Lowering PyTorch's Memory Consumption for Selective Differentiation. S. Bhatia, F. Dangel (pdf, arXiv, code, poster, bug report)
2024 (ICML) Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning. M. Elsayed, H. Farrahi, F. Dangel, R. Mahmood (pdf, arXiv, code)
2024 (ICML) Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective. W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. Turner, A. Makhzani (pdf, arXiv, poster, code)
2024 (ICML) Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC. W. Lin*, F. Dangel*, R. Eschenhagen, K. Neklyudov, A. Kristiadi, R. Turner, A. Makhzani (pdf, arXiv, code, poster)
2023 (pre-print) On the Disconnect Between Theory and Practice of Overparametrized Neural Networks. J. Wenger, F. Dangel, A. Kristiadi (pdf, arXiv)
2023 (NeurIPS) The Geometry of Neural Nets' Parameter Spaces Under Reparametrization. A. Kristiadi, F. Dangel, P. Hennig (pdf, arXiv)
2023 (PhD thesis) Backpropagation Beyond the Gradient. F. Dangel (pdf, source, template)
2022 (TMLR) ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure. F. Dangel*, L. Tatzel*, P. Hennig (pdf, journal, arXiv, code, www)
2021 (NeurIPS) Cockpit: A Practical Debugging Tool for Training Deep Neural Networks. F. Schneider*, F. Dangel*, P. Hennig (pdf, conference, arXiv, code, www, video)
2020 (ICLR spotlight) BackPACK: Packing more into backprop. F. Dangel*, F. Kunstner*, P. Hennig (pdf, conference, arXiv, code, www, video)
2020 (AISTATS) Modular Block-diagonal Curvature Approximations for Feedforward Architectures. F. Dangel, S. Harmeling, P. Hennig (pdf, conference, arXiv, code, video)
2018 (Phys. Rev. A) Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model. F. Dangel*, M. Wagner*, H. Cartarius, J. Main, G. Wunner (pdf, journal, arXiv)
2018 (Acta Polytechnica) Numerical calculation of the complex Berry phase in non-Hermitian systems. M. Wagner, F. Dangel, H. Cartarius, J. Main, G. Wunner (pdf, journal, arXiv)
2017 (Master thesis) Bosonic many-body systems with topologically nontrivial phases subject to gain and loss. F. Dangel (pdf)
2015 (Bachelor thesis) Mikroskopische Beschreibung eines Einkoppelprozesses für PT-symmetrische Bose-Einstein-Kondensate. F. Dangel (pdf, German only)

Talks & Media Appearances

2025 Invited talk, 'Overparametrization, Regularization, Identifiability and Uncertainty in Machine Learning' workshop, Mathematisches Forschungsinstitut Oberwolfach. "Accelerating Taylor mode automatic differentiation" (slides)
2024 Vector Research Day. "Convolutions and More as Einsum" (slides, video)
2024 Short one-to-one presentation for Geoffrey Hinton for a Swedish TV event (footage: 1, 2)
2024 Invited talk, Cerebras Systems seminar and Graham Taylor's group meeting. "Convolutions Through The Lens of Tensor Networks" (slides)
2024 Tutorial, Collin Raffel's group meeting. "Large-scale Linear Algebra with Curvature Matrices" (notes)
2023 Invited talk, Perimeter Institute Machine Learning Initiative seminar. "Deep Learning Convolutions Through the Lens of Tensor Networks" (recording, slides)
2022 Invited talk. ELISE Theory Workshop on ML Fundamentals 2022, EURECOM, Sofia Antipolis
2019 Invited talk, seminar for Integrated Engineering students, DHBW CAS, Heilbronn
2017 Conference talk, DPG Spring Meeting of the atomic Physics and quantum optics section, Mainz

Workshops & Events

2025 Poster presentation, OPT25 workshop at NeurIPS. "Understanding and Improving Shampoo via Kullback–Leibler Minimization"
2024 Poster presentation, WANT workshop at ICML. "Lowering PyTorch's Memory Consumption for Selective Differentiation"
2023 Poster presentation, OPT23 workshop at NeurIPS. "Structured Inverse-Free Natural Gradient Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets" (poster)
2022 Poster presentation, ELLIS Doctoral Symposium (EDS) 2022, Alicante (poster)
2022 Poster presentation. ELLIS Theory Workshop 2022, Arenzano
2022 Session chair, Franco-German Research and Innovation Network on AI
2021 Co-organizer, ELLIS Doctoral Symposium (EDS) 2021, Tübingen
2015 Participant, "Ferienakademie 2015" summer school, organized by the TU Munich, the University of Stuttgart, and the FAU Erlangen in Sarntal (Northern Italy), talk about Lattice Boltzmann Methods
2014 Participant, "Ferienakademie 2014" summer school, organized by the TU Munich, the University of Stuttgart, and the FAU Erlangen in Sarntal (Northern Italy), talk about NMR & MRI

Teaching

2018–2022 Teaching assistant (course design, supervision, and evaluation of 7 courses in total), software development practicals, University of Tübingen. Details: Three PhD students supervise ~15 students whose task is to develop a machine learning prediction system for the German soccer league over the course of one term (example). The overall workload for a student is ~180 hours and the focus lies heavily on teaching good software development practices.

Supervision

Since 2025 Research project co-supervisor: Nikita Dhawan (PhD student, University of Toronto) with Roger Grosse. Function space distance approximations for transformers
Since 2025 Research project co-supervisor: Runshi Yang (undergraduate student, University of Toronto) with Wu Lin. Kronecker-factored curvature approximations for influence functions
Since 2024 Research project co-supervisor: Disen Liao (PhD student, University of Waterloo) with Yaoliang Yu. Bi-level optimization problems for AI safety (AISTATS 2026 paper)
Since 2023 Research project co-supervisor: Marvin F. da Silva (PhD student, Dalhousie University) with Sageev Oore. Symmetry-agnostic algorithms for deep learning (ICML 2025 spotlight)
2024–2025 Research project co-supervisor: Andrés Guzmán-Cordero (Master student, University of Amsterdam; after: PhD student, Université de Montréal and Mila) with Marius Zeinhofer and Gil Goldshlager. Accelerating second-order optimizers for Physics-informed neural networks (NeurIPS 2025 paper)
2024–2025 Thesis project co-supervisor: Yu Xin Li (undergraduate student, University of Toronto) with Derek Tam and Colin Raffel. Gradient accumulators as Fisher proxies (ICML 2025 spotlight)
2024–2025 Master thesis co-supervisor: Weronika Ormaniec (Master student, ETH Zürich; after: PhD student, ETH Zürich) with Sidak Pal Singh. Theoretical investigation of the Hessian and loss landscape of transformers (ICLR 2025 spotlight + Silver Medal of ETH Zürich for outstanding thesis)
2023–2024 Research project supervisor: Samarth Bhatia (undergraduate student, IIT Delhi). Randomized and selective autodiff (ICML 2024 workshop paper)
2022 Summer internship supervisor: Elisabeth Knigge (high school student). Visualization methods to make deep learning optimization more accessible to non-experts.
2021–2022 Research assistant supervisor: Jessica Bader (Master student, University of Tübingen; after: PhD student, Helmholtz Munich/Technical University of Munich). Extending BackPACK with Kronecker-factorized curvature for Bayesian deep learning, including an interface for negative log-likelihood losses to support KFAC and enable Laplace approximations via laplace-torch
2021 Master thesis supervisor: Tim Schäfer (Master student, University of Tübingen; after: PhD student, University of Tübingen). Extending BackPACK to ResNets and recurrent architectures
2020–2021 Research assistant supervisor: Shrisha Bharadwaj (Master student, University of Tübingen; after: PhD student, Max Planck Institute for Intelligent Systems). Improving BackPACK’s code quality (tests, docstrings) and extending convolution support from 2D to 1D and 3D.
2019–2020 Research project supervisor: Paul Fischer (Master student, University of Tübingen; after: PhD student, University of Lucerne). Hessian backpropagation for batch normalization
2019 Bachelor thesis supervisor: Christian Meier (University of Tübingen). Activity prediction in smart home environments via Markov models

Reviewing

2026 Area chair: ICML
2025 Area chair: ICML. Reviewer: NeurIPS (complimentary registration award as 'top reviewer'), ICML CODEML workshop
2024 Reviewer: ICML, ICLR, AISTATS, NeurIPS (complimentary registration award as 'top reviewer'), NeurIPS OPT workshop
2023 Reviewer: NeurIPS, Vector Scholarship in Artificial Intelligence
2022 Reviewer: ICML, NeurIPS HITY workshop
2021 Reviewer: ICML, NeurIPS, JMLR
2020 Reviewer: ICML, NeurIPS

EDI Involvement

2026 Mentor: Jethro Odeyemi (PhD student, University of Saskatchewan) Indigenous and Black Engineering and Technology (IBET) PhD Project
2025 Mentor: Darren Dahunsi (PhD student, University of Alberta), Indigenous and Black Engineering and Technology (IBET) PhD Project
2024 Mentor, Indigenous and Black Engineering and Technology (IBET) PhD Project
2024 Mentor, NeurIPS Black in AI workshop
2023 Representative for Vector Institute and mentor, NeurIPS Black in AI workshop and Q&A session

Created: 2026-02-11 Wed 14:13

Validate