\(\def\mymacro{{\mathbf{\alpha,\beta,\gamma}}}\)
\(\def\va{{\mathbf{a}}}\)
\(\def\vb{{\mathbf{b}}}\)
\(\def\vc{{\mathbf{c}}}\)
\(\def\vd{{\mathbf{d}}}\)
\(\def\ve{{\mathbf{e}}}\)
\(\def\vf{{\mathbf{f}}}\)
\(\def\vg{{\mathbf{g}}}\)
\(\def\vh{{\mathbf{h}}}\)
\(\def\vi{{\mathbf{i}}}\)
\(\def\vj{{\mathbf{j}}}\)
\(\def\vk{{\mathbf{k}}}\)
\(\def\vl{{\mathbf{l}}}\)
\(\def\vm{{\mathbf{m}}}\)
\(\def\vn{{\mathbf{n}}}\)
\(\def\vo{{\mathbf{o}}}\)
\(\def\vp{{\mathbf{p}}}\)
\(\def\vq{{\mathbf{q}}}\)
\(\def\vr{{\mathbf{r}}}\)
\(\def\vs{{\mathbf{s}}}\)
\(\def\vt{{\mathbf{t}}}\)
\(\def\vu{{\mathbf{u}}}\)
\(\def\vv{{\mathbf{v}}}\)
\(\def\vw{{\mathbf{w}}}\)
\(\def\vx{{\mathbf{x}}}\)
\(\def\vy{{\mathbf{y}}}\)
\(\def\vz{{\mathbf{z}}}\)
\(\def\vmu{{\mathbf{\mu}}}\)
\(\def\vsigma{{\mathbf{\sigma}}}\)
\(\def\vtheta{{\mathbf{\theta}}}\)
\(\def\vzero{{\mathbf{0}}}\)
\(\def\vone{{\mathbf{1}}}\)
\(\def\vell{{\mathbf{\ell}}}\)
\(\def\mA{{\mathbf{A}}}\)
\(\def\mB{{\mathbf{B}}}\)
\(\def\mC{{\mathbf{C}}}\)
\(\def\mD{{\mathbf{D}}}\)
\(\def\mE{{\mathbf{E}}}\)
\(\def\mF{{\mathbf{F}}}\)
\(\def\mG{{\mathbf{G}}}\)
\(\def\mH{{\mathbf{H}}}\)
\(\def\mI{{\mathbf{I}}}\)
\(\def\mJ{{\mathbf{J}}}\)
\(\def\mK{{\mathbf{K}}}\)
\(\def\mL{{\mathbf{L}}}\)
\(\def\mM{{\mathbf{M}}}\)
\(\def\mN{{\mathbf{N}}}\)
\(\def\mO{{\mathbf{O}}}\)
\(\def\mP{{\mathbf{P}}}\)
\(\def\mQ{{\mathbf{Q}}}\)
\(\def\mR{{\mathbf{R}}}\)
\(\def\mS{{\mathbf{S}}}\)
\(\def\mT{{\mathbf{T}}}\)
\(\def\mU{{\mathbf{U}}}\)
\(\def\mV{{\mathbf{V}}}\)
\(\def\mW{{\mathbf{W}}}\)
\(\def\mX{{\mathbf{X}}}\)
\(\def\mY{{\mathbf{Y}}}\)
\(\def\mZ{{\mathbf{Z}}}\)
\(\def\mStilde{\mathbf{\tilde{\mS}}}\)
\(\def\mGtilde{\mathbf{\tilde{\mG}}}\)
\(\def\mGoverline{{\mathbf{\overline{G}}}}\)
\(\def\mBeta{{\mathbf{\beta}}}\)
\(\def\mPhi{{\mathbf{\Phi}}}\)
\(\def\mLambda{{\mathbf{\Lambda}}}\)
\(\def\mSigma{{\mathbf{\Sigma}}}\)
\(\def\tA{{\mathbf{\mathsf{A}}}}\)
\(\def\tB{{\mathbf{\mathsf{B}}}}\)
\(\def\tC{{\mathbf{\mathsf{C}}}}\)
\(\def\tD{{\mathbf{\mathsf{D}}}}\)
\(\def\tE{{\mathbf{\mathsf{E}}}}\)
\(\def\tF{{\mathbf{\mathsf{F}}}}\)
\(\def\tG{{\mathbf{\mathsf{G}}}}\)
\(\def\tH{{\mathbf{\mathsf{H}}}}\)
\(\def\tI{{\mathbf{\mathsf{I}}}}\)
\(\def\tJ{{\mathbf{\mathsf{J}}}}\)
\(\def\tK{{\mathbf{\mathsf{K}}}}\)
\(\def\tL{{\mathbf{\mathsf{L}}}}\)
\(\def\tM{{\mathbf{\mathsf{M}}}}\)
\(\def\tN{{\mathbf{\mathsf{N}}}}\)
\(\def\tO{{\mathbf{\mathsf{O}}}}\)
\(\def\tP{{\mathbf{\mathsf{P}}}}\)
\(\def\tQ{{\mathbf{\mathsf{Q}}}}\)
\(\def\tR{{\mathbf{\mathsf{R}}}}\)
\(\def\tS{{\mathbf{\mathsf{S}}}}\)
\(\def\tT{{\mathbf{\mathsf{T}}}}\)
\(\def\tU{{\mathbf{\mathsf{U}}}}\)
\(\def\tV{{\mathbf{\mathsf{V}}}}\)
\(\def\tW{{\mathbf{\mathsf{W}}}}\)
\(\def\tX{{\mathbf{\mathsf{X}}}}\)
\(\def\tY{{\mathbf{\mathsf{Y}}}}\)
\(\def\tZ{{\mathbf{\mathsf{Z}}}}\)
\(\def\gA{{\mathcal{A}}}\)
\(\def\gB{{\mathcal{B}}}\)
\(\def\gC{{\mathcal{C}}}\)
\(\def\gD{{\mathcal{D}}}\)
\(\def\gE{{\mathcal{E}}}\)
\(\def\gF{{\mathcal{F}}}\)
\(\def\gG{{\mathcal{G}}}\)
\(\def\gH{{\mathcal{H}}}\)
\(\def\gI{{\mathcal{I}}}\)
\(\def\gJ{{\mathcal{J}}}\)
\(\def\gK{{\mathcal{K}}}\)
\(\def\gL{{\mathcal{L}}}\)
\(\def\gM{{\mathcal{M}}}\)
\(\def\gN{{\mathcal{N}}}\)
\(\def\gO{{\mathcal{O}}}\)
\(\def\gP{{\mathcal{P}}}\)
\(\def\gQ{{\mathcal{Q}}}\)
\(\def\gR{{\mathcal{R}}}\)
\(\def\gS{{\mathcal{S}}}\)
\(\def\gT{{\mathcal{T}}}\)
\(\def\gU{{\mathcal{U}}}\)
\(\def\gV{{\mathcal{V}}}\)
\(\def\gW{{\mathcal{W}}}\)
\(\def\gX{{\mathcal{X}}}\)
\(\def\gY{{\mathcal{Y}}}\)
\(\def\gZ{{\mathcal{Z}}}\)
\(\def\sA{{\mathbb{A}}}\)
\(\def\sB{{\mathbb{B}}}\)
\(\def\sC{{\mathbb{C}}}\)
\(\def\sD{{\mathbb{D}}}\)
\(\def\sF{{\mathbb{F}}}\)
\(\def\sG{{\mathbb{G}}}\)
\(\def\sH{{\mathbb{H}}}\)
\(\def\sI{{\mathbb{I}}}\)
\(\def\sJ{{\mathbb{J}}}\)
\(\def\sK{{\mathbb{K}}}\)
\(\def\sL{{\mathbb{L}}}\)
\(\def\sM{{\mathbb{M}}}\)
\(\def\sN{{\mathbb{N}}}\)
\(\def\sO{{\mathbb{O}}}\)
\(\def\sP{{\mathbb{P}}}\)
\(\def\sQ{{\mathbb{Q}}}\)
\(\def\sR{{\mathbb{R}}}\)
\(\def\sS{{\mathbb{S}}}\)
\(\def\sT{{\mathbb{T}}}\)
\(\def\sU{{\mathbb{U}}}\)
\(\def\sV{{\mathbb{V}}}\)
\(\def\sW{{\mathbb{W}}}\)
\(\def\sX{{\mathbb{X}}}\)
\(\def\sY{{\mathbb{Y}}}\)
\(\def\sZ{{\mathbb{Z}}}\)
\(\def\E{{\mathbb{E}}}\)
\(\def\jac{{\mathbf{\mathrm{J}}}}\)
\(\def\argmax{{\mathop{\mathrm{arg}\,\mathrm{max}}}}\)
\(\def\argmin{{\mathop{\mathrm{arg}\,\mathrm{min}}}}\)
\(\def\Tr{{\mathop{\mathrm{Tr}}}}\)
\(\def\diag{{\mathop{\mathrm{diag}}}}\)
\(\def\vec{{\mathop{\mathrm{vec}}}}\)
\(\def\Kern{{\mathop{\mathrm{Kern}}}}\)
\(\def\llbracket{⟦}\)
\(\def\rrbracket{⟧}\)
Curriculum vitæ: Felix Dangel
Bio: Felix Dangel is an incoming assistant professor at Concordia University in the Department of Computer Science and Software Engineering, and an Associate Academic Member of Mila.
His research advances ML algorithms and software through consideration of quantities beyond the gradient, such as curvature information.
His research interests include
- Neural network training algorithms inspired by second-order methods (Shampoo, SOAP, K-FAC),
- Efficient computation and applications of higher-order derivatives in scientific ML (Taylor mode, PINNs),
- Information geometry (Fisher information), curvature approximations, and their application outside optimization (model merging, training data attribution, unlearning, bi-level problems for safety).
Before, he completed a Postdoc at the Vector Institute in Toronto, obtained a PhD in Computer Science from the University of Tübingen, and a Master's and Bachelor's degree in Physics from the University of Stuttgart.
| Since 2026 |
Assistant Professor, Concordia University, Department of Computer Science & Software Engineering, Associate member at Mila – Quebec Artificial Intelligence Institute, Montreal. Research topics: Automatic Differentiation and Learning Algorithms Beyond the Gradient |
| 2023–2026 |
Postdoctoral researcher, Vector Institute, Toronto. With: Yaoliang Yu, Roger Grosse. Research statement: Deep Learning Needs More Than Just the Gradient |
| 2018–2023 |
PhD in Computer Science, Max Planck Institute for Intelligent Systems & University of Tübingen. Advisor: Philipp Hennig. Thesis: Backpropagation beyond the Gradient |
| 2017–2018 |
Research assistant, University of Stuttgart. Host: Institute for Theoretical Physics 1. Paper: Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model |
| 2015–2017 |
MSc in Physics, University of Stuttgart. Advisor: Holger Cartarius. Thesis: Bosonic many-body systems with topologically nontrivial phases subject to gain and loss |
| 2012–2015 |
BSc in Physics, University of Stutgart. Advisor: Günter Wunner. Thesis: Microscopic description of a coupling process for \(\mathcal{PT}\!\!\!\) -symmetric Bose-Einstein condensates |
Equal contributions highlighted with *.
| 2026 (pre-print) |
Sketching Low-Rank Plus Diagonal Matrices. A. Fernandez, F. Dangel, P. Hennig, F. Schneider (arXiv) |
| 2026 (ICLR) |
Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature. A. Porrello, P. Buzzega, F. Dangel, T. Sommariva, R. Salami, L. Bonicelli, S. Calderara |
| 2026 (ICLR) |
Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization. W. Lin, S. C. Lowe, F. Dangel , R. Eschenhagen, Z. Xu, R. B. Grosse (arXiv) |
| 2026 (AISTATS) |
Efficient Bilevel Optimization with KFAC-Based Hypergradients. D. Liao, F. Dangel, Y. Yu |
| 2025 (pre-print) |
Kronecker-factored Approximate Curvature (KFAC) From Scratch. F. Dangel* , T. Weber*, B. Muksányi*, R. Eschenhagen (arXiv, code) |
| 2025 (NeurIPS) |
Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization. A. Guzmán-Cordero, F. Dangel, G. Goldshlager, M. Zeinhofer (arXiv) |
| 2025 (NeurIPS) |
Collapsing Taylor Mode Automatic Differentiation. F. Dangel*, T. Siebert*, M. Zeinhofer, A. Walther (arXiv, code) |
| 2025 (pre-print) |
Position: Curvature Matrices Should Be Democratized via Linear Operators. F. Dangel* , R. Eschenhagen*, W. Ormaniec, A. Fernandez, L. Tatzel, A. Kristiadi (arXiv, code) |
| 2025 (pre-print) |
Spectral-factorized Positive-definite Curvature Learning for NN Training W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. E. Turner, R. B. Grosse (arXiv) |
| 2025 (ICML spotlight) |
Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator. Y. X. Li, F. Dangel, D. Tam, C. Raffel (pdf) |
| 2025 (ICML spotlight) |
Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It. M. F. da Silva, F. Dangel, S. Oore (pdf, arXiv) |
| 2025 (ICLR spotlight) |
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis. W. Ormaniec, F. Dangel, S. P. Singh (pdf, arXiv, video) |
| 2024 (NeurIPS) |
Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks. F. Dangel*, J. Mueller*, M. Zeinhofer* (pdf, arXiv, video, poster, code, slides) |
| 2024 (NeurIPS) |
Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods. F. Dangel (pdf, arXiv, code, video, poster, slides) |
| 2024 (ICML workshop) |
Lowering PyTorch's Memory Consumption for Selective Differentiation. S. Bhatia, F. Dangel (pdf, arXiv, code, poster, bug report) |
| 2024 (ICML) |
Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning. M. Elsayed, H. Farrahi, F. Dangel, R. Mahmood (pdf, arXiv, code) |
| 2024 (ICML) |
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective. W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. Turner, A. Makhzani (pdf, arXiv, poster, code) |
| 2024 (ICML) |
Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC. W. Lin*, F. Dangel*, R. Eschenhagen, K. Neklyudov, A. Kristiadi, R. Turner, A. Makhzani (pdf, arXiv, code, poster) |
| 2023 (pre-print) |
On the Disconnect Between Theory and Practice of Overparametrized Neural Networks. J. Wenger, F. Dangel, A. Kristiadi (pdf, arXiv) |
| 2023 (NeurIPS) |
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization. A. Kristiadi, F. Dangel, P. Hennig (pdf, arXiv) |
| 2023 (PhD thesis) |
Backpropagation Beyond the Gradient. F. Dangel (pdf, source, template) |
| 2022 (TMLR) |
ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure. F. Dangel*, L. Tatzel*, P. Hennig (pdf, journal, arXiv, code, www) |
| 2021 (NeurIPS) |
Cockpit: A Practical Debugging Tool for Training Deep Neural Networks. F. Schneider*, F. Dangel*, P. Hennig (pdf, conference, arXiv, code, www, video) |
| 2020 (ICLR spotlight) |
BackPACK: Packing more into backprop. F. Dangel*, F. Kunstner*, P. Hennig (pdf, conference, arXiv, code, www, video) |
| 2020 (AISTATS) |
Modular Block-diagonal Curvature Approximations for Feedforward Architectures. F. Dangel, S. Harmeling, P. Hennig (pdf, conference, arXiv, code, video) |
| 2018 (Phys. Rev. A) |
Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model. F. Dangel*, M. Wagner*, H. Cartarius, J. Main, G. Wunner (pdf, journal, arXiv) |
| 2018 (Acta Polytechnica) |
Numerical calculation of the complex Berry phase in non-Hermitian systems. M. Wagner, F. Dangel, H. Cartarius, J. Main, G. Wunner (pdf, journal, arXiv) |
| 2017 (Master thesis) |
Bosonic many-body systems with topologically nontrivial phases subject to gain and loss. F. Dangel (pdf) |
| 2015 (Bachelor thesis) |
Mikroskopische Beschreibung eines Einkoppelprozesses für PT-symmetrische Bose-Einstein-Kondensate. F. Dangel (pdf, German only) |
| 2025 |
Poster presentation, OPT25 workshop at NeurIPS. "Understanding and Improving Shampoo via Kullback–Leibler Minimization" |
| 2024 |
Poster presentation, WANT workshop at ICML. "Lowering PyTorch's Memory Consumption for Selective Differentiation" |
| 2023 |
Poster presentation, OPT23 workshop at NeurIPS. "Structured Inverse-Free Natural Gradient Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets" (poster) |
| 2022 |
Poster presentation, ELLIS Doctoral Symposium (EDS) 2022, Alicante (poster) |
| 2022 |
Poster presentation. ELLIS Theory Workshop 2022, Arenzano |
| 2022 |
Session chair, Franco-German Research and Innovation Network on AI |
| 2021 |
Co-organizer, ELLIS Doctoral Symposium (EDS) 2021, Tübingen |
| 2015 |
Participant, "Ferienakademie 2015" summer school, organized by the TU Munich, the University of Stuttgart, and the FAU Erlangen in Sarntal (Northern Italy), talk about Lattice Boltzmann Methods |
| 2014 |
Participant, "Ferienakademie 2014" summer school, organized by the TU Munich, the University of Stuttgart, and the FAU Erlangen in Sarntal (Northern Italy), talk about NMR & MRI |
| 2018–2022 |
Teaching assistant (course design, supervision, and evaluation of 7 courses in total), software development practicals, University of Tübingen. Details: Three PhD students supervise ~15 students whose task is to develop a machine learning prediction system for the German soccer league over the course of one term (example). The overall workload for a student is ~180 hours and the focus lies heavily on teaching good software development practices. |
| Since 2025 |
Research project co-supervisor: Nikita Dhawan (PhD student, University of Toronto) with Roger Grosse. Function space distance approximations for transformers |
| Since 2025 |
Research project co-supervisor: Runshi Yang (undergraduate student, University of Toronto) with Wu Lin. Kronecker-factored curvature approximations for influence functions |
| Since 2024 |
Research project co-supervisor: Disen Liao (PhD student, University of Waterloo) with Yaoliang Yu. Bi-level optimization problems for AI safety (AISTATS 2026 paper) |
| Since 2023 |
Research project co-supervisor: Marvin F. da Silva (PhD student, Dalhousie University) with Sageev Oore. Symmetry-agnostic algorithms for deep learning (ICML 2025 spotlight) |
| 2024–2025 |
Research project co-supervisor: Andrés Guzmán-Cordero (Master student, University of Amsterdam; after: PhD student, Université de Montréal and Mila) with Marius Zeinhofer and Gil Goldshlager. Accelerating second-order optimizers for Physics-informed neural networks (NeurIPS 2025 paper) |
| 2024–2025 |
Thesis project co-supervisor: Yu Xin Li (undergraduate student, University of Toronto) with Derek Tam and Colin Raffel. Gradient accumulators as Fisher proxies (ICML 2025 spotlight) |
| 2024–2025 |
Master thesis co-supervisor: Weronika Ormaniec (Master student, ETH Zürich; after: PhD student, ETH Zürich) with Sidak Pal Singh. Theoretical investigation of the Hessian and loss landscape of transformers (ICLR 2025 spotlight + Silver Medal of ETH Zürich for outstanding thesis) |
| 2023–2024 |
Research project supervisor: Samarth Bhatia (undergraduate student, IIT Delhi). Randomized and selective autodiff (ICML 2024 workshop paper) |
| 2022 |
Summer internship supervisor: Elisabeth Knigge (high school student). Visualization methods to make deep learning optimization more accessible to non-experts. |
| 2021–2022 |
Research assistant supervisor: Jessica Bader (Master student, University of Tübingen; after: PhD student, Helmholtz Munich/Technical University of Munich). Extending BackPACK with Kronecker-factorized curvature for Bayesian deep learning, including an interface for negative log-likelihood losses to support KFAC and enable Laplace approximations via laplace-torch |
| 2021 |
Master thesis supervisor: Tim Schäfer (Master student, University of Tübingen; after: PhD student, University of Tübingen). Extending BackPACK to ResNets and recurrent architectures |
| 2020–2021 |
Research assistant supervisor: Shrisha Bharadwaj (Master student, University of Tübingen; after: PhD student, Max Planck Institute for Intelligent Systems). Improving BackPACK’s code quality (tests, docstrings) and extending convolution support from 2D to 1D and 3D. |
| 2019–2020 |
Research project supervisor: Paul Fischer (Master student, University of Tübingen; after: PhD student, University of Lucerne). Hessian backpropagation for batch normalization |
| 2019 |
Bachelor thesis supervisor: Christian Meier (University of Tübingen). Activity prediction in smart home environments via Markov models |
| 2026 |
Area chair: ICML |
| 2025 |
Area chair: ICML. Reviewer: NeurIPS (complimentary registration award as 'top reviewer'), ICML CODEML workshop |
| 2024 |
Reviewer: ICML, ICLR, AISTATS, NeurIPS (complimentary registration award as 'top reviewer'), NeurIPS OPT workshop |
| 2023 |
Reviewer: NeurIPS, Vector Scholarship in Artificial Intelligence |
| 2022 |
Reviewer: ICML, NeurIPS HITY workshop |
| 2021 |
Reviewer: ICML, NeurIPS, JMLR |
| 2020 |
Reviewer: ICML, NeurIPS |
| 2026 |
Mentor: Jethro Odeyemi (PhD student, University of Saskatchewan) Indigenous and Black Engineering and Technology (IBET) PhD Project |
| 2025 |
Mentor: Darren Dahunsi (PhD student, University of Alberta), Indigenous and Black Engineering and Technology (IBET) PhD Project |
| 2024 |
Mentor, Indigenous and Black Engineering and Technology (IBET) PhD Project |
| 2024 |
Mentor, NeurIPS Black in AI workshop |
| 2023 |
Representative for Vector Institute and mentor, NeurIPS Black in AI workshop and Q&A session |
Created: 2026-02-11 Wed 14:13
Validate