\(\def\mymacro{{\mathbf{\alpha,\beta,\gamma}}}\)
\(\def\va{{\mathbf{a}}}\)
\(\def\vb{{\mathbf{b}}}\)
\(\def\vc{{\mathbf{c}}}\)
\(\def\vd{{\mathbf{d}}}\)
\(\def\ve{{\mathbf{e}}}\)
\(\def\vf{{\mathbf{f}}}\)
\(\def\vg{{\mathbf{g}}}\)
\(\def\vh{{\mathbf{h}}}\)
\(\def\vi{{\mathbf{i}}}\)
\(\def\vj{{\mathbf{j}}}\)
\(\def\vk{{\mathbf{k}}}\)
\(\def\vl{{\mathbf{l}}}\)
\(\def\vm{{\mathbf{m}}}\)
\(\def\vn{{\mathbf{n}}}\)
\(\def\vo{{\mathbf{o}}}\)
\(\def\vp{{\mathbf{p}}}\)
\(\def\vq{{\mathbf{q}}}\)
\(\def\vr{{\mathbf{r}}}\)
\(\def\vs{{\mathbf{s}}}\)
\(\def\vt{{\mathbf{t}}}\)
\(\def\vu{{\mathbf{u}}}\)
\(\def\vv{{\mathbf{v}}}\)
\(\def\vw{{\mathbf{w}}}\)
\(\def\vx{{\mathbf{x}}}\)
\(\def\vy{{\mathbf{y}}}\)
\(\def\vz{{\mathbf{z}}}\)
\(\def\vmu{{\mathbf{\mu}}}\)
\(\def\vsigma{{\mathbf{\sigma}}}\)
\(\def\vtheta{{\mathbf{\theta}}}\)
\(\def\vzero{{\mathbf{0}}}\)
\(\def\vone{{\mathbf{1}}}\)
\(\def\vell{{\mathbf{\ell}}}\)
\(\def\mA{{\mathbf{A}}}\)
\(\def\mB{{\mathbf{B}}}\)
\(\def\mC{{\mathbf{C}}}\)
\(\def\mD{{\mathbf{D}}}\)
\(\def\mE{{\mathbf{E}}}\)
\(\def\mF{{\mathbf{F}}}\)
\(\def\mG{{\mathbf{G}}}\)
\(\def\mH{{\mathbf{H}}}\)
\(\def\mI{{\mathbf{I}}}\)
\(\def\mJ{{\mathbf{J}}}\)
\(\def\mK{{\mathbf{K}}}\)
\(\def\mL{{\mathbf{L}}}\)
\(\def\mM{{\mathbf{M}}}\)
\(\def\mN{{\mathbf{N}}}\)
\(\def\mO{{\mathbf{O}}}\)
\(\def\mP{{\mathbf{P}}}\)
\(\def\mQ{{\mathbf{Q}}}\)
\(\def\mR{{\mathbf{R}}}\)
\(\def\mS{{\mathbf{S}}}\)
\(\def\mT{{\mathbf{T}}}\)
\(\def\mU{{\mathbf{U}}}\)
\(\def\mV{{\mathbf{V}}}\)
\(\def\mW{{\mathbf{W}}}\)
\(\def\mX{{\mathbf{X}}}\)
\(\def\mY{{\mathbf{Y}}}\)
\(\def\mZ{{\mathbf{Z}}}\)
\(\def\mStilde{\mathbf{\tilde{\mS}}}\)
\(\def\mGtilde{\mathbf{\tilde{\mG}}}\)
\(\def\mGoverline{{\mathbf{\overline{G}}}}\)
\(\def\mBeta{{\mathbf{\beta}}}\)
\(\def\mPhi{{\mathbf{\Phi}}}\)
\(\def\mLambda{{\mathbf{\Lambda}}}\)
\(\def\mSigma{{\mathbf{\Sigma}}}\)
\(\def\tA{{\mathbf{\mathsf{A}}}}\)
\(\def\tB{{\mathbf{\mathsf{B}}}}\)
\(\def\tC{{\mathbf{\mathsf{C}}}}\)
\(\def\tD{{\mathbf{\mathsf{D}}}}\)
\(\def\tE{{\mathbf{\mathsf{E}}}}\)
\(\def\tF{{\mathbf{\mathsf{F}}}}\)
\(\def\tG{{\mathbf{\mathsf{G}}}}\)
\(\def\tH{{\mathbf{\mathsf{H}}}}\)
\(\def\tI{{\mathbf{\mathsf{I}}}}\)
\(\def\tJ{{\mathbf{\mathsf{J}}}}\)
\(\def\tK{{\mathbf{\mathsf{K}}}}\)
\(\def\tL{{\mathbf{\mathsf{L}}}}\)
\(\def\tM{{\mathbf{\mathsf{M}}}}\)
\(\def\tN{{\mathbf{\mathsf{N}}}}\)
\(\def\tO{{\mathbf{\mathsf{O}}}}\)
\(\def\tP{{\mathbf{\mathsf{P}}}}\)
\(\def\tQ{{\mathbf{\mathsf{Q}}}}\)
\(\def\tR{{\mathbf{\mathsf{R}}}}\)
\(\def\tS{{\mathbf{\mathsf{S}}}}\)
\(\def\tT{{\mathbf{\mathsf{T}}}}\)
\(\def\tU{{\mathbf{\mathsf{U}}}}\)
\(\def\tV{{\mathbf{\mathsf{V}}}}\)
\(\def\tW{{\mathbf{\mathsf{W}}}}\)
\(\def\tX{{\mathbf{\mathsf{X}}}}\)
\(\def\tY{{\mathbf{\mathsf{Y}}}}\)
\(\def\tZ{{\mathbf{\mathsf{Z}}}}\)
\(\def\gA{{\mathcal{A}}}\)
\(\def\gB{{\mathcal{B}}}\)
\(\def\gC{{\mathcal{C}}}\)
\(\def\gD{{\mathcal{D}}}\)
\(\def\gE{{\mathcal{E}}}\)
\(\def\gF{{\mathcal{F}}}\)
\(\def\gG{{\mathcal{G}}}\)
\(\def\gH{{\mathcal{H}}}\)
\(\def\gI{{\mathcal{I}}}\)
\(\def\gJ{{\mathcal{J}}}\)
\(\def\gK{{\mathcal{K}}}\)
\(\def\gL{{\mathcal{L}}}\)
\(\def\gM{{\mathcal{M}}}\)
\(\def\gN{{\mathcal{N}}}\)
\(\def\gO{{\mathcal{O}}}\)
\(\def\gP{{\mathcal{P}}}\)
\(\def\gQ{{\mathcal{Q}}}\)
\(\def\gR{{\mathcal{R}}}\)
\(\def\gS{{\mathcal{S}}}\)
\(\def\gT{{\mathcal{T}}}\)
\(\def\gU{{\mathcal{U}}}\)
\(\def\gV{{\mathcal{V}}}\)
\(\def\gW{{\mathcal{W}}}\)
\(\def\gX{{\mathcal{X}}}\)
\(\def\gY{{\mathcal{Y}}}\)
\(\def\gZ{{\mathcal{Z}}}\)
\(\def\sA{{\mathbb{A}}}\)
\(\def\sB{{\mathbb{B}}}\)
\(\def\sC{{\mathbb{C}}}\)
\(\def\sD{{\mathbb{D}}}\)
\(\def\sF{{\mathbb{F}}}\)
\(\def\sG{{\mathbb{G}}}\)
\(\def\sH{{\mathbb{H}}}\)
\(\def\sI{{\mathbb{I}}}\)
\(\def\sJ{{\mathbb{J}}}\)
\(\def\sK{{\mathbb{K}}}\)
\(\def\sL{{\mathbb{L}}}\)
\(\def\sM{{\mathbb{M}}}\)
\(\def\sN{{\mathbb{N}}}\)
\(\def\sO{{\mathbb{O}}}\)
\(\def\sP{{\mathbb{P}}}\)
\(\def\sQ{{\mathbb{Q}}}\)
\(\def\sR{{\mathbb{R}}}\)
\(\def\sS{{\mathbb{S}}}\)
\(\def\sT{{\mathbb{T}}}\)
\(\def\sU{{\mathbb{U}}}\)
\(\def\sV{{\mathbb{V}}}\)
\(\def\sW{{\mathbb{W}}}\)
\(\def\sX{{\mathbb{X}}}\)
\(\def\sY{{\mathbb{Y}}}\)
\(\def\sZ{{\mathbb{Z}}}\)
\(\def\E{{\mathbb{E}}}\)
\(\def\jac{{\mathbf{\mathrm{J}}}}\)
\(\def\argmax{{\mathop{\mathrm{arg}\,\mathrm{max}}}}\)
\(\def\argmin{{\mathop{\mathrm{arg}\,\mathrm{min}}}}\)
\(\def\Tr{{\mathop{\mathrm{Tr}}}}\)
\(\def\diag{{\mathop{\mathrm{diag}}}}\)
\(\def\vec{{\mathop{\mathrm{vec}}}}\)
\(\def\Kern{{\mathop{\mathrm{Kern}}}}\)
\(\def\llbracket{⟦}\)
\(\def\rrbracket{⟧}\)

Curriculum vitæ: Felix Dangel

Table of Contents

(Download as pdf)

Bio: I am an incoming assistant professor at Concordia University in the Department of Computer Science and Software Engineering, and Mila. My research advances ML algorithms and software through consideration of quantities beyond the gradient, such as curvature information. I am interested in neural network training algorithms (Shampoo, SOAP, K-FAC, …), automatic differentiation (Taylor mode, PINNs, …), information geometry (Fisher information, …) as well as curvature approximations and their application outside optimization (model merging, training data attribution, unlearning, bi-level problems for safety, …). Before, I completed a Postdoc at the Vector Institute in Toronto, obtained my PhD in Computer Science from the University of Tübingen, and a Master's and Bachelor's degree in Physics from the University of Stuttgart.

Education

Now Assistant Professor, Concordia University, Department of Computer Science & Software Engineering
-- Research topics: Automatic Differentiation and Learning Algorithms Beyond the Gradient
2026 Associate member at Mila – Quebec Artificial Intelligence Institute, Montreal
   
2026 Postdoctoral researcher, Vector Institute, Toronto
-- Research statement: Deep Learning Needs More Than Just the Gradient
2023 With: Yaoliang Yu, Roger Grosse
   
2023 PhD in Computer Science, Max Planck Institute for Intelligent Systems & University of Tübingen
-- Thesis: Backpropagation beyond the Gradient
2018 Advisor: Philipp Hennig
   
2018 Researcher, University of Stuttgart
-- Paper: Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
2017 Host: Institute for Theoretical Physics 1
   
2017 MSc in Physics, University of Stuttgart
-- Thesis: Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
2015 Advisor: Holger Cartarius
   
2015 BSc in Physics, University of Stutgart
-- Thesis: Microscopic description of a coupling process for \(\mathcal{PT}\!\!\!\) -symmetric Bose-Einstein condensates
2012 Advisor: Günter Wunner

Publications

  • [pre-print 2025] Sketching Low-Rank Plus Diagonal Matrices
    A. Fernandez, F. Dangel, P. Hennig, F. Schneider (arXiv)
  • [pre-print 2025] Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization
    W. Lin, S. C. Lowe, F. Dangel, R. Eschenhagen, Z. Xu, R. B. Grosse (arXiv)
  • [pre-print 2025] Kronecker-factored Approximate Curvature (KFAC) From Scratch
    F. Dangel*, T. Weber*, B. Muksányi*, R. Eschenhagen (arXiv | code)
  • [NeurIPS 2025] Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization
    A. Guzmán-Cordero, F. Dangel, G. Goldshlager, M. Zeinhofer (arXiv)
  • [NeurIPS 2025] Collapsing Taylor Mode Automatic Differentiation
    F. Dangel*, T. Siebert*, M. Zeinhofer, A. Walther (arXiv | code)
  • [pre-print 2025] Position: Curvature Matrices Should Be Democratized via Linear Operators
    F. Dangel*, R. Eschenhagen*, W. Ormaniec, A. Fernandez, L. Tatzel, A. Kristiadi (arXiv | code)
  • [pre-print 2025] Spectral-factorized Positive-definite Curvature Learning for NN Training
    W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. E. Turner, R. B. Grosse (arXiv)
  • [ICML 2025 spotlight]: Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
    Y. X. Li, F. Dangel, D. Tam, C. Raffel (pdf)
  • [ICML 2025 spotlight] Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It
    M. F. da Silva, F. Dangel, S. Oore (pdf, arXiv)
  • [ICLR 2025 spotlight] What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
    W. Ormaniec, F. Dangel, S. P. Singh (pdf | arXiv | video)
  • [NeurIPS 2024] Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks
    F. Dangel*, J. Mueller*, M. Zeinhofer* (pdf | arXiv | video | poster | code | slides)
  • [NeurIPS 2024] Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods
    F. Dangel (pdf | arXiv | code | video | poster | slides)
  • [ICML 2024 workshop] Lowering PyTorch's Memory Consumption for Selective Differentiation
    S. Bhatia, F. Dangel (pdf | arXiv | code | poster | bug report)
  • [ICML 2024] Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning
    M. Elsayed, H. Farrahi, F. Dangel, R. Mahmood (pdf | arXiv | code)
  • [ICML 2024] Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
    W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. Turner, A. Makhzani (pdf | arXiv | poster | code)
  • [ICML 2024] Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
    W. Lin*, F. Dangel*, R. Eschenhagen, K. Neklyudov, A. Kristiadi, R. Turner, A. Makhzani (pdf | arXiv | code | poster)
  • [pre-print 2023] On the Disconnect Between Theory and Practice of Overparametrized Neural Networks
    J. Wenger, F. Dangel, A. Kristiadi (pdf | arXiv)
  • [NeurIPS 2023] The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
    A. Kristiadi, F. Dangel, P. Hennig (pdf | arXiv)
  • [PhD thesis 2023] Backpropagation Beyond the Gradient
    F. Dangel (pdf | source | template)
  • [TMLR 2022] ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure
    F. Dangel*, L. Tatzel*, P. Hennig (pdf | journal | arXiv | code | www)
  • [NeurIPS 2021] Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
    F. Schneider*, F. Dangel*, P. Hennig (pdf | conference | arXiv | code | www | video)
  • [ICLR 2020 spotlight] BackPACK: Packing more into backprop
    F. Dangel*, F. Kunstner*, P. Hennig (pdf | conference | arXiv | code | www | video)
  • [AISTATS 2020] Modular Block-diagonal Curvature Approximations for Feedforward Architectures
    F. Dangel, S. Harmeling, P. Hennig (pdf | conference | arXiv | code | video)
  • [Phys. Rev. A 2018] Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
    F. Dangel*, M. Wagner*, H. Cartarius, J. Main, G. Wunner (pdf | journal | arXiv)
  • [Acta Polytechnica 2018] Numerical calculation of the complex Berry phase in non-Hermitian systems
    M. Wagner*, F. Dangel*, H. Cartarius, J. Main, G. Wunner (pdf | journal | arXiv)
  • [Master thesis 2017] Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
    F. Dangel (pdf)
  • [Bachelor thesis 2015] Mikroskopische Beschreibung eines Einkoppelprozesses für PT-symmetrische Bose-Einstein-Kondensate
    F. Dangel (pdf, German only)

Talks & Workshops

  • [2025] I gave an invited talk about accelerating Taylor mode automatic differentiation at the workshop 'Overparametrization, Regularization, Identifiability and Uncertainty in Machine Learning' hosted at the Mathematisches Forschungsinstitut Oberwolfach (slides)
  • [2024] I presented my NeurIPS paper "Convolutions and More as Einsum" at the Vector Research Day in November (slides)
  • [2024] I was given the opportunity to prepare a short presentation for Geoffrey Hinton about my joint work with Marvin F. da Silva and Sageev Oore for a Swedish TV event in October (footage: 1, 2)
  • [2024] Invited talk at Cerebras Systems seminar (June 2024) and Graham Taylor's group meeting (July 2024) on "Convolutions Through The Lens of Tensor Networks" (slides)
  • [2024] I gave a tutorial on "Large-scale Linear Algebra with Curvature Matrices" in Collin Raffel's group meeting in April (notes)
  • [NeurIPS 2023] Workshop poster presentation at NeurIPS OPT23 on "Structured Inverse-Free Natural Gradient Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets" (poster)
  • [2023] Invited talk at Perimeter Institute Machine Learning Initiative seminar (December 2023) titled "Deep Learning Convolutions Through the Lens of Tensor Networks" (recording, slides)
  • [2022] Poster presentation at the ELLIS Doctoral Symposium (EDS) 2022 in Alicante (poster)
  • [2022] Invited talk at ELISE Theory Workshop on ML Fundamentals 2022 at EURECOM in Sofia Antipolis
  • [2022] Poster presentation at the ELLIS Theory Workshop 2022 in Arenzano
  • [2022] Session chair at the Franco-German Research and Innovation Network on AI, June 2022
  • [2021] Co-organization of the ELLIS Doctoral Symposium (EDS) 2021 in Tübingen, held 2022 in Alicante
  • [2019] Invited DL overview talk, seminar for Integrated Engineering students, DHBW CAS in Heilbronn
  • [2017] Talk at the DPG Spring Meeting of the atomic Physics and quantum optics section 2017 in Mainz
  • [2015] Participation at the "Ferienakademie 2015" summer school, organized by the TU Munich, the University of Stuttgart, and the FAU Erlangen in Sarntal (Northern Italy), talk about Lattice Boltzmann Methods.
  • [2014] Participation at the "Ferienakademie 2014" summer school, organized by the TU Munich, the University of Stuttgart, and the FAU Erlangen in Sarntal (Northern Italy), talk about NMR & MRI

Teaching, Mentoring, Reviewing & Community Service

Created: 2025-11-10 Mon 13:22

Validate