\(\def\mymacro{{\mathbf{\alpha,\beta,\gamma}}}\)
\(\def\va{{\mathbf{a}}}\)
\(\def\vb{{\mathbf{b}}}\)
\(\def\vc{{\mathbf{c}}}\)
\(\def\vd{{\mathbf{d}}}\)
\(\def\ve{{\mathbf{e}}}\)
\(\def\vf{{\mathbf{f}}}\)
\(\def\vg{{\mathbf{g}}}\)
\(\def\vh{{\mathbf{h}}}\)
\(\def\vi{{\mathbf{i}}}\)
\(\def\vj{{\mathbf{j}}}\)
\(\def\vk{{\mathbf{k}}}\)
\(\def\vl{{\mathbf{l}}}\)
\(\def\vm{{\mathbf{m}}}\)
\(\def\vn{{\mathbf{n}}}\)
\(\def\vo{{\mathbf{o}}}\)
\(\def\vp{{\mathbf{p}}}\)
\(\def\vq{{\mathbf{q}}}\)
\(\def\vr{{\mathbf{r}}}\)
\(\def\vs{{\mathbf{s}}}\)
\(\def\vt{{\mathbf{t}}}\)
\(\def\vu{{\mathbf{u}}}\)
\(\def\vv{{\mathbf{v}}}\)
\(\def\vw{{\mathbf{w}}}\)
\(\def\vx{{\mathbf{x}}}\)
\(\def\vy{{\mathbf{y}}}\)
\(\def\vz{{\mathbf{z}}}\)
\(\def\vmu{{\mathbf{\mu}}}\)
\(\def\vsigma{{\mathbf{\sigma}}}\)
\(\def\vtheta{{\mathbf{\theta}}}\)
\(\def\vzero{{\mathbf{0}}}\)
\(\def\vone{{\mathbf{1}}}\)
\(\def\vell{{\mathbf{\ell}}}\)
\(\def\mA{{\mathbf{A}}}\)
\(\def\mB{{\mathbf{B}}}\)
\(\def\mC{{\mathbf{C}}}\)
\(\def\mD{{\mathbf{D}}}\)
\(\def\mE{{\mathbf{E}}}\)
\(\def\mF{{\mathbf{F}}}\)
\(\def\mG{{\mathbf{G}}}\)
\(\def\mH{{\mathbf{H}}}\)
\(\def\mI{{\mathbf{I}}}\)
\(\def\mJ{{\mathbf{J}}}\)
\(\def\mK{{\mathbf{K}}}\)
\(\def\mL{{\mathbf{L}}}\)
\(\def\mM{{\mathbf{M}}}\)
\(\def\mN{{\mathbf{N}}}\)
\(\def\mO{{\mathbf{O}}}\)
\(\def\mP{{\mathbf{P}}}\)
\(\def\mQ{{\mathbf{Q}}}\)
\(\def\mR{{\mathbf{R}}}\)
\(\def\mS{{\mathbf{S}}}\)
\(\def\mT{{\mathbf{T}}}\)
\(\def\mU{{\mathbf{U}}}\)
\(\def\mV{{\mathbf{V}}}\)
\(\def\mW{{\mathbf{W}}}\)
\(\def\mX{{\mathbf{X}}}\)
\(\def\mY{{\mathbf{Y}}}\)
\(\def\mZ{{\mathbf{Z}}}\)
\(\def\mStilde{\mathbf{\tilde{\mS}}}\)
\(\def\mGtilde{\mathbf{\tilde{\mG}}}\)
\(\def\mGoverline{{\mathbf{\overline{G}}}}\)
\(\def\mBeta{{\mathbf{\beta}}}\)
\(\def\mPhi{{\mathbf{\Phi}}}\)
\(\def\mLambda{{\mathbf{\Lambda}}}\)
\(\def\mSigma{{\mathbf{\Sigma}}}\)
\(\def\tA{{\mathbf{\mathsf{A}}}}\)
\(\def\tB{{\mathbf{\mathsf{B}}}}\)
\(\def\tC{{\mathbf{\mathsf{C}}}}\)
\(\def\tD{{\mathbf{\mathsf{D}}}}\)
\(\def\tE{{\mathbf{\mathsf{E}}}}\)
\(\def\tF{{\mathbf{\mathsf{F}}}}\)
\(\def\tG{{\mathbf{\mathsf{G}}}}\)
\(\def\tH{{\mathbf{\mathsf{H}}}}\)
\(\def\tI{{\mathbf{\mathsf{I}}}}\)
\(\def\tJ{{\mathbf{\mathsf{J}}}}\)
\(\def\tK{{\mathbf{\mathsf{K}}}}\)
\(\def\tL{{\mathbf{\mathsf{L}}}}\)
\(\def\tM{{\mathbf{\mathsf{M}}}}\)
\(\def\tN{{\mathbf{\mathsf{N}}}}\)
\(\def\tO{{\mathbf{\mathsf{O}}}}\)
\(\def\tP{{\mathbf{\mathsf{P}}}}\)
\(\def\tQ{{\mathbf{\mathsf{Q}}}}\)
\(\def\tR{{\mathbf{\mathsf{R}}}}\)
\(\def\tS{{\mathbf{\mathsf{S}}}}\)
\(\def\tT{{\mathbf{\mathsf{T}}}}\)
\(\def\tU{{\mathbf{\mathsf{U}}}}\)
\(\def\tV{{\mathbf{\mathsf{V}}}}\)
\(\def\tW{{\mathbf{\mathsf{W}}}}\)
\(\def\tX{{\mathbf{\mathsf{X}}}}\)
\(\def\tY{{\mathbf{\mathsf{Y}}}}\)
\(\def\tZ{{\mathbf{\mathsf{Z}}}}\)
\(\def\gA{{\mathcal{A}}}\)
\(\def\gB{{\mathcal{B}}}\)
\(\def\gC{{\mathcal{C}}}\)
\(\def\gD{{\mathcal{D}}}\)
\(\def\gE{{\mathcal{E}}}\)
\(\def\gF{{\mathcal{F}}}\)
\(\def\gG{{\mathcal{G}}}\)
\(\def\gH{{\mathcal{H}}}\)
\(\def\gI{{\mathcal{I}}}\)
\(\def\gJ{{\mathcal{J}}}\)
\(\def\gK{{\mathcal{K}}}\)
\(\def\gL{{\mathcal{L}}}\)
\(\def\gM{{\mathcal{M}}}\)
\(\def\gN{{\mathcal{N}}}\)
\(\def\gO{{\mathcal{O}}}\)
\(\def\gP{{\mathcal{P}}}\)
\(\def\gQ{{\mathcal{Q}}}\)
\(\def\gR{{\mathcal{R}}}\)
\(\def\gS{{\mathcal{S}}}\)
\(\def\gT{{\mathcal{T}}}\)
\(\def\gU{{\mathcal{U}}}\)
\(\def\gV{{\mathcal{V}}}\)
\(\def\gW{{\mathcal{W}}}\)
\(\def\gX{{\mathcal{X}}}\)
\(\def\gY{{\mathcal{Y}}}\)
\(\def\gZ{{\mathcal{Z}}}\)
\(\def\sA{{\mathbb{A}}}\)
\(\def\sB{{\mathbb{B}}}\)
\(\def\sC{{\mathbb{C}}}\)
\(\def\sD{{\mathbb{D}}}\)
\(\def\sF{{\mathbb{F}}}\)
\(\def\sG{{\mathbb{G}}}\)
\(\def\sH{{\mathbb{H}}}\)
\(\def\sI{{\mathbb{I}}}\)
\(\def\sJ{{\mathbb{J}}}\)
\(\def\sK{{\mathbb{K}}}\)
\(\def\sL{{\mathbb{L}}}\)
\(\def\sM{{\mathbb{M}}}\)
\(\def\sN{{\mathbb{N}}}\)
\(\def\sO{{\mathbb{O}}}\)
\(\def\sP{{\mathbb{P}}}\)
\(\def\sQ{{\mathbb{Q}}}\)
\(\def\sR{{\mathbb{R}}}\)
\(\def\sS{{\mathbb{S}}}\)
\(\def\sT{{\mathbb{T}}}\)
\(\def\sU{{\mathbb{U}}}\)
\(\def\sV{{\mathbb{V}}}\)
\(\def\sW{{\mathbb{W}}}\)
\(\def\sX{{\mathbb{X}}}\)
\(\def\sY{{\mathbb{Y}}}\)
\(\def\sZ{{\mathbb{Z}}}\)
\(\def\E{{\mathbb{E}}}\)
\(\def\jac{{\mathbf{\mathrm{J}}}}\)
\(\def\argmax{{\mathop{\mathrm{arg}\,\mathrm{max}}}}\)
\(\def\argmin{{\mathop{\mathrm{arg}\,\mathrm{min}}}}\)
\(\def\Tr{{\mathop{\mathrm{Tr}}}}\)
\(\def\diag{{\mathop{\mathrm{diag}}}}\)
\(\def\vec{{\mathop{\mathrm{vec}}}}\)
\(\def\Kern{{\mathop{\mathrm{Kern}}}}\)
\(\def\llbracket{⟦}\)
\(\def\rrbracket{⟧}\)

Curriculum vitæ: Felix Dangel

Table of Contents

(Download as pdf)

I am a Postdoc at the Vector Institute in Toronto. My goal is to build practical second-order optimizers to accelerate the training of neural nets for language modelling and scientific ML, hand in hand with advancing automatic differentiation techniques and integrating them into next-generation ML libraries. More broadly, these techniques transfer to many other tasks built on second-order Taylor expansion (model merging and compression, uncertainty quantification, training data attribution, bi-level optimization, …). During my PhD, I extended gradient backpropagation to efficiently extract higher-order information about the loss landscape of neural nets to improve their training. Before, I studied Physics at the University of Stuttgart with a main interest in simulating quantum many-body systems with tensor networks. I am passionate about

Education

Start Postdoctoral researcher, Vector Institute, Toronto
2025 With: Prof. Dr. Roger Grosse
   
2025 Postdoctoral researcher, Vector Institute, Toronto
-- Research statement: Deep Learning Needs More Than Just the Gradient
2023 With: Prof. Dr. Yaoliang Yu
   
2023 PhD in Computer Science, Max Planck Institute for Intelligent Systems & University of Tübingen
-- Thesis: Backpropagation beyond the Gradient
2018 Advisor: Prof. Dr. Philipp Hennig
   
2018 Researcher, University of Stuttgart
-- Paper: Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
2017 Host: Institute for Theoretical Physics 1
   
2017 MSc in Physics, University of Stuttgart
-- Thesis: Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
2015 Advisor: P. D. Holger Cartarius
   
2015 BSc in Physics, University of Stutgart
-- Thesis: Microscopic description of a coupling process for \(\mathcal{PT}\!\!\!\) -symmetric Bose-Einstein condensates
2012 Advisor: Prof. Dr. Günter Wunner

Publications

  • [pre-print 2025] Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization
    A. Guzmán-Cordero, F. Dangel, G. Goldshlager, M. Zeinhofer (arXiv)
  • [pre-print 2025] Collapsing Taylor Mode Automatic Differentiation
    F. Dangel*, T. Siebert*, M. Zeinhofer, A. Walther (arXiv)
  • [pre-print 2025] Position: Curvature Matrices Should Be Democratized via Linear Operators
    F. Dangel*, R. Eschenhagen*, W. Ormaniec, A. Fernandez, L. Tatzel, A. Kristiadi (arXiv | code)
  • [pre-print 2025] Spectral-factorized Positive-definite Curvature Learning for NN Training
    W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. E. Turner, R. B. Grosse (arXiv)
  • [ICML 2025 spotlight]: Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
    Y. X. Li, F. Dangel, D. Tam, C. Raffel (pdf)
  • [ICML 2025 spotlight] Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It
    M. F. da Silva, F. Dangel, S. Oore (pdf, arXiv)
  • [ICLR 2025 spotlight] What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
    W. Ormaniec, F. Dangel, S. P. Singh (pdf | arXiv | video)
  • [NeurIPS 2024] Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks
    F. Dangel*, J. Mueller*, M. Zeinhofer* (pdf | arXiv | video | poster | code | slides)
  • [NeurIPS 2024] Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods
    F. Dangel (pdf | arXiv | code | video | poster | slides)
  • [ICML 2024 workshop] Lowering PyTorch's Memory Consumption for Selective Differentiation
    S. Bhatia, F. Dangel (pdf | arXiv | code | poster | bug report)
  • [ICML 2024] Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning
    M. Elsayed, H. Farrahi, F. Dangel, R. Mahmood (pdf | arXiv | code)
  • [ICML 2024] Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
    W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. Turner, A. Makhzani (pdf | arXiv | poster | code)
  • [ICML 2024] Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
    W. Lin*, F. Dangel*, R. Eschenhagen, K. Neklyudov, A. Kristiadi, R. Turner, A. Makhzani (pdf | arXiv | code | poster)
  • [pre-print 2023] On the Disconnect Between Theory and Practice of Overparametrized Neural Networks
    J. Wenger, F. Dangel, A. Kristiadi (pdf | arXiv)
  • [NeurIPS 2023] The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
    A. Kristiadi, F. Dangel, P. Hennig (pdf | arXiv)
  • [PhD thesis 2023] Backpropagation Beyond the Gradient
    F. Dangel (pdf | source | template)
  • [TMLR 2022] ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure
    F. Dangel*, L. Tatzel*, P. Hennig (pdf | journal | arXiv | code | www)
  • [NeurIPS 2021] Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
    F. Schneider*, F. Dangel*, P. Hennig (pdf | conference | arXiv | code | www | video)
  • [ICLR 2020 spotlight] BackPACK: Packing more into backprop
    F. Dangel*, F. Kunstner*, P. Hennig (pdf | conference | arXiv | code | www | video)
  • [AISTATS 2020] Modular Block-diagonal Curvature Approximations for Feedforward Architectures
    F. Dangel, S. Harmeling, P. Hennig (pdf | conference | arXiv | code | video)
  • [Phys. Rev. A 2018] Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
    F. Dangel*, M. Wagner*, H. Cartarius, J. Main, G. Wunner (pdf | journal | arXiv)
  • [Acta Polytechnica 2018] Numerical calculation of the complex Berry phase in non-Hermitian systems
    M. Wagner*, F. Dangel*, H. Cartarius, J. Main, G. Wunner (pdf | journal | arXiv)
  • [Master thesis 2017] Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
    F. Dangel (pdf)
  • [Bachelor thesis 2015] Mikroskopische Beschreibung eines Einkoppelprozesses für PT-symmetrische Bose-Einstein-Kondensate
    F. Dangel (pdf, German only)

Talks & Workshops

  • [2025] I gave an invited talk about accelerating Taylor mode automatic differentiation at the workshop 'Overparametrization, Regularization, Identifiability and Uncertainty in Machine Learning' hosted at the Mathematisches Forschungsinstitut Oberwolfach (slides)
  • [2024] I presented my NeurIPS paper "Convolutions and More as Einsum" at the Vector Research Day in November (slides)
  • [2024] I was given the opportunity to prepare a short presentation for Geoffrey Hinton about my joint work with Marvin F. da Silva and Sageev Oore for a Swedish TV event in October (footage: 1, 2)
  • [2024] Invited talk at Cerebras Systems seminar (June 2024) and Graham Taylor's group meeting (July 2024) on "Convolutions Through The Lens of Tensor Networks" (slides)
  • [2024] I gave a tutorial on "Large-scale Linear Algebra with Curvature Matrices" in Collin Raffel's group meeting in April (notes)
  • [NeurIPS 2023] Workshop poster presentation at NeurIPS OPT23 on "Structured Inverse-Free Natural Gradient Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets" (poster)
  • [2023] Invited talk at Perimeter Institute Machine Learning Initiative seminar (December 2023) titled "Deep Learning Convolutions Through the Lens of Tensor Networks" (recording, slides)
  • [2022] Poster presentation at the ELLIS Doctoral Symposium (EDS) 2022 in Alicante (poster)
  • [2022] Invited talk at ELISE Theory Workshop on ML Fundamentals 2022 at EURECOM in Sofia Antipolis
  • [2022] Poster presentation at the ELLIS Theory Workshop 2022 in Arenzano
  • [2022] Session chair at the Franco-German Research and Innovation Network on AI, June 2022
  • [2021] Co-organization of the ELLIS Doctoral Symposium (EDS) 2021 in Tübingen, held 2022 in Alicante
  • [2019] Invited DL overview talk, seminar for Integrated Engineering students, DHBW CAS in Heilbronn
  • [2017] Talk at the DPG Spring Meeting of the atomic Physics and quantum optics section 2017 in Mainz
  • [2015] Participation at the "Ferienakademie 2015" summer school, organized by the TU Munich, the University of Stuttgart, and the FAU Erlangen in Sarntal (Northern Italy), talk about Lattice Boltzmann Methods.
  • [2014] Participation at the "Ferienakademie 2014" summer school, organized by the TU Munich, the University of Stuttgart, and the FAU Erlangen in Sarntal (Northern Italy), talk about NMR & MRI

Teaching, Mentoring, Reviewing & Community Service

Created: 2025-06-04 Wed 16:55

Validate