Curriculum vitæ: Felix Dangel

Education
Publications
Talks & Workshops
Teaching, Mentoring, Reviewing & Community Service

(Download as pdf)

Bio: I am an incoming assistant professor at Concordia University in the Department of Computer Science and Software Engineering, and Mila. My research advances ML algorithms and software through consideration of quantities beyond the gradient, such as curvature information. I am interested in neural network training algorithms (Shampoo, SOAP, K-FAC, …), automatic differentiation (Taylor mode, PINNs, …), information geometry (Fisher information, …) as well as curvature approximations and their application outside optimization (model merging, training data attribution, unlearning, bi-level problems for safety, …). Before, I completed a Postdoc at the Vector Institute in Toronto, obtained my PhD in Computer Science from the University of Tübingen, and a Master's and Bachelor's degree in Physics from the University of Stuttgart.

Education

Now	Assistant Professor, Concordia University, Department of Computer Science & Software Engineering
--	Research topics: Automatic Differentiation and Learning Algorithms Beyond the Gradient
2026	Associate member at Mila – Quebec Artificial Intelligence Institute, Montreal

2026	Postdoctoral researcher, Vector Institute, Toronto
--	Research statement: Deep Learning Needs More Than Just the Gradient
2023	With: Yaoliang Yu, Roger Grosse

2023	PhD in Computer Science, Max Planck Institute for Intelligent Systems & University of Tübingen
--	Thesis: Backpropagation beyond the Gradient
2018	Advisor: Philipp Hennig

2018	Researcher, University of Stuttgart
--	Paper: Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
2017	Host: Institute for Theoretical Physics 1

2017	MSc in Physics, University of Stuttgart
--	Thesis: Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
2015	Advisor: Holger Cartarius

2015	BSc in Physics, University of Stutgart
--	Thesis: Microscopic description of a coupling process for \(\mathcal{PT}\!\!\!\) -symmetric Bose-Einstein condensates
2012	Advisor: Günter Wunner

Publications

[pre-print 2025] Sketching Low-Rank Plus Diagonal Matrices
A. Fernandez, F. Dangel, P. Hennig, F. Schneider (arXiv)
[pre-print 2025] Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization
W. Lin, S. C. Lowe, F. Dangel, R. Eschenhagen, Z. Xu, R. B. Grosse (arXiv)
[pre-print 2025] Kronecker-factored Approximate Curvature (KFAC) From Scratch
F. Dangel*, T. Weber*, B. Muksányi*, R. Eschenhagen (arXiv | code)
[NeurIPS 2025] Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization
A. Guzmán-Cordero, F. Dangel, G. Goldshlager, M. Zeinhofer (arXiv)
[NeurIPS 2025] Collapsing Taylor Mode Automatic Differentiation
F. Dangel*, T. Siebert*, M. Zeinhofer, A. Walther (arXiv | code)
[pre-print 2025] Position: Curvature Matrices Should Be Democratized via Linear Operators
F. Dangel*, R. Eschenhagen*, W. Ormaniec, A. Fernandez, L. Tatzel, A. Kristiadi (arXiv | code)
[pre-print 2025] Spectral-factorized Positive-definite Curvature Learning for NN Training
W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. E. Turner, R. B. Grosse (arXiv)
[ICML 2025 spotlight]: Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
Y. X. Li, F. Dangel, D. Tam, C. Raffel (pdf)
[ICML 2025 spotlight] Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It
M. F. da Silva, F. Dangel, S. Oore (pdf, arXiv)
[ICLR 2025 spotlight] What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
W. Ormaniec, F. Dangel, S. P. Singh (pdf | arXiv | video)
[NeurIPS 2024] Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks
F. Dangel*, J. Mueller*, M. Zeinhofer* (pdf | arXiv | video | poster | code | slides)
[NeurIPS 2024] Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods
F. Dangel (pdf | arXiv | code | video | poster | slides)
[ICML 2024 workshop] Lowering PyTorch's Memory Consumption for Selective Differentiation
S. Bhatia, F. Dangel (pdf | arXiv | code | poster | bug report)
[ICML 2024] Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning
M. Elsayed, H. Farrahi, F. Dangel, R. Mahmood (pdf | arXiv | code)
[ICML 2024] Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. Turner, A. Makhzani (pdf | arXiv | poster | code)
[ICML 2024] Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
W. Lin*, F. Dangel*, R. Eschenhagen, K. Neklyudov, A. Kristiadi, R. Turner, A. Makhzani (pdf | arXiv | code | poster)
[pre-print 2023] On the Disconnect Between Theory and Practice of Overparametrized Neural Networks
J. Wenger, F. Dangel, A. Kristiadi (pdf | arXiv)
[NeurIPS 2023] The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
A. Kristiadi, F. Dangel, P. Hennig (pdf | arXiv)
[PhD thesis 2023] Backpropagation Beyond the Gradient
F. Dangel (pdf | source | template)
[TMLR 2022] ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure
F. Dangel*, L. Tatzel*, P. Hennig (pdf | journal | arXiv | code | www)
[NeurIPS 2021] Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
F. Schneider*, F. Dangel*, P. Hennig (pdf | conference | arXiv | code | www | video)
[ICLR 2020 spotlight] BackPACK: Packing more into backprop
F. Dangel*, F. Kunstner*, P. Hennig (pdf | conference | arXiv | code | www | video)
[AISTATS 2020] Modular Block-diagonal Curvature Approximations for Feedforward Architectures
F. Dangel, S. Harmeling, P. Hennig (pdf | conference | arXiv | code | video)
[Phys. Rev. A 2018] Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
F. Dangel*, M. Wagner*, H. Cartarius, J. Main, G. Wunner (pdf | journal | arXiv)
[Acta Polytechnica 2018] Numerical calculation of the complex Berry phase in non-Hermitian systems
M. Wagner*, F. Dangel*, H. Cartarius, J. Main, G. Wunner (pdf | journal | arXiv)
[Master thesis 2017] Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
F. Dangel (pdf)
[Bachelor thesis 2015] Mikroskopische Beschreibung eines Einkoppelprozesses für PT-symmetrische Bose-Einstein-Kondensate
F. Dangel (pdf, German only)

Talks & Workshops

[2025] I gave an invited talk about accelerating Taylor mode automatic differentiation at the workshop 'Overparametrization, Regularization, Identifiability and Uncertainty in Machine Learning' hosted at the Mathematisches Forschungsinstitut Oberwolfach (slides)
[2024] I presented my NeurIPS paper "Convolutions and More as Einsum" at the Vector Research Day in November (slides)
[2024] I was given the opportunity to prepare a short presentation for Geoffrey Hinton about my joint work with Marvin F. da Silva and Sageev Oore for a Swedish TV event in October (footage: 1, 2)
[2024] Invited talk at Cerebras Systems seminar (June 2024) and Graham Taylor's group meeting (July 2024) on "Convolutions Through The Lens of Tensor Networks" (slides)
[2024] I gave a tutorial on "Large-scale Linear Algebra with Curvature Matrices" in Collin Raffel's group meeting in April (notes)
[NeurIPS 2023] Workshop poster presentation at NeurIPS OPT23 on "Structured Inverse-Free Natural Gradient Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets" (poster)
[2023] Invited talk at Perimeter Institute Machine Learning Initiative seminar (December 2023) titled "Deep Learning Convolutions Through the Lens of Tensor Networks" (recording, slides)
[2022] Poster presentation at the ELLIS Doctoral Symposium (EDS) 2022 in Alicante (poster)
[2022] Invited talk at ELISE Theory Workshop on ML Fundamentals 2022 at EURECOM in Sofia Antipolis
[2022] Poster presentation at the ELLIS Theory Workshop 2022 in Arenzano
[2022] Session chair at the Franco-German Research and Innovation Network on AI, June 2022
[2021] Co-organization of the ELLIS Doctoral Symposium (EDS) 2021 in Tübingen, held 2022 in Alicante
[2019] Invited DL overview talk, seminar for Integrated Engineering students, DHBW CAS in Heilbronn
[2017] Talk at the DPG Spring Meeting of the atomic Physics and quantum optics section 2017 in Mainz
[2015] Participation at the "Ferienakademie 2015" summer school, organized by the TU Munich, the University of Stuttgart, and the FAU Erlangen in Sarntal (Northern Italy), talk about Lattice Boltzmann Methods.
[2014] Participation at the "Ferienakademie 2014" summer school, organized by the TU Munich, the University of Stuttgart, and the FAU Erlangen in Sarntal (Northern Italy), talk about NMR & MRI

Teaching, Mentoring, Reviewing & Community Service

[2018-2022] I taught seven (7) iterations of software development practicals. In these courses, three PhD students supervise ~15 students whose task is to develop a machine learning prediction system for the German soccer league over the course of one term (example). The overall workload for a student is ~180 hours and the focus lies heavily on teaching good software development practices.
I have worked with various students:
- [2025, ongoing] Runshi Yang (undergraduate student at University of Toronto) on improving Kronecker product curvature approximations for influence functions.
- [2024, ongoing] Disen Liao (PhD student at University of Waterloo, in co-supervision with Yaoliang Yu) on bi-level optimization problems in the context of AI safety.
- [2024-2025] Andrés Guzmán-Cordero (Master student at University of Amsterdam, in co-supervision with Marius Zeinhofer and Gil Goldshlager) on using randomized linear algebra and Woodbury's formula to accelerate second-order optimizers for Physics-informed neural networks (NeurIPS paper).
- [2024-2025] Yu Xin Li (undergraduate student at University of Toronto, in co-supervision with Derek Tam and Colin Raffel) on applying gradient accumulators as Fisher proxies (ICML spotlight).
- [2024-2025] Weronika Ormaniec (master student at ETH Zürich, in co-supervision with Sidak Pal Singh) worked on gaining insights into the loss landscape of transformers by theoretically investigating their Hessian (ICLR spotlight).
- [2023-2024] Samarth Bhatia (master student) worked on randomized autodiff for convolutions and wrote an ICML workshop paper identifying a sub-optimality in PyTorch's automatic differentiation.
- [2022] Elisabeth Knigge (high school student, summer internship) worked on making deep learning optimization methods more approachable to non-experts through visualization.
- [2021-2022] Jessica Bader (research assistant) worked on broadening BackPACK's support for Kronecker-factorized curvature for Bayesian deep learning. She wrote the interface for negative log-likelihood losses to support KFAC and to enable applications with their Laplace approximation via the laplace-torch library.
- [2021] Tim Schäfer (Master thesis), added support for ResNets and recurrent architectures to BackPACK. The underlying converter that makes these architectures compatible can easily be enabled through an optional argument while extending the model.
- [2020-2021] Shrisha Bharadwaj (research assistant) improved BackPACK's code quality through additional tests, docstrings, and extended support for two-dimensional convolution to 1d and 3d.
- [2019-2020] Paul Fischer (research project), implemented and analyzed Hessian backpropagation for batch normalization to speed up its Hessian-vector product, that can be slow (page 7), through structural knowledge.
- [2019] Christian Meier (Bachelor thesis): Activity prediction in smart home environments via Markov models.
Served as area chair for International Conference for Machine Learning (ICML) (2025)
I have reviewed for top-tier machine learning conferences and journals
- Advances in Neural Information Processing (NeurIPS): 2020, 2021, 2022 (HITY workshop), 2023, 2024*, 2024 (OPT workshop), 2025* (*complimentary registration award as 'top reviewer')
- International Conference for Machine Learning (ICML): 2020, 2021, 2022, 2024, 2025 (CODEML workshop)
- International Conference on Learning Representations (ICLR): 2024
- Journal of Machine Learning Research (JMLR): 2021
- International Conference on Artificial Intelligence and Statistics (AISTATS): 2024
Served as reviewer for the Vector Scholarship in Artificial Intelligence 2023-2024
[NeurIPS 2023] Represented Vector Institute as mentor at the Black in AI workshop's Q&A session
[NeurIPS 2024] Served as mentor for the Black in AI workshop
[2024-2025] Mentor for the Indigenous and Black Engineering and Technology (IBET) PhD Project

Curriculum vitæ: Felix Dangel

Table of Contents

Education

Publications

Talks & Workshops

Teaching, Mentoring, Reviewing & Community Service