\(\def\mymacro{{\mathbf{\alpha,\beta,\gamma}}}\)
\(\def\va{{\mathbf{a}}}\)
\(\def\vb{{\mathbf{b}}}\)
\(\def\vc{{\mathbf{c}}}\)
\(\def\vd{{\mathbf{d}}}\)
\(\def\ve{{\mathbf{e}}}\)
\(\def\vf{{\mathbf{f}}}\)
\(\def\vg{{\mathbf{g}}}\)
\(\def\vh{{\mathbf{h}}}\)
\(\def\vi{{\mathbf{i}}}\)
\(\def\vj{{\mathbf{j}}}\)
\(\def\vk{{\mathbf{k}}}\)
\(\def\vl{{\mathbf{l}}}\)
\(\def\vm{{\mathbf{m}}}\)
\(\def\vn{{\mathbf{n}}}\)
\(\def\vo{{\mathbf{o}}}\)
\(\def\vp{{\mathbf{p}}}\)
\(\def\vq{{\mathbf{q}}}\)
\(\def\vr{{\mathbf{r}}}\)
\(\def\vs{{\mathbf{s}}}\)
\(\def\vt{{\mathbf{t}}}\)
\(\def\vu{{\mathbf{u}}}\)
\(\def\vv{{\mathbf{v}}}\)
\(\def\vw{{\mathbf{w}}}\)
\(\def\vx{{\mathbf{x}}}\)
\(\def\vy{{\mathbf{y}}}\)
\(\def\vz{{\mathbf{z}}}\)
\(\def\vmu{{\mathbf{\mu}}}\)
\(\def\vsigma{{\mathbf{\sigma}}}\)
\(\def\vtheta{{\mathbf{\theta}}}\)
\(\def\vzero{{\mathbf{0}}}\)
\(\def\vone{{\mathbf{1}}}\)
\(\def\vell{{\mathbf{\ell}}}\)
\(\def\mA{{\mathbf{A}}}\)
\(\def\mB{{\mathbf{B}}}\)
\(\def\mC{{\mathbf{C}}}\)
\(\def\mD{{\mathbf{D}}}\)
\(\def\mE{{\mathbf{E}}}\)
\(\def\mF{{\mathbf{F}}}\)
\(\def\mG{{\mathbf{G}}}\)
\(\def\mH{{\mathbf{H}}}\)
\(\def\mI{{\mathbf{I}}}\)
\(\def\mJ{{\mathbf{J}}}\)
\(\def\mK{{\mathbf{K}}}\)
\(\def\mL{{\mathbf{L}}}\)
\(\def\mM{{\mathbf{M}}}\)
\(\def\mN{{\mathbf{N}}}\)
\(\def\mO{{\mathbf{O}}}\)
\(\def\mP{{\mathbf{P}}}\)
\(\def\mQ{{\mathbf{Q}}}\)
\(\def\mR{{\mathbf{R}}}\)
\(\def\mS{{\mathbf{S}}}\)
\(\def\mT{{\mathbf{T}}}\)
\(\def\mU{{\mathbf{U}}}\)
\(\def\mV{{\mathbf{V}}}\)
\(\def\mW{{\mathbf{W}}}\)
\(\def\mX{{\mathbf{X}}}\)
\(\def\mY{{\mathbf{Y}}}\)
\(\def\mZ{{\mathbf{Z}}}\)
\(\def\mStilde{\mathbf{\tilde{\mS}}}\)
\(\def\mGtilde{\mathbf{\tilde{\mG}}}\)
\(\def\mGoverline{{\mathbf{\overline{G}}}}\)
\(\def\mBeta{{\mathbf{\beta}}}\)
\(\def\mPhi{{\mathbf{\Phi}}}\)
\(\def\mLambda{{\mathbf{\Lambda}}}\)
\(\def\mSigma{{\mathbf{\Sigma}}}\)
\(\def\tA{{\mathbf{\mathsf{A}}}}\)
\(\def\tB{{\mathbf{\mathsf{B}}}}\)
\(\def\tC{{\mathbf{\mathsf{C}}}}\)
\(\def\tD{{\mathbf{\mathsf{D}}}}\)
\(\def\tE{{\mathbf{\mathsf{E}}}}\)
\(\def\tF{{\mathbf{\mathsf{F}}}}\)
\(\def\tG{{\mathbf{\mathsf{G}}}}\)
\(\def\tH{{\mathbf{\mathsf{H}}}}\)
\(\def\tI{{\mathbf{\mathsf{I}}}}\)
\(\def\tJ{{\mathbf{\mathsf{J}}}}\)
\(\def\tK{{\mathbf{\mathsf{K}}}}\)
\(\def\tL{{\mathbf{\mathsf{L}}}}\)
\(\def\tM{{\mathbf{\mathsf{M}}}}\)
\(\def\tN{{\mathbf{\mathsf{N}}}}\)
\(\def\tO{{\mathbf{\mathsf{O}}}}\)
\(\def\tP{{\mathbf{\mathsf{P}}}}\)
\(\def\tQ{{\mathbf{\mathsf{Q}}}}\)
\(\def\tR{{\mathbf{\mathsf{R}}}}\)
\(\def\tS{{\mathbf{\mathsf{S}}}}\)
\(\def\tT{{\mathbf{\mathsf{T}}}}\)
\(\def\tU{{\mathbf{\mathsf{U}}}}\)
\(\def\tV{{\mathbf{\mathsf{V}}}}\)
\(\def\tW{{\mathbf{\mathsf{W}}}}\)
\(\def\tX{{\mathbf{\mathsf{X}}}}\)
\(\def\tY{{\mathbf{\mathsf{Y}}}}\)
\(\def\tZ{{\mathbf{\mathsf{Z}}}}\)
\(\def\gA{{\mathcal{A}}}\)
\(\def\gB{{\mathcal{B}}}\)
\(\def\gC{{\mathcal{C}}}\)
\(\def\gD{{\mathcal{D}}}\)
\(\def\gE{{\mathcal{E}}}\)
\(\def\gF{{\mathcal{F}}}\)
\(\def\gG{{\mathcal{G}}}\)
\(\def\gH{{\mathcal{H}}}\)
\(\def\gI{{\mathcal{I}}}\)
\(\def\gJ{{\mathcal{J}}}\)
\(\def\gK{{\mathcal{K}}}\)
\(\def\gL{{\mathcal{L}}}\)
\(\def\gM{{\mathcal{M}}}\)
\(\def\gN{{\mathcal{N}}}\)
\(\def\gO{{\mathcal{O}}}\)
\(\def\gP{{\mathcal{P}}}\)
\(\def\gQ{{\mathcal{Q}}}\)
\(\def\gR{{\mathcal{R}}}\)
\(\def\gS{{\mathcal{S}}}\)
\(\def\gT{{\mathcal{T}}}\)
\(\def\gU{{\mathcal{U}}}\)
\(\def\gV{{\mathcal{V}}}\)
\(\def\gW{{\mathcal{W}}}\)
\(\def\gX{{\mathcal{X}}}\)
\(\def\gY{{\mathcal{Y}}}\)
\(\def\gZ{{\mathcal{Z}}}\)
\(\def\sA{{\mathbb{A}}}\)
\(\def\sB{{\mathbb{B}}}\)
\(\def\sC{{\mathbb{C}}}\)
\(\def\sD{{\mathbb{D}}}\)
\(\def\sF{{\mathbb{F}}}\)
\(\def\sG{{\mathbb{G}}}\)
\(\def\sH{{\mathbb{H}}}\)
\(\def\sI{{\mathbb{I}}}\)
\(\def\sJ{{\mathbb{J}}}\)
\(\def\sK{{\mathbb{K}}}\)
\(\def\sL{{\mathbb{L}}}\)
\(\def\sM{{\mathbb{M}}}\)
\(\def\sN{{\mathbb{N}}}\)
\(\def\sO{{\mathbb{O}}}\)
\(\def\sP{{\mathbb{P}}}\)
\(\def\sQ{{\mathbb{Q}}}\)
\(\def\sR{{\mathbb{R}}}\)
\(\def\sS{{\mathbb{S}}}\)
\(\def\sT{{\mathbb{T}}}\)
\(\def\sU{{\mathbb{U}}}\)
\(\def\sV{{\mathbb{V}}}\)
\(\def\sW{{\mathbb{W}}}\)
\(\def\sX{{\mathbb{X}}}\)
\(\def\sY{{\mathbb{Y}}}\)
\(\def\sZ{{\mathbb{Z}}}\)
\(\def\E{{\mathbb{E}}}\)
\(\def\jac{{\mathbf{\mathrm{J}}}}\)
\(\def\argmax{{\mathop{\mathrm{arg}\,\mathrm{max}}}}\)
\(\def\argmin{{\mathop{\mathrm{arg}\,\mathrm{min}}}}\)
\(\def\Tr{{\mathop{\mathrm{Tr}}}}\)
\(\def\diag{{\mathop{\mathrm{diag}}}}\)
\(\def\vec{{\mathop{\mathrm{vec}}}}\)
\(\def\Kern{{\mathop{\mathrm{Kern}}}}\)
\(\def\llbracket{⟦}\)
\(\def\rrbracket{⟧}\)

Curriculum vitæ: Felix Dangel

Table of Contents

(Download as pdf)

Felix Dangel is a Postdoctoral researcher at the Vector Institute in Toronto. His goal is to build practical second-order optimizers to speed up training deep nets, and more broadly to develop practical algorithms that use higher-order information for deep learning tasks that build on second-order Taylor expansion (model merging and compression, uncertainty quantification, training data attribution, bi-level optimization, …). During his PhD with Philipp Hennig at the University of Tübingen and the Max Planck Institute for Intelligent Systems, he extended gradient backpropagation to efficiently extract higher-order information about the loss landscape of neural nets to improve their training. Before, he studied Physics at the University of Stuttgart with a main interest on simulating quantum many-body systems with tensor networks. He is passionate about

Education

now Postdoctoral researcher, Vector Institute, Toronto
-- Research statement: Deep Learning Needs More Than Just the Gradient
2023 With: Prof.Dr.Yaoliang Yu
   
2023 PhD in Computer Science, Max Planck Institute for Intelligent Systems & University of Tübingen
-- Thesis: Backpropagation beyond the Gradient
2018 Advisor: Prof.Dr.Philipp Hennig
   
2018 Researcher, University of Stuttgart
-- Paper: Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
2017 Host: Institute for Theoretical Physics 1
   
2017 MSc in Physics, University of Stuttgart
-- Thesis: Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
2015 Advisor: P.D.Holger Cartarius
   
2015 BSc in Physics, University of Stutgart
-- Thesis: Microscopic description of a coupling process for \(\mathcal{PT}\!\!\!\) -symmetric Bose-Einstein condensates
2012 Advisor: Prof.Dr.Günter Wunner

Publications

  • [pre-print 2024] Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks
    F. Dangel*, J. Mueller*, M. Zeinhofer* (pdf | arXiv | video)
  • [ICML 2024 workshop] Lowering PyTorch's Memory Consumption for Selective Differentiation
    S. Bhatia, F. Dangel (pdf | arXiv | code | poster)
  • [ICML 2024] Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning
    M. Elsayed, H. Farrahi, F. Dangel, R. Mahmood (pdf | arXiv | code)
  • [ICML 2024] Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
    W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. Turner, A. Makhzani (pdf | arXiv | poster | code)
  • [ICML 2024] Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
    W. Lin*, F. Dangel*, R. Eschenhagen, K. Neklyudov, A. Kristiadi, R. Turner, A. Makhzani (pdf | arXiv | code | poster)
  • [pre-print 2023] On the Disconnect Between Theory and Practice of Overparametrized Neural Networks
    J. Wenger, F. Dangel, A. Kristiadi (pdf | arXiv)
  • [pre-print 2023] Convolutions Through the Lens of Tensor Networks
    F. Dangel (pdf | arXiv | code | video)
  • [NeurIPS 2023] The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
    A. Kristiadi, F. Dangel, P. Hennig (pdf | arXiv)
  • [PhD thesis 2023] Backpropagation Beyond the Gradient
    F. Dangel (pdf | source | template)
  • [TMLR 2022] ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure
    F. Dangel*, L. Tatzel*, P. Hennig (pdf | journal | arXiv | code | www)
  • [NeurIPS 2021] Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
    F. Schneider*, F. Dangel*, P. Hennig (pdf | conference | arXiv | code | www | video)
  • [ICLR 2020 spotlight] BackPACK: Packing more into backprop
    F. Dangel*, F. Kunstner*, P. Hennig (pdf | conference | arXiv | code | www | video)
  • [AISTATS 2020] Modular Block-diagonal Curvature Approximations for Feedforward Architectures
    F. Dangel, S. Harmeling, P. Hennig (pdf | conference | arXiv | code | video)
  • [Phys. Rev. A 2018] Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
    F. Dangel*, M. Wagner*, H. Cartarius, J. Main, G. Wunner (pdf | journal | arXiv)
  • [Acta Polytechnica 2018] Numerical calculation of the complex Berry phase in non-Hermitian systems
    M. Wagner*, F. Dangel*, H. Cartarius, J. Main, G. Wunner (pdf | journal | arXiv)
  • [Master thesis 2017] Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
    F. Dangel (pdf)
  • [Bachelor thesis 2015] Mikroskopische Beschreibung eines Einkoppelprozesses für PT-symmetrische Bose-Einstein-Kondensate
    F. Dangel (pdf, German only)

Talks & Workshops

Teaching, Reviewing & Community Service

  • [2018-2022] Felix taught seven (7) iterations of software development practicals. In these courses, three PhD students supervise ~15 students whose task is to develop a machine learning prediction system for the German soccer league over the course of one term (example). The overall workload for a student is ~180 hours and the focus lies heavily on teaching good software development practices.
  • Felix has worked with various students on different projects:
    • [2023-2024] Samarth Bhatia (master student) worked on randomized autodiff for convolutions and wrote an ICML workshop paper identifying a sub-optimality in PyTorch's automatic differentiation.
    • [2022] Elisabeth Knigge (high school student, summer internship) worked on making deep learning optimization methods more approachable to non-experts through visualization. By combining Tübingen's interesting topology with optimization methods, she created intriguing wall art for the Tübingen AI building.
    • [2021-2022] Jessica Bader (research assistant) worked on broadening BackPACK's support for Kronecker-factorized curvature for Bayesian deep learning. She wrote the interface for negative log-likelihood losses to support KFAC and to enable applications with their Laplace approximation via the laplace-torch library.
    • [2021] Tim Schäfer (Master thesis), now a PhD student with Anna Levina, added support for ResNets and recurrent architectures to BackPACK. The underlying converter that makes these architectures compatible can easily be enabled through an optional argument while extending the model.
    • [2020-2021] Shrisha Bharadwaj (research assistant) improved BackPACK's code quality through additional tests, docstrings, and extended support for two-dimensional convolution to 1d and 3d.
    • [2019-2020] Paul Fischer (research project), now a PhD student with Christian Baumgartner, implemented and analyzed Hessian backpropagation for batch normalization to speed up its Hessian-vector product, that can be slow (page 7), through structural knowledge.
    • [2019] Christian Meier (Bachelor thesis): Activity prediction in smart home environments via Markov models.
  • He reviewed for top-tier machine learning conferences and journals
  • Served as reviewer for the Vector Scholarship in Artificial Intelligence 2023-2024

Created: 2024-08-13 Tue 15:54

Validate