\(\def\mymacro{{\mathbf{\alpha,\beta,\gamma}}}\)
\(\def\va{{\mathbf{a}}}\)
\(\def\vb{{\mathbf{b}}}\)
\(\def\vc{{\mathbf{c}}}\)
\(\def\vd{{\mathbf{d}}}\)
\(\def\ve{{\mathbf{e}}}\)
\(\def\vf{{\mathbf{f}}}\)
\(\def\vg{{\mathbf{g}}}\)
\(\def\vh{{\mathbf{h}}}\)
\(\def\vi{{\mathbf{i}}}\)
\(\def\vj{{\mathbf{j}}}\)
\(\def\vk{{\mathbf{k}}}\)
\(\def\vl{{\mathbf{l}}}\)
\(\def\vm{{\mathbf{m}}}\)
\(\def\vn{{\mathbf{n}}}\)
\(\def\vo{{\mathbf{o}}}\)
\(\def\vp{{\mathbf{p}}}\)
\(\def\vq{{\mathbf{q}}}\)
\(\def\vr{{\mathbf{r}}}\)
\(\def\vs{{\mathbf{s}}}\)
\(\def\vt{{\mathbf{t}}}\)
\(\def\vu{{\mathbf{u}}}\)
\(\def\vv{{\mathbf{v}}}\)
\(\def\vw{{\mathbf{w}}}\)
\(\def\vx{{\mathbf{x}}}\)
\(\def\vy{{\mathbf{y}}}\)
\(\def\vz{{\mathbf{z}}}\)
\(\def\vmu{{\mathbf{\mu}}}\)
\(\def\vsigma{{\mathbf{\sigma}}}\)
\(\def\vtheta{{\mathbf{\theta}}}\)
\(\def\vzero{{\mathbf{0}}}\)
\(\def\vone{{\mathbf{1}}}\)
\(\def\vell{{\mathbf{\ell}}}\)
\(\def\mA{{\mathbf{A}}}\)
\(\def\mB{{\mathbf{B}}}\)
\(\def\mC{{\mathbf{C}}}\)
\(\def\mD{{\mathbf{D}}}\)
\(\def\mE{{\mathbf{E}}}\)
\(\def\mF{{\mathbf{F}}}\)
\(\def\mG{{\mathbf{G}}}\)
\(\def\mH{{\mathbf{H}}}\)
\(\def\mI{{\mathbf{I}}}\)
\(\def\mJ{{\mathbf{J}}}\)
\(\def\mK{{\mathbf{K}}}\)
\(\def\mL{{\mathbf{L}}}\)
\(\def\mM{{\mathbf{M}}}\)
\(\def\mN{{\mathbf{N}}}\)
\(\def\mO{{\mathbf{O}}}\)
\(\def\mP{{\mathbf{P}}}\)
\(\def\mQ{{\mathbf{Q}}}\)
\(\def\mR{{\mathbf{R}}}\)
\(\def\mS{{\mathbf{S}}}\)
\(\def\mT{{\mathbf{T}}}\)
\(\def\mU{{\mathbf{U}}}\)
\(\def\mV{{\mathbf{V}}}\)
\(\def\mW{{\mathbf{W}}}\)
\(\def\mX{{\mathbf{X}}}\)
\(\def\mY{{\mathbf{Y}}}\)
\(\def\mZ{{\mathbf{Z}}}\)
\(\def\mStilde{\mathbf{\tilde{\mS}}}\)
\(\def\mGtilde{\mathbf{\tilde{\mG}}}\)
\(\def\mGoverline{{\mathbf{\overline{G}}}}\)
\(\def\mBeta{{\mathbf{\beta}}}\)
\(\def\mPhi{{\mathbf{\Phi}}}\)
\(\def\mLambda{{\mathbf{\Lambda}}}\)
\(\def\mSigma{{\mathbf{\Sigma}}}\)
\(\def\tA{{\mathbf{\mathsf{A}}}}\)
\(\def\tB{{\mathbf{\mathsf{B}}}}\)
\(\def\tC{{\mathbf{\mathsf{C}}}}\)
\(\def\tD{{\mathbf{\mathsf{D}}}}\)
\(\def\tE{{\mathbf{\mathsf{E}}}}\)
\(\def\tF{{\mathbf{\mathsf{F}}}}\)
\(\def\tG{{\mathbf{\mathsf{G}}}}\)
\(\def\tH{{\mathbf{\mathsf{H}}}}\)
\(\def\tI{{\mathbf{\mathsf{I}}}}\)
\(\def\tJ{{\mathbf{\mathsf{J}}}}\)
\(\def\tK{{\mathbf{\mathsf{K}}}}\)
\(\def\tL{{\mathbf{\mathsf{L}}}}\)
\(\def\tM{{\mathbf{\mathsf{M}}}}\)
\(\def\tN{{\mathbf{\mathsf{N}}}}\)
\(\def\tO{{\mathbf{\mathsf{O}}}}\)
\(\def\tP{{\mathbf{\mathsf{P}}}}\)
\(\def\tQ{{\mathbf{\mathsf{Q}}}}\)
\(\def\tR{{\mathbf{\mathsf{R}}}}\)
\(\def\tS{{\mathbf{\mathsf{S}}}}\)
\(\def\tT{{\mathbf{\mathsf{T}}}}\)
\(\def\tU{{\mathbf{\mathsf{U}}}}\)
\(\def\tV{{\mathbf{\mathsf{V}}}}\)
\(\def\tW{{\mathbf{\mathsf{W}}}}\)
\(\def\tX{{\mathbf{\mathsf{X}}}}\)
\(\def\tY{{\mathbf{\mathsf{Y}}}}\)
\(\def\tZ{{\mathbf{\mathsf{Z}}}}\)
\(\def\gA{{\mathcal{A}}}\)
\(\def\gB{{\mathcal{B}}}\)
\(\def\gC{{\mathcal{C}}}\)
\(\def\gD{{\mathcal{D}}}\)
\(\def\gE{{\mathcal{E}}}\)
\(\def\gF{{\mathcal{F}}}\)
\(\def\gG{{\mathcal{G}}}\)
\(\def\gH{{\mathcal{H}}}\)
\(\def\gI{{\mathcal{I}}}\)
\(\def\gJ{{\mathcal{J}}}\)
\(\def\gK{{\mathcal{K}}}\)
\(\def\gL{{\mathcal{L}}}\)
\(\def\gM{{\mathcal{M}}}\)
\(\def\gN{{\mathcal{N}}}\)
\(\def\gO{{\mathcal{O}}}\)
\(\def\gP{{\mathcal{P}}}\)
\(\def\gQ{{\mathcal{Q}}}\)
\(\def\gR{{\mathcal{R}}}\)
\(\def\gS{{\mathcal{S}}}\)
\(\def\gT{{\mathcal{T}}}\)
\(\def\gU{{\mathcal{U}}}\)
\(\def\gV{{\mathcal{V}}}\)
\(\def\gW{{\mathcal{W}}}\)
\(\def\gX{{\mathcal{X}}}\)
\(\def\gY{{\mathcal{Y}}}\)
\(\def\gZ{{\mathcal{Z}}}\)
\(\def\sA{{\mathbb{A}}}\)
\(\def\sB{{\mathbb{B}}}\)
\(\def\sC{{\mathbb{C}}}\)
\(\def\sD{{\mathbb{D}}}\)
\(\def\sF{{\mathbb{F}}}\)
\(\def\sG{{\mathbb{G}}}\)
\(\def\sH{{\mathbb{H}}}\)
\(\def\sI{{\mathbb{I}}}\)
\(\def\sJ{{\mathbb{J}}}\)
\(\def\sK{{\mathbb{K}}}\)
\(\def\sL{{\mathbb{L}}}\)
\(\def\sM{{\mathbb{M}}}\)
\(\def\sN{{\mathbb{N}}}\)
\(\def\sO{{\mathbb{O}}}\)
\(\def\sP{{\mathbb{P}}}\)
\(\def\sQ{{\mathbb{Q}}}\)
\(\def\sR{{\mathbb{R}}}\)
\(\def\sS{{\mathbb{S}}}\)
\(\def\sT{{\mathbb{T}}}\)
\(\def\sU{{\mathbb{U}}}\)
\(\def\sV{{\mathbb{V}}}\)
\(\def\sW{{\mathbb{W}}}\)
\(\def\sX{{\mathbb{X}}}\)
\(\def\sY{{\mathbb{Y}}}\)
\(\def\sZ{{\mathbb{Z}}}\)
\(\def\E{{\mathbb{E}}}\)
\(\def\jac{{\mathbf{\mathrm{J}}}}\)
\(\def\argmax{{\mathop{\mathrm{arg}\,\mathrm{max}}}}\)
\(\def\argmin{{\mathop{\mathrm{arg}\,\mathrm{min}}}}\)
\(\def\Tr{{\mathop{\mathrm{Tr}}}}\)
\(\def\diag{{\mathop{\mathrm{diag}}}}\)
\(\def\vec{{\mathop{\mathrm{vec}}}}\)
\(\def\Kern{{\mathop{\mathrm{Kern}}}}\)
\(\def\llbracket{⟦}\)
\(\def\rrbracket{⟧}\)

Curriculum vitæ: Felix Dangel

Table of Contents

(Download as pdf)

Felix Dangel is a Postdoctoral researcher at the Vector Institute in Toronto. His goal is to build practical second-order optimizers to speed up training deep nets, and more broadly to develop practical algorithms that use higher-order information for deep learning tasks that build on second-order Taylor expansion (model merging and compression, uncertainty quantification, training data attribution, bi-level optimization, …). During his PhD with Philipp Hennig at the University of Tübingen and the Max Planck Institute for Intelligent Systems, he extended gradient backpropagation to efficiently extract higher-order information about the loss landscape of neural nets to improve their training. Before, he studied Physics at the University of Stuttgart with a main interest on simulating quantum many-body systems with tensor networks. He is passionate about

Education

now Postdoctoral researcher, Vector Institute, Toronto
-- Research statement: Deep Learning Needs More Than Just the Gradient
2023 With: Prof.Dr.Yaoliang Yu
   
2023 PhD in Computer Science, Max Planck Institute for Intelligent Systems & University of Tübingen
-- Thesis: Backpropagation beyond the Gradient
2018 Advisor: Prof.Dr.Philipp Hennig
   
2018 Researcher, University of Stuttgart
-- Paper: Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
2017 Host: Institute for Theoretical Physics 1
   
2017 MSc in Physics, University of Stuttgart
-- Thesis: Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
2015 Advisor: P.D.Holger Cartarius
   
2015 BSc in Physics, University of Stutgart
-- Thesis: Microscopic description of a coupling process for \(\mathcal{PT}\!\!\!\) -symmetric Bose-Einstein condensates
2012 Advisor: Prof.Dr.Günter Wunner

Publications

  • [pre-print 2024] What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
    W. Ormaniec, F. Dangel, S. P. Singh
  • [pre-print 2024] Fast Fractional Natural Gradient Descent using Learnable Spectral Factorizations
    W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. E. Turner, R. B. Grosse
  • [pre-print 2024] Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemmanian Geometry Finds It
    M. F. da Silva, F. Dangel, S. Oore
  • [NeurIPS 2024] Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks
    F. Dangel*, J. Mueller*, M. Zeinhofer* (pdf | arXiv | video)
  • [NeurIPS 2024] Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods
    F. Dangel (pdf | arXiv | code | video)
  • [ICML 2024 workshop] Lowering PyTorch's Memory Consumption for Selective Differentiation
    S. Bhatia, F. Dangel (pdf | arXiv | code | poster | bug report)
  • [ICML 2024] Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning
    M. Elsayed, H. Farrahi, F. Dangel, R. Mahmood (pdf | arXiv | code)
  • [ICML 2024] Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
    W. Lin, F. Dangel, R. Eschenhagen, J. Bae, R. Turner, A. Makhzani (pdf | arXiv | poster | code)
  • [ICML 2024] Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
    W. Lin*, F. Dangel*, R. Eschenhagen, K. Neklyudov, A. Kristiadi, R. Turner, A. Makhzani (pdf | arXiv | code | poster)
  • [pre-print 2023] On the Disconnect Between Theory and Practice of Overparametrized Neural Networks
    J. Wenger, F. Dangel, A. Kristiadi (pdf | arXiv)
  • [NeurIPS 2023] The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
    A. Kristiadi, F. Dangel, P. Hennig (pdf | arXiv)
  • [PhD thesis 2023] Backpropagation Beyond the Gradient
    F. Dangel (pdf | source | template)
  • [TMLR 2022] ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure
    F. Dangel*, L. Tatzel*, P. Hennig (pdf | journal | arXiv | code | www)
  • [NeurIPS 2021] Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
    F. Schneider*, F. Dangel*, P. Hennig (pdf | conference | arXiv | code | www | video)
  • [ICLR 2020 spotlight] BackPACK: Packing more into backprop
    F. Dangel*, F. Kunstner*, P. Hennig (pdf | conference | arXiv | code | www | video)
  • [AISTATS 2020] Modular Block-diagonal Curvature Approximations for Feedforward Architectures
    F. Dangel, S. Harmeling, P. Hennig (pdf | conference | arXiv | code | video)
  • [Phys. Rev. A 2018] Topological invariants in dissipative extensions of the Su-Schrieffer-Heeger model
    F. Dangel*, M. Wagner*, H. Cartarius, J. Main, G. Wunner (pdf | journal | arXiv)
  • [Acta Polytechnica 2018] Numerical calculation of the complex Berry phase in non-Hermitian systems
    M. Wagner*, F. Dangel*, H. Cartarius, J. Main, G. Wunner (pdf | journal | arXiv)
  • [Master thesis 2017] Bosonic many-body systems with topologically nontrivial phases subject to gain and loss
    F. Dangel (pdf)
  • [Bachelor thesis 2015] Mikroskopische Beschreibung eines Einkoppelprozesses für PT-symmetrische Bose-Einstein-Kondensate
    F. Dangel (pdf, German only)

Talks & Workshops

Teaching, Reviewing & Community Service

Created: 2024-10-02 Wed 11:43

Validate