Attend First: Consolidate Later: On the Importance of Attention in Different LLM Layers

BlackBox NLP 2024
Amit Ben-Artzy, Roy Schwartz