Home (11/08) MEGA: Moving Average Equipped Gated Attention
Post
Cancel

(11/08) MEGA: Moving Average Equipped Gated Attention

Hello!

The second seminar of “Gauss Jr. Colloquium” seminar series is reserved.

Dr. Chunting Zhou will give us a talk about her latest paper, Mega: Moving Average Equipped Gated Attention. This seminar will be a great opportunity to learn the up-to-date research on sequence models. The detailed information is given below.

Thank you!

Mega: Moving Average Equipped Gated Attention

Date & Time

2022-11-08 09:30 AM - 11:00 AM

Abstract

The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences. In this talk, I will introduce Mega, a simple, theoretically grounded, single-head gated attention mechanism equipped with (exponential) moving average to incorporate inductive bias of position-aware local dependencies into the position-agnostic attention mechanism. We further propose a variant of Mega that offers linear time and space complexity yet yields only minimal quality loss, by performing chunk-wise attention on input sequences. Extensive experiments on a wide range of sequence modeling benchmarks, including the Long Range Arena, neural machine translation, autoregressive language modeling, and image and speech classification, show that Mega achieves significant improvements over other sequence models, including variants of Transformers and recent state space models.

Bio

Chunting Zhou is a Research Scientist at Meta AI. She received her PhD degree in May 2022 from Language Technologies Institute, Carnegie Mellon University, advised by Graham Neubig. Her research aims to improve the robustness (in terms of distribution shift) and efficiency (in terms of data and inference efficiency) of natural language processing (NLP) systems and has multiple publications in the top-tier ML or NLP conferences. Chunting has received a CMU Presidential Fellowship in LTI and D. E. Shaw Zenith Fellowship during her PhD.

This post is licensed under CC BY 4.0 by the author.

(10/24) First Seminar by Prof. Jonathan Frankle

-