about

I am a staff research scientist at Google DeepMind based in Mountain View, California. I work on Gemini pretraining. Previously I was a PhD student at the Language Technology Lab, University of Cambridge, supervised by Professor Nigel Collier. I worked on topics such as pretraining for image-to-code capabilities (see MatCha, DePlot, Pix2Struct) and evaluations of multimodal LLMs (see VSR, MaRVL). My co-authored works won best paper awards at EMNLP 2021 (MaRVL) and EACL 2023 (WinoDict).

education

University of Cambridge
MPhil (2020) and PhD student (2020-2023), Computation, Cognition and Language
University of Waterloo
Bachelor of Mathematics (2019), Computer Science

selected publications (see all publications)

The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
Fredrik Carlsson, Fangyu Liu, Daniel_Ward, Murathan Kurfali, Joakim Nivre
ICLR’25 (The 13th International Conference on Learning Representations), April 2025

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini Team, Google
Technical Report’24

Gemini: A Family of Highly Capable Multimodal Models
Gemini Team, Google
Technical Report’23

DePlot: One-shot visual language reasoning by plot-to-table translation
Fangyu Liu*, Julian Martin Eisenschlos*, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun
ACL’23-Findings

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos
ACL’23

Visual Spatial Reasoning
Fangyu Liu, Guy Emerson, Nigel Collier
TACL’23

WinoDict: Probing language models for in-context word acquisition
Julian Martin Eisenschlos, Jeremy R. Cole, Fangyu Liu, William W. Cohen
EACL’23
Best Paper Award

Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour
Fangyu Liu, Julian Martin Eisenschlos, Jeremy R. Cole, Nigel Collier
AACL-IJCNLP’22

IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello, Fangyu Liu, Jonas Pfeiffer, Siva Reddy, Desmond Elliott, Edoardo Maria Ponti, and Ivan Vulić
ICML’22

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations
Fangyu Liu, Yunlong Jiao, Jordan Massiah, Emine Yilmaz, Serhii Havrylov
ICLR’22

Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu*, Emanuele Bugliarello*, Edoardo Maria Ponti, Siva Reddy, Nigel Collier, Desmond Elliott
EMNLP’21
Best Long Paper Award

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders
Fangyu Liu, Ivan Vulić, Anna Korhonen, Nigel Collier
EMNLP’21

Self-Alignment Pretraining for Biomedical Entity Representations
Fangyu Liu, Ehsan Shareghi, Zaiqiao Meng, Marco Basaldella, Nigel Collier
NAACL’21

services

conference senior PC member / (senior) area chair: ACL 2025, EMNLP 2023, IJCNLP-AACL 2023, IJCAI 2021

conference PC member/reviewer: NeurIPS (2022-2023), ICLR (2024-2025), ACL (2021-2025), EACL (2023-2024), EMNLP (2022-2024), ICML (2022-2024), IJCAI (2021-2023), ACL Roling Review (Oct. 2021 - present), AAAI (2021-2023), WACV (2021)

journal reviewer: PNAS, Nature Communications, JAIR, Neural Networks, IEEE TNNLS, IEEE TSE, IEEE TKDE, IEEE/ACM TASLP, Neurocomputing, NCAA

reviewer awards: ACL 2021 Outsanding Reviewer, AAAI 2021 top 25% PC member

workshop organiser: 1st workshop on Multilingual Multimodal Learning @ACL 2022

volunteer: EMNLP 2022, ACL 2022, EMNLP 2021, ACL-IJCNLP 2021, NAACL 2021, AAAI 2021

(stale) news (that I am not actively updating)

[Dec 6th, 2023] We released Gemini ♊. Here’s some highlights of its MM reasoning capabilities.

[May 4th, 2023] Thrilled that our work WinoDict (by Julian, Jeremy, me, and William) received Best Paper Award at EACL 2023!

[May 1st, 2023] Am in Dubrovnik 🇭🇷 for EACL’23!

[Dec 20th, 2022] We released MatCha🍵 and DePlot📊, visual language understanding on plots and charts with plot derendering pretraining and LLM reasoning!

[Dec 1st, 2022] I’ll be going to Abu Dhabi (🇦🇪) in person for EMNLP’22. Feel free to drop me a line if you’d like to chat :)

[May 15th, 2022] I’m attending ACL 2022 (Dublin, Ireland 🇮🇪) in person. Let’s grab a coffee and chat!

[Mar 17th, 2022] One paper on sentence representaiton accepted to ICLR 2022; four papers on knowledge probing, KB completion, word translation, and text generation accepted to ACL 2022 main conference.

[Oct 29th, 2021] MaRVL won the Best Long Paper Award at EMNLP 2021! Congrats to my dear collaborators: Emanuele Bugliarello (co-lead), Edoardo Maria Ponti, Siva Reddy, Nigel Collier, Desmond Elliott.

[Sep 28th, 2021] Releasing Trans-Encoder: unsupervised knowledge distillation from a pretrained language model to itself, by alternating between its bi- and cross-encoder forms.

[Aug 27th, 2021] Four papers accepted to the main conference of EMNLP 2021. See you in Punta Cana 🇩🇴 (~~hopefully?~~yes!).

[Aug 27th, 2021] SapBERT is integrated into NVIDIA’s deep learning toolkit NeMo as its entity linking module (thank you NVIDIA!). They even wrote a tutorial – check out this Google Colab.

[May 6th, 2021] A cross-lingual extension of SapBERT will appear at ACL-IJCNLP 2021.

[April 16th, 2021] Happy to have given a talk about SapBERT (our recent NAACL paper) at AstraZeneca’s NLP seminar. Here are the slides.

[April 15th, 2021] We released Mirror-BERT, a fast, effective, and self-supervised approach for transforming masked language models to universal language encoders.

mail

~~fl399 [at] cam [dot] ac [dot] uk~~ liufangyu [at] google [dot] com