I am a Research Scientist at Google DeepMind based in Mountain View, California, where I help build multimodal large language models with frontier capabilities (e.g., MatCha, DePlot, Pix2Struct). Previously I was a PhD student at the Language Technology Lab, University of Cambridge, supervised by Professor Nigel Collier. My co-authored works won best paper awards at EMNLP 2021 (MaRVL) and EACL 2023 (WinoDict).


  • [May 4th, 2023] Thrilled that our work WinoDict (by Julian, Jeremy, me, and William) received Best Paper Award at EACL 2023!
  • [May 1st, 2023] Am in Dubrovnik 🇭🇷 for EACL’23!
  • [Dec 20th, 2022] We released MatCha🍵 and DePlot📊, visual language understanding on plots and charts with plot derendering pretraining and LLM reasoning!
  • [Dec 1st, 2022] I’ll be going to Abu Dhabi (🇦🇪) in person for EMNLP’22. Feel free to drop me a line if you’d like to chat :)
  • [May 15th, 2022] I’m attending ACL 2022 (Dublin, Ireland 🇮🇪) in person. Let’s grab a coffee and chat!
  • [Mar 17th, 2022] One paper on sentence representaiton accepted to ICLR 2022; four papers on knowledge probing, KB completion, word translation, and text generation accepted to ACL 2022 main conference.
  • [Oct 29th, 2021] MaRVL won the Best Long Paper Award at EMNLP 2021! Congrats to my dear collaborators: Emanuele Bugliarello (co-lead), Edoardo Maria Ponti, Siva Reddy, Nigel Collier, Desmond Elliott.
  • [Sep 28th, 2021] Releasing Trans-Encoder: unsupervised knowledge distillation from a pretrained language model to itself, by alternating between its bi- and cross-encoder forms.
  • [Aug 27th, 2021] Four papers accepted to the main conference of EMNLP 2021. See you in Punta Cana 🇩🇴 (hopefully?yes!).
  • [Aug 27th, 2021] SapBERT is integrated into NVIDIA’s deep learning toolkit NeMo as its entity linking module (thank you NVIDIA!). They even wrote a tutorial – check out this Google Colab.
  • [May 6th, 2021] A cross-lingual extension of SapBERT will appear at ACL-IJCNLP 2021.
  • [April 16th, 2021] Happy to have given a talk about SapBERT (our recent NAACL paper) at AstraZeneca’s NLP seminar. Here are the slides.
  • [April 15th, 2021] We released Mirror-BERT, a fast, effective, and self-supervised approach for transforming masked language models to universal language encoders.


University of Cambridge
MPhil (2020) and PhD student (2020-2023), Computation, Cognition and Language
University of Waterloo
Bachelor of Mathematics (2019), Computer Science

selected publications (see all publications)

DePlot: One-shot visual language reasoning by plot-to-table translation
Fangyu Liu*, Julian Martin Eisenschlos*, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos

Visual Spatial Reasoning
Fangyu Liu, Guy Emerson, Nigel Collier

WinoDict: Probing language models for in-context word acquisition
Julian Martin Eisenschlos, Jeremy R. Cole, Fangyu Liu, William W. Cohen
Best Paper Award

Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour
Fangyu Liu, Julian Martin Eisenschlos, Jeremy R. Cole, Nigel Collier

IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello, Fangyu Liu, Jonas Pfeiffer, Siva Reddy, Desmond Elliott, Edoardo Maria Ponti, and Ivan Vulić

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations
Fangyu Liu, Yunlong Jiao, Jordan Massiah, Emine Yilmaz, Serhii Havrylov

Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu*, Emanuele Bugliarello*, Edoardo Maria Ponti, Siva Reddy, Nigel Collier, Desmond Elliott
Best Long Paper Award

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders
Fangyu Liu, Ivan Vulić, Anna Korhonen, Nigel Collier

Self-Alignment Pretraining for Biomedical Entity Representations
Fangyu Liu, Ehsan Shareghi, Zaiqiao Meng, Marco Basaldella, Nigel Collier


  • workshop organiser: 1st workshop on Multilingual Multimodal Learning @ACL 2022
  • conference senior PC member / area chair: EMNLP 2023, IJCNLP-AACL 2023, IJCAI 2021
  • conference PC member/reviewer: NeurIPS (2022-2023), ACL (2021-2023), EACL (2023), EMNLP (2022-2023), ICML (2022-2023), IJCAI (2021-2023), ACL Roling Review (Oct. 2021 - present), AAAI (2021-2023), WACV (2021)
  • reviewer awards: ACL 2021 Outsanding Reviewer, AAAI 2021 top 25% PC member
  • journal reviewer: JAIR, Neural Networks, IEEE TNNLS, IEEE TSE, IEEE/ACM TASLP, Neurocomputing, NCAA
  • volunteer: EMNLP 2022, ACL 2022, EMNLP 2021, ACL-IJCNLP 2021, NAACL 2021, AAAI 2021


fl399 [at] cam [dot] ac [dot] uk liufangyu [at] google [dot] com