I am a third-year Ph.D. student at University of California, Los Angeles’s Department of Computer Science, fortunate to be advised by Professor Wei Wang.

Prior to that, I obtained my Computer Science bachelor’s degree from Tsinghua University, fortunate to be advised by Prof. Jie Tang.

My research interest includes machine learning with biology and natural language processing.

  • Derive complex, information-rich properties (e.g., structural characteristics) from primary properties [first principle].
  • Model molecular interactions (protein, ligand, DNA) to make drug discovery more targeted.
  • Bridge biological data with user-friendly data modalities (texts, images) using large language models (LLMs).

πŸ”₯ News

  • 2024.10: Β πŸŽ‰πŸŽ‰ One paper accepted at Neurips workshop.
  • 2024.10: Β πŸŽ‰πŸŽ‰ Three papers accepted at EMNLP.
  • 2024.07: Β πŸŽ‰πŸŽ‰ Started the internship at Amazon Web Service.

πŸ“ Publications

EMNLP 2024 Main
CPPLM

Large Language Models Can Be Contextual Privacy Protection Learners

Yijia Xiao, Yiqiao Jin, Yushi Bai, Yue Wu, Xianjun Yang, Xiao Luo, Wenchao Yu, Xujiang Zhao, Yanchi Liu, Haifeng Chen, Wei Wang, Wei Cheng

Abstract: We introduce CPPLM (Contextual Privacy Protection Fine-Tuning for LLM), which emphasizes instruction-based tuning with positive and negative examples, enabling LLMs to capture knowledge while preserving privacy.

MLSB, Neurips 2024
RNA-GPT

RNA-GPT: Multimodal Generative System for RNA Sequence Understanding

Yijia Xiao, Edward Sun, Yiqiao Jin, Wei Wang

Abstract: RNA-GPT combines RNA sequence encoders with state-of-the-art LLMs for precise representation alignment, streamlining RNA research by providing accurate responses to RNA queries.

Research Benchmark
LogicVista

LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts

Yijia Xiao, Edward Sun, Tianyu Liu, Wei Wang

Abstract: LogicVista is an evaluation benchmark designed to assess logical reasoning capabilities of MLLMs in visual contexts, encompassing multiple logical reasoning tasks and capabilities.

ArXiv Preprint
ProteinGPT

ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding

Yijia Xiao, Edward Sun, Yiqiao Jin, Qifan Wang, Wei Wang

Abstract: ProteinGPT enables comprehensive protein analysis by allowing users to upload sequences and structures, providing contextually relevant responses to streamline protein research.

Huggingface Demonstration: https://huggingface.co/spaces/AI-BIO/ProteinGPT-Llama3.

Pretrain@KDD 2021
sym

Modeling protein using large-scale pretrain language model

Yijia Xiao, Jiezhong Qiu, Ziang Li, Chang-Yu Hsieh, Jie Tang

Abstract: Introducing ProteinLM, a suite of large-scale protein language models comprising 3 billion parameters. ProteinLM enhances contact prediction accuracy from 36% to 75%, showcasing its efficiency in capturing evolutionary data. Our resources are accessible to the public at https://github.com/THUDM/ProteinLM.

Wikipedia [Wikipedia: Wen Su]

Neurips 2023
sym

Benchmarking foundation models with language-model-as-an-examiner

Yushi Bai, Jiahao Ying, Yixin Cao, Xin Lv, Yuze He, Xiaozhi Wang, Jifan Yu, Kaisheng Zeng, Yijia Xiao, Haozhe Lyu, Jiayin Zhang, Juanzi Li, Lei Hou

Abstract: We propose Language-Model-as-an-Examiner, a novel benchmarking method that utilizes an LM as a knowledgeable examiner to construct dataset and evaluate other models.

πŸŽ– Honors and Awards

  • 2021 Research Excellence Scholarship (Top 2%, 3 / 230), Tsinghua University.
  • 2020 Silver Medal, ICPC Asia East Continent Final.
  • 2020 Gold Medal, ICPC Asia Regional Contest.
  • 2019 First Prize, Chinese Collegiate Physics Olympiad.
  • 2017 National Bronze, Chinese Physics Olympiad.

πŸ“– Educations

  • Ph.D. Student, Computer Science, 2022 - Now
  • Bachelor, Computer Science and Technology, 2018 - 2022

πŸ’¬ Invited Talks

  • 2024, Delivered a report on the application of machine learning in biomedical scenarios at dknet.
  • 2022, Delivered a talk on efficient pre-training of large-scale protein language models at BioMap.
  • 2021, Delivered a talk on the progress and applications of pre-trained protein models to AI start-ups at Beijing Academy of Artificial Intelligence.

πŸ’» Internships