About-CN
中文Bio:
张文涛,北京大学国际机器学习研究中心助理教授、研究员、博士生导师,研究兴趣为数据驱动的机器学习、图机器学习、机器学习系统和AI4Science。 他近5年在机器学习(ICML, NeurIPS, ICLR)、数据挖掘(KDD, WWW)和数据库(SIGMOD, VLDB, ICDE)等领域发表论文40余篇,并获得多个最佳论文奖(如第一作者获WWW’22 Best Student Paper Award 和 通讯作者获APWeb-WAIM’23 Best Paper Runner Up Award)。他领导或参与开源了多个机器学习系统,如大规模图学习系统SGL、分布式机器学习系统Angel、和黑盒优化系统OpenBox。他曾获2021年度亚太地区唯一的Apple Scholar、2022世界人工智能大会云帆奖等多项荣誉。
邮箱: wentao.zhang@pku.edu.cn
Wechat (微信): z1299799152
办公地点: 北京大学静园6号院217室
2023年秋季入学国际机器学习研究中心的博士轮转名额还有2个空缺!
2024年秋季入学的申请考核制博士、硕士(北大数院大数据硕士项目)名额还有空缺,长期招收实习生(可远程校外实习),如感兴趣请直接联系我!
Research Interests
Data-centric ML: annotation, augmentation, imbalance, noise, distillation, out-of-distribution, heterogeneity, and privacy.
Graph ML: graph neural network, graph representation learning.
ML Systems: large-scale distributed training, AutoML, and data-centric ML platform.
Interdisciplinary Application: AI4Science (e.g., drug and protein), and AI4Industry (e.g.,recommender system and anomaly detection).
A summary of my recent works:
- Data-centric ML on graph: how to prepare high quality and quantity data for graph ML?
- Data annotation
- Better efficiency [ALG, SIGMOD 21]
- Model free [Grain, VLDB 21]
- Noise handling[RIM, NeurIPS 21, Spotlight]
- Simplifying the labeling task [IGP, ICLR 22]
- Feature engineering (Complex model –> better features + simple model)
- Feature/label smoothing + simple model [NDLS, NeurIPS 21, Spotlight]
- Unsupervised and non-parametric feature smoothing [NAFS, ICML 22]
- Graph-based MLP deployed at Tencent [GAMLP, KDD 22]
- Inference at large scale [NAI, ICDE 24]
- Experimental evaluation [AIR, KDD 22]
- Data distillation
- Offline distillation [RDD, SIGMOD 20]
- Online distillation [ROD, KDD 21]
- Data annotation
- ML Systems: how to make machine learning faster and easier?
- Distributed ML & AutoML
- Distributed NAS on graph [PasCa, WWW 22, Best Student Paper Award]
- Deep and flexible NAS on graph [DF-GNAS, ICML 22]
- Scalable graph learning [SGL]
- Distributed graph learning [Angel Graph]
- End-to-End AutoML [MindWare, VLDB 21]
- Black box optimization [OpenBox, KDD 21]
- Large-scale hyper-parameter tuning [Hyper-Tune, VLDB 22]
- Distributed GNN training[The First Survey of Distributed GNN Training, Arxiv 22]
- Distributed ML & AutoML
- Interdisciplinary Application: how to use machine learning in real applications?
- AI4Industry
- GNN-based recommendation [The First Survey of GNN-based RS, CSUR 22]
- GNN-based recommendation system deployed at Taobao [Zoomer, ICDE 22]
- AI4Science
- Diffusion models [The First Survey of Diffusion Models, CSUR 23]
- AutoML for biology [AutoDC, Bioinformatics 22]
- AI4Industry
What's New
- 2023-10: One paper is accepted by ICDE 2024.
- 2023-10: 🏆 We win the Best Paper Runner Up Award in APWeb-WAIM 2023.
- 2023-09: One paper is accepted by ACM Computing Survey 2023.
- 2023-09: One paper is accepted by NeurIPS 2023.
- 2023-08: One paper is accepted by VLDB 2024.
- 2023-08: One paper is accepted by APWEB-WAIM 2023.
- 2023-08: One paper is accepted by CIKM 2023.
- 2023-08: Our book about Diffusion Model is now avaliable.
- 2023-05: One paper is accepted by TKDE 2023.
- 2023-05: One paper is accepted by SIGKDD 2023.
- 2023-05: One paper is accepted by VLDB 2023.
- 2023-03: One paper is accepted by SIGMOD 2023.
- 2022-11: One paper is accepted by AAAI 2023.
- 2022-11: One paper is accepted by ICDE 2023.
- 2022-10: One paper is accepted by VLDBJ 2022.
- 2022-09: One paper is accepted by NeurIPS 2022.
- 2022-09: I am awared Rising Star (云帆奖-明日之星) in World AI Conference, 2022.
- 2022-06: I am honor to present the valedictorian for the class of 2022 in CS of PKU.
- 2022-06: I receive my Ph.D. degree in computer science from Peking University with Outstanding Doctoral Dissertation Award.
- 2022-05: One paper is accepted by the journal VLDBJ 2022.
- 2022-05: Four papers are accepted by the conference SIGKDD 2022.
- 2022-05: Two papers as first author, have been accepted by ICML 2022.
- 2022-05: One paper related to AutoML, has been accepted by Bioinformatics 2022.
- 2022-04: 🏆 We win the Best Student Paper Award in WWW 2022 !
- 2022-04: We release our first version of the scalable graph learning toolkit–SGL.
- 2022-03: One paper is selected as the Best Paper Award Nominees in WWW 2022. The corresponding PasCa system (integrated into SGL) will be open source next month!
- 2022-03: One paper as corresponding author, related to GNN-based Recommendation, has been accepted by the journal ACM Computing Survey 2022 .
- 2022-01: One paper related to graph-based recommendation, has been accepted by the conference ICDE 2022 .
- 2022-01: One paper as first author, related to graph data annotation, has been accepted by the conference ICLR 2022 .
- 2022-01: One paper related to our large scale Hyper-paramater Tuning system, has been accepted by the conference VLDB 2022 .
- 2022-01: I accepted the invitation to serve as Program Committee member of the Research Track of ACM SIGKDD 2022.
- 2022-01: One paper as first author, related to our scalable graph NAS system, has been accepted by the conference WWW 2022 .
- 2021-12: Our OpenBox team won the “Outstanding Winner” at the openGCC contest in CCF ChinaSoft 2021. Congratulations!
- 2021-09: Two papers as first author, related to scalable graph learning and graph data annotation, have been accepted by the conference NeurIPS 2021 with Spotlight (< 3%).
- 2021-08: We propose GAMLP, a scalable and efficient graph model, which achieves the top #1 performance in three public and largest ogbn graphs (i.e., ogbn-papers100M, ogbn-products, and ogbn-mag)! See the leaderboards here.
- 2021-07: One paper as first author, related to large-scale graph data selection, has been accepted by the conference VLDB 2021.
- 2021-07: One paper as co-first author, related to deep GNN, has been accepted by the journal TKDE 2021.
- 2021-06: One paper as third author, related to our AutoML system – VocalnoML, has been accepted by the conference VLDB 2021.
- 2021-05: Three papers, related to sparse graph, graph decomposition and our blackbox optimization (BBO) system – OpenBox, are accepted by the conference SIGKDD 2021.
- 2021-03: As the only person in China, I was supported by the Apple Scholars in AI/ML PhD fellowship. Many thanks to Apple!
- 2021-03: One paper as first author has been accepted by the conference SIGMOD 2021. Looking forward to the meeting in Xi’an this summer!
Contributed Open-source Projects
- Angel: a high-performance distributed machine learning and graph computing platform, jointly designed by Tencent and PKU.
SGL: a scalable graph learning toolkit for extremely large graph datasets.
MindWare: a powerful AutoML system, which automates feature engineering, algorithm selection and hyperparameter tuning.
- OpenBox: an efficient open-source system designed for solving generalized black-box optimization (BBO) problems.
Selected Awards
- 🏆 Best Paper Runner Up Award, APWeb-WAIM 2023.
- Rising Star (云帆奖-明日之星), World AI Conference, 2022.
- 🏆 Best Student Paper Award of WWW 2022 (1/1822, the second WWW Best Student Paper from China), 2022
- IVADO Postdoctoral Fellowship, Canada
- Outstanding Doctoral Dissertation Award, Peking University (Sole winner in Computer Software and Theory), 2022
- Outstanding Graduate of Beijing, China, 2022
- Candidate of May 4th Medal (Each School recommends 1 candidate, highest honor in PKU), 2022
- The Big Data Expo Leading Technology Achievement Award, China International Big Data Industry Expo (Angel Graph project), 2022
- Candidate of People of the Year (1 people in EECS, and 42 people in PKU), 2021
- Merit Student of Beijing (2 people in EECS, and 58 people in PKU), 2021
- Apple PhD Fellowship (1 people in China, and 15 people in the world), 2021
- National Scholarship (Top 1% in PKU), 2019, 2021
- Baidu Scholarship Nominee (20 people in the world), 2021
Selected Competitions
- Outstanding Winner of the openGCC contest in CCF ChinaSoft (1/3814), 2021
- Rank #1 in Open Graph Benchmark, 2021
- Outstanding Winner of the BDIC Big Data Competition (1/575), 2018
Selected Program Committee Member and Reviewer
- Database and Data Management:
- ICDE 2023
- DASFFA 2022
- VLDBJ 2022, 2023
- Machine Learning:
- ICML 2021, 2022, 2023
- NeurIPS 2022, 2023
- ICLR 2024
- JMLR 2023
- Machine Learning 2023
- LoG 2024
- Data Mining:
- SIGKDD 2021, 2022, 2023
- SDM 2024
- WWW 2022
- DASFFA 2022, 2023, 2024
- IEEE TKDE 2022
- IEEE TNNLS 2022
- PAKDD 2023, 2024
- Others:
- ICCV 2023
- CVPR 2023, 2024
- SCIS 2023
Invited Talks
I am happy to give a talk if you are interested in my work. 😊
- Model Degradation Hinders Deep Graph Neural Networks.
KDD’22, 2022. 08 - Graph Attention Multi-Layer Perceptron.
KDD’22, 2022. 08 - NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning.
AI Time [News]
ICML’22, Virtual, 2022. 07
Jiqizhixin, Virtual, 2022. 07 [News][Slides] - Deep and Flexible Graph Neural Architecture Search.
ICML’22, Virtual, 2022. 07
Jiqizhixin, Virtual, 2022. 07 - Towards Large Scale Graph Learning: Data, Model and System.《大规模图学习:数据、模型与系统》
THU, Virtual, 2023.02
PKU, Virtual, 2023.02
SUSTech, 2023.01
HKUST (Guang Zhou), Virtual, 2022.04 [News]
Stanford, Virtual, 2021.11
Mila, Virtual, 2021.9 - Towards Automated Graph Learning. 《自动化图机器学习》 [Doc]
HKUST, Virtual, 2022.11[News]
NUDT, Virtual, 2022. 07
HUST, 2022. 08
Zhejiang University, 2022. 08 - Information gain propagation a new way to graph active learning with soft labels. 《软标签场景下的图主动学习》
AI Time, Virtual, 2022. 06 [News]
ICLR’22, Virtual, 2022. 04 - Data-centric ML on Graph.
UvA, 2022. 04
PKU, 2023.05
HKUST, 2023.04 - Towards Data-Centric ML.《数据驱动的机器学习》
Apple research, 2022. 06
RUC, 2023.06
SEU, 2023.07
PKU, 2023.08 - valedictorian Speech.《北京大学计算机系2022级毕业生代表致辞》
CS of PKU, 2022. 06 [News] - PaSca: a graph neural architecture search system under the scalable paradigm. 《可扩展性的图神经结构搜索系统》
DGL Team, Amazon, Virtual, 2022.07
CSU, Virtual, 2022. 07
CCF, Virtual, 2022.06 [News] [Slides]
DataFun, Virtual, 2022.06 [Slides]
MLNLP, Virtual, 2022.06 [News][Slides][Video]
InfoQ, Tencent Cloud, Virtual, 2022.06 [News]
WWW’22, Virtual, 2022.04 [Slides]
Data Platform, Tencent, Virtual, 2022.05 - Towards Large-scale Graph Machine Learning. 《大规模图机器学习》 [Doc]
HKUST, Virtual, 2022. 08 (In Preparing)
LOGs, Virtual, 2022. 07 [Video] How to Do Research? 《浅谈科研》
Apple Research, Virtual, 2021.12
PKU, Virtual, 2021.12 [News-1, News-2] [Slides]- The Scalability of Large-scale Graph Machine Learning.《大规模图机器学习的可扩展性》
Tencent Big Data, Virtual, 2022.04
NeurIPS, Virtual, 2021.12
4Paradigm, Virtual, 2021.12
AI Drive, 2021.12 [Video] [News] [Slides] - RIM: Reliable Influence-based Active Learning on Graphs.
NeurIPS, Virtual, 2021.12
NeurIPS MeetUp China, 2021.12 [News] [Slides] A survey of GNN system.《GNN系统调研》
Tencent, Virtual, 2021.12 [Slides]- Graph Attention Multi-Layer Perceptron.《图注意力多层感知器》
DataFun, Virtual, 2021.10 [News] [Slides]