Name: THUKEG
Type: Organization
Bio: ChatGLM, GLM-4, CogVLM, Agent, CodeGeeX, CogView, CogVideo | CogDL, GraphMAE, AMiner | Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University
Twitter: thukeg
Location: FIT Building, Tsinghua University
Blog: https://huggingface.co/THUDM
THUKEG's Projects
LongAlign: A Recipe for Long Context Alignment Encompassing Data, Training, and Evaluation
[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
LVBench: An Extreme Long Video Understanding Benchmark
Official Pytorch Implementation for MathGLM
Source code and dataset for KDD 2020 paper "Understanding Negative Sampling in Graph Representation Learning"
Ongoing research training transformer models at scale
MRT: Tracing the Evolution of Scientific Publications (TKDE 2021)
MSAGPT
The multilingual variant of GLM, a general language model trained with autoregressive blank infilling objective
Paper list of NLP for recommender systems
Source code and dataset for KDD 2019 paper "OAG: Toward Linking Large-scale Heterogeneous Entity Graphs"
A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)
pix2struct version of open_clip
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.
An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks
A Pre-training Framework for Adaptive Learning in MOOCs
Source code and dataset for IJCAI 2019 paper "ProNE: Fast and Scalable Network Representation Learning"
Protein Language Model
RecDCL: Dual Contrastive Learning for Recommendation (WWW'24, Oral)
Source code and dataset for TKDE'22 paper "Region or Global? A Principle for Negative Sampling in Graph-based Recommendation"
The official implementation of "Relay Diffusion: Unifying diffusion process across resolutions for image synthesis" [ICLR 2024 Spotlight]