Giter Site home page Giter Site logo

aigc_resources's Introduction

Gen AI_Resources

Gather Gen AI most useful tools, materials, publications and reports

Foundation Papers

Title Model Publication Date Code Organization
Attention Is All You Need Transformer Dec 2017 Google
Improving Language Understanding by Generative Pre-Training GPT Jun 2018 OpenAI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Bert May 2019 Google
On the Opportunities and Risks of Foundation Models Jul 2022 Center for Research on Foundation Models (CRFM) & Stanford Institute for Human-Centered Artificial Intelligence (HAI)
Language Models are Unsupervised Multitask Learners GPT-2 Dec 2020 Code OpenAI
Learning Transferable Visual Models From Natural Language Supervision CLIP Feb 2021 Code OpenAI
Evaluating Large Language Models Trained on Code Codex Jul 2021 OpenAI
Competition-Level Code Generation with AlphaCode AlphaCode Feb 2022 DeepMind
Adding Conditional Control to Text-to-Image Diffusion Models ControlNet Feb 2022 Code Stanford University
Codegen: an open large language model for code with multi-turn program synthesis CodeGen March 2022 Code Salesforce

Recent Papers

Title Short Name Date Institution Code (if available)
Training language models to follow instructions with human feedback Instruct GPT March 2022 OpenAI
High-Resolution Image Synthesis with Latent Diffusion Models Stable Diffusion April 2022 Heidelberg University & Runway
Hierarchical Text-Conditional Image Generation with CLIP Latents Dalle 2 April 2022 OpenAI
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback RLHF Jun 2022 Anthropic
Language Models are Few-Shot Learners GPT-3 Jun 2022 OpenAI
WebGPT: Browser-assisted question-answering with human feedback WebGPT Jun 2022 OpenAI
Robust Speech Recognition via Large-Scale Weak Supervision Whisper Sep 2022 OpenAI Code
LLaMA: Open and Efficient Foundation Language Models LLaMA Feb 2023 Meta
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Visual ChatGPT March 2023 Microsoft Code
Consistency Models March 2023 OpenAI Code
Language Is Not All You Need: Aligning Perception with Language Models Aligning March 2023 Microsoft
GPT-4 Technical Report GPT-4 March 2023 OpenAI
BloombergGPT: A Large Language Model for Finance BloombergGPT March 2023 Bloomberg
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face HuggingGPT April 2023 Microsoft Code
Segment Anything SAM April 2023 Meta Code
Instruction Tuning with GPT-4 April 2023 Stanford & Google Code
Generative Agents: Interactive Simulacra of Human Behavior April 2023 Microsoft Code
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca Chinese LLaMa April 2023 Microsoft Code
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling Pythia April 2023 MMany
DINOv2: Learning Robust Visual Features without Supervision DINOv2 April 2023 Meta
Shap·E: Generating Conditional 3D Implicit Functions Shap·E May 2023 OpenAI Code
StarCoder: may the source be with you StarCoder May 2023 Many Code
Large Language Models are Zero-Shot Rankers for Recommender Systems May 2023 Renmin University, Wechat, US San Diego
IMAGEBIND: One Embedding Space To Bind Them All IMAGEBIND May 2023 Meta
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining DoReMi May 2023 Google & Stanford University
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold DragGAN May 2023 Many Code
VOYAGER: An Open-Ended Embodied Agent with Large Language Models VOYAGER May 2023 NVIDIA and Many Code
Large Language Models as Tool Makers May 2023 Deepmind, Princeton University, Stanford University Code
SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions SELF-INSTRUCT May 2023 Many Code
LIMA: Less Is More for Alignment LIMA May 2023 Meta, CMU and Many
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction GPT4Tools May 2023 Tsinghua University and Many Code
SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions SELF-INSTRUCT May 2023 Many Code
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild UniControl May 2023 Salesforce, Stanford University, Northeastern University Code
QLORA: Efficient Finetuning of Quantized LLMs QLORA May 2023 University of Washington Code
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback AlpacaFarm May 2023 Stanford University
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft STEVE-1 June 2023 University of Toronto
MIND2WEB: Towards a Generalist Agent for the Web MIND2WEB June 2023 Ohio State Code
StyleDrop: Text-to-Image Generation in Any Style StyleDrop June 2023 Google Code
Simple and Controllable Music Generation MusicGen June 2023 Meta Code
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 Orca June 2023 Microsoft
TryOnDiffusion: A Tale of Two UNets TryOnDiffusion June 2023 University of Washington, Google
WizardLM: Empowering Large Language Models to Follow Complex Instructions WizardLM June 2023 Microsoft, Peking University Code
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Voicebox June 2023 Meta
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing DragDiffusion June 2023 National University of Singapore & ByteDance
Textbooks Are All You Need phi-1 June 2023 Microsoft
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models VoxPoser June 2023 Stanford University
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning FlashAttention-2 July 2023 Stanford University
Llama 2: Open Foundation and Fine-Tuned Chat Models Llama 2 July 2023 Meta
FACTOOL: Factuality Detection in Generative AI, A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios FACTOOL July 2023 CMU and many Code
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis SDXL July 2023 Stability AI Code
Learning to Model the World with Language Dynalang July 2023 UC Berkeley
Universal and Transferable Adversarial Attacks on Aligned Language Models July 2023 CMU
Frontier AI Regulation: Managing Emerging Risks to Public Safety July 2023 OpenAI
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control RT2 July 2023 Google DeepMind
AgentBench: Evaluating LLMs as Agents AgentBench Aug 2023 Tsinghua U, OSU and UC Berkeley
Retroformer: retrospective large language agents with policy gradient optimization Retroformer Aug 2023 Salesforce
LongloRA: Efficient Fine-Tuning Of Long-Context Large Language Models LongloRA Sep 2023 CUHK , MIT , NVIDIA
CodePlan: Repository-level Coding using LLMs and Planning CodePlan Sep 2023 Microsoft
MemGPT: Towards Llms As Operating Systems MemGPT Oct 2023 UC Berkeley
Self-RAG: Learning To Retrieve, Generate, And Critique Through Self-Reflection Self-RAG Oct 2023 University of Washington,Allen Institute for AI ,IBM
Lcm-Lora: A Universal Stable-Diffusion Acceleration Modul LCM-LoRA Nov 2023 Tsinghua University, HuggingFace
Music Controlnet: Multiple Time-Varying Controls For Music Generation Nov 2023 CMU, Adobe Code
Make Pixels Dance: High-Dynamic Video Generation Nov 2023 ByteDance Code
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models Nov 2023 Google & Many
Orca 2: Teaching Small Language Models How to Reason Orca 2 Nov 2023 Microsoft
GAIA: A Benchmark for General AI Assistants GAIA Nov 2023 FAIR, Meta, HuggingFace, AutoGPT
Llemma: An Open Language Model For Mathematic Llemma Dec 2023 Princeton University & Many
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Mamba Dec 2023 CMU, Princeton University
Magicoder: Source Code Is All You Need Magicoder Dec 2023 UIUC, Tsinghua University
Kandinsky 3.0 Technical Report Kandinsky Dec 2023 Sber AI , AIRI
LLM360: Towards Fully Transparent Open-Source LLMs LLM360 Dec 2023 Petuum,BZUAI, CMU
Docllm: A Layout-Aware Generative Language Model For Multimodal Document Understanding Docllm Dec 2023 JPMorgan
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Jan 2024 Apple
Levels of AGI: Operationalizing Progress on the Path to AGI Jan 2024 Google Deepmind
Mixtral of Experts Jan 2024 Mistral.ai
Trustllm: Trustworthiness In Large Language Model Jan 2024 Many
Tuning Language Models by Proxy Jan 2024 University of Washington
Scalable Pre-training of Large Autoregressive Image Models Jan 2024 Apple Code
Self-Rewarding Language Models Jan 2024 Meta
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation Mobile ALOHA Jan 2024 Stanford Univeristy Code

Important Reports

Report Link Date Institution
Stanford AI index Report 2023 Link Stanford
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Link Microsoft
A Survey of Large Language Models Link April 2023 Renmin University, China & University of Montreal, Canada
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond Link Amazon & many others
A Cookbook of Self-Supervised Learning Link Meta & many others
Let’s Verify Step by Step Link May 2023 OpenAI
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering Link May 2023 Kyung Hee University and many
A Comprehensive Survey on Segment Anything Model for Vision and Beyond Link May 2023 Hong Kong University of Science and Technology and many
On the Design Fundamentals of Diffusion Models: A Survey Link June 2023 Durham University
Open LLM Leaderboard Link Update in real time Huggingface
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering Link May 2023 JKyung Hee University and many
A Survey on Multimodal Large Language Models Link June 2023 CST and many
Recent Advancements in End-to-End Autonomous Driving using Deep Learning: A Survey Link July 2023 IIT
A Survey on Evaluation of Large Language Models Link July 2023 Jilin University
Challenges and Applications of Large Language Models Link July 2023 UCL and many
A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges Link Sep 2023 University of Oxford & Many
Large Language Models in Finance: A Survey Link Sep 2023 Columbia University & Many
A Survey on Video Diffusion Models Link Oct 2023 Fudan University & Many
Learn From Model Beyond Fine-Tuning: A Survey Link Oct 2023 Wuhan University & Many
A Survey on Multimodal Large Language Models for Autonomous Driving Link Nov 2023 Purdue University & Many
Green Edge AI: A Contemporary Survey Link Dec 2023 Nanjing University
Efficient Large Language Models: A Survey Link Dec 2023 Ohio State U
Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives Link Dec 2023 Tsinghua University
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code Link Dec 2023 Shanghai Jiao Tong Universit, Ant Group
A Survey Of Reinforcement Learning From Human Feedback Link Dec 2023 MCML Munich, Germany
Diffusion Models, Image Super-Resolution And Everything: A Survey Link Jan 2024 Many
Exploring Large Language Model based Intelligent Agents: definitions, methods, and prospects Link Jan 2024 The Chinese University of Hong Kong & Many
AI Alignment: A Comprehensive Survey Link Jan 2024 Peking University & Many
Large Language Models for Robotics:Opportunities, Challenges, and Perspectives Link Jan 2024 Many

Important Projects

MidJourney

Alpaca Open Source Code Stanford March 2023

Dolly Open Source Code Databricks March 2023 Note: OK to use commercially

Vicuna Open Source Code UC Berkeley, CMU, Stanford, and UC San Diego March 2023

ChatPDF March 2023

Bard Google March 2023

Langchain Community Effort March 2023

Microsoft 365 Copilot Microsoft March 2023

AutoGPT Community Effort April 2023

Grounded SAM IDEA April 2023

DeepSpeed Chat Microsoft April 2023

AgentGPT Community Effort April 2023

MiniGPT King Abdullah University of Science and Technology April 2023

DeepFloyd IF Stability.ai April 2023

Open Llama Berkeley May 2023

SoftVC VITS Singing Voice Conversion Community May 2023

Falcon Tii May 2023

FinGPT Columbia University June 2023

UltraLM Tsinghua University June 2023

ChatLaw Peking University June 2023

LMFlow HK University of Science and Technology June 2023

GPT Store OpenAI Jan 2024

AIGC Courses

COS597G Understanding Large Language Models Princeton 2022

CS324 Large Language Models Stanford 2023

ChatGPT, LangChain and DS Courses Deeplearning.ai Jun 2023

Large Multimodal Models: Notes on CVPR 2023 Tutorial Microsoft Jun 2023

Very Useful Source Code

OpenAI Cookbook
Llama Index
PrivateGPT
Llama.cpp
petals
FlexGen
Flowise
Candle
ChatGPT Next Web

Main LLM Development Tips, Updated June 21, 2023

1. Data is still king - LLMs are great but if you don't have quality clean data you won’t go far.

2. Smaller models can be just as good as larger general models at specific tasks. And cheaper!

3. Fine-tuning is becoming cheaper.

4. Evaluation of LLMs is very hard - feels very subjective still.

5. Managed APIs are expensive.

6. "Traditional" ML isn't going anywhere.

7. Memory matters - for both serving and training.

8. Information retrieval w/ vector databases is becoming standard pattern.

9. Start w/ prompt engineering and push that to its limits before fine-tuning w/ smaller models.

10. Use agents/chains only when necessary. They are unruly.

11. Latency is critical for a good user experience.

12. Privacy is critical.

Main LLM Development Tips, Updated July 16, 2023

1. Prompt vs fine-tuning: The reliability of prompt engineering is still not enough, it is sensitive to certain prompts, and supervised fine-tuning (sft) remains a stable and efficient method.

2. The ability of open-source models and the gap with GPT-4 still lies in the complexity of the base model. Although the answer styles can be similar, the professional content and reasoning capabilities differ greatly. A better base model is still the key. Llama2 is about to be released and may be commercialized. The evaluation of the abilities of pretrained models is mainly based on a large set of tasks.

3. sft vs ppo: training and using ppo is still difficult, ppo can indeed improve results. There is a lot of academic research, but there are not many commercial applications yet. sft, when combined with good data, can replace ppo in most cases. Fine-tuning indeed improves tool usage and comprehensive summary response capabilities for specific domains.

4. Key points for sft data: diversity, not only the content and perspective of the problems should be diverse, but also the style of questioning. In terms of answers, not only the accuracy and truthfulness of the content are important, but also the format and style should be as expected (for example, clear and organized).

5. Regarding the link between pretraining and fine-tuning, when there is less data in the fine-tuning dataset, you can consider adding some pretraining datasets to increase stability. There are also many stages added between pretraining and fine-tuning, namely continuous pretraining. At this stage, you mainly train the domain you are focusing on, which will increase the recognition of this domain, such as training a dedicated coder, doing data and code mixed data training in pretraining, and doing coding code data training in continuous pretraining.

6. OpenLlama pre-training computing power: hundreds of GPUs, 1-2 months.

7. LangChain _ Vec DB is actually retrieval & tool use.

8. Fine-tuning vs Vec DB, fine-tuning is more about understanding large amounts of information, Vec DB is more about specific data details. There is no conflict between the two. Fine-tuning can consider turning the data retrieved by Vec DB into fine-tuning example data.

9. The optimization of the combination of fine-tuning and Vec DB, by generating related question keywords or sql through llm to retrieve relevant data from the database or knowledge base, then let llm summarize, and then put the entire process into the fine-tuning training dataset, will greatly enhance the effect. The problem of keyword matching can be solved by collecting a dataset of a few thousand examples, which will not be too difficult to teach llm this ability. Keyword matching is crucial for improving the tool use capability of fine-tuning. Pretrained models are not tools, they need to collect a large amount of data for fine-tuning to know that this task needs to search, and that task needs to use a mathematical model.

10. When keyword matching encounters more complex structured problems, llm can generate sql or python to solve it.

11. Many people are doing natural language processing to do data science, that is, llm generates sql to query relational databases, and this scenario can also be fine-tuned.

12. The reasoning, decision-making, and error correction of llm still depend on the base model capability. Generally, there is still a part of such data in the pretraining data of the model, but it has not been specialized. If you are biased towards traditional decision-making problems, you can strengthen this ability by specifically doing rewards and labels in fine-tuning.

13.Multimodal is a key direction. The open-source model community is starting to find that multimodal is not that difficult. The current mainstream method is to add multimodal during fine-tuning (not during pretraining). If you have computing power and data, fine-tuning with multimodal is better, it does not lose information. If it is a picture-to-text method, information will be lost. Diffusion models are still mainstream for generating images, because they have fewer computations and parameters, but autoregressive models have also been proven to have good image generation capabilities (palm e: first tokenize the picture and then use llm to generate the picture), but if the hardware improves further, there may be multimodal generation models that use autoregressive models, the model is simpler.

14. The long-tail problem (medical, autonomous driving and other critical scenarios) can be improved by collecting data similar to long-tail through simulated scenarios, usually data from accidents. For non-real-time scenarios like medical, it may still require human oversight.

Main LLM Development Tips, Updated July 23, 2023

1. For SFT, the quality and diversity of data are key (domain & task), as well as the way of training. Of course, having more data is good when the quality can be assured, but less is more when it can't. Task level generation will affect examples. Task level diversity is important, defining the intention of the instruction. The model's ability has basically been learned during pre-training, alignment is about teaching it the format of communication with the user. During FT, even if you only have a small amount of useful data (like 10), it is best to increase the data to 1,000 (a certain base number), which can give the model a good signal to learn from.


2. Basic observations of LLama 2: lack of code, logical reasoning, mathematical skills, multilingual and multimodal abilities, overdoing the alignment, the ecosystem should form quickly (many efforts are in progress). The 30B has stronger task generalization, the 7B is still relatively weak, and no noticeable progress has been observed in this regard from LLama 1 to 2.


3. The SFT results of LLama 2 are very good, Reject Sampling is helpful, which involves having the previous model generate a sample, ranking them, and then feeding the good ones back.


4. Continuous training in coding may require a guiding process supervision, which makes it easy for the model to learn. Unseen domains are mainly added to the model through additional training, not ft (ft is more about adjusting alignment format). Better base models may require less data for alignment. Direct additional training: code training is easier to fit, the loss will be lower; and additional code training will not affect the model's original natural language ability, if the data is not enough, just go through it a few more times.


5. User query distribution is not in line with the benchmarks we usually use, many companies have user query distribution, which has already created a barrier. During the annotation process, people need to act according to the assumed persona, then the data will be good.


6. Regarding the improvement of LLama 2's reasoning-intensive ability, llama's pre-training and ft did not target this improvement. 2T tokens may still be far from enough, it could be 4T, 5T, or the important data could be viewed several times. There is still a lot of room in LLama 2, saturation has not been seen. And it actually has a lot of knowledge in it, but this knowledge has not been played out, which is a problem of alignment.


7. Data mixsure, continuous training. When continuous training, it is best to mix general data with your domain data for training, which will reduce the probability of destroying the original ability of the model. LLaMA's web data mixture is already excellent, using methods like DoReMi or DRO (Distributional Robust Optimization) does not fill gaps, but simply takes care of the worse ones, not the expected mixture result.


8. SFT, rlhf comparison and combination. SFT is a subset and prelude to rlhf. There are no case-by-case comparative studies on these two yet.


9. MoE increases model capacity without changing latency. In the impossible situation of infinitely increasing model size, it is a way to increase capacity. Adding MoE initially will lose performance, so you need to first train to the original level.


10. One way to deploy Llama 2 into applications, which (llama 2 ) lacks mathematical ability, is to make it an agent, and when necessary, call a model with strong mathematical ability to solve problems.


aigc_resources's People

Contributors

wel3kxial avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.