Giter Site home page Giter Site logo

grepx's Introduction

grepx

This repository will serve as a collection of my findings, useful concepts, and coding practices related to building applications using Large Language Models (LLMs) and knowledge graphs. It will include several mini-projects, each focused on different aspects of these technologies, providing practical insights and solutions.

Evolution of Neural Networks architecture to Transfomers

Neural networks have evolved significantly:

CNNs: Used for image recognition by capturing spatial hierarchies. RNNs: Handled sequential data but struggled with long-range dependencies. Seq2Seq: Improved translation tasks but still limited by sequence length. Deep Belief Networks: Introduced unsupervised learning. BC/GAIL: Enhanced policy learning in reinforcement learning. Transformers: Revolutionized by enabling parallel processing and effective long-range dependency handling through self-attention.

Transformers, introduced by Google in 2017, revolutionized neural networks by using self-attention mechanisms instead of RNNs or CNNs. This allows them to handle long-range dependencies more effectively, process data in parallel, and significantly boost performance in NLP tasks. The core idea is to use attention mechanisms to weigh the importance of different input tokens, improving tasks like translation, summarization, and question-answering. Their efficiency and accuracy have made Transformers the foundation for state-of-the-art models like BERT and GPT.

Traditional models like RNNs struggle with relationships between elements in a sequence because they process data sequentially, making it hard to capture dependencies over long distances. Transformers handle these dependencies more effectively by using attention mechanisms, allowing them to weigh the importance of all elements in a sequence simultaneously, regardless of their position. This results in better performance on tasks that require understanding context over long spans, such as language translation and text summarization.

Architecture of transformers

Project Overview

Example of how this architecture works:

Original Text: "I am eating apple and then will go to sport, since it is healthy, what is problem with you?"

Step-by-Step Transformer Processing:

Tokenization:

Tokens: ๐ผ,๐‘Ž๐‘š,๐‘’๐‘Ž๐‘ก๐‘–๐‘›,๐‘Ž๐‘๐‘๐‘™๐‘’,๐‘Ž๐‘›๐‘‘,๐‘กโ„Ž๐‘’๐‘›,๐‘ค๐‘–๐‘™๐‘™,๐‘”๐‘œ,๐‘ก๐‘œ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘ก,๐‘ ๐‘–๐‘›๐‘๐‘’,๐‘–๐‘ก,๐‘–๐‘ ,โ„Ž๐‘’๐‘Ž๐‘™๐‘กโ„Ž๐‘ฆ,๐‘คโ„Ž๐‘Ž๐‘ก,๐‘–๐‘ ,it?

Embedding:

Convert tokens into dense vectors. E.g., "I" -> [0.1, 0.3, ...], "am" -> [0.2, 0.4, ...] Positional Encoding:

Add positional information to each embedding. E.g., Position 0: [0.1+sin(0), 0.3+cos(0), ...], Position 1: [0.2+sin(1), 0.4+cos(1), ...] Self-Attention Mechanism:

Compute Query, Key, Value vectors for each token. Dot Product of Queries and Keys: Calculate scores for attention. E.g., Score("I", "am") = Query("I") ยท Key("am") Apply Softmax: Normalize scores. E.g., Softmax([Score("I", "I"), Score("I", "am"), ...]) Weighted Sum of Values: Compute new representations. E.g., Attention("I") = Sum(Value vectors weighted by attention scores) Feed-Forward Layers:

Pass attention outputs through fully connected layers. E.g., Dense layers refine the contextual representation of each word. Decoder:

Uses encoded input to generate the output sequence. Each step attends to previous outputs and encoder's output. Generates "I will go to the sport because it is healthy" or other contextual understanding.

Why One-Hot Encoding Falls Short and Embedding is the Solution

One-hot encoding represents words as sparse vectors with high dimensionality, where each word is a unique vector with a single high value. This method has significant limitations:

High Dimensionality: One-hot vectors can become extremely large, especially for languages with extensive vocabularies, leading to increased computational requirements. No Semantic Information: One-hot encoding treats each word as completely independent, ignoring any semantic relationships between words. Sparse Representations: The vectors are sparse, containing mostly zeros, which can be inefficient for many machine learning algorithms. Advantages of Embeddings Embeddings, such as Word2Vec or GloVe, address these issues by representing words in dense, lower-dimensional vectors. These vectors capture semantic relationships between words, allowing for more efficient and meaningful computations. For instance, in an embedding space, similar words are located closer together, enabling models to better understand context and improve overall performance in natural language processing tasks.

Different types of LLM apps for AI engineers

LLM apps

LLM apps

Agentic Reasoning design pattern

LLM apps

LLM apps

flow engineering

Alphacodium

Nice links:

llm story

LLM history from zero to now: https://fullstackdeeplearning.com/llm-bootcamp/spring-2023/llm-foundations/

Challenges with LLM applications: https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/free_courses/Applied_LLMs_Mastery_2024/week1_part1_foundations.md

prompt engineering

Video training for prompt engineering: https://learn.deeplearning.ai/courses/chatgpt-prompt-eng/lesson/1/introduction

Prompt engineering: https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/free_courses/Applied_LLMs_Mastery_2024/week2_prompting.md

RAG

LLM apps

Old School RAG with Vector DB

RAG app with vector db: https://learn.deeplearning.ai/courses/building-applications-vector-databases/lesson/1/introduction

Fine-tuning

fine tuning course: https://learn.deeplearning.ai/courses/finetuning-large-language-models/lesson/6/training-

fine tuning review: https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/free_courses/Applied_LLMs_Mastery_2024/week3_finetuning_llms.md

nice link for fine-tuning: https://www.youtube.com/watch?v=eC6Hd1hFvos

RAG with Langchain: https://learn.deeplearning.ai/courses/langchain/lesson/5/question-and-answer

Langchain vs Llamaindex

LLM apps

grepx's People

Contributors

alishams8 avatar

Watchers

Lucian avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.