svpino / llm Goto Github PK

A bunch of experiments using Large Language Models

Jupyter Notebook 99.97% Python 0.03%

llm's Introduction

Hey! 👋

My name is Santiago, and I'm a Computer Scientist focusing on applied Machine Learning. I have more than two decades of experience building software to solve exciting and —sometimes— hard problems.

I care deeply about unlocking the power of technology for individuals and businesses, so they can use computers in a way that was previously unrealistic for them. I find joy in ambiguity and feel more engaged when working on problems that can't be solved by merely searching the web or reading a book. Bonus points if I can collaborate with a team of like-minded engineers!

This is where I've worked during the last 10 years

DIRECTOR OF COMPUTER VISION SOLUTIONS

Levatas. Oct 2018 - Present

I lead a small team of software developers and machine learning engineers in the development of Levatas' flagship product, a platform to augment the productization of machine learning models with human reviews, where I'm the main contributor to the core Python engine. Other than that, the majority of my work centers around building solutions to help operationalize machine learning models running on AWS.

DIRECTOR OF ENGINEERING | ENGINEER MANAGER | SENIOR SOFTWARE ENGINEER

Levatas. Oct 2010 – Oct 2018.

I joined Levatas in 2010 as a Senior Software Engineer, moved to Engineer Manager three years later, before becoming the director of the department in May 2016. During this time, I had the opportunity to help deliver software solutions for companies like IBM, Dell, and HSBC. I moved through the entire development stack and touched projects every step of the way, from ideation to final delivery.

My education

MASTER OF SCIENCE (M.Sc.), COMPUTER SCIENCE

Machine Learning Specialization. Georgia Institute of Technology, 2019. 4.0 GPA

I was a Teaching Assistant of Graduate Algorithms during two semesters. Coursework: Software Architecture and Design, Software Development Process, Algorithms, Software Testing, Machine Learning, Reinforcement Learning, Database Systems, Computer Vision, Human-Computer Interaction.

BACHELOR OF SCIENCE (B.S.), INFORMATION TECHNOLOGY

University of Camaguey, Cuba. 2004. 3.84 GPA

Coursework: Software Engineering, Data Structures and Algorithms, Database Design, File Structures, Artificial Intelligence, Information Technology.

OTHER CERTIFICATIONS

I've also accumulated other certifications from 2001 to 2015 covering OO Concepts, Java, C, JavaScript, and C# development.

My most relevant technical skills

These are the highlights of the skills I consider to be the keystone of my abilities:

I have a lot of experience designing and architecting systems of different sizes and complexity.
I've become an excellent Python 🐍 developer. And this happened after years dedicated to Java.
I have a lot of experience with Machine Learning using TensorFlow.
I have been focusing mainly on AWS. It's the place where most of my work goes to serve its purpose.
I have substantial experience dealing with relational databases (mainly MySQL), and non-relational ones (Amazon's DynamoDB, Google's Firestore, and MongoDB.)
I've done a lot of front-end development, and at some point, I was pretty good using Angular.

Some of the things I've built

It's hard to decide what things should make it into this list, so I'm opting for a combination of private and public projects where I've participated over the last ten years.

A Python library that orchestrates a workflow of images between different services deployed in AWS.
An application that connects to Spot's cameras, and makes the robot react to visual clues.
A process using OpenCV and TensorFlow to analyze a video feed and flag break-ins into an amusement park.
A library to generate RSS 2.0 feeds in Python.
A project to run TensorFlow Object Detection models on SageMaker.
A very simple and fully responsive file system-based blog engine.
Some really cool and interesting projects during my Masters that have become popular solutions to their respective problems.
A full Android application to follow stock tickers from different markets worldwide.

Other accomplishments

I love to write. I contribute articles about Python to the Real Python website.
I won a bunch of medals competing nationally on Computer Science. Since that time I've loved algorithms and data structures
I have a beautiful family. For sure my best accomplishment by far.

📫 You can find me on Twitter or LinkedIn.

llm's People

Contributors

Stargazers

Watchers

Forkers

diogodsa tomaslucas maheshrajamanikurup mazon1 krishnatray nina-tan hbcbh1999 gaylordcc kitranet randhawp rayhunter msadiq365 khchine5 jinlee-m darvinrivera techthiyanes sebastiancerquera noninfelix roysh sathishjos76 bouha07 apache-jw mavitu56 orefaleoluwayinka glaceage mdwoicke fmassicano whatif-dev temic137 ali121300 mekongdelta-mind bodonkey psuryachaitanya jiangcongtao anorod trombone1 samrudh ybkangster sorokinvld gmavaliani kukejas dhanushb2000 rich-data-f liston piyras23 kennylaw02 zhadraoui mikeharding antkarv laxmi-narayana-chilakala rckumar alialemimatinpour saldistefano sergiogama prachapratik leonardleonard b8heng aureart clintthomas4 meirankri sbenda bakk21 hatemabusadaa mani080 lawanfalalu ammar-alnagar deividsoncs chas-mellish

llm's Issues

[Not an issue] Would like to know which tool is used to create the diagrams

Hi Santiago,

Thank you for the very insightful guide. I have a quick question that is not related to the code: Which tool do you used to generate the diagrams here, and also the ones in the notebooks, if you don't might sharing? I find them to be really neat and would like to create something similar for my project if possible. At first glance, the diagrams look like they are built using excalidraw, but not quite. Of course you are not comfortable with sharing this info, it's okay too.

Thank you again, and looking forward to more of your guides and courses!

TypeError: eval() arg 1 must be a string, bytes or code object

why I'm getting this error
for this cell
if os.path.exists("embedded_sentences.csv"):
dataset= pd.read_csv("embedded_sentences.csv")
dataset["embedding"]=dataset.apply(eval).apply(np.array)
else:
dataset['embedding']=dataset['sentence'].apply(get_embedding)
dataset.to_csv("embedded_sentences.csv",index=False)

Issue with DocArrayInMemoryStore

I've created the virtual environment, activated it, and installed the requirements. Is this a python version error or in need of a different package version?

Python Version: 3.12.2
pip version: 24.0

When attempting to run this part of the notebook:

from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(pages, embedding=embeddings)

I'm getting the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[9], line 3
      1 from langchain_community.vectorstores import DocArrayInMemorySearch
----> 3 vectorstore = DocArrayInMemorySearch.from_documents(pages, embedding=embeddings)

File ~/Development/llm-local/.venv/lib/python3.12/site-packages/langchain_core/vectorstores.py:528, in VectorStore.from_documents(cls, documents, embedding, **kwargs)
    526 texts = [d.page_content for d in documents]
    527 metadatas = [d.metadata for d in documents]
--> 528 return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

File ~/Development/llm-local/.venv/lib/python3.12/site-packages/langchain_community/vectorstores/docarray/in_memory.py:68, in DocArrayInMemorySearch.from_texts(cls, texts, embedding, metadatas, **kwargs)
     46 @classmethod
     47 def from_texts(
     48     cls,
   (...)
     52     **kwargs: Any,
     53 ) -> DocArrayInMemorySearch:
     54     """Create an DocArrayInMemorySearch store and insert data.
     55 
     56     Args:
   (...)
     66         DocArrayInMemorySearch Vector Store
     67     """
---> 68     store = cls.from_params(embedding, **kwargs)
...
---> 46         return Generic.__class_getitem__.__func__(cls, item)  # type: ignore
     47         # this do nothing that checking that item is valid type var or str
     48     if not issubclass(item, BaseDoc):

AttributeError: 'builtin_function_or_method' object has no attribute '__func__'
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

[Not an issue] Using giskard with LLaMA3 as opposed to OpenAI (default)

Hi Santiago, thanks for the tutorials and the resources! Wanted to ask you for help in using giskard to evaluate LLaMA3 model (or any open source model), I am stuck at the KnowledgeBase part as I haven't gotten around how to specify the model to giskard. Thanks again!

DocArrayInMemorySearch -> similarity_search_with_score returns incorrect results with Ollama

Hi, thanks for your videos and the code.

I'm trying to advance with your 2 vids for this repo and the youtube-rag one.

For some reason I can't make the DocArrayInMemorySearch work correctly with the basic example. Here's the simplified code merged from both repos:

import os
from dotenv import load_dotenv
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.llms import Ollama

load_dotenv()

# Initialize Ollama embeddings
MODEL = "llama3"
model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

# Create a list of documents
vectorstore1 = DocArrayInMemorySearch.from_texts(
    [
        "Mary's sister is Susana",
        "John and Tommy are brothers",
        "Patricia likes white cars",
        "Pedro's mother is a teacher",
        "Lucia drives an Audi",
        "Mary has two siblings",
    ],
    embedding=embeddings,
)

print(vectorstore1.similarity_search_with_score(query="Who is Mary's sister?", k=6))

The results are as follows (split with new lines for easier reading):

[
    (Document(page_content="Pedro's mother is a teacher"), 0.4350104103848039),
    (Document(page_content='Mary has two siblings'), 0.43119987668775467),
    (Document(page_content='John and Tommy are brothers'), 0.41273142441302735),
    (Document(page_content='Patricia likes white cars'), 0.3569403395446856),
    (Document(page_content="Mary's sister is Susana"), 0.3464697744599006),
    (Document(page_content='Lucia drives an Audi'), 0.22815817605634237)
]

Why the similarity score and the order don't match what is shown in your videos and the notebooks? I.e. (Document(page_content="Mary's sister is Susana"), 0.3464697744599006) is the 5th result when it should be the first one.

I'm using python 3.10.12.

Thanks!

No module named 'langchain_openai'

i piped installed langchain openai and did everything as you did but I keep getting this error
ModuleNotFoundError: No module named 'langchain_openai'
it seems that there are a problem in importing langchain
any solution ??