charlesfrye / charlesfrye.github.io Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 1.0 13.13 MB

My personal website

Home Page: http://charlesfrye.github.io

HTML 31.19% SCSS 66.43% Makefile 2.15% Ruby 0.23%

charlesfrye.github.io's People

Contributors

Stargazers

Watchers

Forkers

standardgalactic

charlesfrye.github.io's Issues

first-half-comments

Sorry this isn't complete, haven't had as much time as I would have liked. I think the big picture points are most important. Mainly, I think the analogies between the needs/strategies of runtimes and LLMs could be clearer and more frontloaded.

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 11 in 60cbe41

It is clear that large language models --

maybe keep it framed as a question? i'm still skeptical at this point, and saying "it is clear" makes me react poorly.

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 17 in 60cbe41

but the builders of those past blocks have many more,

awkward wording

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 38 in 60cbe41

In speculative execution,

set up the problem of bottlenecks generally and in terms of both runtimes and LLMs so we can be prepared to take in this strategy
or maybe even before this section setup the juxtaposition - the reader needs to grok the analogy/intersections between the two structures before they can appreciate the parallels in strategies/applications.
also, i feel like this strategy is even more general than kernels/runtimes... like in liam's group we used this strategy all the time... i would often make more approximate models for online inference and design and then use slower inference with a more accurate or structured model offline. it's not 1:1, but like... not clear me to that this is something that is unique enough to demonstrate that LLMs are a building block of a computer system... this is more general to any computational system (i.e. a program that does an analysis vs a program that is a runtime).

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 142 in 60cbe41

Registers store small amounts of intermediate data

are they intermediates? i sort of see them as the representations/instantiations of the data that are actually used for computation. the registers are the things that go into and out of the CPU. they are the actual thing

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 151 in 60cbe41

they can be manipulated -- e.g. by a compiler pass -- without concern

i don't know what a compiler pass means exactly so hard for me to take anything away from this. not sure if i'm a representative sample.

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 154 in 60cbe41

How might this pattern show up in LLMs?

this is good. this helps the reader prepare to connect ideas. more of this.

..To be continued.

Fixes to Convoluted II

as advantages -> has advantages
Ctaegory -> Category
fix diagram

add opengraph image for llm-cpu post

more-shababo-comments

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 224 in 60cbe41

which were precious even before the Browser Nation attacked.

isn't it true that memory only became an issue when chrome was released and used one process per tab? this is relevant i think.

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 222 in 60cbe41

Memory must be allocated, and those allocations must be tracked and managed,

this opening is great. clearly explains what part of a kerne we are going to apply to an LLM!

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 251 in 60cbe41

used to store the intermediate values of attention computations,

i'm immediately curious how these "intermediate" values relate to the register stuff

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 253 in 60cbe41

This cache converts a quadratic-time operation into a linear-time one,

a reference would be good here

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 290 in 60cbe41

The vLLM inference server that uses this technique

is it crucial here that this LLM is handling two requests at once... would be nice to know how this type of system is setup since that informs what's at issue here - like if these were two LLMs on different machines, they would not have a problem. In this case, it's one LLM process on the CPU or multiple? Curious to get a little bit of context on that to help me see the parallel to the runtime case.

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 301 in 60cbe41

`0x0BFF`

0x0BF4eva!!!!!

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 313 in 60cbe41

akin to the problems of addressing more than one person as your BFF.

oh, you had this planned...

https://github.com/charlesfrye/charlesfrye.github.io/blob/60cbe411b102bab34c14bf491b50f0439417fb35/_posts/2023-11-01-llm-cpu.markdown?plain=1#L357C1-L357C13

"analogically".... maybe it's a word, i'm not even gonna look it up. sounds like not a word.

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 355 in 60cbe41

The MemGPT pattern uses prompting and careful design of tools

for me, this section is by far the one that actually talks about using an LLM like a runtime... somehow this one needs to either to go first, or probably last is best... but it could be used as an opportunity to anchor the whole thing?

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 374 in 60cbe41

This style is associated more with agent patterns

not surprising that the most agent like situation is the most runtime like... though i'm still hot on the trail of understanding why i'm not surprised

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 378 in 60cbe41

    
           The LLM is prompted to retrieve the information it needs and then add it to its prompt --

also reminds me of caching or speculative processing! maybe more than the previous example(s)

charlesfrye.github.io/_posts/2023-11-01-llm-cpu.markdown

Line 393 in 60cbe41

is the event-driven style:

boom. this is key. again - this section feels so much more relevant... if you are going to compare and contrast... this section is compare, the others more contrast almost..?

Interruptibility and event-driven-ness are key features of biological agents as well.
To quote Copilot's suggested completion of this paragraph:
"If you poke a cat, it will stop what it's doing and respond to the poke."

The event of that suggestion's appearance while drafting this post
interrupted my drafting task, triggering me to reflect on it
and then resume my drafting task with a new idea for what to say next.

I like this but worded awkwardly atm

It is clear that if LLMs are to become the cognitive kernels
of agents in any way worthy of the name,
they will need a similar architecture.

weeeeeeee
i think move this section to the top. give us the goods up front... i'm worried people on the fringes of this convo will not have a hook soon enough. for me, as you clearly know, this part is the hook.

cognitive kernels

❤️

Whether your background is in systems or ML,
you'll find a lot of value in thinking
"across the divide" and your knowledge to the other field.

some grammar issue or missing words here?

update date/filename of llm-cpu post

add acknowledgements to llm-cpu post

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.