Giter Site home page Giter Site logo

charlesfrye.github.io's People

Contributors

charlesfrye avatar

Stargazers

 avatar

Watchers

 avatar

charlesfrye.github.io's Issues

first-half-comments

Sorry this isn't complete, haven't had as much time as I would have liked. I think the big picture points are most important. Mainly, I think the analogies between the needs/strategies of runtimes and LLMs could be clearer and more frontloaded.

It is clear that large language models --

  • maybe keep it framed as a question? i'm still skeptical at this point, and saying "it is clear" makes me react poorly.

but the builders of those past blocks have many more,

  • awkward wording

  • set up the problem of bottlenecks generally and in terms of both runtimes and LLMs so we can be prepared to take in this strategy
  • or maybe even before this section setup the juxtaposition - the reader needs to grok the analogy/intersections between the two structures before they can appreciate the parallels in strategies/applications.
  • also, i feel like this strategy is even more general than kernels/runtimes... like in liam's group we used this strategy all the time... i would often make more approximate models for online inference and design and then use slower inference with a more accurate or structured model offline. it's not 1:1, but like... not clear me to that this is something that is unique enough to demonstrate that LLMs are a building block of a computer system... this is more general to any computational system (i.e. a program that does an analysis vs a program that is a runtime).

Registers store small amounts of intermediate data

  • are they intermediates? i sort of see them as the representations/instantiations of the data that are actually used for computation. the registers are the things that go into and out of the CPU. they are the actual thing

they can be manipulated -- e.g. by a compiler pass -- without concern

  • i don't know what a compiler pass means exactly so hard for me to take anything away from this. not sure if i'm a representative sample.

How might this pattern show up in LLMs?

  • this is good. this helps the reader prepare to connect ideas. more of this.

..To be continued.

more-shababo-comments

which were precious even before the Browser Nation attacked.

  • isn't it true that memory only became an issue when chrome was released and used one process per tab? this is relevant i think.

Memory must be allocated, and those allocations must be tracked and managed,

  • this opening is great. clearly explains what part of a kerne we are going to apply to an LLM!

used to store the intermediate values of attention computations,

  • i'm immediately curious how these "intermediate" values relate to the register stuff

This cache converts a quadratic-time operation into a linear-time one,

  • a reference would be good here

The vLLM inference server that uses this technique

  • is it crucial here that this LLM is handling two requests at once... would be nice to know how this type of system is setup since that informs what's at issue here - like if these were two LLMs on different machines, they would not have a problem. In this case, it's one LLM process on the CPU or multiple? Curious to get a little bit of context on that to help me see the parallel to the runtime case.

  • 0x0BF4eva!!!!!

akin to the problems of addressing more than one person as your BFF.

  • oh, you had this planned...

https://github.com/charlesfrye/charlesfrye.github.io/blob/60cbe411b102bab34c14bf491b50f0439417fb35/_posts/2023-11-01-llm-cpu.markdown?plain=1#L357C1-L357C13

  • "analogically".... maybe it's a word, i'm not even gonna look it up. sounds like not a word.

The MemGPT pattern uses prompting and careful design of tools

  • for me, this section is by far the one that actually talks about using an LLM like a runtime... somehow this one needs to either to go first, or probably last is best... but it could be used as an opportunity to anchor the whole thing?

This style is associated more with agent patterns

  • not surprising that the most agent like situation is the most runtime like... though i'm still hot on the trail of understanding why i'm not surprised

The LLM is prompted to retrieve the information it needs and then add it to its prompt --

  • also reminds me of caching or speculative processing! maybe more than the previous example(s)

is the event-driven style:

  • boom. this is key. again - this section feels so much more relevant... if you are going to compare and contrast... this section is compare, the others more contrast almost..?

Interruptibility and event-driven-ness are key features of biological agents as well.
To quote Copilot's suggested completion of this paragraph:
"If you poke a cat, it will stop what it's doing and respond to the poke."

  • :)

The event of that suggestion's appearance while drafting this post
interrupted my drafting task, triggering me to reflect on it
and then resume my drafting task with a new idea for what to say next.

  • I like this but worded awkwardly atm

It is clear that if LLMs are to become the cognitive kernels
of agents in any way worthy of the name,
they will need a similar architecture.

  • weeeeeeee
  • i think move this section to the top. give us the goods up front... i'm worried people on the fringes of this convo will not have a hook soon enough. for me, as you clearly know, this part is the hook.

cognitive kernels

  • ❤️

Whether your background is in systems or ML,
you'll find a lot of value in thinking
"across the divide" and your knowledge to the other field.

  • some grammar issue or missing words here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.