tairov / llama2.mojo Goto Github PK
View Code? Open in Web Editor NEWInference Llama 2 in one file of pure 🔥
Home Page: https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
License: MIT License
Inference Llama 2 in one file of pure 🔥
Home Page: https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
License: MIT License
On Mac M1, build
from read import BufReader, File
^
mojo: error: failed to parse the provided Mojo
version:
% mojo --version
mojo 0.4.0 (9e33b013)
Hi there,
awesome port and demonstrator. Have you compared the performance of vectorize
and vectorize_unroll
?
While tinkering around with demanding algos I saw that unrolling the partial loop 12x I got a 10% performance increase. Maybe enough to beat cpp? 😁
Hi,
The HuggingFace demo of this project is not working. It says:
Build failed with exit code: 1
...
--> ERROR: process "/bin/sh -c curl https://get.modular.com | MODULAR_AUTH=$AUTH_KEY sh - && modular install mojo" did not complete successfully: exit code: 1
The fast llama2 reasoning is really helpful to us, but how to add a RestAPI to mojo, preferably one that is compatible with the openai interface? I don't know much about mojo. I know how to use python flask. Can you help me?
Hi
I'm following the instructions to get started and whenever I run:
mojo llama2.mojo stories15M.bin -s 100 -n 256 -t 0.5 -i "Mojo is a language"
it crashes with the following info:
num parallel workers: 8 SIMD width: 16
Please submit a bug report to https://github.com/modularml/mojo/issues and include the crash backtrace along with all the relevant source codes.
Stack dump:
0. Program arguments: mojo llama2.mojo stories15M.bin -s 100 -n 256 -t 0.5 -i "Mojo is a language"
#0 0x0000000102becfd8 llvm_strlcpy (~/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x1000ccfd8)
#1 0x0000000102beb138 llvm_strlcpy (~/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x1000cb138)
#2 0x0000000102bed678 llvm_strlcpy (~/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x1000cd678)
#3 0x0000000180a69a24 (/usr/lib/system/libsystem_platform.dylib+0x18046da24)
#4 0x000000028000e7a0
#5 0x0000000102f8a5a8 llvm_strlcpy (~/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x10046a5a8)
#6 0x0000000102b41e94 _mh_execute_header (~/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x100021e94)
#7 0x0000000102b25bd0 _mh_execute_header (~/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x100005bd0)
#8 0x00000001806b90e0
[15981:92870:20240318,112922.885502:WARNING crash_report_exception_handler.cc:257] UniversalExceptionRaise: (os/kern) failure (5)
zsh: segmentation fault mojo llama2.mojo stories15M.bin -s 100 -n 256 -t 0.5 -i "Mojo is a language"
System info
Macbook Pro, 14inch, 2021. macOS:14.2.1 (23C71)
mojo 24.1.0 (55ec12d6)
modular 0.5.2 (6b3a04fd)
I found this interesting project via the 'AI Anywhere' channel on YouTube. I've installed Modular and Mojo, and successfully run your test on an under powered mini computer with only a 1.5GHz 4 core Intel Celeron cpu, running Ubuntu 20.04.6, and this achieved 32.5 tok/s.
I'm an LLM newbie so my questions may appear stupid!! Can this project be run with other models?
I tried the following:
mojo llama2.mojo /home/ezyweb/Public/chatpdf1/models/llama-2-7b-chat.Q4_K_M.gguf -s 100 -n 256 -t 0.5 -i "What is Llama 2"
And got the result:
num hardware threads: 4 SIMD vector width: 8 checkpoint size: 4081004224 [ 3891 MB ] Killed
Is that likely an under resourced hardware issue or is the project not compatible with .gguf models?
> MOJO_PYTHON_LIBRARY="/Users/shroominic/dev/miniforge3/lib" mojo llama2.mojo stories110M.bin -i "hello"
num parallel workers: 10 SIMD width: 8
Stack dump:
0. Program arguments: mojo llama2.mojo stories110M.bin -i hello
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 mojo 0x0000000100f79990 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 56
1 mojo 0x0000000100f77af0 llvm::sys::RunSignalHandlers() + 112
2 mojo 0x0000000100f7a02c SignalHandler(int) + 344
3 libsystem_platform.dylib 0x000000018dd06a24 _sigtramp + 56
4 libsystem_platform.dylib 0x0000000280058ac8 _sigtramp + 4063568092
5 libsystem_platform.dylib 0x000000028005807c _sigtramp + 4063565456
6 mojo 0x00000001012ca24c M::KGEN::ExecutionEngine::runProgram(llvm::StringRef, llvm::StringRef, llvm::function_ref<M::ErrorOrSuccess (void*)>) + 1156
7 mojo 0x0000000100ed3c64 run(M::State const&) + 3980
8 mojo 0x0000000100ebcb2c main + 1672
9 dyld 0x000000018d97ff28 start + 2236
[79626:9834787:20231019,201441.806415:WARNING crash_report_exception_handler.cc:257] UniversalExceptionRaise: (os/kern) failure (5)
[1] 79624 segmentation fault MOJO_PYTHON_LIBRARY="/Users/shroominic/dev/miniforge3/lib" mojo llama2.mojo -i
Not sure what this all means but I am just trying to run this on my MacBook Pro based on the README.md ...
I installed mojo and I am able to run basic hello world scripts and I've put the path of my conda base env.
Here another try with LLVM Symbolizer and the smaller model:
Stack dump:
0. Program arguments: mojo llama2.mojo stories15M.bin -s 42 -m 256 -t 0.5 -i hello -z tokenizer.bin
#0 0x0000000100729990 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Users/shroominic/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x1000c5990)
#1 0x0000000100727af0 llvm::sys::RunSignalHandlers() (/Users/shroominic/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x1000c3af0)
#2 0x000000010072a02c SignalHandler(int) (/Users/shroominic/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x1000c602c)
#3 0x000000018dd06a24 (/usr/lib/system/libsystem_platform.dylib+0x18042ea24)
#4 0x0000000280052c7c
#5 0x0000000280051fc4
#6 0x0000000100a7a24c M::KGEN::ExecutionEngine::runProgram(llvm::StringRef, llvm::StringRef, llvm::function_ref<M::ErrorOrSuccess (void*)>) (/Users/shroominic/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x10041624c)
#7 0x0000000100683c64 run(M::State const&) (/Users/shroominic/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x10001fc64)
#8 0x000000010066cb2c main (/Users/shroominic/.modular/pkg/packages.modular.com_mojo/bin/mojo+0x100008b2c)
#9 0x000000018d97ff28
[35841:10187750:20231020,185712.713361:WARNING crash_report_exception_handler.cc:257] UniversalExceptionRaise: (os/kern) failure (5)
[1] 35839 segmentation fault MOJO_PYTHON_LIBRARY= LLVM_SYMBOLIZER_PATH= mojo llama2.mojo stories15M.bin -s
Do you have the plan to support InternLM-7B & InternLM-20B?
(https://github.com/InternLM/InternLM)
We'd love to provide technical support and other forms of assistance where needed.
Thanks!
The error occurs when running llama.mojo
My environment: Python 3.10(ubuntu22.04)
console imformation bellow:
mojo llama2.mojo stories15M.bin -s 100 -n 256 -t 0.5 -i "Llama is an animal"
num hardware threads: 192 SIMD vector width: 16
Unhandled exception caught during execution: An error occurred in Python.
mojo: error: execution exited with a non-zero result: 1
Hi,
Maybe a stupid question, I couldn't find a way to execute shelll cmd in mojo player ground's console or notebook. How did you manage to run this project in the mojo playground?
Thanks!
Hi, really impressive work this.
Unfortunately,I couldn't run your code.
Your code is execute read 'tokenizer.bin'
but it isn't provided.
Please tell me where is that.
Hi @tairov,
I love your work so much and thank you for your contribution.
I think mojo is very promising in the near future. And I am wondering that do you have any plan to port Stable Diffusion models to mojo? Or do you know someone is currently do this?
Best,
Linh
Ok, I installed mojo, cloned your repo and run the test. It works, congrats! But how all of this relates to LLAMA? Nothing happened when I was trying to run the LLAMA2 itself:
alex@NLDW4-5-20-11:~/ai/llama2.mojo$ mojo llama2.mojo /ai/llama.cpp/models/ggml-model-q4_1.bin -s 100 -n 256 -t 0.5 -i "Llama is an animal"/ai/llama2.mojo$ mojo llama2.mojo ~/ai/llama.cpp/models/ggml-model-q4_1.bin -s 100 -n 256 -t 4 -i "Llama is an animal"
num hardware threads: 12
SIMD vector width: 16
checkpoint size: 4238459520
Killed
alex@NLDW4-5-20-11:
num hardware threads: 12
SIMD vector width: 16
checkpoint size: 4238459520
Killed
I don't know what does it mean -t 0.5 (I suppose threads), I've been trying -t 4 and again without results.
The the clue here is how to run LLAMA 2 using this new language called MOJO. And if you made a MOJO wrapper for the LLAMA/LLAMA2 models, please provide the instruction on how to run the model using this wrapper.
Thank you.
Thanks for your fatastic project. For curiosity, I tried to build it as binary. It seems to be built at first. But it didn't work. It showed a message like set python path. But after I set its environment variable, a segmentation falut error occurred. I think it came from mojo builder, maybe. My enviroment is wsl in Windows 11.
I had some errors when I tried to run the Docker version. When I fix the first, the second appears:
I'd submit a pull request with the changes that works to me: #70
My Setup:
Replicating Steps:
Inside Repo Directory:
docker build --build-arg AUTH_KEY=MY_MODULAR_KEY -t llama2.mojo .
Terminal Print (Error 1):
17.15 Setting up modular (0.2.1) ...
17.16 Processing triggers for libc-bin (2.31-0ubuntu9.12) ...
17.18 sh: 80: [[: not found
17.18 __ __ _ _
17.18 | \/ | ___ __| |_ _| | __ _ _ __
17.18 | |\/| |/ _ \ / _` | | | | |/ _` | '__|
17.18 | | | | (_) | (_| | |_| | | (_| | |
17.18 |_| |_|\___/ \__,_|\__,_|_|\__,_|_|
17.18
17.18 Welcome to the Modular CLI!
17.18 For info about this tool, type "modular --help".
17.18
17.18 To install Mojo🔥, type "modular install mojo".
17.18
17.18 For Mojo documentation, see https://docs.modular.com/mojo.
17.18 To chat on Discord, visit https://discord.gg/modular.
17.18 To report issues, go to https://github.com/modularml/mojo/issues.
21.66 modular: error: please run `modular auth` before attempting to install a package
------
Dockerfile:52
--------------------
51 |
52 | >>> RUN curl https://get.modular.com | MODULAR_AUTH=$AUTH_KEY sh - \
53 | >>> && modular install mojo
54 |
--------------------
ERROR: failed to solve: process "/bin/sh -c curl https://get.modular.com | MODULAR_AUTH=$AUTH_KEY sh - && modular install mojo" did not complete successfully: exit code: 1
Terminal Print (Error 2):
=> ERROR [ 6/15] RUN modular install mojo 40.6s
------
> [ 6/15] RUN modular install mojo:
40.33 The virtual environment was not created successfully because ensurepip is not
40.33 available. On Debian/Ubuntu systems, you need to install the python3-venv
40.33 package using the following command.
40.33
40.33 apt install python3.8-venv
40.33
40.33 You may need to use sudo with that command. After installing the python3-venv
40.33 package, recreate your virtual environment.
40.33
40.33 Failing command: ['/home/user/.modular/pkg/packages.modular.com_mojo/venv/bin/python3', '-Im', 'ensurepip', '--upgrade', '--default-pip']
40.33
40.56 modular: error: failed to run python:
40.56 # Found release for https://packages.modular.com/mojo @ 0.4.0
40.56 # Installing to /home/user/.modular/pkg/packages.modular.com_mojo
40.56 # Downloading artifacts. Please wait...
40.56 # Downloads complete, setting configs...
40.56 # Configs complete, running post-install hooks...
------
Dockerfile:54
--------------------
52 | RUN curl https://get.modular.com | MODULAR_AUTH=$AUTH_KEY sh -
53 | RUN modular auth $AUTH_KEY
54 | >>> RUN modular install mojo
55 |
56 | RUN useradd -m -u 1000 user
--------------------
ERROR: failed to solve: process "/bin/sh -c modular install mojo" did not complete successfully: exit code: 1
#Unhandled exception caught during execution: String is not convertible to integer.
rng_seed = atol(args[i + 1])
if args[i] == "-i":
prompt = args[i + 1]
rng_seed = atol(args[i + 1]) #line for if args[i] == "-s":
as title
I am also working on a port of llama2 to mojo. Your port is excellent, just do it myself for the sake of learning, probably sticking closer to the C source, will see how it goes. Just struggling with the tokenizer in the Andrej's C code and taking a look at your code, i wonder if there is a probably not problematic memory leak in the str_concat method, cant see right now that the memory allocated there is freed at one pint ... i might be completely wrong, just thought to drop you a line.
With mojo 24.2.1 (58157dc0)
in Google CoLab env
Running
mojo llama2.mojo stories15M.bin -s 100 -n 256 -t 0.5 -i "Mojo is a language"
yields:
/root/llama2.mojo/llama2.mojo:2:47: error: package 'algorithm' does not contain 'unroll'
from algorithm import vectorize, parallelize, unroll
^
/root/llama2.mojo/llama2.mojo:173:18: error: no matching function in call to 'memcpy'
memcpy[UInt8](str, s1, l1)
~~~~~~~~~~~~~^~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: expected at most 2 positional arguments, got 3
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: expected at most 2 positional arguments, got 3
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: callee expects 0 parameters, but 1 was specified
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: failed to infer implicit parameter 'type' of argument 'dest' type 'DTypePointer'
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:174:18: error: no matching function in call to 'memcpy'
memcpy[UInt8](str.offset(l1), s2, l2)
~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: expected at most 2 positional arguments, got 3
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: expected at most 2 positional arguments, got 3
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: callee expects 0 parameters, but 1 was specified
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: failed to infer implicit parameter 'type' of argument 'dest' type 'DTypePointer'
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:208:49: error: use of unknown declaration 'DynamicVector', 'fn' declarations require explicit variable declarations
inout array: PointerStrings, inout indices: DynamicVector[Int], low: Int, high: Int
^~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:212:31: error: unexpected token in expression
for jj in range(low, high):
^
/root/llama2.mojo/llama2.mojo:212:31: error: statements must start at the beginning of a line
for jj in range(low, high):
^
/root/llama2.mojo/llama2.mojo:236:49: error: use of unknown declaration 'DynamicVector', 'fn' declarations require explicit variable declarations
inout array: PointerStrings, inout indices: DynamicVector[Int], low: Int, high: Int
^~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:296:25: error: use of unknown declaration 'DynamicVector'
var sorted_indices: DynamicVector[Int]
^~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:517:10: error: 'Tensor[f32]' value has no attribute 'simd_store'
a.simd_store[_nelts](j, a.simd_load[_nelts](j) + b.simd_load[_nelts](j))
~^~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:531:35: error: 'DTypePointer[f32, 0]' value has no attribute 'simd_load'
tmp.accumulate(x.offset(j).simd_load[_nelts](0) ** 2)
~~~~~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:542:25: error: 'DTypePointer[f32, 0]' value has no attribute 'simd_load'
var val = weight.simd_load[_nelts](j) * ss * x.simd_load[_nelts](j)
~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:542:55: error: 'DTypePointer[f32, 0]' value has no attribute 'simd_load'
var val = weight.simd_load[_nelts](j) * ss * x.simd_load[_nelts](j)
~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:543:20: error: 'DTypePointer[f32, 0]' value has no attribute 'simd_store'
o.offset(j).simd_store[_nelts](0, val)
~~~~~~~~~~~^~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:569:20: error: 'Tensor[f32]' value has no attribute 'simd_load'
var val = x.simd_load[_nelts](start + ii).reduce_max()
~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:579:29: error: 'Tensor[f32]' value has no attribute 'simd_load'
var val = math.exp(x.simd_load[_nelts](start + ii) - max_val)
~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:580:10: error: 'Tensor[f32]' value has no attribute 'simd_store'
x.simd_store[_nelts](start + ii, val)
~^~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:589:10: error: 'Tensor[f32]' value has no attribute 'simd_store'
x.simd_store[_nelts](start + ii, x.simd_load[_nelts](start + ii) / ssum)
~^~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:598:20: error: 'StaticTuple' parameter #0 has 'AnyRegType' type, but value has type 'Int'
C: StaticTuple[n, BufferPtrFloat32],
^
/root/llama2.mojo/llama2.mojo:1:1: note: 'StaticTuple' declared here
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:600:20: error: 'StaticTuple' parameter #0 has 'AnyRegType' type, but value has type 'Int'
B: StaticTuple[n, BufferPtrFloat32],
^
/root/llama2.mojo/llama2.mojo:1:1: note: 'StaticTuple' declared here
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:606:31: error: 'StaticTuple' parameter #0 has 'AnyRegType' type, but value has type 'Int'
var tmp = StaticTuple[n, Accumulator[DType.float32, nelts]]()
^
/root/llama2.mojo/llama2.mojo:1:1: note: 'StaticTuple' declared here
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:616:22: error: 'DTypePointer[f32, 0]' value has no attribute 'simd_load'
var a = A.simd_load[_nelts](j)
~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:730:26: error: no matching function in call to 'memcpy'
memcpy[DType.float32](state.x.data(), content_row, dim)
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: expected at most 2 positional arguments, got 3
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: expected at most 2 positional arguments, got 3
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: failed to infer implicit parameter 'type' of argument 'dest' type 'Pointer'
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:1:1: note: candidate not viable: callee expects 0 parameters, but 1 was specified
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:794:32: error: 'Tensor[f32]' value has no attribute 'simd_load'
state.q.simd_load[_nelts](q_offset + i)
~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:795:42: error: 'Tensor[f32]' value has no attribute 'simd_load'
* state.key_cache.simd_load[_nelts](k_offset + i)
~~~~~~~~~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:818:39: error: 'Tensor[f32]' value has no attribute 'simd_load'
var xbi = state.xb.simd_load[_nelts](
~~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:820:46: error: 'Tensor[f32]' value has no attribute 'simd_load'
) + a * state.value_cache.simd_load[_nelts](v_offset + i)
~~~~~~~~~~~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:821:29: error: 'Tensor[f32]' value has no attribute 'simd_store'
state.xb.simd_store[_nelts](xb_offset + i, xbi)
~~~~~~~~^~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:846:38: error: 'Tensor[f32]' value has no attribute 'simd_load'
var initial_hb = state.hb.simd_load[_nelts](i)
~~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:850:21: error: 'Tensor[f32]' value has no attribute 'simd_store'
state.hb.simd_store[_nelts](i, hbi * state.hb2.simd_load[_nelts](i))
~~~~~~~~^~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:881:32: error: invalid call to 'rand': missing 1 required positional argument: 'size'
var r = rand[DType.float32](1)
~~~~~~~~~~~~~~~~~~~^~~
/root/llama2.mojo/llama2.mojo:1:1: note: function declared here
from algorithm import sum
^
/root/llama2.mojo/llama2.mojo:890:29: error: use of unknown declaration 'DynamicVector', 'fn' declarations require explicit variable declarations
fn bpe_encode(inout tokens: DynamicVector[Int], text: String, inout tok: Tokenizer):
^~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:891:32: error: unexpected token in expression
for pos in range(len(text)):
^
/root/llama2.mojo/llama2.mojo:891:32: error: statements must start at the beginning of a line
for pos in range(len(text)):
^
/root/llama2.mojo/llama2.mojo:940:9: error: use of unknown declaration 'print_no_newline'
print_no_newline(chr(str2num(d1) * 16 + str2num(d2)))
^~~~~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:945:9: error: use of unknown declaration 'print_no_newline'
print_no_newline(chr(s[p].to_int()))
^~~~~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:1044:25: error: use of unknown declaration 'DynamicVector', 'fn' declarations require explicit variable declarations
var prompt_tokens = DynamicVector[Int]()
^~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:49:31: error: 'DTypePointer[T, 0]' value has no attribute 'simd_load'
var newVal = self.data.simd_load[_width]() + val
~~~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:50:18: error: 'DTypePointer[T, 0]' value has no attribute 'simd_store'
self.data.simd_store[_width](newVal)
~~~~~~~~~^~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:54:25: error: 'DTypePointer[T, 0]' value has no attribute 'simd_load'
return self.data.simd_load[width]().reduce_add()
~~~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:111:26: error: 'DTypePointer[f32, 0]' value has no attribute 'simd_load'
return self._data.simd_load[nelts](idx)
~~~~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:124:26: error: 'DTypePointer[f32, 0]' value has no attribute 'simd_load'
return self._data.simd_load[nelts](indices[0] * self._shape[1] + indices[1])
~~~~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:127:26: error: 'DTypePointer[f32, 0]' value has no attribute 'simd_load'
return self._data.simd_load[1](idx)
~~~~~~~~~~^~~~~~~~~~
/root/llama2.mojo/llama2.mojo:130:26: error: 'DTypePointer[f32, 0]' value has no attribute 'simd_store'
return self._data.simd_store[nelts](idx, val)
~~~~~~~~~~^~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:305:31: error: use of unknown declaration 'DynamicVector', 'fn' declarations require explicit variable declarations
self.sorted_indices = DynamicVector[Int]()
^~~~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:382:40: error: 'List[SIMD[si8, 1]]' value has no attribute '_steal_ptr'
var int32_ptr = config_data_raw._steal_ptr().bitcast[DType.int32]()
~~~~~~~~~~~~~~~^~~~~~~~~~~
/root/llama2.mojo/llama2.mojo:469:27: error: 'List[SIMD[si8, 1]]' value has no attribute '_steal_ptr'
var data = tmp._steal_ptr().bitcast[DType.float32]()
~~~^~~~~~~~~~~
/root/.modular/pkg/packages.modular.com_max/bin/mojo: error: failed to parse the provided Mojo
~/src/AI/mojo/llama2.mojo$ mojo llama2.mojo tl-chat.bin \
-r falcon \
-z tok_tl-chat.bin \
-n 256 -t 0 -s 100 -i "<|im_start|>user\nGive me a python function to generate Fibonacci sequence<|im_end|>\n<|im_start|>assistant\n"
num hardware threads: 12
SIMD vector width: 16
checkpoint size: 4400767004 [ 4196 MB ]
n layers: 22
vocab size: 32003
<|im_start|>user
Give me a python function to generate Fibonacci sequence<|im_end|>
<|im_start|>assistant
¿Quiero debera.io|efes<|
|- [aquíntena|
|-|re|re|
|-|
|-ichas|[estructurañiñu|implementa.py|
|esínda|
¿Quiero|
|Olahi|
Does anyone know how to resolve this?
Are these speed comparisons all in CPU mode? Can we add a comparison with GPU?
Also, if you want to train, you want to use Mojo training. Is it necessary to add training related code in this way? Will rewriting be time-consuming?
Stumbled on this while trying to run the code on WSL Ubuntu
num parallel workers: 2 SIMD width: 16
checkpoint size: 60816028 [ 57 MB ] | n layers: 6 | vocab size: 32000
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
I downloaded the repo and was super happy to see the story model work!
Then I looked down and saw the chat so I went and installed it via the wget
that was provided in the readme
But, when I try to run it, this happened:
username@username:~/mojo/llama2.mojo$ mojo llama2.mojo tl-chat.bin \
-r falcon \
-z tok_tl-chat.bin \
-n 256 -t 0 -s 100 -i "<|im_start|>user\nGive me a python function to generate Fibonacci sequence<|im_end|>\n<|im_start|>assistant\n"
num hardware threads: 4
SIMD vector width: 16
Killed
(sorry, I accidentally opened the issue before I finished typing it 😢 )
Actually, this even happens if I follow all the instructions, download again and all, in a new folder
I remember llama2 uses group query attention. In the llama.c, I found there are kv_heads, kv_dim.
After having installed mojo (working), and llama2 as described, running mojo llama2.mojo
on ubuntu 22.04 with 16 cores, I get:
llama2.mojo $ mojo llama2.mojo
num hardware threads: 16 SIMD vector width: 16
checkpoint size: 60816028
Unhandled exception caught during execution: An error occurred in Python.
mojo: error: execution exited with a non-zero result: 1
It might be worth turning on discussions. It would be helpful to discuss performance improvements so there is a history of what people have tried and any benchmarks run.
I have enabled llama2.c to run the Tinyllama 1.1B chat on my repo.
Read Tiny Llama 1.1B model to run the model.
We can update the benchmark now.
Idk what your plan is with this project so I just wanted to ask if you want to grow it and advance into enabling
We could create different TODO Issues for the featues to enable work by the community.
If you dont want to grow it maybe we could create a community fork building on top of it.
I really like the idea of doing inference in mojo so really greatful for this project and I think this could be a good opportunity to learn more about mojo by building some features :)
hey team, incredible work being done here.
Wondering if you only support .bin models, or would it also manage to work with gguf quantized models as well.
If not, then that's a real feature request. Mostly everyone uses gguf models to work nowadays, as they are easier to run on consumer-grade hardware.
thanks.
I know it only came out yesterday but ;-)
I'm trying to make llama2.mojo work on tinyllama-1.1B.
Which is a GQA and not tie_embedding model.
Now I have finish converting the model and modify part of llama2.mojo(llama.cpp,llama.c).
I have noticed that our tokenizer is not stable compared with huggingface tokenizer.
I spent some time investigating why parallelized + vectorized version of matmul is slower than only vectorized.
Older Matmul examples showed that multi-core + vector was faster. Still, for me, the Matmul notebook example on Playground and Matmul example from the repo run on the GitHub Codespaces instance (4 cores, 16GB) showed that the multi-core version was slower.
I tried two commands: mojo examples/matmul.mojo
and mojo build examples/matmul.mojo
+ run the binary. They had the same results, multi-core slower. In addition, using htop
, I also made sure that the multi-core is utilizing all cores.
I found this PR - modularml/mojo#742 where you could see the value for vector width you get from simdwidthof is multiplied. In the case of the GitHub Codespace instance, my base value from simdwidthof was 8, I benchmarked higher values like 16 (2x), 32 (4x), and 64 (8x). You can see the results below:
I believe adjusting nelts
value should bring additional speed-ups.
Line 24 in 86a34c9
CPU details:
System information:
OS : linux
CPU : znver3
Arch : x86_64-unknown-linux-gnu
Num Cores : 4
CPU Features: avx2
mojo 1.0.0+601 已从 Canonical IS Snaps 安装
FileNotFoundError: [Errno 2] No such file or directory: 'juju': 'juju'
/home/y3rawat/.modular/pkg/packages.modular.com_mojo/bin/mojo /home/y3rawat/mojo/llama2.mojo
/home/y3rawat/mojo/llama2.mojo:10:6: error: unable to locate module 'read'
from read import BufReader, File
^
/home/y3rawat/.modular/pkg/packages.modular.com_mojo/bin/mojo: error: failed to parse the provided Mojo
(python38) y3rawat@y3rawat-ASUS-TUF-Gaming-F15-FX507ZC4-FX507ZC4:~/mojo$
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.