Giter Site home page Giter Site logo

Comments (8)

awan-10 avatar awan-10 commented on June 2, 2024

@delock - FYI. Created this issue so we can track and fix it. Please work with folks assigned on this issue.

from deepspeedexamples.

lekurile avatar lekurile commented on June 2, 2024

Hello @delock,

Thank you for raising this issue. I ran a local vllm benchmark with the microsoft/Phi-3-mini-4k-instruct model using the following code:

# Run benchmark
python ./run_benchmark.py \
        --model microsoft/Phi-3-mini-4k-instruct \
        --tp_size 1 \
        --num_replicas 1 \
        --max_ragged_batch_size 768 \
        --mean_prompt_length 2600 \
        --mean_max_new_tokens 60 \
        --stream \
        --backend vllm \
        --overwrite_results \

### Gernerate the plots
python ./src/plot_th_lat.py --data_dirs results_vllm/

echo "Find figures in ./plots/ and log outputs in ./results/"

I also had to add the "--trust-remote-code", argument to the vllm_cmd here:

Here's the resulting plot:

To reproduce the issue you show above, can you please provide a reproduction script so I can test on my end?

To answer your question:

Is it possible to run this script to benchmark a local API server? I kind of thinking run vllm serving in separate command, and use this benchmark to test the api server vllm started. So I would have better control on how the vllm server started and see all the error message from vllm server if it fails.

We can update the benchmarking script and add an additional argument, where existing local server information is provided and the script will not stand up a new server, but will instead target the existing server using the information provided.

from deepspeedexamples.

delock avatar delock commented on June 2, 2024

@awan-10 @lekurile Thanks for start this thread. I met this error when I tried to run this example on Xeon server with CPU. I suspect this is a configuration issue. Currently, I plan to modify the script to run client code only, and start the server on seperate command line, so will be able to see more error message and get better understanding.

from deepspeedexamples.

delock avatar delock commented on June 2, 2024

Hi @lekurile
Now I can start the server from seperate command line and run benchmark on this server with reduced test size (max batch 128, avg prompt128) to start with.

Yet I met the following error during post processing, I suspect this is due to transformers version. What is the transformers version you are using? My version is transformers==4.40.1

Traceback (most recent call last):
  File "/home/gma/DeepSpeedExamples/benchmarks/inference/mii/./run_benchmark.py", line 44, in <module>
    run_benchmark()
  File "/home/gma/DeepSpeedExamples/benchmarks/inference/mii/./run_benchmark.py", line 36, in run_benchmark
    print_summary(client_args, response_details)
  File "/home/gma/DeepSpeedExamples/benchmarks/inference/mii/src/utils.py", line 235, in print_summary
    ps = get_summary(vars(args), response_details)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gma/DeepSpeedExamples/benchmarks/inference/mii/src/postprocess_results.py", line 80, in get_summary
    [
  File "/home/gma/DeepSpeedExamples/benchmarks/inference/mii/src/postprocess_results.py", line 81, in <listcomp>
    (len(get_tokenizer().tokenize(r.prompt)) + len(get_tokenizer().tokenize(r.generated_tokens)))
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gma/anaconda3/envs/vllm/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 396, in tokenize
    return self.encode_plus(text=text, text_pair=pair, add_special_tokens=add_special_tokens, **kwargs).tokens()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gma/anaconda3/envs/vllm/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3037, in encode_plus
    return self._encode_plus(
           ^^^^^^^^^^^^^^^^^^
  File "/home/gma/anaconda3/envs/vllm/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus
    batched_output = self._batch_encode_plus(
                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gma/anaconda3/envs/vllm/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

from deepspeedexamples.

lekurile avatar lekurile commented on June 2, 2024

Hi @delock,

I'm using transformers==4.40.1 as well.

After #895 was committed to the repo, I'm seeing the same error on my end as well.

  File "/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

Can you please try detaching your repo HEAD to fab5d06, one commit prior, and running again? I'll look into this PR and see if we need to revert or not.

Thanks,
Lev

from deepspeedexamples.

lekurile avatar lekurile commented on June 2, 2024

@delock, here's the PR fixing the tokens_per_sec metric to work for both the streaming and non-streaming cases:
#897

You should be able to get past your error above with this PR, but I'm curious if you're seeing any failures still.

from deepspeedexamples.

delock avatar delock commented on June 2, 2024

Yes, the latest version can going forward. Will see whether it can continue.

@delock, here's the PR fixing the tokens_per_sec metric to work for both the streaming and non-streaming cases: #897

You should be able to get past your error above with this PR, but I'm curious if you're seeing any failures still.

from deepspeedexamples.

delock avatar delock commented on June 2, 2024

Hi @lekurile the benchmark will proceed but will hit some other error when running on CPU. I'll check with vllm cpu engineers to investigate these errors. I also submitted a PR adding a flag allowing start the server in seperate command line.
#900

from deepspeedexamples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.