Giter Site home page Giter Site logo

Comments (6)

MatthewShao avatar MatthewShao commented on July 23, 2024 1

It turns out I used a wrong tokenizer from hf-internal-testing/tiny-random-llama, fix this with correct tokenizer file, thanks everyone.

from text-generation-inference.

MatthewShao avatar MatthewShao commented on July 23, 2024

I also tried to launch without flash attention, and shard num argument: CUDA_VISIBLE_DEVICES=4,5,6,7 text-generation-launcher --port 8080 --model-id decapoda-research/llama-7b, got following error:

Traceback (most recent call last):
  File \"/opt/miniconda/envs/text-generation/bin/text-generation-server\", line 8, in <module>
    sys.exit(app())
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/typer/main.py\", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/click/core.py\", line 1130, in __call__
    return self.main(*args, **kwargs)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/typer/core.py\", line 778, in main
    return _main(
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/typer/core.py\", line 216, in _main
    rv = self.invoke(ctx)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/click/core.py\", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/click/core.py\", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/click/core.py\", line 760, in invoke
    return __callback(*args, **kwargs)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/typer/main.py\", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/cli.py\", line 55, in serve
    server.serve(model_id, revision, sharded, quantize, uds_path)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/server.py\", line 135, in serve
    asyncio.run(serve_inner(model_id, revision, sharded, quantize))
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/asyncio/runners.py\", line 44, in run
    return loop.run_until_complete(main)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/asyncio/base_events.py\", line 634, in run_until_complete
    self.run_forever()
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/asyncio/base_events.py\", line 601, in run_forever
    self._run_once()
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/asyncio/base_events.py\", line 1905, in _run_once
    handle._run()
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/asyncio/events.py\", line 80, in _run
    self._context.run(self._callback, *self._args)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/grpc_interceptor/server.py\", line 153, in invoke_intercept_method
    return await self.intercept(
> File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/interceptor.py\", line 20, in intercept
    return await response
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py\", line 82, in _unary_interceptor
    raise error
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py\", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/server.py\", line 42, in Prefill
    batch = self.model.batch_type.from_pb(
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py\", line 80, in from_pb
    stopping_criteria = StoppingCriteria.from_pb(
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/utils/tokens.py\", line 162, in from_pb
    tokenizer.eos_token_id,
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9-linux-x86_64.egg/transformers/tokenization_utils_base.py\", line 1132, in eos_token_id
    return self.convert_tokens_to_ids(self.eos_token)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9-linux-x86_64.egg/transformers/tokenization_utils_fast.py\", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9-linux-x86_64.egg/transformers/tokenization_utils_fast.py\", line 260, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9-linux-x86_64.egg/transformers/tokenization_utils_base.py\", line 1141, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9-linux-x86_64.egg/transformers/tokenization_utils_fast.py\", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9-linux-x86_64.egg/transformers/tokenization_utils_fast.py\", line 260, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9-linux-x86_64.egg/transformers/tokenization_utils_base.py\", line 1141, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
......
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9-linux-x86_64.egg/transformers/tokenization_utils_fast.py\", line 260, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File \"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9-linux-x86_64.egg/transformers/tokenization_utils_base.py\", line 1141, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
RecursionError: maximum recursion depth exceeded

from text-generation-inference.

OlivierDehaene avatar OlivierDehaene commented on July 23, 2024
NotImplementedError: Could not run 'c10d::allreduce_' with arguments from the 'Meta' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'c10d::allreduce_' is only available for these backends: [CPU, CUDA, SparseCPU, SparseCUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

Thanks for reporting! This must be because of the weight loading logic of the sharded model. I will check why it fails in this case.

@Narsil, (#166 (comment)) am I doing something wrong here to trigger a max recursion in the tokenizer conversion?

from text-generation-inference.

OlivierDehaene avatar OlivierDehaene commented on July 23, 2024

FLASH_ATTENTION=1 text-generation-launcher --num-shard 2 --port 8080 --max-total-tokens 2048 --model-id decapoda-research/llama-7b-hf seems to work for me.

decapoda-research/llama-7b might not have the correct weight layout.

from text-generation-inference.

Narsil avatar Narsil commented on July 23, 2024

@Narsil, (#166 (comment)) am I doing something wrong here to trigger a max recursion in the tokenizer conversion?

Not sure. Anyway I can reproduce ? @ArthurZucker since I remember you touching the @property there.

from text-generation-inference.

ArthurZucker avatar ArthurZucker commented on July 23, 2024

Will have a look, but seeing @OlivierDehaene 's comment not sure I should really dive deep

from text-generation-inference.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.