lucidrains / ring-attention-pytorch Goto Github PK

View Code? Open in Web Editor NEW

373.0 8.0 21.0 1.08 MB

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

License: MIT License

Python 100.00%

attention-mechanism efficient-attention long-context distributed-attention

ring-attention-pytorch's Introduction

ring-attention-pytorch's People

Contributors

Stargazers

Watchers

ring-attention-pytorch's Issues

inference for open LLM

Can it be used to extend the structure of the open source LLM from huggingface and use ring attention for inference?

A ring attention with flash attention kernel implementation

Hi! Thank you for your work on implementing the ring attention in pytorch!

I've just tried to implement a ring_flash_attn_qkvpacked_func (corresponding to flash_attn_qkvpacked_func in flash attention) with the flash attention kernels here: https://github.com/zhuzilin/ring-flash-attention/

Maybe this can help :)

Updates:

ring_flash_attn_varlen_qkvpacked_func is also implemented.

Comment about use of all gather

Hi Phil!

Hope you're doing well. As you saw with Gemini Pro 1.5, which works on 1 million tokens, open-source has some work to do to catch up :D porting Ring Attention to PyTorch is definitely the first step towards that.

@rwightman made an interesting comment on your current approach of implementing Ring Attention, tought it would be useful for you to share that: https://twitter.com/wightmanr/status/1758275957557719308. Basically Ross had something similar he had to implement to make the SigLIP loss function work, leveraging neighbour exchange instead of allgather.

Btw, if your implementation is done, I would like to leverage it to port the LWM model that came out 2 days ago (https://github.com/LargeWorldModel/LWM). I would port the model to the Hugging Face Transformers library, by adding a LWMForCausalLM class. Since the weights are open-sourced I can convert them to the Transformers format.

Btw are you still active on any Discord channel?

Cheers,

Niels

Connection closed by peer

when I set use_cuda = True, and set

  os.environ['MASTER_ADDR'] = '127.0.0.1'
  os.environ['MASTER_PORT'] = '1234'

the error as follow:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
    fn(i, *args)
  File "/app/assert.py", line 87, in start
    ring_out = ddp_ring_attention_net(seq)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1509, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1345, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
  File "/app/ring_attention_pytorch/ring_attention.py", line 568, in forward
    x = attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
  File "/app/ring_attention_pytorch/ring_attention.py", line 368, in forward
    out = ring_flash_attn_cuda(
  File "<@beartype(ring_attention_pytorch.ring_flash_attention_cuda.ring_flash_attn_cuda) at 0x7fdd828c4ee0>", line 214, in ring_flash_attn_cuda
  File "/app/ring_attention_pytorch/ring_flash_attention_cuda.py", line 752, in ring_flash_attn_cuda
    return ring_flash_attn_cuda_(q, k, v, mask, causal, bucket_size, ring_reduce_col, striped_ring_attn, max_lookback_seq_len, ring_size)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 551, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/app/ring_attention_pytorch/ring_flash_attention_cuda.py", line 526, in forward
    for (ring_rank, is_last), ((kv, mask), (receive_kv, receive_mask)) in ring_pass_fn(kv, mask, receive_buffers = (receive_kv, receive_mask), max_iters = max_ring_passes, ring_size = ring_size):
  File "/app/ring_attention_pytorch/ring.py", line 127, in all_ring_pass
    new_tensor, new_receive_buffer = one_ring_pass(tensor, receive_buffer, ring_size)
  File "/app/ring_attention_pytorch/ring.py", line 88, in ring_pass
    send_and_receive_(x, receive_buffer, circular_rank_right(ring_size = ring_size), circular_rank_left(ring_size = ring_size))
  File "/app/ring_attention_pytorch/ring.py", line 69, in send_and_receive_
    dist.recv(receive_buffer, receive_from_rank)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 1680, in recv
    pg.recv([tensor], src, tag).wait()
RuntimeError: [/opt/pytorch/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer [172.24.0.2]:34966

Cross Attention variant?

Hi there,

Sorry if this is a stupid issue but I was wondering if it would be possible to apply Ring Attention to Cross Attention? I was thinking of using RingFlashAttentionCUDAFunction directly but it seems like the transformer block itself has modifications.

Thanks

ValueError: Invalid expression '[ True]', must be integers

pytree_node instead.
  _torch_pytree._register_pytree_node(
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Traceback (most recent call last):
  File "/Users/defalt/Desktop/Athena/research/Gemini/gemini_block.py", line 18, in <module>
    out = model(x)  # Apply the model to the input tensor
          ^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/defalt/Desktop/Athena/research/Gemini/gemini_torch/model.py", line 101, in forward
    x = self.attn(x)
        ^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ring_attention_pytorch/ring_attention.py", line 228, in forward
    q, k, v = rearrange('b n (qkv h d) -> qkv b h n d', qkv, qkv = 3, h = self.heads)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/einx/lru_cache.py", line 70, in inner
    graph = construct_graph(*args, backend=backend, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/einx/lru_cache.py", line 20, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/einx/lru_cache.py", line 45, in construct_graph
    output_tracers = func(*args, **kwargs, backend=einx.backend.tracer)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/einx/op/rearrange.py", line 118, in rearrange
    exprs_in, exprs_out = parse(description, *[einx.param.get_shape(tensor) for tensor in tensors], cse=cse, **parameters)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/einx/lru_cache.py", line 20, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/einx/op/rearrange.py", line 59, in parse
    + [einx.expr.Equation(k, np.asarray(v)[..., np.newaxis], depth1=None, depth2=None) for k, v in parameters.items()],
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/einx/op/rearrange.py", line 59, in <listcomp>
    + [einx.expr.Equation(k, np.asarray(v)[..., np.newaxis], depth1=None, depth2=None) for k, v in parameters.items()],
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/einx/expr/util.py", line 36, in __init__
    self.expr2 = _input_expr(expr2)
                 ^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/einx/expr/util.py", line 29, in _input_expr
    raise ValueError(f"Invalid expression '{expr}', must be integers")
ValueError: Invalid expression '[ True]', must be integers
```

8 A100S

Willing to give you 8 A100s for this, lmk through email

I'm doing an image generation experiment, but my script outputs a json file, how do I train a Transformer model to generate a pixel representation of an image?

I'm doing an experiment with image generation, but my script outputs a json file, how can I train a transformer model??

import cv2
import json
import numpy as np
import os
from PIL import Image

def image_to_text(image_path, text_path):
    # Read the image and convert to grayscale
    image = cv2.imread(image_path)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # Apply thresholding to get a binary image
    _, binary = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
    
    # Find contours
    contours, _ = cv2.findContours(binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    contours_list = [contour.flatten().tolist() for contour in contours]  # Flatten and convert to list

    # Convert image to list of pixel values
    pixels = image.flatten().tolist()

    # Save mode and size
    mode = image.shape[2] if len(image.shape) == 3 else 1  # Color if 3 channels, grayscale if 2
    size = image.shape[:2]

    # Write pixel data and contour information to text file
    with open(text_path, 'w') as text_file:
        json.dump({'mode': mode, 'size': size, 'pixels': pixels, 'contours': contours_list}, text_file)

def text_to_image(text_path, output_image_path):
    # Read pixel data and contour information from text file
    with open(text_path, 'r') as text_file:
        data = json.load(text_file)
        mode = data['mode']
        size = tuple(data['size'])
        pixels = data['pixels']
        contours_list = data['contours']

    # Reconstruct the image from the pixel information
    image_array = np.array(pixels, dtype=np.uint8)
    if mode == 1:
        image_array = image_array.reshape(size[0], size[1])  # Grayscale
    else:
        image_array = image_array.reshape(size[0], size[1], mode)  # Color

    img = Image.fromarray(image_array)
    img.save(output_image_path)

    # Reconstruct the image contours
    contours = [np.array(contour).reshape(-1, 1, 2) for contour in contours_list]  # Reshape to contour format
    img_contours = cv2.imread(output_image_path)
    cv2.drawContours(img_contours, contours, -1, (0, 255, 0), 2)
    cv2.imwrite(output_image_path, img_contours)


def batch_process(input_folder, output_folder_text, output_folder_images):
    # 确保输出文件夹存在
    if not os.path.exists(output_folder_text):
        os.makedirs(output_folder_text)
    if not os.path.exists(output_folder_images):
        os.makedirs(output_folder_images)
    
    # 遍历文件夹中的所有图像文件
    for filename in os.listdir(input_folder):
        if filename.lower().endswith(('.jpg', '.png', '.jpeg')):  # 处理常见的图像格式
            print(f"Processing {filename}...")
            image_path = os.path.join(input_folder, filename)
            base_filename = os.path.splitext(filename)[0]
            text_path = os.path.join(output_folder_text, base_filename + '.txt')
            output_image_path = os.path.join(output_folder_images, filename)
            
            # 图像到文本
            image_to_text(image_path, text_path)
            
            # 文本到图像
            text_to_image(text_path, output_image_path)

# 使用示例
input_folder = 'D:/llama2.c-master/1/images'
output_folder_text = 'D:/llama2.c-master/1/text'
output_folder_images = 'D:/llama2.c-master/1/imagesout'

batch_process(input_folder, output_folder_text, output_folder_images)

@lucidrains Can you help me?

lucidrains / ring-attention-pytorch Goto Github PK

ring-attention-pytorch's Introduction

Ring Attention - Pytorch

Appreciation

Install

Usage

Test

Todo

Citations

ring-attention-pytorch's People

Contributors

Stargazers

Watchers

Forkers

ring-attention-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org