lllyasviel / omost Goto Github PK

Your image is almost there!

License: Apache License 2.0

Python 100.00%

omost's Introduction

Omost

Omost is a project to convert LLM's coding capability to image generation (or more accurately, image composing) capability.

The name Omost (pronunciation: almost) has two meanings: 1) everytime after you use Omost, your image is almost there; 2) the O mean "omni" (multi-modal) and most means we want to get the most out of it.

Omost provides LLMs models that will write codes to compose image visual contents with Omost's virtual Canvas agent. This Canvas can be rendered by specific implementations of image generators to actually generate images.

Currently, we provide 3 pretrained LLM models based on variations of Llama3 and Phi3 (see also the model notes at the end of this page).

All models are trained with mixed data of (1) ground-truth annotations of several datasets including Open-Images, (2) extracted data by automatically annotating images, (3) reinforcement from DPO (Direct Preference Optimization, "whether the codes can be compiled by python 3.10 or not" as a direct preference), and (4) a small amount of tuning data from OpenAI GPT4o's multi-modal capability.

Get Started

You can just use the official HuggingFace space.

Or, you can use the below deployment (requires 8GB Nvidia VRAM):

git clone https://github.com/lllyasviel/Omost.git
cd Omost
conda create -n omost python=3.10
conda activate omost
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
python gradio_app.py

(Note that quant LLM requires bitsandbytes - some 9XX or 10XX or 20XX GPUs may have trouble in running it. If that happens, just use our official huggingface space.)

Screenshots

(All with random seed 12345)

a ragged man wearing a tattered jacket in the nineteenth century：

the greatest escape after the apocalypse, the hope of all humanity：

jurassic dinosaur battle：

the supreme ruler with endless magic on a gothic throne, with undead armies kneeling:

a funny cartoon batman fights joker:

Example Transcripts (with Conversational Editing)

(Below is a raw transcript from default model, with random seed 123456)

(If the GitHub's code block rendering is causing troubles to reading long lines, click here to read raw texts.)

User

generate an image of the fierce battle of warriors and the dragon

Assistant

# Initialize the canvas
canvas = Canvas()

# Set a global description for the canvas
canvas.set_global_description(
    description='A fierce battle between warriors and a dragon.',
    detailed_descriptions=[
        'In this intense scene, a group of fierce warriors is engaged in an epic battle with a mighty dragon.',
        'The warriors, clad in armor and wielding swords and shields, are positioned on the left side of the image.',
        'Their expressions are determined and focused, reflecting their resolve to defeat the dragon.',
        'The dragon, with its massive wings spread wide and its fiery breath illuminating the scene, dominates the center of the image.',
        'Its scales glisten with a metallic sheen, and its eyes burn with a fierce intensity.',
        'The background is a dark, stormy sky with flashes of lightning, adding to the drama and tension of the battle.',
        'The ground is littered with debris and the remnants of previous battles, enhancing the sense of a long and brutal conflict.',
        'The overall atmosphere is one of chaos and intensity, with the warriors and the dragon locked in a fierce struggle for survival.',
    ],
    tags='battle, warriors, dragon, fierce, armor, swords, shields, determined, focused, epic, intense, metallic, glistening, fiery breath, stormy sky, lightning, debris, conflict, struggle, survival, chaos, tension, drama, wings, scales, eyes, burning, illuminated',
    HTML_web_color_name='darkslategray',
)

# Add a group of fierce warriors.
canvas.add_local_description(
    location='on the left',
    offset='no offset',
    area='a large horizontal area',
    distance_to_viewer=5.0,
    description='A group of fierce warriors.',
    detailed_descriptions=[
        'The warriors, clad in gleaming armor, are positioned on the left side of the image.',
        'They are armed with swords, shields, and spears, ready for battle.',
        'Their faces are set with determination and focus, reflecting their resolve to defeat the dragon.',
        'Some warriors are in mid-action, swinging their swords or shields, while others stand ready to strike.',
        'The armor they wear is intricately designed, with symbols and patterns that signify their rank and bravery.',
        'The ground beneath them is uneven and littered with debris, adding to the sense of a chaotic and intense battle.',
    ],
    tags='warriors, armor, swords, shields, spears, determined, focused, mid-action, intricate design, symbols, patterns, rank, bravery, uneven ground, debris, chaotic, intense, battle',
    atmosphere='Determined and focused, ready for the fierce battle.',
    style='Highly detailed and dynamic, capturing the intensity of the warriors.',
    quality_meta='High resolution with intricate details and dynamic poses.',
    HTML_web_color_name='darkgoldenrod',
)

# Add a mighty dragon.
canvas.add_local_description(
    location='in the center',
    offset='no offset',
    area='a large square area',
    distance_to_viewer=7.0,
    description='A mighty dragon.',
    detailed_descriptions=[
        'The dragon is a massive creature, dominating the center of the image with its wide-spread wings and fiery breath.',
        'Its scales glisten with a metallic sheen, reflecting the light from its fiery breath.',
        "The dragon's eyes burn with a fierce intensity, and its teeth are sharp and menacing.",
        'The wings of the dragon are powerful and spread wide, casting shadows over the battlefield.',
        'The dragon’s body is muscular and covered in protective scales, with a long, sinewy tail that adds to its formidable appearance.',
        'The fiery breath of the dragon illuminates the scene, casting a reddish glow over the warriors and the battlefield.',
    ],
    tags='dragon, massive, wings, fiery breath, glistening scales, metallic sheen, fierce eyes, sharp teeth, powerful wings, shadows, battlefield, muscular body, protective scales, sinewy tail, formidable, reddish glow, illumination',
    atmosphere='Intense and menacing, with a powerful presence.',
    style='Epic and dramatic, emphasizing the grandeur and danger of the dragon.',
    quality_meta='High resolution with dramatic lighting and detailed textures.',
    HTML_web_color_name='firebrick',
)

# Add a stormy sky with flashes of lightning.
canvas.add_local_description(
    location='on the top',
    offset='no offset',
    area='a large horizontal area',
    distance_to_viewer=10.0,
    description='A stormy sky with flashes of lightning.',
    detailed_descriptions=[
        'The background of the image is a dark, stormy sky filled with swirling clouds and flashes of lightning.',
        'The sky is turbulent, with clouds dark and foreboding, adding to the dramatic tension of the battle.',
        'The lightning flashes illuminate the scene, casting sharp, brief lights over the warriors and the dragon.',
        'The stormy sky creates a sense of chaos and unpredictability, heightening the intensity of the battle below.',
        'The overall atmosphere is one of impending doom and relentless conflict, with the storm mirroring the fierce struggle between the warriors and the dragon.',
    ],
    tags='stormy sky, dark clouds, lightning, turbulent, foreboding, dramatic tension, illumination, chaos, unpredictability, intensity, impending doom, relentless conflict, battle, warriors, dragon, swirling clouds, sharp lights, brief lights',
    atmosphere='Chaotic and intense, mirroring the fierce battle below.',
    style='Dramatic and turbulent, emphasizing the conflict and tension.',
    quality_meta='High resolution with dynamic lighting and detailed cloud textures.',
    HTML_web_color_name='midnightblue',
)

# Add a debris-covered battlefield.
canvas.add_local_description(
    location='on the bottom',
    offset='no offset',
    area='a large horizontal area',
    distance_to_viewer=5.0,
    description='A debris-covered battlefield.',
    detailed_descriptions=[
        'The ground of the battlefield is littered with debris, remnants of previous battles.',
        'Broken weapons, shattered shields, and scattered armor pieces are strewn across the battlefield.',
        'The terrain is uneven, with patches of mud and dirt, adding to the sense of a long and brutal conflict.',
        'The debris-covered battlefield enhances the chaotic and intense atmosphere of the scene, reflecting the ferocity and duration of the battle.',
        'The overall appearance is one of destruction and turmoil, with the remnants of previous battles serving as a grim reminder of the ongoing struggle.',
    ],
    tags='battlefield, debris, broken weapons, shattered shields, scattered armor, uneven terrain, mud, dirt, brutal conflict, chaos, intensity, destruction, turmoil, previous battles, ongoing struggle, remnants, ferocity, duration',
    atmosphere='Chaotic and intense, reflecting the ferocity of the battle.',
    style='Detailed and realistic, emphasizing the destruction and chaos.',
    quality_meta='High resolution with detailed textures and realistic debris.',
    HTML_web_color_name='darkolivegreen',
)

User

change the dragon to a dinosaur

Assistant

# Initialize the canvas
canvas = Canvas()

# Set a global description for the canvas
canvas.set_global_description(
    description='A fierce battle between warriors and a dinosaur.',
    detailed_descriptions=[
        'In this intense scene, a group of fierce warriors is engaged in an epic battle with a mighty dinosaur.',
        'The warriors, clad in armor and wielding swords and shields, are positioned on the left side of the image.',
        'Their expressions are determined and focused, reflecting their resolve to defeat the dinosaur.',
        'The dinosaur, with its massive body and powerful legs, dominates the center of the image.',
        'Its scales glisten with a metallic sheen, and its eyes burn with a fierce intensity.',
        'The background is a dark, stormy sky with flashes of lightning, adding to the drama and tension of the battle.',
        'The ground is littered with debris and the remnants of previous battles, enhancing the sense of a long and brutal conflict.',
        'The overall atmosphere is one of chaos and intensity, with the warriors and the dinosaur locked in a fierce struggle for survival.',
    ],
    tags='battle, warriors, dinosaur, fierce, armor, swords, shields, determined, focused, epic, intense, metallic, glistening, stormy sky, lightning, debris, conflict, struggle, survival, chaos, tension, drama, powerful legs, scales, eyes, burning, illuminated',
    HTML_web_color_name='darkslategray',
)

# Add a group of fierce warriors.
canvas.add_local_description(
    location='on the left',
    offset='no offset',
    area='a large horizontal area',
    distance_to_viewer=5.0,
    description='A group of fierce warriors.',
    detailed_descriptions=[
        'The warriors, clad in gleaming armor, are positioned on the left side of the image.',
        'They are armed with swords, shields, and spears, ready for battle.',
        'Their faces are set with determination and focus, reflecting their resolve to defeat the dinosaur.',
        'Some warriors are in mid-action, swinging their swords or shields, while others stand ready to strike.',
        'The armor they wear is intricately designed, with symbols and patterns that signify their rank and bravery.',
        'The ground beneath them is uneven and littered with debris, adding to the sense of a chaotic and intense battle.',
    ],
    tags='warriors, armor, swords, shields, spears, determined, focused, mid-action, intricate design, symbols, patterns, rank, bravery, uneven ground, debris, chaotic, intense, battle',
    atmosphere='Determined and focused, ready for the fierce battle.',
    style='Highly detailed and dynamic, capturing the intensity of the warriors.',
    quality_meta='High resolution with intricate details and dynamic poses.',
    HTML_web_color_name='darkgoldenrod',
)

# Add a mighty dinosaur.
canvas.add_local_description(
    location='in the center',
    offset='no offset',
    area='a large square area',
    distance_to_viewer=7.0,
    description='A mighty dinosaur.',
    detailed_descriptions=[
        'The dinosaur is a massive creature, dominating the center of the image with its powerful legs and fierce roar.',
        'Its scales glisten with a metallic sheen, reflecting the light from the stormy sky.',
        "The dinosaur's eyes burn with a fierce intensity, and its teeth are sharp and menacing.",
        'The powerful legs of the dinosaur are muscular and add to its formidable appearance.',
        'The ground shakes beneath its feet, enhancing the sense of a long and brutal conflict.',
        'The overall appearance is one of destruction and turmoil, with the dinosaur adding to the chaotic atmosphere.',
    ],
    tags='dinosaur, massive, powerful legs, fierce roar, glistening scales, metallic sheen, fierce eyes, sharp teeth, muscular legs, formidable, ground shaking, destruction, turmoil, chaos, battle',
    atmosphere='Intense and menacing, with a powerful presence.',
    style='Epic and dramatic, emphasizing the grandeur and danger of the dinosaur.',
    quality_meta='High resolution with dramatic lighting and detailed textures.',
    HTML_web_color_name='firebrick',
)

# Add a stormy sky with flashes of lightning.
canvas.add_local_description(
    location='on the top',
    offset='no offset',
    area='a large horizontal area',
    distance_to_viewer=10.0,
    description='A stormy sky with flashes of lightning.',
    detailed_descriptions=[
        'The background of the image is a dark, stormy sky filled with swirling clouds and flashes of lightning.',
        'The sky is turbulent, with clouds dark and foreboding, adding to the dramatic tension of the battle.',
        'The lightning flashes illuminate the scene, casting sharp, brief lights over the warriors and the dinosaur.',
        'The stormy sky creates a sense of chaos and unpredictability, heightening the intensity of the battle below.',
        'The overall atmosphere is one of impending doom and relentless conflict, with the storm mirroring the fierce struggle between the warriors and the dinosaur.',
    ],
    tags='stormy sky, dark clouds, lightning, turbulent, foreboding, dramatic tension, illumination, chaos, unpredictability, intensity, impending doom, relentless conflict, battle, warriors, dinosaur, swirling clouds, sharp lights, brief lights',
    atmosphere='Chaotic and intense, mirroring the fierce battle below.',
    style='Dramatic and turbulent, emphasizing the conflict and tension.',
    quality_meta='High resolution with dynamic lighting and detailed cloud textures.',
    HTML_web_color_name='midnightblue',
)

# Add a debris-covered battlefield.
canvas.add_local_description(
    location='on the bottom',
    offset='no offset',
    area='a large horizontal area',
    distance_to_viewer=5.0,
    description='A debris-covered battlefield.',
    detailed_descriptions=[
        'The ground of the battlefield is littered with debris, remnants of previous battles.',
        'Broken weapons, shattered shields, and scattered armor pieces are strewn across the battlefield.',
        'The terrain is uneven, with patches of mud and dirt, adding to the sense of a long and brutal conflict.',
        'The debris-covered battlefield enhances the chaotic and intense atmosphere of the scene, reflecting the ferocity and duration of the battle.',
        'The overall appearance is one of destruction and turmoil, with the remnants of previous battles serving as a grim reminder of the ongoing struggle.',
    ],
    tags='battlefield, debris, broken weapons, shattered shields, scattered armor, uneven terrain, mud, dirt, brutal conflict, chaos, intensity, destruction, turmoil, previous battles, ongoing struggle, remnants, ferocity, duration',
    atmosphere='Chaotic and intense, reflecting the ferocity of the battle.',
    style='Detailed and realistic, emphasizing the destruction and chaos.',
    quality_meta='High resolution with detailed textures and realistic debris.',
    HTML_web_color_name='darkolivegreen',
)

(User can click the render button to actually render the image from any of those Canvas instances.)

Symbols

All Omost LLMs are trained to obey the following symbols

class Canvas:
    def set_global_description(
            self, 
            description: str, 
            detailed_descriptions: list[str], 
            tags: str, 
            HTML_web_color_name: str
    ):
        pass

    def add_local_description(
            self, 
            location: str, 
            offset: str, 
            area: str, 
            distance_to_viewer: float, 
            description: str, 
            detailed_descriptions: list[str], 
            tags: str, 
            atmosphere: str, 
            style: str, 
            quality_meta: str, 
            HTML_web_color_name: str
    ):
        assert location in [
            "in the center", 
            "on the left", 
            "on the right", 
            "on the top", 
            "on the bottom", 
            "on the top-left", 
            "on the top-right", 
            "on the bottom-left", 
            "on the bottom-right"
        ]
        assert offset in [
            "no offset", 
            "slightly to the left", 
            "slightly to the right", 
            "slightly to the upper", 
            "slightly to the lower", 
            "slightly to the upper-left", 
            "slightly to the upper-right", 
            "slightly to the lower-left", 
            "slightly to the lower-right"
        ]
        assert area in [
            "a small square area", 
            "a small vertical area", 
            "a small horizontal area", 
            "a medium-sized square area", 
            "a medium-sized vertical area", 
            "a medium-sized horizontal area", 
            "a large square area", 
            "a large vertical area", 
            "a large horizontal area"
        ]
        assert distance_to_viewer > 0
        pass

During training, the above symbols are associated with specific concepts and use cases related to image generation.

The design is to make those codes easy to learn for LLMs, but also easy to handle for diffusion models.

Lets breakdown each part:

Function: Canvas.set_global_description and Canvas.add_local_description

They set descriptions to images. The meanings of the parameters are same for them, with add_local_description having more fields than set_global_description.

The set_global_description annotate entire image, while add_local_description annotates a part of image.

Parameter: description and detailed_descriptions

Let us introduce a concept called "sub-prompt". If a prompt is less than 75 tokens, and is self-supported to describe a thing without relying on other prompts, we call it a "sub-prompt".

The description is a sub-prompt, and the detailed_descriptions is a list of sub-prompts.

Note that each sub-prompt is strictly less than 75 tokens (and typically less than 40 tokens), you can safely encode them with any clip without worrying the truncation position affecting the semantics.

The design of sub-prompt also allows more satisfying text encoding based on greedy merge. For example, if you have

sub-prompt A: 25 tokens
sub-prompt B: 35 tokens
sub-prompt C: 5 tokens
sub-prompt D: 60 tokens
sub-prompt E: 15 tokens
sub-prompt F: 25 tokens

and since every sub-prompt is promised to be self-supported to describe a thing independently, we can use greedy method to merge them to bags like

bag 1 {A, B, C} : 65 tokens
bag 2 {D} : 60 tokens
bag 1 {E, F} : 40 tokens

where each bag is less than 75 tokens and can be encoded by any clip in one pass (and then concat them).

Encoding texts in this way will make sure that text-encoder will never make semantic truncation mistakes.

One may ask - if all sub-prompts are less than 75 tokens with independent semantics, why not just encode them without merge and then concat? This is mainly because we want the text embedding to be more coherent. For example, lets say sub-prompt A is "a man" while sub-prompt B is "handsome, professional", then merging them before encoding will give you a more mixed text embedding concept with coherent features of a handsome professional man.

All Omost LLMs are trained to give strictly well-defined sub-prompts. You can make use of these definitions to design lossless text encoding methods.

Parameter: location, offset, area

The three parameters defines a bounding box. Note that they must obey

assert location in [
    "in the center", 
    "on the left", 
    "on the right", 
    "on the top", 
    "on the bottom", 
    "on the top-left", 
    "on the top-right", 
    "on the bottom-left", 
    "on the bottom-right"
]
assert offset in [
    "no offset", 
    "slightly to the left", 
    "slightly to the right", 
    "slightly to the upper", 
    "slightly to the lower", 
    "slightly to the upper-left", 
    "slightly to the upper-right", 
    "slightly to the lower-left", 
    "slightly to the lower-right"
]
assert area in [
    "a small square area", 
    "a small vertical area", 
    "a small horizontal area", 
    "a medium-sized square area", 
    "a medium-sized vertical area", 
    "a medium-sized horizontal area", 
    "a large square area", 
    "a large vertical area", 
    "a large horizontal area"
]

First we divide a canvas into 3*3=9 locations:

Then we further divide each location to 3*3 offsets, resulting in 9*9=81 positions:

Using these positions as centers, we further define 9 types of bounding boxes:

We can see that this method allows 9*9*9=729 different bounding boxes, covering almost all common possible locations of an object in the image.

One may argue that why this is necessary - why not just let the LLMs to learn pixel index or x, y coordinates - and should that be much more accurate? Below is several of my notes:

I have tried several representations, including pixel index like {x=32, y=16, w=58, h=99}, or margin pixels like {left=32, right=15, top=27, bottom=33}, or percentage pixel index like {x=0.124, y=0.65, w=0.335, h=0.251}, or percentage margin like {left=0.251, right=0.154, top=0.254, bottom=0.441}. The result is that opensource LLMs are really not very good at learning these representations even for Llama3 (perhaps GPT4o can learn it). Sometimes it works sometimes it gives completely random numbers. Note that our problem is very different from MLLM. The vision-LLM usually have image embedding as inputs and in that case estimating numeric position is like a look-up table problem and can somewhat be learned, but our case is where the LLM need to generate every composition from scratch without help of any image embedding to look-up.
But the natural language like "on the right", "slightly to the top-right", "a small vertical area" etc, works very well. The model converges very fast and the learning is stable. It aligns to the pretrained knowledge of LLMs very well.
I have also tried adding some special tokens to represent spatial locations and also train the embedding layers. But that model is very difficult to train and debug. Also, the token-embedding-based method needs many hyperparameter tuning everytime we change the LLM - for example when changing from Llama3 to Phi, if we use the token-embedding method, we need to design training parameters again.
The number 9*9*9=729 is not really a small number from the perspective of bounding box proposals. This can also be called ROI (region of interest) and some old semantic segmentation tech uses (RPN) Region Proposal Network to produce a similar number (<1000) of regions.
Most region-guided diffusion methods are coarse-level methods (like multi-diffusion and attention couple and gligen), and they do not need pixel-perfect regions.
These are very personal results from me - if you are working on some similar multi-modal LLM research, using pixel indices is completely okay, worth trying, and probably other training methods can also achieve a robust system.

Parameter: distance_to_viewer and HTML_web_color_name

The distance_to_viewer can be viewed as relative depth. Note that this value's absolute number is not reliable at all (because opensource LLMs are not very good at producing image-space numbers) and it should only be used in sorting elements into background-to-foreground layers.

You can always use distance_to_viewer to sort all local elements before rendering them using a diffusion model. The global description can be always viewed as the most far away background layer.

The HTML_web_color_name is one of these:

possible_HTML_web_color_names = {  # r, g, b
    'aliceblue': (240, 248, 255), 'antiquewhite': (250, 235, 215), 'aqua': (0, 255, 255),
    'aquamarine': (127, 255, 212), 'azure': (240, 255, 255), 'beige': (245, 245, 220),
    'bisque': (255, 228, 196), 'black': (0, 0, 0), 'blanchedalmond': (255, 235, 205), 'blue': (0, 0, 255),
    'blueviolet': (138, 43, 226), 'brown': (165, 42, 42), 'burlywood': (222, 184, 135),
    'cadetblue': (95, 158, 160), 'chartreuse': (127, 255, 0), 'chocolate': (210, 105, 30),
    'coral': (255, 127, 80), 'cornflowerblue': (100, 149, 237), 'cornsilk': (255, 248, 220),
    'crimson': (220, 20, 60), 'cyan': (0, 255, 255), 'darkblue': (0, 0, 139), 'darkcyan': (0, 139, 139),
    'darkgoldenrod': (184, 134, 11), 'darkgray': (169, 169, 169), 'darkgrey': (169, 169, 169),
    'darkgreen': (0, 100, 0), 'darkkhaki': (189, 183, 107), 'darkmagenta': (139, 0, 139),
    'darkolivegreen': (85, 107, 47), 'darkorange': (255, 140, 0), 'darkorchid': (153, 50, 204),
    'darkred': (139, 0, 0), 'darksalmon': (233, 150, 122), 'darkseagreen': (143, 188, 143),
    'darkslateblue': (72, 61, 139), 'darkslategray': (47, 79, 79), 'darkslategrey': (47, 79, 79),
    'darkturquoise': (0, 206, 209), 'darkviolet': (148, 0, 211), 'deeppink': (255, 20, 147),
    'deepskyblue': (0, 191, 255), 'dimgray': (105, 105, 105), 'dimgrey': (105, 105, 105),
    'dodgerblue': (30, 144, 255), 'firebrick': (178, 34, 34), 'floralwhite': (255, 250, 240),
    'forestgreen': (34, 139, 34), 'fuchsia': (255, 0, 255), 'gainsboro': (220, 220, 220),
    'ghostwhite': (248, 248, 255), 'gold': (255, 215, 0), 'goldenrod': (218, 165, 32),
    'gray': (128, 128, 128), 'grey': (128, 128, 128), 'green': (0, 128, 0), 'greenyellow': (173, 255, 47),
    'honeydew': (240, 255, 240), 'hotpink': (255, 105, 180), 'indianred': (205, 92, 92),
    'indigo': (75, 0, 130), 'ivory': (255, 255, 240), 'khaki': (240, 230, 140), 'lavender': (230, 230, 250),
    'lavenderblush': (255, 240, 245), 'lawngreen': (124, 252, 0), 'lemonchiffon': (255, 250, 205),
    'lightblue': (173, 216, 230), 'lightcoral': (240, 128, 128), 'lightcyan': (224, 255, 255),
    'lightgoldenrodyellow': (250, 250, 210), 'lightgray': (211, 211, 211), 'lightgrey': (211, 211, 211),
    'lightgreen': (144, 238, 144), 'lightpink': (255, 182, 193), 'lightsalmon': (255, 160, 122),
    'lightseagreen': (32, 178, 170), 'lightskyblue': (135, 206, 250), 'lightslategray': (119, 136, 153),
    'lightslategrey': (119, 136, 153), 'lightsteelblue': (176, 196, 222), 'lightyellow': (255, 255, 224),
    'lime': (0, 255, 0), 'limegreen': (50, 205, 50), 'linen': (250, 240, 230), 'magenta': (255, 0, 255),
    'maroon': (128, 0, 0), 'mediumaquamarine': (102, 205, 170), 'mediumblue': (0, 0, 205),
    'mediumorchid': (186, 85, 211), 'mediumpurple': (147, 112, 219), 'mediumseagreen': (60, 179, 113),
    'mediumslateblue': (123, 104, 238), 'mediumspringgreen': (0, 250, 154),
    'mediumturquoise': (72, 209, 204), 'mediumvioletred': (199, 21, 133), 'midnightblue': (25, 25, 112),
    'mintcream': (245, 255, 250), 'mistyrose': (255, 228, 225), 'moccasin': (255, 228, 181),
    'navajowhite': (255, 222, 173), 'navy': (0, 0, 128), 'navyblue': (0, 0, 128),
    'oldlace': (253, 245, 230), 'olive': (128, 128, 0), 'olivedrab': (107, 142, 35),
    'orange': (255, 165, 0), 'orangered': (255, 69, 0), 'orchid': (218, 112, 214),
    'palegoldenrod': (238, 232, 170), 'palegreen': (152, 251, 152), 'paleturquoise': (175, 238, 238),
    'palevioletred': (219, 112, 147), 'papayawhip': (255, 239, 213), 'peachpuff': (255, 218, 185),
    'peru': (205, 133, 63), 'pink': (255, 192, 203), 'plum': (221, 160, 221), 'powderblue': (176, 224, 230),
    'purple': (128, 0, 128), 'rebeccapurple': (102, 51, 153), 'red': (255, 0, 0),
    'rosybrown': (188, 143, 143), 'royalblue': (65, 105, 225), 'saddlebrown': (139, 69, 19),
    'salmon': (250, 128, 114), 'sandybrown': (244, 164, 96), 'seagreen': (46, 139, 87),
    'seashell': (255, 245, 238), 'sienna': (160, 82, 45), 'silver': (192, 192, 192),
    'skyblue': (135, 206, 235), 'slateblue': (106, 90, 205), 'slategray': (112, 128, 144),
    'slategrey': (112, 128, 144), 'snow': (255, 250, 250), 'springgreen': (0, 255, 127),
    'steelblue': (70, 130, 180), 'tan': (210, 180, 140), 'teal': (0, 128, 128), 'thistle': (216, 191, 216),
    'tomato': (255, 99, 71), 'turquoise': (64, 224, 208), 'violet': (238, 130, 238),
    'wheat': (245, 222, 179), 'white': (255, 255, 255), 'whitesmoke': (245, 245, 245),
    'yellow': (255, 255, 0), 'yellowgreen': (154, 205, 50)
}

By combining distance_to_viewer and HTML_web_color_name, you can draw a very coarse image of the composition. For example, if the LLM works well, "a green bottle in front of a red bottle on a wood table in a dark room" should make it possible for you to compute an image like:

You can use this image as an initial latent and use denoise strength like 0.95 to 0.99 to generate the image.

Or if you do not like this and still prefer to let diffusion models to generate from zero-mean (even when you know that most diffusion models have tsnr problems), you can ignore this image and or just use this image as a debugger.

Besides, the layer sorting can also be useful in some very special attention formulation - we will discuss this later.

Parameter: tags and atmosphere and style and quality_meta

The tags is designed as a possible replacement for the description since many diffusion models prefer tags. If used with anime models, one may hard code some logics to replace all "girl" to "1girl". If used with Pony then probably always hard code adding "score_9, score_8 ..." to this.

The atmosphere and style and quality_meta are some experimental parameters without very specific use cases. Current we can just treat them as sub-prompts and involve them in the greedy merge of sub-prompt bags. This in my experiments will improve the atmosphere and quality a bit.

A Baseline Renderer

In this repo, we provide a baseline render for Omost LLMs based on attention manipulation.

Regional Prompter

As of 2024, if we want to achieve a region guided diffusion system, some possible options are:

multi-diffusion / mixture-of-diffusers: these method run UNet on different locations, and then merge the estimated epsilon or x0 using weights or masks for different regions.
attention decomposition: lets say attention is like y=softmax(q@k)@v, then one can achieve attention decomposition like y=mask_A * softmax(q@k_A)@v_A + mask_B * softmax(q@k_B)@v_B where mask_A, k_A, v_A are masks, k, v for region A; mask_B, k_B, v_B are masks, k, v for region B. This method usually yields image quality a bit better than (1) and some people call it Attention Couple or Region Prompter Attention Mode. But this method has a consideration: the mask only makes regional attention numerically possible, but it does not force the UNet to really attend its activations in those regions. That is to say, the attention is indeed masked, but there is no promise that the attention softmax will really be activated in the masked area, and there is also no promise that the attention softmax will never be activated outside the masked area.
attention score manipulation: this is a more advanced method compared to (2). It directly manipulates the attention scores to make sure that the activations in mask each area are encouraged and those outside the masks are discouraged. The formulation is like y=softmax(modify(q@k))@v where modify() is a complicated non-linear function with many normalizations and tricks to change the score's distributions. This method goes beyond a simple masked attention to really make sure that those layers get wanted activations. A typical example is Dense Diffusion.
gradient optimization: since the attention can tell us where each part is corresponding to what prompts, we can split prompts into segments and then get attention activations to each prompt segment. Then we compare those activations with external masks to compute a loss function, and back propagate the gradients. Those methods are usually very high quality but VRAM hungry and very slow. Typical methods are BoxDiff and Attend-and-Excite.
Use external control models like gligen and InstanceDiffusion. Those methods give the highest benchmark performance on region following but will also introduce some style offset to the base model since they are trained parameters. Also, those methods need to convert prompts to vectors and usually do not support prompts of arbitary length (but one can use them together with other attention methods to achieve arbitrary length).
Some more possible layer options like layerdiffuse and mulan.

In this repo I wrote a baseline formulation based on (3). I consider this parameter-free formulation as a very standard baseline implementation that will almost introduce zero style offsets or quality degradation. In the future we may consider training some parametrized methods for Omost.

Lets consider an extremely simplified image with only 2*2=4 pixels:

Then we have three prompts "two cats", "a black cat", "a white cat", and we have their masks:

Then we can draw this attention score table:

where the upper arrow mean that we want to encourage the activation, while the lower arrow means we want to get rid of those activation.

This manipulation directly modify attention scores and compute all prompts conditions in one single SDP attention pass. (See also the codes for more details.)

Prompt Prefix Tree

In this repo, I also included another trick that I find out to improve prompt understanding a lot. Lets call it a Prompt Prefix Tree. The motivation is that, since now that all our prompts are sub-prompts that can be merged arbitrarily (recall that all sub-prompts are strictly less than 75 tokens and typically less than 40 tokens, describe independent concepts, and can be arbitrarily merged as common prompts for clip to encode), finding a better method to merge those sub-prompts may improve the results and prompt interpretation.

For example below is a tree structure of global/local overall/detailed descriptions.

The idea is that, since all sub-prompts can be merged arbitrarily, we can use the paths in this tree graph as prompts.

For example the below path will give a prompt "A cat and a dog. The cat on sofa."

Note that we can use this together with greedy subprompt bag merging when a path exceed 75 tokens. And, if a path has remaining place to contain more subprompts, the greedy subprompt bag merging will also take care of it. And again, since all sub prompts describe independent concepts, the greedy subprompt bag merging never makes semantic truncation mistakes. So satisfying!

Model Notes

Currently, we provide 3 models (you can get them by adding the prefix https://huggingface.co/lllyasviel/ to the below names):

omost-llama-3-8b
omost-dolphin-2.9-llama3-8b
omost-phi-3-mini-128k

And their quant versions:

omost-llama-3-8b-4bits
omost-dolphin-2.9-llama3-8b-4bits
omost-phi-3-mini-128k-8bits

Some notes:

The recommended quant for omost-llama-3-8b is 4bits, and for omost-phi-3-mini-128k (3.8B) is 8 bits. They all fit in 8GB VRAM without offloads. The performance degradation caused by quant is very minimal and I personally never observed any evidences of degradation. However, quant omost-phi-3-mini-128k into 4 bits is not recommended since I noticed some obvious performance degradation. The 4bit inference of omost-phi-3-mini-128k should be viewed as a last method in extreme cases when you really do not have more capable GPUs.
My user study shows that omost-llama-3-8b-4bits > omost-dolphin-2.9-llama3-8b-4bits > omost-phi-3-mini-128k-8bits. So in most cases one should just use omost-llama-3-8b-4bits.
The omost-llama-3-8b and omost-phi-3-mini-128k are trained with filtered safe data without NSFW or inappropriate contents. See (4) if you need a different option.
The omost-dolphin-2.9-llama3-8b is trained with all data WITHOUT any filtering. You must apply your own safety alignment methods if you expose any service of omost-dolphin-2.9-llama3-8b to public.
Note that the filtering in (3) is not because of any policy - the reason is that I noticed slight instability in training gradients in those models since they are pretrained with instruct following regulated by safety alignment, causing the performance to degrade a bit. But the instruct following of omost-dolphin-2.9-llama3-8b is pretrained with community efforts and do not have this problem.
The 128k context length of omost-phi-3-mini-128k cannot be trusted. The performance of it will degrade a lot after the tokens reach about 8k. One should just view it as a model with about 8k content length.
A model of 8k context length can do about 5 to 6 rounds of conversational editing. If you are about to run out of token lengths, use the UI to modify your message and respond again (this can be done with infinite times).
All models are fully trained with our H100 clusters at precision fp16 without any tricks like quant or Q-LoRA etc. The optimizer is Adam without any tricks.
You must also follow the licenses of Llama-3 and Phi-3.
You can request us to train on other LLMs if reasonable and necessary.

Cite

@Misc{omost,
  author = {Omost Team},
  title  = {Omost GitHub Page},
  year   = {2024},
}

Related Work

omost's People

Contributors

Stargazers

Watchers

Forkers

shinshin86 painebenjamin ototao huchenlei conglesolutionx randomgituser321 camenduru veryvanya crazymonkey42 kemolo gzxmren yuan-manx lludlow nyaa-kobayashi liunix61 phanhavn gilinfen liaoque chenpython kodxana peanutcocktail kustomzone mexicanamerican runnitai chenxwh princetrunks clardemasol luxeaveforks shaneholloman zfbok lycsqq galliot yomaser bogushito jimchan3301 xiyuefox shanchxxx jumanjimi jit-testing-aviyam fullstackdeveloperparvez taeyoonkwon hadryan stupidgiantmonster josbert1 viehoang68 rejalkarim66666 sandireddy4444 hoohaa8384 naveen22338 rezabehnoud ebinum83 ahmed-alhameedawi sarnaz1304 rohanrusta21 lightforgemedia johnny-rice rsl8 mohammadrezasoraya ehsansoraya mojowebs techthiyanes acmecorp-dev1 soon14 linecode qiannanyou tengjunhe l3rnds hubin858130 yongsuifeng ezhangle misterypoem zcfrank1st jahnavimalaga panrein-dev yellowbee686 kaur-d23 atluslin melo0806a aycholpon tuannda1986 laozhuang727 godomus27 d-rubyiv jmwdpk martin-fabbri clarysf minmin2411 ysfyf alxsbr2411 joshdayax kotamadelin ysfadlaa goasilon goaclement alephdungeon laywookbarat pangpondz yuanxiaoming8899 yangbinb cybernobie

omost's Issues

There is no image generation button on the left side, how to solve this ah, have you ever encountered!

After the text description on the right is finished, the image generation button is not found on the left, how to solve this, 30 series graphics cards and 20 series graphics cards have tried, after the right generation, there is no image generation button on the left.

Training another model

Hello! Thank you for your work!
Could you please produce omost training for phi-3-medium or provide code and dataset for self-training Q-Lora?

Could you use this to IMPROVE the inpaint+lama (outpaint) workflow IILYASVIEL?

It's big time we get a powerful improved outpaining (or inpainting on the edges) workflow!
Perhaps with Omost.

Prompt prefix

would be nice to be able to set a prefix for models like pony that need score_9, score_8_up, score_7_up, score_6_up,

whenever i am running the program i get this error

(Omost) C:\Users\aj214\Desktop\Omost>python gradio_app.py
C:\Users\aj214\Desktop\Omost\lib_omost\pipeline.py:64: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
alphas_cumprod = torch.tensor(np.cumprod(alphas, axis=0), dtype=torch.float32)
Unload to CPU: CLIPTextModel
Unload to CPU: UNet2DConditionModel
Unload to CPU: AutoencoderKL
Unload to CPU: CLIPTextModel
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Traceback (most recent call last):
File "C:\Users\aj214\Desktop\Omost\gradio_app.py", line 75, in
llm_model = AutoModelForCausalLM.from_pretrained(
File "C:\Users\aj214.conda\envs\Omost\lib\site-packages\transformers\models\auto\auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "C:\Users\aj214.conda\envs\Omost\lib\site-packages\transformers\modeling_utils.py", line 3703, in from_pretrained
hf_quantizer.validate_environment(device_map=device_map)
File "C:\Users\aj214.conda\envs\Omost\lib\site-packages\transformers\quantizers\quantizer_bnb_8bit.py", line 86, in validate_environment
raise ValueError(
ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

i am using nvidia rtx 2050 4 GB and 8 GB shared GPU

Issues with multiple rounds chat of model "omost-phi-3-mini-128k"

good idea and project!
but I test on "omost-phi-3-mini-128k", and "SG161222/RealVisXL_V4.0", env as below:
windows 11 x64, Python 3.11.9, torch 2.3.0 + cu121, 24GB GPU and other recommand requirements.
single round prompt is correctly, but second round chat base our's sample, LLM append some error generate as below:

click "New Chat", only play sample prompt, the error occur again as below:

the command windows's log as below:

D:\AITest\Omost\lib_omost\pipeline.py:64: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
alphas_cumprod = torch.tensor(np.cumprod(alphas, axis=0), dtype=torch.float32)
configuration_phi3.py: 10.4kB [00:00, 1.04MB/s]
A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:

configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_phi3.py: 73.8kB [00:00, 1.76MB/s]
A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
flash-attention package not found, consider installing for better performance: No module named 'flash_attn'.
Current flash-attenton does not support window_size. Either upgrade or use attn_implementation='eager'.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.44s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Running on local URL: http://127.0.0.1:18888

To create a public link, set share=True in launch().
Chat begin: generate an image of the fierce battle of warriors and the dragon
Chat end at 161.23 seconds: generate an image of the fierce battle of warriors and the dragon
100%|████████████████████████████████████████████████████████████████████████████████| 25/25 [01:01<00:00, 2.47s/it]
Chat begin: change the dragon to a dinosaur
Chat end at 361.10 seconds: change the dragon to a dinosaur
Last assistant response is not valid canvas: Response does not contain codes!

Pls help me to analyze the reasons for the confusion described above, thanks a lot.

Is there a technical report coming?

Hi, thanks for your awesome work here. May I know if there is a technical report or preprint paper to be released? If yes, what's the time for that?

2 gpu error

There are two GPU (3060 12Gb, 3090 24 Gb), i got errors:

-RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

-Last assistant response is not valid canvas: expected string or bytes-like object, got 'NoneType'

can i specify gpu to utilize?

replaced LLM by MoE model

To expand, replaced LLM by MoE model, and each individual small model corresponds to a different art style. When drawing each time, different units and weights are called to combine different styles, which can not only reduce the performance requirements of LLM, but also increase more varied styles.

AMD support

Is AMD/ROCm support planned?

Can be used with LoRA

I tried Omost and I think it's one of the best there is today.

Is there a possibility that this can be used in ForgeUI? Or Fooocus?
And can be added with LoRA(s)?

It is not powerful enough? COULD NOT GENERATE a "Floating building (Mosque) in the sky"

I tried this prompt:
Generate an image of a majestic mosque in the sky
Then I rendered the image, it was a mosque on the ground (with a river or lake next to it)
I then used this prompt:
make the mosque IN THE SKY floating

Then I asked it to generate the image again, and it was still a mosque on the ground, next to a lake (it changed slightly)

This is part of the generated text:

detailed_descriptions=[
'The central feature of this image is a majestic mosque, floating in the sky.',
'The mosque is an architectural marvel, with towering minarets and intricately designed domes.',
'The walls of the mosque are adorned with ornate patterns and calligraphy, reflecting Islamic artistry.',
'The mosque is bathed in golden sunlight, which enhances its beauty and gives it a divine appearance.',
],

I could not capture this "floating" concept.
Unfortunately.

Is this project related to other installations, or it should be installed as a standalone?

Hello
I saw the showcase, it shows a UI with some things about prompts being modified I suppose
I would like to know if:

I need to install LLMs from other UIs/programs?
I need to connect it to other programs, such as Forge?
Or I can simply the instructions from this repo without having to worry about any other installation, and in the end I will be able to run a new UI that has this LLM omost feature?

In that case, isn't it better to connect it to forge? I am confused.

Fooocus/Forge integration

Please integrate this into fooocus/forge

8GB Nvidia VRAM，torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB. GPU

2070S-8G，

Unload to CPU: CLIPTextModel Load to GPU: UNet2DConditionModel 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:19<00:00, 1.05it/s] Unload to CPU: UNet2DConditionModel Load to GPU: AutoencoderKL Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\queueing.py", line 528, in process_events response = await route_utils.call_process_api( File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api output = await app.get_blocks().process_api( File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\blocks.py", line 1908, in process_api result = await self.call_function( File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\blocks.py", line 1485, in call_function prediction = await anyio.to_thread.run_sync( File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run result = context.run(func, *args) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 808, in wrapper response = f(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\AI\Omost\Omost\gradio_app.py", line 235, in diffusion_fn pixels = vae.decode(latents).sample File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper return method(self, *args, **kwargs) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl.py", line 303, in decode decoded = self._decode(z, return_dict=False)[0] File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\diffusers\models\autoencoders\autoencoder_kl.py", line 276, in _decode dec = self.decoder(z) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\diffusers\models\autoencoders\vae.py", line 337, in forward sample = up_block(sample, latent_embeds) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\diffusers\models\unets\unet_2d_blocks.py", line 2750, in forward hidden_states = upsampler(hidden_states) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\diffusers\models\upsampling.py", line 180, in forward hidden_states = self.conv(hidden_states) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward return self._conv_forward(input, self.weight, self.bias) File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward return F.conv2d(input, weight, bias, self.stride, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB. GPU

Error when starting gradio_app.py

Hello, i tried to start the app on my windows PC, but it seems my Nvidia is not good enough (RTX 2060 6GB), so i saw there was a 8GB requirement, but i thought that there would be a way to change to a smaller model like omost-phi-3-mini-128k, or even omost-llama-3-8b-4bits. But i don't see a way to use them is there an option on the command line to add? And would that solve my issue ?

Thank you

(omost) C:\Users\xxx\Documents\Omost>python gradio_app.py tokenizer/tokenizer_config.json: 100%|█████████████████████████████████████████████████| 737/737 [00:00<00:00, 768kB/s] D:\Anaconda\envs\omost\lib\site-packages\huggingface_hub\file_download.py:157: UserWarning: huggingface_hubcache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\xxx\Documents\Omost\hf_download\hub\models--SG161222--RealVisXL_V4.0. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting theHF_HUB_DISABLE_SYMLINKS_WARNINGenvironment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message) tokenizer/vocab.json: 100%|███████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 1.32MB/s] tokenizer/merges.txt: 100%|██████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 988kB/s] tokenizer/special_tokens_map.json: 100%|██████████████████████████████████████████████████████| 472/472 [00:00<?, ?B/s] tokenizer_2/tokenizer_config.json: 100%|██████████████████████████████████████████████████████| 725/725 [00:00<?, ?B/s] tokenizer_2/vocab.json: 100%|█████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 1.43MB/s] tokenizer_2/merges.txt: 100%|████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 927kB/s] tokenizer_2/special_tokens_map.json: 100%|████████████████████████████████████████████████████| 460/460 [00:00<?, ?B/s] text_encoder/config.json: 100%|███████████████████████████████████████████████████████████████| 560/560 [00:00<?, ?B/s] model.fp16.safetensors: 100%|███████████████████████████████████████████████████████| 246M/246M [00:30<00:00, 8.04MB/s] text_encoder_2/config.json: 100%|█████████████████████████████████████████████████████████████| 570/570 [00:00<?, ?B/s] model.fp16.safetensors: 100%|█████████████████████████████████████████████████████| 1.39G/1.39G [02:35<00:00, 8.95MB/s] vae/config.json: 100%|█████████████████████████████████████████████████████████████████| 602/602 [00:00<00:00, 591kB/s] diffusion_pytorch_model.fp16.safetensors: 100%|█████████████████████████████████████| 167M/167M [00:21<00:00, 7.89MB/s] unet/config.json: 100%|███████████████████████████████████████████████████████████████████| 1.68k/1.68k [00:00<?, ?B/s] diffusion_pytorch_model.fp16.safetensors: 100%|███████████████████████████████████| 5.14G/5.14G [10:34<00:00, 8.09MB/s] C:\Users\xxx\Documents\Omost\lib_omost\pipeline.py:64: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). alphas_cumprod = torch.tensor(np.cumprod(alphas, axis=0), dtype=torch.float32) Unload to CPU: CLIPTextModel Unload to CPU: UNet2DConditionModel Unload to CPU: AutoencoderKL Unload to CPU: CLIPTextModel config.json: 100%|████████████████████████████████████████████████████████████████| 1.20k/1.20k [00:00<00:00, 1.23MB/s] D:\Anaconda\envs\omost\lib\site-packages\huggingface_hub\file_download.py:157: UserWarning:huggingface_hubcache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\xxx\Documents\Omost\hf_download\hub\models--lllyasviel--omost-llama-3-8b-4bits. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting theHF_HUB_DISABLE_SYMLINKS_WARNINGenvironment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message) Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. model.safetensors.index.json: 100%|██████████████████████████████████████████████████| 132k/132k [00:00<00:00, 804kB/s] model-00001-of-00002.safetensors: 100%|███████████████████████████████████████████| 4.65G/4.65G [07:54<00:00, 9.81MB/s] model-00002-of-00002.safetensors: 100%|███████████████████████████████████████████| 1.05G/1.05G [01:55<00:00, 9.13MB/s] Downloading shards: 100%|███████████████████████████████████████████████████████████████| 2/2 [09:50<00:00, 295.08s/it] Traceback (most recent call last): File "C:\Users\xxx\Documents\Omost\gradio_app.py", line 75, in <module> llm_model = AutoModelForCausalLM.from_pretrained( File "D:\Anaconda\envs\omost\lib\site-packages\transformers\models\auto\auto_factory.py", line 563, in from_pretrained return model_class.from_pretrained( File "D:\Anaconda\envs\omost\lib\site-packages\transformers\modeling_utils.py", line 3703, in from_pretrained hf_quantizer.validate_environment(device_map=device_map) File "D:\Anaconda\envs\omost\lib\site-packages\transformers\quantizers\quantizer_bnb_4bit.py", line 85, in validate_environment raise ValueError( ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to setload_in_8bit_fp32_cpu_offload=Trueand pass a customdevice_maptofrom_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

how to construct the sft data？

Torch Error ?

After I run the installation and run the GradioApp I get this error. How do I resolve it?

C:\Users\admin\anaconda3\envs\omost\lib\site-packages\transformers\models\llama\modeling_llama.py:649: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
Exception in thread Thread-9 (generate):
Traceback (most recent call last):
File "C:\Users\admin\anaconda3\envs\omost\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\admin\anaconda3\envs\omost\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\admin\anaconda3\envs\omost\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\admin\anaconda3\envs\omost\lib\site-packages\transformers\generation\utils.py", line 1758, in generate
result = self._sample(
File "C:\Users\admin\anaconda3\envs\omost\lib\site-packages\transformers\generation\utils.py", line 2397, in _sample
outputs = self(
File "C:\Users\admin\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\admin\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\admin\anaconda3\envs\omost\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\admin\anaconda3\envs\omost\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1164, in forward
outputs = self.model(
File "C:\Users\admin\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\admin\anaconda3\envs\omost\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **

Thanks

ValueError: When localhost is not accessible, a shareable link must be created. Please set share=True or check your proxy settings to allow access to localhost.

(Omost_3.10.11) PS C:\Windows\system32> python C:\Colony\Omost\gradio_app.py
C:\Colony\Omost\lib_omost\pipeline.py:64: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
alphas_cumprod = torch.tensor(np.cumprod(alphas, axis=0), dtype=torch.float32)
Unload to CPU: AutoencoderKL
Unload to CPU: CLIPTextModel
Unload to CPU: CLIPTextModel
Unload to CPU: UNet2DConditionModel
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.10s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
You shouldn't move a model that is dispatched using accelerate hooks.
Unload to CPU: LlamaForCausalLM
Running on local URL: http://0.0.0.0:7860
Traceback (most recent call last):
File "C:\Colony\Omost\gradio_app.py", line 382, in
demo.queue().launch(inbrowser=True, server_name='0.0.0.0')
File "C:\ProgramData\miniconda3\envs\Omost_3.10.11\lib\site-packages\gradio\blocks.py", line 2375, in launch
raise ValueError(
ValueError: When localhost is not accessible, a shareable link must be created. Please set share=True or check your proxy settings to allow access to localhost.

[Feature] Use LoRA on specified regions

Thanks for your works. Omost is a pretty clever idea. If I understand correctly, the idea is that we have a LLM automate regional prompting. Therefore, in theory, we can ask the chatbot to use LoRA on specified regions (e.g. ao dai LoRA on Vietnamese person region). Can you add this feature if possible?

Omost use other llms api

当前情况：单卡3090，生成提示词3分钟，出图15秒
盲猜本地llama3-8b表现得速度太慢，是否支持openai这种标准接口？期待回复，再一次感谢作者团队的伟大作品！

Compatibility issues with other models

Are omost supported for use with other models? Such as controlnet, etc
Or is it that omost's image generation method only supports its own internal image generation and does not support other model to join?

Is there a version of comfyui?

i get the error when running the gradio_app.py

output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (332x4096 and 1x8388608)

Computer freezes after some time

Everything runs smoothly (if slowly) for 20-30 minutes before my computer freezes and I'm forced to hard restart. I'm using nvidia 4090 on Win 10.

Loading doesn't seem right.

D:\BaiduNetdiskDownload\Omost20240604\venv\lib\site-packages\transformers\utils\hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
D:\BaiduNetdiskDownload\Omost20240604\lib_omost\pipeline.py:64: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
alphas_cumprod = torch.tensor(np.cumprod(alphas, axis=0), dtype=torch.float32)
Unload to CPU: AutoencoderKL
Unload to CPU: CLIPTextModel
Unload to CPU: UNet2DConditionModel
Unload to CPU: CLIPTextModel
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Downloading shards: 100%|██████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1830.37it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:08<00:00, 4.16s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
You shouldn't move a model that is dispatched using accelerate hooks.
Unload to CPU: LlamaForCausalLM
Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch().

Tags Generate Indefinitely

For some prompts, the tags will generate indefinitely. For example, the prompt:
A school of fish with exactly 17 fish of 3 unique species.
will result in the following generation:

# Initialize the canvas
canvas = Canvas()

# Set a global description for the canvas
canvas.set_global_description(
    description='A school of fish with exactly 17 fish of 3 unique species.',
    detailed_descriptions=[
        'This image captures a mesmerizing school of fish with a precise number of 17 fish, representing three distinct species.',
        'The fish are swimming in unison, creating a dynamic and captivating scene.',
        'The first species is characterized by its vibrant blue and yellow stripes, with a distinctive fin shape.',
        'The second species has a sleek silver body with a distinctive black stripe running along its length.',
        'The third species is notable for its iridescent scales that shimmer in the light, giving it a magical appearance.',
        'The background is a clear, blue ocean, with sunlight filtering through the water, adding a sense of depth and tranquility to the image.',
        'The entire scene is framed in a way that emphasizes the unity and movement of the fish, while also highlighting their unique characteristics.',
    ],
    tags='school of fish, 17 fish, three species, blue ocean, sunlight, dynamic, captivating, vibrant, iridescent, silver, black stripe, blue 
and yellow stripes, distinctive fins, unison, depth, tranquility, unity, movement, unique characteristics, ocean, fish, marine life, aquatic, 
nature, wildlife, underwater, sunlight filtering, shimmer, magical, ocean scene, framed scene, ocean background, clear water, depth, 
oceanic, marine, aquatic scene, water, ocean life, oceanic scene, blue ocean, sunlight filtering, aquatic life, oceanic scene, oceanic, 
nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, 
magical, oceanic, aquatic, oceanic scene, oceanic, nature, wildlife, underwater, shimmer, magical, oceanic, aquatic, oceanic scene, 
oceanic
...

where the tags repeat for an indefinite amount of time. This is using the default settings for generation.

Training Data for LLM

Hi, It was mentioned in README.md that the LLMs were trained to produce the code like structure, is this some open-source dataset or was this manually created. If so what were the stats like how many samples etc.

Thanks

No text result

Hey Guys,

When launching locally, and typing my request the program looks like it is loading and generating, but it suddenly stops with the following debug error in the CLI:

Exception in thread Thread-9 (generate):
Traceback (most recent call last):
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1009, in _bootstrap_inner
self.run()
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 1758, in generate
result = self._sample(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2397, in _sample
outputs = self(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1164, in forward
outputs = self.model(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 968, in forward
layer_outputs = decoder_layer(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 713, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 649, in forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: cutlassF: no kernel found to launch!
Traceback (most recent call last):
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events
response = await route_utils.call_process_api(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api
result = await self.call_function(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper
response = await iterator.anext()
File "C:\Users\timde\Omost\chat_interface.py", line 554, in _stream_fn
first_response, first_interrupter = await async_iteration(generator)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext
return await anyio.to_thread.run_sync(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async
return next(iterator)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "C:\Users\timde\Omost\gradio_app.py", line 164, in chat_fn
for text in streamer:
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\streamers.py", line 223, in next
value = self.text_queue.get(timeout=self.timeout)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\queue.py", line 179, in get
raise Empty
_queue.Empty
Last assistant response is not valid canvas: expected string or bytes-like object
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:128001 for open-end generation.
Exception in thread Thread-10 (generate):
Traceback (most recent call last):
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1009, in _bootstrap_inner
self.run()
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 1758, in generate
result = self._sample(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2397, in _sample
outputs = self(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1164, in forward
outputs = self.model(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 968, in forward
layer_outputs = decoder_layer(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 713, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 649, in forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: cutlassF: no kernel found to launch!
Traceback (most recent call last):
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events
response = await route_utils.call_process_api(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api
result = await self.call_function(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper
response = await iterator.anext()
File "C:\Users\timde\Omost\chat_interface.py", line 554, in _stream_fn
first_response, first_interrupter = await async_iteration(generator)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext
return await anyio.to_thread.run_sync(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async
return next(iterator)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "C:\Users\timde\Omost\gradio_app.py", line 164, in chat_fn
for text in streamer:
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\streamers.py", line 223, in next
value = self.text_queue.get(timeout=self.timeout)
File "C:\Users\timde\AppData\Local\Programs\Python\Python310\lib\queue.py", line 179, in get
raise Empty
_queue.Empty
Last assistant response is not valid canvas: expected string or bytes-like object

Anyone any suggestions on how to fix this issue when running Omost locally?
Thanks in advance.

Could not install it ERROR: "ImportError: cannot import name 'Undefined' from 'pydantic.fields' "

Hello, I could not even finish the installation and I got this:

....
from fastapi import params
File "C:\Users\FFFF\Desktop\Forge\stable-diffusion-webui-forge\venv\lib\site-packages\fastapi\params.py", line 4, in
from pydantic.fields import FieldInfo, Undefined
ImportError: cannot import name 'Undefined' from 'pydantic.fields' (C:\Users\FFF\Desktop\Forge\stable-diffusion-webui-forge\venv\lib\site-packages\pydantic\fields.py)

What's wrong?

IP-Adapter support?

Would be cool to have support for IP-Adapter added. As the pipeline inherits from StableDiffusionXLImg2ImgPipeline which supports ip-adapter, I assume it should be quite simple to add support for ip-adapter? ...but give the custom implementation of the sampling process, I am not so sure, perhaps something like this?

if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
    image_embeds = self.prepare_ip_adapter_image_embeds(
        ip_adapter_image,
        ip_adapter_image_embeds,
        device,
        batch_size * num_images_per_prompt,
        self.do_classifier_free_guidance,
    )

if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
    added_cond_kwargs["image_embeds"] = image_embeds

still can't do 'a horse is riding an astronaut'

ing cached gradio-4.31.5-py3-none-any.whl.metadata (15 kB) ERROR: Could not find a version that satisfies the requirement bitsandbytes==0.43.1 (from versions: 0.31.8, 0.32.0, 0.32.1, 0.32.2, 0.32.3, 0.33.0, 0.33.1, 0.34.0, 0.35.0, 0.35.1, 0.35.2, 0.35.3, 0.35.4, 0.36.0, 0.36.0.post1, 0.36.0.post2, 0.37.0, 0.37.1, 0.37.2, 0.38.0, 0.38.0.post1, 0.38.0.post2, 0.38.1, 0.39.0, 0.39.1, 0.40.0, 0.40.0.post1, 0.40.0.post2, 0.40.0.post3, 0.40.0.post4, 0.40.1, 0.40.1.post1, 0.40.2, 0.41.0, 0.41.1, 0.41.2, 0.41.2.post1, 0.41.2.post2, 0.41.3, 0.41.3.post1, 0.41.3.post2, 0.42.0) ERROR: No matching distribution found for bitsandbytes==0.43.1

I get this when I try to install it

No image generation button on the left, how to solve

2080ti graphics card, after the right side of the text is generated, the left side does not use the canvas generation button, it is very strange, please ask how to solve the problem

请问模型放在哪个目录下的

请问有哪些模型，分别放在哪个目录下的呢。遇到网络不好的时候，自动下载模型总是会断开。希望能给一个模型下载下载地址的列表以及模型放在哪个位置，以方便网络不通畅的童鞋部署。万分感谢

Without gradio?

Thank you for your excellent work. Could you provide a version without gradio? The current version is somewhat unfriendly to developers.

Adding related work on LLM-assisted generation and editing

Hi, thanks for your contribution to this awesome work! We also study on this topic to leverage LLM planning to generate images with complex scene. The paper is here Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following (CVPR 2024) with open-sourced code . Furthermore, we take more efforts to use it for modifying the generated image with progressive adjustments or chatting-based editing (also discussed in this repo). We hope our exploration could help to further develop the editing function in this repository, e.g. maintaining ID-consistency, understanding complex editing instructions with LLM.

Our framework is illustrated as follows:

Error on image generation

Execution stops after clicking on image generation.

https://colab.research.google.com/drive/1BiLB96PI89MGPsAFAb--nT5QUHyiUNni?usp=sharing

Can create Cohere AYA 23 8B LLM Version?

Is it possible to consider to SFT the Cohere AYA 23 8B model, so that prompt can be used in 23 languages, this will let prompt more easy and let the project cover more user group.

I installed Triton 2.0.0 and Omost would not start.

I used this file
triton-2.0.0-cp310-cp310-win_amd64.whl

As far as Omost goes, it says it is the 'wrong' version of Triton
but goes without specifying the correct version?!

Trying to run locally but I`ve got an error instantly on generation

(omost) D:\Omost>python gradio_app.py
D:\Omost\lib_omost\pipeline.py:64: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
alphas_cumprod = torch.tensor(np.cumprod(alphas, axis=0), dtype=torch.float32)
Unload to CPU: CLIPTextModel
Unload to CPU: CLIPTextModel
Unload to CPU: UNet2DConditionModel
Unload to CPU: AutoencoderKL
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.31s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
You shouldn't move a model that is dispatched using accelerate hooks.
Unload to CPU: LlamaForCausalLM
Running on local URL: http://0.0.0.0:7860

[Bug] LLM halucinates and repeats same sequence nonstop

All settings are default. Chat with Hatsune Miku lying on the beach with a guitar as prompt.

LLM output:

# Initialize the canvas
canvas = Canvas()

# Set a global description for the canvas
canvas.set_global_description(
    description='Hatsune Miku lying on the beach with a guitar',
    detailed_descriptions=[
        'Hatsune Miku is depicted lying on a beautiful beach, enjoying the serene atmosphere while playing a guitar.',
        'The beach is sandy with gentle waves lapping at the shore.',
        'The sky is clear with a few fluffy clouds, and the sun is shining brightly.',
        'Hatsune Miku is dressed in her iconic turquoise twintail outfit, with her hair flowing freely.',
        'She is relaxed, with one hand behind her head and the other arm resting on the sand while holding a guitar.',
        'The ocean in the background is a stunning shade of blue, and the horizon is visible.',
        'There are a few seashells scattered around her.',
        'The overall scene is peaceful and idyllic, capturing a perfect moment of relaxation, joy, and creativity.',
    ],
    tags='Hatsune Miku, beach, turquoise twintail outfit, serene, ocean, sandy shore, clear sky, fluffy clouds, sun, relaxed, seashells, peaceful, idyllic, horizon, waves, seashells, blue ocean, joy, relaxation, sandy beach, flowing hair, iconic outfit, guitar, music, creativity, playing guitar',
    HTML_web_color_name='skyblue',
)

# Add hatsune miku lying on the beach with a guitar
canvas.add_local_description(
    location='in the center',
    offset='no offset',
    area='a medium-sized square area',
    distance_to_viewer=2.0,
    description='Hatsune Miku lying on the beach with a guitar',
    detailed_descriptions=[
        'Hatsune Miku is lying on the beach, dressed in her iconic turquoise twintail outfit.',
        'Her hair flows freely in the breeze, and she appears relaxed, with one hand behind her head and the other arm resting on the sand while holding a guitar.',
        'She is surrounded by a few scattered seashells, and her presence adds a touch of whimsy and charm to the serene beach scene.',
    ],
    tags='Hatsune Miku, turquoise twintail outfit, relaxed, seashells, flowing hair, beach, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand, breeze, hair, whimsical, charming, serene, iconic, sand

You can observe that whimsical, charming, serene, iconic, sand, breeze, hair, sequence clearly repeats nonstop.

[BUG] Attention mask and pad token not set

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_maskto obtain reliable results. Settingpad_token_idtoeos_token_id:128001 for open-end generation. C:\Users\sildenafil\.conda\envs\omost\lib\site-packages\transformers\models\llama\modeling_llama.py:649: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) attn_output = torch.nn.functional.scaled_dot_product_attention( You shouldn't move a model that is dispatched using accelerate hooks.

Runs fine it appears, but not sure what this means.

Which is better?

omost-llama-3-8b
omost-dolphin-2.9-llama3-8b
omost-phi-3-mini-128k

Which is better?

FP32 causes 8GB vram gpu out of memory error, RunDiffusion/Juggernaut-X-v10 FP16 is not supported

# SDXL
sdxl_name = RunDiffusion/Juggernaut-X-v10 #FP16 is not supported
# sdxl_name = SG161222/RealVisXL_V4.0
# sdxl_name = stabilityai/stable-diffusion-xl-base-1.0

FP32 can be downloaded, however it causes 8GB vram gpu out of memory error

about sd 1.5 or sd3 more future

Hello author, this is a cool program. Can the Vincent diagram model be sd1.5? If so, are you willing to share the code?

Pip install -r requirements.txt ...

... always issues in Torch 2.3.0 being compiled without CUDA?!

Make it work on AMD gpus

Everything is in the title thank you <3

🍊 Jupyter Notebook

Thanks for the project ❤️ I made a jupyter notebook 🥳 I hope you like it.

https://github.com/camenduru/Omost-jupyter

Last assistant response is not valid canvas: expected string or bytes-like object

Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch().
You shouldn't move a model that is dispatched using accelerate hooks.
Load to GPU: LlamaForCausalLM
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:128001 for open-end generation.
Exception in thread Thread-11 (generate):
Traceback (most recent call last):
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in bootstrap_inner
self.run()
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self.target(*self.args, **self.kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 1758, in generate
result = self.sample(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2397, in sample
outputs = self(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module.old_forward(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1164, in forward
outputs = self.model(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module.old_forward(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 940, in forward
causal_mask = self.update_causal_mask(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1061, in update_causal_mask
causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
Traceback (most recent call last):
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events
response = await route_utils.call_process_api(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api
result = await self.call_function(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper
response = await iterator.anext()
File "X:\Omost\chat_interface.py", line 554, in stream_fn
first_response, first_interrupter = await async_iteration(generator)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext
return await anyio.to_thread.run_sync(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async
return next(iterator)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "X:\Omost\gradio_app.py", line 164, in chat_fn
for text in streamer:
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\streamers.py", line 223, in next
value = self.text_queue.get(timeout=self.timeout)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\queue.py", line 179, in get
raise Empty
queue.Empty
Last assistant response is not valid canvas: expected string or bytes-like object
Traceback (most recent call last):
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events
response = await route_utils.call_process_api(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api
result = await self.call_function(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper
response = await iterator.anext()
File "X:\Omost\chat_interface.py", line 554, in stream_fn
first_response, first_interrupter = await async_iteration(generator)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext
return await anyio.to_thread.run_sync(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async
return next(iterator)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "X:\Omost\gradio_app.py", line 116, in chat_fn
np.random.seed(int(seed))
File "numpy\random\mtrand.pyx", line 4806, in numpy.random.mtrand.seed
File "numpy\random\mtrand.pyx", line 250, in numpy.random.mtrand.RandomState.seed
File "mt19937.pyx", line 168, in numpy.random.mt19937.MT19937.legacy_seeding
File "mt19937.pyx", line 182, in numpy.random.mt19937.MT19937.legacy_seeding
ValueError: Seed must be between 0 and 2**32 - 1
Last assistant response is not valid canvas: expected string or bytes-like object
Traceback (most recent call last):
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events
response = await route_utils.call_process_api(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api
result = await self.call_function(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper
response = await iterator.anext()
File "X:\Omost\chat_interface.py", line 554, in stream_fn
first_response, first_interrupter = await async_iteration(generator)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext
return await anyio.to_thread.run_sync(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async
return next(iterator)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "X:\Omost\gradio_app.py", line 116, in chat_fn
np.random.seed(int(seed))
File "numpy\random\mtrand.pyx", line 4806, in numpy.random.mtrand.seed
File "numpy\random\mtrand.pyx", line 250, in numpy.random.mtrand.RandomState.seed
File "mt19937.pyx", line 168, in numpy.random.mt19937.MT19937.legacy_seeding
File "mt19937.pyx", line 182, in numpy.random.mt19937.MT19937.legacy_seeding
ValueError: Seed must be between 0 and 2**32 - 1
Last assistant response is not valid canvas: expected string or bytes-like object
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:128001 for open-end generation.
Exception in thread Thread-12 (generate):
Traceback (most recent call last):
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in bootstrap_inner
self.run()
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self.target(*self.args, **self.kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 1758, in generate
result = self.sample(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2397, in sample
outputs = self(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module.old_forward(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1164, in forward
outputs = self.model(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module.old_forward(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 940, in forward
causal_mask = self.update_causal_mask(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1061, in update_causal_mask
causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
Traceback (most recent call last):
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events
response = await route_utils.call_process_api(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api
result = await self.call_function(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper
response = await iterator.anext()
File "X:\Omost\chat_interface.py", line 554, in stream_fn
first_response, first_interrupter = await async_iteration(generator)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext
return await anyio.to_thread.run_sync(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async
return next(iterator)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "X:\Omost\gradio_app.py", line 164, in chat_fn
for text in streamer:
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\streamers.py", line 223, in next
value = self.text_queue.get(timeout=self.timeout)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\queue.py", line 179, in get
raise Empty
queue.Empty
Last assistant response is not valid canvas: expected string or bytes-like object
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:128001 for open-end generation.
Exception in thread Thread-13 (generate):
Traceback (most recent call last):
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in bootstrap_inner
self.run()
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self.target(*self.args, **self.kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 1758, in generate
result = self.sample(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2397, in sample
outputs = self(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module.old_forward(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1164, in forward
outputs = self.model(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module.old_forward(*args, **kwargs)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 940, in forward
causal_mask = self.update_causal_mask(
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1061, in update_causal_mask
causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
Traceback (most recent call last):
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events
response = await route_utils.call_process_api(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api
result = await self.call_function(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper
response = await iterator.anext()
File "X:\Omost\chat_interface.py", line 554, in stream_fn
first_response, first_interrupter = await async_iteration(generator)
File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext
return await anyio.to_thread.run_sync(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async
return next(iterator)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "X:\Omost\gradio_app.py", line 164, in chat_fn
for text in streamer:
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\streamers.py", line 223, in next
value = self.text_queue.get(timeout=self.timeout)
File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\queue.py", line 179, in get
raise Empty
_queue.Empty
Last assistant response is not valid canvas: expected string or bytes-like object

lllyasviel / omost Goto Github PK

omost's Introduction

Omost

Get Started

Screenshots

Example Transcripts (with Conversational Editing)

User

Assistant

User

Assistant

Symbols

Function: Canvas.set_global_description and Canvas.add_local_description

Parameter: description and detailed_descriptions

Parameter: location, offset, area

Parameter: distance_to_viewer and HTML_web_color_name

Parameter: tags and atmosphere and style and quality_meta

A Baseline Renderer

Regional Prompter

Prompt Prefix Tree

Model Notes

Cite

Related Work

omost's People

Contributors

Stargazers

Watchers

Forkers

omost's Issues

Recommend Projects

Recommend Topics

Recommend Org