Giter Site home page Giter Site logo

starlightvision's Introduction

Multi-Modality

๐ŸŒŒ Starlight Vision ๐Ÿš€

Starlight ๐Ÿช Starlight Vision is a powerful multi-modal AI model designed to generate high-quality novel videos using text, images, or video clips as input. By leveraging state-of-the-art deep learning techniques, it can synthesize realistic and visually impressive video content that can be used in a variety of applications, such as movie production, advertising, virtual reality, and more. ๐ŸŽฅ

๐ŸŒŸ Features

  • ๐Ÿ“ Generate videos from text descriptions
  • ๐ŸŒƒ Convert images into video sequences
  • ๐Ÿ“ผ Extend existing video clips with novel content
  • ๐Ÿ”ฎ High-quality output with customizable resolution
  • ๐Ÿง  Easy to use API for quick integration

๐Ÿ“ฆ Installation

To install Starlight Vision, simply use pip:

pip install starlight-vision

๐ŸŽฌ Quick Start

After we train you can install Starlight Vision and can start generating videos using the following code:

import torch
from starlight_vision import Unet3D, ElucidatedStarlight, StarlightTrainer

unet1 = Unet3D(dim = 64, dim_mults = (1, 2, 4, 8)).cuda()

unet2 = Unet3D(dim = 64, dim_mults = (1, 2, 4, 8)).cuda()

# elucidated starlight, which contains the unets above (base unet and super resoluting ones)

starlight = ElucidatedStarlight(
    unets = (unet1, unet2),
    image_sizes = (16, 32),
    random_crop_sizes = (None, 16),
    temporal_downsample_factor = (2, 1),        # in this example, the first unet would receive the video temporally downsampled by 2x
    num_sample_steps = 10,
    cond_drop_prob = 0.1,
    sigma_min = 0.002,                          # min noise level
    sigma_max = (80, 160),                      # max noise level, double the max noise level for upsampler
    sigma_data = 0.5,                           # standard deviation of data distribution
    rho = 7,                                    # controls the sampling schedule
    P_mean = -1.2,                              # mean of log-normal distribution from which noise is drawn for training
    P_std = 1.2,                                # standard deviation of log-normal distribution from which noise is drawn for training
    S_churn = 80,                               # parameters for stochastic sampling - depends on dataset, Table 5 in apper
    S_tmin = 0.05,
    S_tmax = 50,
    S_noise = 1.003,
).cuda()

texts = [
    'a whale breaching from afar',
    'young girl blowing out candles on her birthday cake',
    'fireworks with blue and green sparkles',
    'dust motes swirling in the morning sunshine on the windowsill'
]

videos = torch.randn(4, 3, 10, 32, 32).cuda() # (batch, channels, time / video frames, height, width)

# feed images into starlight, training each unet in the cascade
# for this example, only training unet 1

trainer = StarlightTrainer(starlight)

# you can also ignore time when training on video initially, shown to improve results in video-ddpm paper. eventually will make the 3d unet trainable with either images or video. research shows it is essential (with current data regimes) to train first on text-to-image. probably won't be true in another decade. all big data becomes small data

trainer(videos, texts = texts, unet_number = 1, ignore_time = False)
trainer.update(unet_number = 1)

videos = trainer.sample(texts = texts, video_frames = 20) # extrapolating to 20 frames from training on 10 frames

videos.shape # (4, 3, 20, 32, 32)

๐Ÿค Contributing

We welcome contributions from the community! If you'd like to contribute, please follow these steps:

  1. ๐Ÿด Fork the repository on GitHub
  2. ๐ŸŒฑ Create a new branch for your feature or bugfix
  3. ๐Ÿ“ Commit your changes and push the branch to your fork
  4. ๐Ÿš€ Create a pull request and describe your changes

๐Ÿ“„ License

Starlight Vision is released under the APACHE License. See the LICENSE file for more details.

๐Ÿ—บ๏ธ Roadmap

The following roadmap outlines our plans for future development and enhancements to Starlight Vision. We aim to achieve these milestones through a combination of research, development, and collaboration with the community.

๐Ÿš€ Short-term Goals

  • Improve text-to-video synthesis by incorporating advanced natural language understanding techniques
  • Train on LAION-5B and video datasets
  • Enhance the quality of generated videos through the implementation of state-of-the-art generative models
  • Optimize the model for real-time video generation on various devices, including mobile phones and edge devices
  • Develop a user-friendly web application that allows users to generate videos using Starlight Vision without any programming knowledge
  • Create comprehensive documentation and tutorials to help users get started with Starlight Vision

๐ŸŒŒ Medium-term Goals

  • Integrate advanced style transfer techniques to allow users to customize the visual style of generated videos
  • Develop a plugin for popular video editing software (e.g., Adobe Premiere, Final Cut Pro) that enables users to utilize Starlight Vision within their existing workflows
  • Enhance the model's ability to generate videos with multiple scenes and complex narratives
  • Improve the model's understanding of object interactions and physics to generate more realistic videos
  • Expand the supported input formats to include audio, 3D models, and other media types

๐ŸŒ  Long-term Goals

  • Enable users to control the generated video with more granular parameters, such as lighting, camera angles, and object placement
  • Incorporate AI-driven video editing capabilities that automatically adjust the pacing, color grading, and transitions based on user preferences
  • Develop an API for real-time video generation that can be integrated into virtual reality, augmented reality, and gaming applications
  • Investigate methods for training Starlight Vision on custom datasets to generate domain-specific videos
  • Foster a community of researchers, developers, and artists to collaborate on the continued development and exploration of Starlight Vision's capabilities

Join Agora

Agora is advancing Humanity with State of The Art AI Models like Starlight, join us and write your mark on the history books for eternity!

https://discord.gg/sbYvXgqc

๐Ÿ™Œ Acknowledgments

This project is inspired by state-of-the-art research in video synthesis, such as the Structure and Content-Guided Video Synthesis with Diffusion Models paper, and leverages the power of deep learning frameworks like PyTorch.

We would like to thank the researchers, developers, and contributors who have made this project possible. ๐Ÿ’ซ

starlightvision's People

Contributors

kyegomez avatar

Stargazers

 avatar Catherine Oborski avatar Antasann avatar  avatar  avatar Nikolaus Schlemm avatar  avatar Pedro L. Chacรญn avatar  avatar  avatar  avatar kim ji yoon avatar larra avatar Phineas avatar Junaid Afzal avatar  avatar Ghulam Jilani Raza avatar ๅ…ญๆœˆ้บฆ่Œฌ avatar Diabolo-BE avatar  avatar Jeff Carpenter avatar  avatar Derrick avatar Hongyun Qiu avatar  avatar Jose Cohenca avatar  avatar  avatar ้“็”ฒๅฐๅฎ avatar Drosophila avatar AI็ง‘ๆŠ€ avatar Wenhao Jiang avatar  avatar DS.Xu avatar likeucode avatar  avatar  avatar  avatar Sandalots avatar Chuanming avatar  avatar  avatar ็ˆฑๅฏๅฏ-็ˆฑ็”Ÿๆดป avatar Eugenio Herrera-Berg avatar Max Ku avatar Stefan Baumann avatar Dominic Rampas avatar uknowWho42 avatar  avatar CodingMan avatar BeiXiao avatar Doc avatar  avatar TheNotify avatar Hoki Limpah Wijaya avatar  avatar  avatar Shai Perednik avatar SuperMarioAI avatar  avatar  avatar Mohamed Hussein avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

starlightvision's Issues

Would there be a demo?

Would you be open to releasing a demo version of your work? It would give us a chance to explore and understand your work more thoroughly. Thank you for your consideration.

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

cannot install it.

tried to use "pip install starlight-vision" but failed. What is the requirements for this to work?

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.