Giter Site home page Giter Site logo

microsoft / genstudio Goto Github PK

View Code? Open in Web Editor NEW
192.0 16.0 35.0 7.15 MB

A website for exploring the Metropolitan Museum of Art's collection with Generative Adversarial Networks

Home Page: https://gen.studio

License: Creative Commons Attribution 4.0 International

HTML 1.62% CSS 0.65% JavaScript 43.18% Dockerfile 0.89% Python 20.79% Jupyter Notebook 11.02% SCSS 21.85%

genstudio's Introduction

Gen Studio

Gen Studio is a prototype created by collaborators from The Metropolitan Museum of Art (The Met), Microsoft, and Massachusetts Institute of Technology (MIT). This site allows users to visually explore Generative Adversarial Networks (GANs) for high resolution art creation.

Try it for yourself and create your own unique neural artworks here!

You can also learn more about the collaboration and The Met’s Open Access Program.

Goal of the Project

Our goal is to inspire others to explore the world of art and the world of AI. We hope that publishing this website will help anyone with an internet connection discover and create works of art they love. Art captures humanity's intricate structure, from great struggles and triumphs, to the beautifully commonplace. We hope that by using the GAN as a tool, you too can feel the rush of creation that has driven so many great artists and creators throughout the ages. This project was prototyped by a passionate group during a two day hackathon and brought to life by some amazing students and engineers over a month. Thank you for trying our website and please feel free to send us feedback at [email protected].

The Website

We created an online experience called Gen Studio where users can explore generated, dreamlike images created by AI. These images are created using a GAN, which can randomly sample from the space of possible artworks. We then take this trained GAN and invert it to create an algorithm capable of finding the closest match, in the GANs ‘mind’, to the real artwork in The Met. This allows us to not just create random works, but to interpolate between real artworks in the collection.

We present two simple ways of exploring the GANs underlying 140-dimensional space of art. Our first visualization lets users explore a two-dimensional slice of the vast “latent” space of the GAN. Users can move throughout this space and see how the GANs generated images change based on proximity to real images in The Met’s collection. Our second visualization gives the user precise control of how to blend different works together. We hope that users use this site to learn more about the inferred visual structure underlying The Met’s collection and to create and recombine artwork that draws from a variety of styles, materials, and forms.

Once you have found an inspiring piece you can explore related works through an immersive visual search experience. To create this experience, we used a microservice architecture of deep networks, Azure services, and blob storage. We used Visual Studio Code to develop a Flask API to serve the GAN from an Azure Kubernetes Service (AKS) cluster powered by Nvidia GPUs.

Deep Learning Techniques

Generative Adversarial Networks (GANs)

GANs are a special kind of deep network capable of modeling distributions of data. More explicitly, GANs can learn to "sample" or "create" new data that looks like an existing dataset.

In Gen Studio we use a GAN to sample from the space of art in The Metropolitan Museum of Art's Open Access Collection.

A GAN is made up of two dueling deep networks: A Generator and a Discriminator. The generator aims to create new art, and the discriminator aims to critique the art and distinguish it from existing art. Both networks are trained in competition until the generator can fool the descriminator and create realistic works of art.

GAN Inversion with Semantic Loss

GANs are great at generating new works of art, but we wanted to see if the GAN could recreate existing works in The MET's collection. To achieve this, we used a technique called neural network inversion. Instead of learning the weights of a generator network, we keep those weights constant and instead learn the noise pattern to maximize the similarity between the GAN output and a target work. We discovered that one needs to match both the target image's pixels as well as its "high-level" semantic content to be successful.

GAN Latent Space Traversal

To explore the spaces between objects in our GAN, we first invert the objects to get their positions in "latent" space. This latent space is learned by the network, and each point in it corresponds to a unique artwork when mapped through the generator network. To interpolate between the points we use plain-old vector interpolation, though depending on the noise you train your GAN with, you might get better performance by transforming to spherical coordinates before the interpolation (because of the magic of high dimensional gaussians).

Reverse Image Search

To create a reverse image search engine, we first map the MET's images into a space where distance is more meaningful, aka the output of a truncated pretrained ResNet50 model. In this space, images that seem similar to us are close together. Furthermore, their positions are roughly invariant to small image transformations like scaling, brightness, rotations etc. This is starkly opposed to pixel space, where imperceptibly small translations like scaling or rotating can completely change the distance between images. Once we have all of the Met's images featurized, we create an efficient nearest neighbor lookup tree frequently referred to as a k-d tree. This lets us lookup approximate nearest neighbors in feature space without comparing our image to the entirty of the MET collection. At each node of this tree, we store the pointer to the MET image so that we can quickly return it to the caller. We use the annoy library for fast NN indexes.

Architecture

Frontend:

The frontend serves as the portal to the backend services. The website is built using React for the layout and styling, and plotly for the map exploration UX. The content is hosted as an azure web-app using Azure App Services.

Backend:

The backend for the site uses the following services:

These services make it possible to generate new images in real-time and are responsible for the website’s interactive look and feel. AKS dramatically streamlines the path to production services on GPU hardware. The AKS cluster provides backend GPU compute for the GAN evaluation service, and the reverse image search service.

A core challenge we encountered was how to explore around the latent space of existing MET artworks. The network inversion technique, described in the techniques section above, is time intensive. To make this responsive, we pre-computed inversions to thousands of images from the met collection and stored the results in Azure Storage. Thus, we could treat them as fast static assets.

To create the image search experience, we loaded the entire open access collection into an Azure Databricks cluster. We used Microsoft Machine Learning for Apache Spark (MMLSpark) to enrich these images with annotations from the Azure Computer Vision API. We then pushed these enriched documents to the Azure Search Service.

Development:

To invert the GAN and train the nearest neighbor models we recommend you use a GPU machine. We used a GPU Azure Data Science Virtual Machine.

To enrich The Met images with computer vision annotations, we used an Apache Spark cluster backed by Azure Databricks with MMLSpark. We used the Azure Search writer for Spark to push the enriched images into Azure Search.

Get started

The following docs walk through how to deploy and set up your own APIs and website:

  1. Build the website
  2. Deploy the APIs
  3. Build the Search Index

Contributors

Special thanks to the entire Gen Studio team including:

  • Mark Hamilton – Software Engineer, Microsoft Applied AI – lead
  • Chris Hoder – AI Project Manager, Microsoft Applied AI – lead
  • Matthew Ritchie – Dasha Zhukova Distinguished Visiting Artist at the MIT Center for Art, Science & Technology, MIT – lead
  • Sarah Schwettmann, Brain + Cognitive Sciences, Knowledge Futures Group, MIT- lead
  • Kim Benzel, Curator in Charge, Department of Ancient Near Eastern Art, The Met
  • Julie Arslanoglu, Research Scientist, Department of Scientific Research, The Met
  • Darius Bopp, MIT Student Extern with Microsoft Garage
  • Diana Nguyen, MIT Student Extern with Microsoft Garage
  • Elaine Zhang, MIT Student Extern with Microsoft Garage
  • Casey Hong, Software Engineer, Microsoft AI Rotation Program
  • Dalitso Banda, Software Engineer, Microsoft AI Rotation Program
  • Manon Knoertzer, Program Manager, Microsoft AI Rotation Program
  • Karthik Rajendran, Data Scientist, Microsoft AI Rotation Program
  • Botong Ma, Graduate Student, Electrical Engineering and Computer Science, MIT
  • Alonso Salas Infante, MIT Student Extern with Microsoft Garage
  • Gillian Belto, MIT Student Extern with Microsoft Garage
  • SJ Klein, Underlayer, Knowledge Futures Group, MIT
  • Luke Hewitt, Graduate Student, Brain and Cognitive Sciences, MIT
  • Maddie Cusimano, Graduate Student, Brain and Cognitive Sciences, MIT
  • Wilson Lee, Senior Software Engineer, Microsoft

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Legal Notices

Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.

Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.

Privacy information can be found at https://privacy.microsoft.com/en-us/

Microsoft and any contributors reserve all others rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.

genstudio's People

Contributors

chrishmsft avatar mhamilton723 avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genstudio's Issues

Combine Features Tab does not display images

I noticed that the images in the Combined Features Tab does not display any images. There's an error in the console the specified blob does not exist for urls like this: https://mmlsparkdemo.blob.core.windows.net/met/thumbnails239826

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.