Giter Site home page Giter Site logo

hmm34 / reducto Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 15.66 MB

Project II for 3460:635 Advanced Algorithms, which is composed of two parts. Part I uses singular value decomposition (SVD) to compress PGM images. Part II uses principal component analysis (PCA) to reduce the dimensionality of high-dimensional datasets.

C 59.44% C++ 40.49% Objective-C 0.06%

reducto's Introduction

reducto

The goal of this project is to use linear algebra matrix operations to shrink large datasets. The specific requirements consist of two major parts: image compression and dimensionality reduction. Image compression is accomplished by using singular value decomposition (SVD), and high-dimensional dataset dimensionality reduction is accomplished by using principal component analysis (PCA).

Part I: Image Compression

The most important operation used in this section is singular value decomposition (SVD). A singular value decomposition of an m by n matrix M is a factorization of the form M=UΣV*, where

  • U is an m by m unitary matrix
  • Σ is an m by n rectangular diagonal matrix with nonnegative numbers on the diagonal, and
  • V* is the transpose of a unitary matrix V.

Uses

reducto 1 image.pgm

Saves the necessary information for the PGM image in ASCII PGM format using binary encoding, without preserving comments. 2 bytes are used to save the width of the image, 2 bytes are used to save the height are the image, 1 byte is used to save the grey scale, and 1 byte is used for the grey level of each pixel. See issue #6.

For example, if the image.pgm contains:

P2
#my image
4 5
255
234 23 12 345 22 11 22 11 22 1
1 2 3 4 5 6 7 8 9 10

then the file will be saved using 2 + 2 + 1 + (4 * 5) = 25 bytes. The comment is not preserved. The output is a binary file named image_b.pgm.

reducto 2 image_b.pgm

Reverses the previous command. Converts a binary file, image_b.pgm, to an ASCII PGM file. The original comment is not preserved, and the new comment is "# Back off! I will make you teensy!". The outupt file is saved as image2.pgm. See issue #7.

reducto 3 header.txt SVD.txt k

Saves the necessary information in the approximated image in a binary format. The input includes the header of the original image, three matrices U, Σ, V, and an integer k representing the rank of the approximation. The saved image is named image_b.pgm.SVD, is recoverable, and can be viewed using any PGM viewer. The goal is to save storage.

The header.txt contains three integers (width, height, grey scale levels). The SVD.txt contains three matrices (U, Σ, V). See issue #8.

reducto 4 image_b.pgm.SVD

Reverses the previous command. Takes a PGM image that was compressed using SVD and outputs an uncompressed PGM image, named image_k.pgm, that can be viewed using any pgm viewer. See issue #9.

reducto 5 image.pgm

Possible extra credit opportunity. Takes a PGM image in ASCII format and produces the header.txt and svd.txt files that can be used for step 3. Employs SVD decomposition on the original matrix A (which is the image) to produce three matrices U, S, and V, which are written to image_svd.txt. The width, height, and maximum grayscale values from the original image are pereserved and written to image_header.txt.

Part II: Dimensionality Reduction

The most important algorithm used in this section is principal component analysis (PCA). It is a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. It's commonly used for dimensionality reduction for visualization of high-dimensional datasets. See issue #5

Testing

test storage

Runs multiple test cases of image compression and decompression using the first two reducto commands and varying sizes of images, as well as the last two commands that use SVD with varying levels of rank approximation k and relaxing the norm of the error matrix.

test timing

Runs multiple timing metrics using the aforementioned commands for image compression and decompression based on size of the original image as well as compression size.

reducto's People

Contributors

hmm34 avatar evanpurkhiser avatar

Stargazers

 avatar

Watchers

Drew Guarnera avatar  avatar

reducto's Issues

Part II - Use PCA on a large dataset

The objective of this part is to understand how PCA works, the connection between SVD and PCA, and the application of PCA to dimensionality reduction. You are required to pick a dataset that’s big and interesting to demonstrate your understanding of PCA. Make sure you give us the background of your data and tell me what you can conclude from the PCA results.

Apart from the wiki, some good places to start would be:

Report - Describe SVD and PCA

Describe singular value decomposition and principal component analysis. Here is a potentially useful compare and contrast describing the two methods of dimensional reduction. Since we don't have to implement these two components, describing what they are and that we understand them is crucial.

Choose a library that implements SVD and/or PCA

We're not required to implement these methods ourselves, but can choose from an awesome and convenient, hopefully heavily document, third party library that does it all for us! Some options include:

I'm sure there are others, but these were the first and most promising search results. From my understanding, most if not all of these build off of BLAS. Though why go lower level when high level already exists?

Report - Theoretical and empirical analysis

Theorize how long the implementation of SVD and PCA will take and the expected size reduction, and compare this against the empirical results from the testing metrics. Discuss the differences and create lovely, colorful charts that are not of the type scatterplot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.