Giter Site home page Giter Site logo

lewkoo / dataminingvizproject Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 253.72 MB

A COMP 4710 project on visualizing data mining results

C++ 48.52% Objective-C 2.87% C 47.74% JavaScript 0.01% Objective-C++ 0.19% Processing 0.01% Shell 0.03% D 0.57% CSS 0.05% Arduino 0.01% Python 0.01%

dataminingvizproject's Introduction

DataMiningVizProject

A COMP 4710 project on visualizing data mining results,

A COMP 4520 project on advancing & improving the visualization of Data Mining results

Goal

The goal of this project is to research and explore new and innovative ideas about processing and visualizing Mined Data results. This project should consist of two parts: 1) Pre-processor application to take the raw, mined data and produce a visualization file. 2) A player application, which will take the visualization file and display the data, as well as let the user interact with it.

Requirements

This project is designed to be as self contained as possible. A suitable version of OpenFrameworks is included in the repository. The only requirement, however, is Microsoft Visual Studio 2010. We only guarantee a succesful start on the Ultimate build of VS, but it should not be a problem to start it on any other version. The project never have been tested with any other version of Visual Studio (eq. 2008 or 2012, 2013).

We have tested this on : Windows 7, Visual Studio 2010 Ultimate.

To Run

  1. Clone the repo
  2. Open the ConeViz folder
  3. Open 'ConeViz.sln' in Visual Studio
  4. Go to 'Build' -> 'Clean Solution'
  5. Go to 'Build' -> 'Build Solution'
  6. After succesful compilation, Run the project

dataminingvizproject's People

Contributors

lewkoo avatar mrcoby avatar

Watchers

Juan David Hincapié-Ramos avatar  avatar  avatar

dataminingvizproject's Issues

Clustering

  1. Modify the "add itemset to a level" method to record minimum and maximum frequency on per level basis
  2. Introduce a "isClustered" flag for the level. If true, do a different location calculation and drawing routines (later, interaction routines as well)
  3. Introduce a slider to adjust the clustering level (cluster by frequency)

Count the number of items in the levels on dataset read

  1. Make sure that you are supporting n number of levels

Possible solution - create a Levels class, that holds a vector of Itemsets. When reading the itemsets, build the levels structure so you can easily recalculate the locations of the spheres on level-by-level basis. This will also simplify the rotation of the circles.

Implement a mapping function of mouse coordinates to 3D world coordinates

Use this link:

http://forum.openframeworks.cc/index.php?topic=11936.0

ofEasyCam is a ofCam which is a ofNode which has the worldToScreen()-function which calculates your 3d-coords to screen-coords.

greetings ascorbin

hi dP,
check the pointPickerExample in openframeworks/examples/3D
It deals exactly with what youre asking.

best regards

The worldToScreen scrrenToWorld functions did it.

Just a note: these functions are members of ofEasyCam and not of ofNode.

Implement a refresh method

  1. Is called on update
  2. Is executed only if a flag requestRefresh is set to true
  3. Re-loads the itemset, does the initial set up, resets flags and adjustable values

Create an OS X project, link the source files

This might be a tricky one, but referencing the source files shouldn't be that bad. Basically, the goal is to be able to run the existing code on Xcode & Windows at the same time.

Look into .PLY technology

  1. What is it?
  2. Who came up with it?
  3. Where is it used?
  4. How do you go about generating one?
  5. Performance testing - get some sample .ply files and load them into an OF example project (where they load a figure of a bunny). Make sure you test with HUGE files - no less than 4 million points, preferably more, go as far as you can.
  6. What are the limitations of this technology?
  7. Is there a way to include any extra data into it?

Compile your answers into a WIKI page, make sure you understand them.

Implement a Data parser

Dude, do this. I know it is boring, but it is necessary. Feel free to modify the Itemset and Level class if it needs to accommodate extra data (ex. frequency). Do not worry about the performance of the parser, as it will be loading the data only once.

Also, I don't think it is a big deal if you modify the format of the data (like removing the headers and so on). Just make sure you commit here and then we are good.

Parser must take the same things as it does now, returns void.

I have committed the files to the repository so you can just pull and start working.

We need code, man. Words are good, but code is better.

Look into Edge Bundling

This stuff is amazing. If we can incorporate it into our structure, that will greatly improve clarity. Thanks to Pourang for suggesting the idea.

Thoughts on showing big data

This is tricky, but doable. Em, from my experience with OpenGL, efficient, real-time rendering of more than 8,000 spheres start to be sluggish, slow and not interactive at all even on fast hardware (we are talking $2,000 double GPU kind of hardware). Which means, after a simple calculation, that if we have 1,000,000 itemsets and only 8,000 spheres at our disposal, each sphere will represent 125 itemsets, which is pretty good. We can cluster them by frequency. We need a good way to expand on that data to gain more detail representation... I will talk to Pourang on that part, I'll see if he has any ideas... it won't make sense to build another cone for clustered spheres (because they are all on the same level), however, it might make sense to reuse the code and create a temporary level ( a circle ) that will appear instead of the cone while user selects an exact itemset. I don't know about this yet, we will need to see the data first.

I am sure that when presented with big data, our visualization will break in many ways - in terms of representation, performance and interactivity. However, that is just one of the challenges and there are existing techniques we can employ to make is nicer, faster and more user friendly. I think that will be a big part of this project. Unfortunately, the sheer amount of data is stopping us from doing some really cool things, but that's just part of the deal.

Interactivity: path highlighting

This will be the last implemented feature. Essentially, there are two parts to this algorithm. I will try to highlight them here as I go through them.

  1. First, we need to select an itemset. Easy. Just click and the closest sphere to your mouse pointer will become your selected sphere. If a sphere is an itemset, this is easy - just take the name of that itemset and create a collection of VizElements where that name occurs using a similar algorithm to that of set Connections which determines the connections between itemsets. Great. Easy. With clusters it is a bit more tricky because they contain a lot of itemsets and a user needs to select a specific one. I will try to implement a full screen ofxUI selection element with all the itemsets in a selected cluster. I will also touch on this subkect in the report.

  2. Once the itemset is selected, a path in the tree highlighting every single occurence of that itemset will be selected. Maybe provide an option to hide the other lines to avoid clutter. Either way, this part is easy.

Draw the lines between spheres

  1. Have a function that accepts 2 levels - top and bottom. For every itemsets top level, find all the itemsets in the bottom level that contain the current itemstet.
    (use string method contains)

if contains -> draw a line

ex. For 'a' -> 'ab' and 'ac' will be your matches. draw a line between a and ab and ac.

Code to draw a line:

ofLine(topItemset.getLocation(), bottomItemset.getLocation());

Data file parser

Begin with some simple functionality, such as checking if the file exists and can be opened, listing all the available files, checking the file to data consistency (can be avoided).

At the moment, I am interested in how fast we can just read through the damn thing...

Line drawing threshold

Add a slider that allows user to define the threshold of the connection lines. If the frequency of two items that are considered for drawing a line in between them reaches a user-specified threshold, only then the line should be drawn. This can help reduce clutter with links that are low in frequency, therefore, that are not occurring often.

Color coding and size adjustments for the spheres

Color Coding :

implement a function that takes the actual frequency of an itemset and maps it into an normalized RGB color where:

if frequency == minsup, the sphere is silver/grayish.

If frequency if close to minsup, choose a color that is close to green. As you move away from minsup, the color become more yellow, orange and then turns red. Then apply that color to the sphere

Sphere size -

sphere size must also be a function of the frequency of the itemset.

Set up the project as a stand alone Visual Studio 2010 project

  1. The project should run out-of-the-box (i.e. all the static libs and dlls must be in the repository). Anyone must be able to clone the project, open it up and run it without any prerequisites besides the Visual Studio 2010.)

  2. OpenFrameworks 0.7.4 (or 0.8) should be included in the project and be set up.

  3. Include some basic 3D rendering code, with ofEasyCam (use the ofEasyCam example)

Look into Unity

As suggested by Juan, Unity is a very modern and very easy to use environment for 3D programming. While I am skeptical about the freedoms it might offer me in terms of coding, I do like the idea that I will have little code & generally more intuitive 3D development tool.

Unity, at least at the moment, gives me 2 advantages that I can see right from the start:

  1. Unity is easy to use and easy to program. Is it easy to learn to a level high enough for me to do what I might need to do? I don't know.
  2. Unity is new and contemporary, and that is very important in a research project like this - it is important to show that the technologies were chosen carefully and with prior knowledge of the field (in this case, 3D programming). Also, every time you choose a framework, you play a guess game - it might make your life a lot easier and a lot harder. The trouble is - you never know what is going to happen. But Unity is worth exploring, hence this ticket.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.