Giter Site home page Giter Site logo

nasa-jpl-memex / memex-explorer Goto Github PK

View Code? Open in Web Editor NEW
121.0 43.0 69.0 14.33 MB

Viewers for statistics and dashboarding of Domain Search Engine data

License: BSD 2-Clause "Simplified" License

Python 43.91% Shell 2.45% HTML 19.79% CSS 16.40% JavaScript 16.31% Nginx 0.78% Batchfile 0.37%
memex-explorer miniconda anaconda crawler dashboard domain-discovery nutch ache apache tika

memex-explorer's People

Contributors

ahmadia avatar amfarrell avatar anthonytw avatar aterrel avatar brittainhard avatar chdoig avatar chrismattmann avatar kdodia avatar lewismc avatar nipurndoshi avatar purg avatar quasiben avatar rrgirish avatar shivikathapar avatar tbpalsulich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

memex-explorer's Issues

db_add_crawl bug

This happened when i started a nutch crawl that had no data model:

Traceback (most recent call last):
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask_restful/__init__.py", line 262, in error_router
    return original_handler(e)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask_restful/__init__.py", line 262, in error_router
    return original_handler(e)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/brittainchristopherhard/Documents/memex-explorer/app/views.py", line 162, in add_crawl
    crawl = db_add_crawl(project, form, seed_filename)
  File "/Users/brittainchristopherhard/Documents/memex-explorer/app/db_api.py", line 73, in db_add_crawl
    data_model_id=form.data_model.data.id,
AttributeError: 'NoneType' object has no attribute 'id'

@kdodia, @chdoig

Add basic ache crawler test

Check that an explorer install ache runs and crawls correctly. Will need to set up dummy web service to check against.

Backbone Integration

So there are a lot of ways this application can be streamlined through the use of Backbone. Setting up projects, for example, can be done on one page instead of on multiple pages that require multiple page refreshes.

It might be worthwhile to discuss (1) the extent to which we want Backbone to supplant Flask in areas such as project creation and (2) areas of the project that could benefit from the inclusion of Backbone.

Just something to discuss for the near future.

Refactor Image Space

After talking with @chdoig, we decided to

Images will be stored in a central directory, indexed by ID. Each image should have a parent project and at least one linked crawl.

Clicking on the "Image Space" application in a particular project should return all corresponding images.

Add basic nutch testing

Check that an explorer install nutch runs and crawls correctly. Will need to set up dummy web service to check against.

Delete project behavior

Update the delete project behavior so that, when you delete a project, all the resources of that project, move to your home directory. This involves changing all the project_id to the project with project.name=="Home" and just removing that project from the project table.

Dashboard take too long start

We are currently not displaying any plots with frontierpages.csv data, but we still depend on that file existing and having data for the dashboard page to display. The frontierpages file takes a long time to be generated (~10min, on a test). I tried removing the frontierpages.csv dependency on the code, but was still getting an error.

The ultimate solution probably involves digging into the ACHE code and providing a better output, a streaming API instead of a bulk write to files every X seconds. For now remove frontier pages dependency.

Responsive design?

The application looks odd when someone looks at it on a small screen. It's not really built for "Responsive Design" ©.

So, do we want to make it so it looks good on everything? Phones, tablets, etc? This shouldn't be too difficult to do, but it will take me some time.

A good way to look at this is with the web inspector in chrome (command + alt + j, then click the button that looks like a phone).

Let me know what you guys think. Do we want our customers to be able to use this thing on tablets and/or smartphones?

image space table link to image compare not working

Error message:

  File "/Users/cdoig/anaconda/envs/foo2/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/cdoig/work/test_memex/memex-explorer/app/views.py", line 417, in compare
    img = get_image(image_name)
TypeError: get_image() takes exactly 2 arguments (1 given)

Install not complete

When I followed through the install today, I get the following error upon visiting the entry page:

http://0.0.0.0:5000/
 * Running on http://0.0.0.0:5000/
 * Restarting with reloader
http://0.0.0.0:5000/
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET / HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/aterrel/workspace/apps/memex/memex-viewer/app/views.py", line 64, in index
    return render_template('index.html')
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/templating.py", line 126, in render_template
    ctx.app.update_template_context(context)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 716, in update_template_context
    context.update(func())
  File "/Users/aterrel/workspace/apps/memex/memex-viewer/app/views.py", line 41, in inject_crawls
    crawls = Crawl.query.all()
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2320, in all
    return list(self)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2438, in __iter__
    return self._execute_and_instances(context)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2453, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 322, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 958, in _execute_context
    context)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1159, in _handle_dbapi_exception
    exc_info
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 951, in _execute_context
    context)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 436, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (OperationalError) no such table: crawl u'SELECT crawl.id AS crawl_id, crawl.name AS crawl_name, crawl.endpoint AS crawl_endpoint, crawl.description AS crawl_description \nFROM crawl' ()
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=style.css HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=jquery.js HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=debugger.js HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=ubuntu.ttf HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=console.png HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=source.png HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:52] "GET /?__debugger__=yes&cmd=resource&f=console.png HTTP/1.1" 200 -

Add Summary Statistics

Ideas:

  • How long the crawl has been running
  • How many pages have been fetched so far
  • Average harvest rate

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.