Giter Site home page Giter Site logo

datahuborg / datahub Goto Github PK

View Code? Open in Web Editor NEW
210.0 210.0 60.0 50.58 MB

An experimental hosted platform (GitHub-like) for organizing, managing, sharing, collaborating, and making sense of data.

Home Page: https://datahub.csail.mit.edu

License: MIT License

Python 29.21% HTML 20.26% Shell 0.77% JavaScript 34.73% CSS 9.36% Makefile 0.34% C++ 0.08% Go 0.07% Java 0.21% Objective-C 4.51% Thrift 0.13% Batchfile 0.31%

datahub's People

Contributors

anantb avatar b-carter avatar dnsserver avatar famien avatar hariharsubramanyam avatar jharia avatar justinanderson avatar kxzhang avatar rogertangos avatar sirrice avatar ygina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datahub's Issues

Tables should be accessible by links WITHOUT username

I would like to be able to link people to tables/repositories without using their usernames:

for example, instead of
https://datahub.csail.mit.edu/browse/USERNAME/REPONAME/table/TABLENAME

use this and have username inferred by their login status.
https://datahub.csail.mit.edu/browse/REPONAME/table/TABLENAME

This is a thing that kept bugging me during getfit.

DML "cards" don't escape characters

When a user creates a card using DML, they are asked to give it a name. Currently, names that contain blank spaces cannot be mapped to urls, and break.

ImportError: No module named datahub

/home/ubuntu/datahub/src/apps/dbwipes/views.py:6: DeprecationWarning: the md5 module is deprecated; use hashlib instead
import md5

Internal Server Error: /
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 103, in get_response
resolver_match = resolver.resolve(request.path_info)
File "/usr/local/lib/python2.7/dist-packages/django/core/urlresolvers.py", line 319, in resolve
for pattern in self.url_patterns:
File "/usr/local/lib/python2.7/dist-packages/django/core/urlresolvers.py", line 347, in url_patterns
patterns = getattr(self.urlconf_module, "urlpatterns", self.urlconf_module)
File "/usr/local/lib/python2.7/dist-packages/django/core/urlresolvers.py", line 342, in urlconf_module
self._urlconf_module = import_module(self.urlconf_name)
File "/usr/local/lib/python2.7/dist-packages/django/utils/importlib.py", line 35, in import_module
import(name)
File "/home/ubuntu/datahub/src/browser/urls.py", line 171, in
url(r'^apps/dbwipes/', include('dbwipes.urls')), # dbwipes app
File "/usr/local/lib/python2.7/dist-packages/django/conf/urls/init.py", line 25, in include
urlconf_module = import_module(urlconf_module)
File "/usr/local/lib/python2.7/dist-packages/django/utils/importlib.py", line 35, in import_module
import(name)
File "/home/ubuntu/datahub/src/apps/dbwipes/urls.py", line 2, in
import views
File "/home/ubuntu/datahub/src/apps/dbwipes/views.py", line 16, in
from service.handler import DataHubHandler
File "/home/ubuntu/datahub/src/service/handler.py", line 7, in
from datahub import DataHub
ImportError: No module named datahub
[27/Apr/2015 03:19:12] "GET / HTTP/1.1" 500 122244

Layout should be a compiled header

the current layout.html is not a compiled template (it's a raw template) -- in future we would compile it so that it separates out the static files (that can be server through either CDN) or a fast web server like nginx and put the template handling inside a reverse proxy. I'll put up a task for it but it's not super important.

Anant

In the datahub's layout.html we're importing 30+ header files. Many of these are for specific applications (i.e. terminal), and are hosted locally instead of using a CDN. Is this really necessary?

My impression is that it'd be faster/cleaner to put these in the application headers, and load them from cdn networks where possible.

Albert Carter

Go Thrift code generation throws errors

This seems to be a missing (non-vital) formatting package. Unfortunately, I'm unfamiliar with both Go and Thrift.

https://golang.org/cmd/gofmt/

Here's the error:

$ cd src/examples/go
$ /.setup.sh

sh: gofmt: command not found WARNING - Running 'gofmt -w /Users/arcarter/code/datahub/src/examples/go/gen-go/src///datahub/datahub.go' failed. sh: gofmt: command not found WARNING - Running 'gofmt -w /Users/arcarter/code/datahub/src/examples/go/gen-go/src///datahub/ttypes.go' failed. sh: gofmt: command not found WARNING - Running 'gofmt -w /Users/arcarter/code/datahub/src/examples/go/gen-go/src///datahub/constants.go' failed. sh: gofmt: command not found WARNING - Running 'gofmt -w /Users/arcarter/code/datahub/src/examples/go/gen-go/src///datahub/account/account_service-remote/account_service-remote.go' failed. sh: gofmt: command not found WARNING - Running 'gofmt -w /Users/arcarter/code/datahub/src/examples/go/gen-go/src///datahub/account/accountservice.go' failed. sh: gofmt: command not found WARNING - Running 'gofmt -w /Users/arcarter/code/datahub/src/examples/go/gen-go/src///datahub/account/ttypes.go' failed. sh: gofmt: command not found WARNING - Running 'gofmt -w /Users/arcarter/code/datahub/src/examples/go/gen-go/src///datahub/account/constants.go' failed.

responses should be gzipped

Doing big queries over HTTP is incredibly time consuming. Gzipping is pretty straightforward, and would significantly speed up load time.

Views cannot be deleted

{"error": "\"viewtest\" is not a table\nHINT: Use DROP VIEW to remove a view.\n"}

table_delete in browser.views will need to determine if the table_name passed is a view, and it will have to call related methods in manager.py and pg.py

Console is difficult to copy and paste into

When using the console, you sometimes have to click more than once and try to paste more than once before any text appears.

I'm using OSX and chrome.

This is possibly one of the simplest and most frustrating parts of datahub... because I no one wants to be editing sql in a terminal.

Manage a collection of queries

Maybe save create view statements as cards, or save all queries as temporary cards? It's too hard to retrieve queries that have previously been run successfully.

Ctrl+c to interrupt console processes

Manipulating large datasets in the terminal causes it to hang. The only way to interrupt is to force quit the tab. It should be possible to keep listening for ctrl+c and interrupt the process if the user desires.

Google Gadgets Like Apps

Users might be able to add javascript applications that affect tables through the DataHub API, for example Hands On Table
@karger

new account email addresses are case sensitive

account email addresses are case sensitive. It's possible to create an account with [email protected], and then another with [email protected], and then give them separate usernames and passwords.

in objective-c:

[account_client create_account:username email:@"[email protected]" password:password repo_name:@"getfit" app_id:appID app_token:appToken];

and then create another account:

[account_client create_account:username email:@"[email protected]" password:password repo_name:@"getfit" app_id:appID app_token:appToken];

Dependencies?

Seriously, add a requirements.txt or setup.py or just a list to the README.

Apps should be listed in app center

The "Apps Center" link on the top nav bar currently points to root. There should either be a list of installed apps, or the link should be removed.

disconnect stack in console

It should be possible disconnect from the current repo in the terminal. In postgres, this should be with the disconnect command and/or ctrl+d, but I'm not sure how other databases manage it.

Add DataQ back into DataHub

Currently, the DataQ app is not accessible via the DataHub user interface. Add a button to table-browse-template.html to launch the app.

LIMIT statements do not work in repo interface

DataHub adds automatic limit statements to all sql statements in repository interface (http://datahub.csail.mit.edu/browse/USERNAME/REPONAME/), to support pagination. As a result, a statement like select * from getfit.deviceinfo LIMIT 1; should work, but instead returns an error:

{"error": "syntax error at or near \"LIMIT\"\nLINE 1: select * from getfit.deviceinfo LIMIT 1 LIMIT 50 OFFSET 0\n ^\n"}

Client Throws "DBException(message:User matching query does not exist. Lookup parameters were {'username': None})"

The following piece of code throws the above error:

this.transport = new THttpClient("http://datahub.csail.mit.edu/service");
this.protocol = new  TBinaryProtocol(transport);
this.client = new DataHub.Client(protocol);

this.con_params = new ConnectionParams();
this.con_params.setUser("anantb");
this.con_params.setPassword("anant");
this.conn = this.client.open_connection(con_params);

ResultSet updatelogExists =  this.client.execute_sql(this.conn, "select * from anantb.test.demo", null);

Gives the following error:

DBException(message:User matching query does not exist. Lookup parameters were {'username': None})...

Would appreciate a fix for this! Thanks!

Test Issue

I'm linking github and Jira issues. This is a test to see if Jira picks up new github issues. Apologies for the notifications.

Allow cross-user table joins

I'm copying a few of the frequently requested features/fixes from jira onto Github, just so people know that we're aware of the issues, and are working on them.

u'prefixed strings' being returned by thrift

reported by @karger:

In javascript I'm using the thrift api to send datahub a sql query that
includes the "array_agg" operator:
"select count(prequest.hilulim.uid) as num, prequest.hilulim.title as
title ,array_agg(prequest.names.name) as names from prequest.hilulim
join ...."

the aggregated column is being returned by the thrift api as a string
encoding of an array of unicode strings
which is really weird---ie, the
value in the cell is the string "[u'Fiana Sara Eber', u'David Karger']"
Why am I getting this python-language syntax exposed in theoretically
language-agnostic thrift? Why isn't coming back as an array of
strings? why the unicode encoding? will it always be a unicode
encoding? do I need to parse it myself? Unfortunately since it's a
python encoded string I can't hand it to JSON.parse()

Client/Connection protocol is verbose

It takes a large number of lines to create a client. Maybe this could be done with one line and assume a default http client. If the user wanted a TCP client, they could specify.

Console cannot list views

Go to the console, then ls reponame

Base tables will show, but views won't.

This happens because the list_tables method now only lists base tables. Unfortunately, I'm having trouble getting Thrift (0.9.2) to generate up-to-date javascript code which will allow a javascript client.ls_views function.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.