databrewery / cubes Goto Github PK

View Code? Open in Web Editor NEW

1.5K 106.0 315.0 9.41 MB

[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

Home Page: http://cubes.databrewery.org

License: Other

Python 99.11% HTML 0.56% Vim Script 0.33%

olap data data-warehouse sql multidimensional-analysis cube data-analysis

cubes's Introduction

Cubes - Online Analytical Processing Framework for Python

Cubes is a light-weight Python framework and set of tools for Online Analytical Processing (OLAP), multidimensional analysis and browsing of aggregated data.

Focus on data analysis, in human way

Overview

Purpose is to provide a framework for giving analyst or any application end-user understandable and natural way of presenting the multidimensional data. One of the main features is the logical model, which serves as abstraction over physical data to provide end-user layer.

Features:

OLAP and aggregated browsing (default backend is for relational databse - ROLAP)
multidimensional analysis
logical view of analysed data - how analysts look at data, how they think of data, not not how the data are physically implemented in the data stores
hierarchical dimensions (attributes that have hierarchical dependencies, such as category-subcategory or country-region)
localizable metadata and data
SQL query generator for multidimensional aggregation queries
OLAP server – HTTP server based on Flask Blueprint, can be easily integrated into your application.

Download

Current recommended version is 1.1.x. It hasn't been yet tagged so please use the master branch. This version includes SQL backend support out of the box, and other backends have been moved to separate projects (ie. MongoDB). This branch (currently master) will be soon tagged as 1.1 release.

Previous stable version was 1.0.1. This version included all backend types, but no further development will be done on this branch.

Documentation

Latest documentation

Examples

See examples directory in the source code repository for simple examples and use-cases.

See https://github.com/DataBrewery/cubes-examples for more complex examples.

Models

For cubes models see https://github.com/DataBrewery/cubes-models

Development

Source code is in a Git repository on GitHub

git clone git://github.com/DataBrewery/cubes

After you've cloned, you might want to install all of the development dependencies.

pip install -e .[dev]

Build the documentation like so. ::

cd doc
make help
make html

Outputs will go in doc/_*.

Requirements

Python >= 2.7 and Python >= 3.4.1

Most of the requirements are soft (optional) and need to be satisfied only if certain parts of cubes are being used.

SQLAlchemy from http://www.sqlalchemy.org/ version >= 0.7.4 - for SQL backend
Flask from http://flask.pocoo.org/ for Slicer server
Jinja2 from http://jinja.pocoo.org/docs/ for HTML presenters

Support

If you have questions, problems or suggestions, you can send a message to the Google group cubes-discuss.

IRC channel #databrewery on server irc.freenode.net

Report bugs using github issue tracking.

Development

If you are browsing the code and you find something that:

is over-complicated or not obvious
is redundant
can be done in better Python-way

... please let it be known.

Authors

Cubes is written and maintained by Stefan Urbanek (@Stiivi on Twitter) [email protected] and various contributors. See AUTHORS file for more information.

License

Cubes is licensed under MIT license. For full license see the LICENSE file.

cubes's People

Contributors

Stargazers

Watchers

Forkers

trumant code6 marconilabs rjchacko deytao adnam smoothdeveloper m0nocle sepastian tmu ioggstream maurodoglio akolechkin elmarcoh craigteegarden ovnicraft mrcrabby thieman daemon13 jjmontesl kewlcherry perryhau dariogt lukehan squarespace willu47 digitalsatori direvius troyscott intery89 funkygao kamilchm nd0ut gregjurman bbelchak alberts hartym ebunt alexmiao samof76 galihrivanto matthieuriolo hewen1990 rquevedo adieyal eece-23 jibecompany rgruebel nonsleepr tlevine angventesworks 0xack13 obsh koenvo meyerson r0k3 jinsongbian juaneschutte zjuwangfei eokyere michalskop ahoy-jon obken magicjohnson chiller luismoralesalonso 6si bachatero slefoll tengfei1010 xunyou kjing juracy jromer94 justinleoye dwa stancikcom aswanikarteek salticus ltvolks dtheodor zejn dustinromey micdm clham robin900 kchudy ltaylor-digmap lf8289 ramilexe winggynonly allanvieira jdiazvera noyeitan higward sayiho hwl-bi jell0720 ubreddy cesarmarinhorj

cubes's Issues

Generate cross-tab aggregation result

Allow ability to format output JSon to be easily used in a cross-tab. Possible json output:

{ "page" = ..., "row-headers": ..., "column-headers:...", "rows": ...}

or something like that (need to match to some table-display JS framework).

Allow dimension objects in Model.init

Model.__init__ dimensions expects a dictionary of dictionaries. Allow values to be Dimension objects as well.

Add optional SQL debugging output

Allow to log SQL statements being issued to the SQLAlchemy. Either use sqlalchemy logging or just print the generated statement.

Allow slicer to get model from backend

Current situation: model has to be specified in the slicer configuration

Proposal: allow backend to provide the model.

Silly example: if slicer backend is used (remember "Inception" (2010)?) then the model can be received from the URL. We do not need to provide one.

Inconsistency between SQL denormalizer and browser with flat dimensions

Situation: physical model contains column year and a dimension is called date. Dimension is currently flat, containing only attribute year. Rected cube changes allow to browse flat dimensions just by dimension name, that would be date in this case. However, we want to browse is as no-flat dimension, that is as date.year.

Slicer config section for backends should be [workspace]

Current state: too much unnecessary freedom and inconsistency in slicer configuration section for backends.

To do:

remove backend_section
read configuration from [workspace]
silently accept [db] and [backend]
issue warning when [db] or [backend] is used

Advantage: [workspace] name consistent with how it is referred in the code (create_workspace() & workspace subclasses)

Flatten namespace

Problem: There is no practical reason to make user to import different modules, it might be even confusing. Currently module namespace is there just for taxonomy reasons.

Make requirement to import just cubes: user will get everything except backends. backends should be the only sub-module of cubes publicly visible.

That does not mean that the modules will be removed, just users will NOT be encouraged to use them. Rules:

no example should show use of cubes sub-modules, except backends
documentation should contain reference for 'un-moduled' objects
in documentation there should be a note, if there is any reason that user would want only one Cubes module, he might.

Considered as usability change.

Users should not have to ask "What package this class/function is in?"

Add documentation about aggregations

Add text from blog post about aggregations into official documentation. Describe differences and slice description ("why?" and "how?")

Backend should provide keywords as list of features

Each backend or rather browser should provide metadata about itself: list of capabilities as keywords, such as:

remainder - if the backend provides remainder information on limit
ordering - if backend can order results or not
...

Slicer should be able to load modules

Problem: Slicer can use only backends that are provided by the Cubes framework

Proposed functionality: allow list of modules to be specified in the slicer configuration file. Slicer will load them. Modules might contain custom backends and the backends will be accessible by the slicer.

Options:

[modules]
module1=
module2=

or:

[server]
load=module1,module2

Implement normalized star browser

Implement a browser for normalized star schema: fact with keys to dimension tables(views).

Allow more drilldowns in HTTP API in one drilldown parameter

Allow &drilldown=date|donor ... in HTTP API:

To specify dimension only:

drilldown=dim|dim|dim
drilldown=date|donor

or to specify level:

drilldown=date:year|donor

Add slicer model checker

Add checking of model against physical schema.

Slicer interface:

slicer test cofig.ini

Checks:

model validation
check each measure and attribute, whether it exists in the schema

*Requirements: * backends should implement validation function, preferably in workspace.

Change designated way of using Cubes in docs

Add create_workspace() to documentation examples and state that it is the designated way of using cubes.
Prefer model from file loading instead of programatic model creation
state that programatic model creation is for advanced users and for creating generators

Create JavaScript front-end library for Slicer

Create a JavaScript framework for the Slicer Server.

Functionality

set slicer URL
load model (with locale)
model queries
perform all aggregation functions

Add unit testing for slicer server

Create unit tests for Slicer server HTTP requests, at least: aggregate, facts, dimension values and model.

Fix invalid cross-references in documentation

Documentation (both in /doc and in *.py sources) has to be checked for validity of cross-references: classes, functions, packages, ... (for example api/backends). There are some invalid - results in no links in the documentation, which might be frustrating for developers.

Change from dict to list in model definition file

Do not store dimensions, cubes and any other list of named objects as dictionary, use list instead, to preserve ordering.

Question: Should we allow both or not?

Restructure documentation - separate reference

Documentation is not mixed in one heap: for example explanation of principles with class references is confusing for newbies and information-overloading for those who just want to write code. It needs little bit of restructuring: split into two or three parts: Tutorial/Explainatory and Reference/API.

SQL: Allow tables to be in dfferent schemas

Current state: all tables of the star/snowflake schema have to be in one database schema (oracle, postgres, ...). This is considered a bug.

Solution: recognise schemas in mappings, either by dot '.' separation or use "schemaname.tablename" => {schema:"schemaname", table:"tablename}

This should go to the AttributeMapper

(reported/suggested by gauthier @ IRC)

Model.cubes should be a list not a dict

Model.cubes is a dictionary, it should be a list. To get a cube by name one should use: Model.cube(name)

Generate SQL schema based on model

Create a SQL schema from logical model. Options:

generate star schema with same naming as model attributes, one table per dimension
use the mappings and reversse-generate physical schema

Suggestion by gauthier (#databrewery IRC)

Verify if slicer works with virtualenv

Cubes Slicer server was never tested with wsgi+virtualenv, try it. If it does not work, fix it to make it work.

Problem with denormalizing localizable model

When trying to denormalize localized model:

Traceback (most recent call last):
  File "denormalize.py", line 53, in <module>
    tool.create_view()
  File "denormalize.py", line 50, in create_view
    builder.create_view(VIEW_NAME, schema = DATAMART_SCHEMA, index = True)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cubes-0.7.0-py2.7.egg/cubes/backends/sql/builder.py", line 120, in create_view
    self._create_view_expression()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cubes-0.7.0-py2.7.egg/cubes/backends/sql/builder.py", line 101, in _create_view_expression
    self._collect_columns()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cubes-0.7.0-py2.7.egg/cubes/backends/sql/builder.py", line 245, in _collect_columns
    self.columns.append(self._select_column(attribute, locale))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cubes-0.7.0-py2.7.egg/cubes/backends/sql/builder.py", line 294, in _select_column
    % (localized_alias, table_name, field_name) )
cubes.model.ModelError: Mapped column 'date.month_name.en.en' does not exist (as dm_date.month_name.en_en)

NameError: global name 'sys' is not defined (Flask example)

Running flask example:

sebastian@sebastian-laptop:~/src/cubes/examples/sandbox/flask_dimension_browser$ python application.py
Traceback (most recent call last):
File "application.py", line 107, in
initialize_model()
File "application.py", line 100, in initialize_model
view_prefix="vft_")
File "/usr/local/lib/python2.7/dist-packages/cubes-0.8.1-py2.7.egg/cubes/util.py", line 346, in create_workspace
backend = get_backend(backend_name)
File "/usr/local/lib/python2.7/dist-packages/cubes-0.8.1-py2.7.egg/cubes/util.py", line 311, in get_backend
backend = sys.modules.get("cubes.backends."+backend_name)
NameError: global name 'sys' is not defined

Refactor collection of dimensions in Model.add_cube()

Model.add_cube() requires refactoring and sanity check. The collection of dimensions seems to be too implicit.

No more details, this is just a reminder.

Refactor localization in Slicer

Rewrite localization in Slicer to use localization dictionary (already in app), request model by locale from App for each request (do not cache model).

References: server.slicer.Slicer.init(), server.controllers.application_controller.ApplicationController._localize_model (and init() )

Libraries dependencies in REDME file

The README file need a list about dependencies.

Reflect removal of Cell aggregations in documentation

Cell is nov a "model" object and has no actions such as aggregations. Reflect this in documentation.

Python version required in README file

It necessary add the Python version required for 0.6 version.

Create SQL helpers

Create SQL helpers:

load CSV into DB (generic way)
generic denormalizer
schema-to-model - generate logical model automatically from shema

Reason: mostly newbies might not have the proper tools or knowledge how to do that, they just want to do that.

Rules for helpers:

do NOT try to cover all possible cases, just the most common ones
attach "appropriateness" notice: when it is OK to use helper and when the user should use some more "professional" way of doing it

Consider using brewery.

Idea for this issue comes from #30.

This issue affects project adoption, which should be considered kind of important.

Slicer is not recognized as an internal or external command

Hi Stefan,
I tried to run hello_world example on Win7, but stuck at running 'slicer serve slicer.ini' with above error message.
I found "Slicer" file (without exe or py extension) on C:\Python27\Scripts which contain following code:

#!C:\Python27\python.exe
# EASY-INSTALL-SCRIPT: 'cubes==0.8.1','slicer'
 __requires__ = 'cubes==0.8.1'
 import pkg_resources
 pkg_resources.run_script('cubes==0.8.1', 'slicer')

I tried both 'pip install cubes' and 'python setup.py install'. Do I miss something here?

Backend catalogue

Current state: backends have to be modules

Allow a backend to be anything with specified interface and have a way how to:

register the backend
get the backend by name

Affected code: create_workspace()

Backend-related cube attributes should be grouped

Backend related attributes of cube, such as mappings, fact, dimension_prefix ... should be put into some dictionary which should be named Cube.backend or Cube.physical.

create_workspace() should take kwargs instead of dict

Current state: backends.*.create_workspace(model, config) takes a config dictionary with configuration. This is fine for "non-user" calls of this function, such as from server. There might be situation when it is desired to create workspace programmatically and it is more convenient to be able to set arguments as of any other function.

Solution:: The signature should be backends.*.create_workspace(model, **config)

All attributes should be of Attribute not strings

Current status: Inconsistent. Currently there is mix of cubes.model.Attribute instances and strings and it is causing problems on some places where full attribute is needed. Also it causes confusion about how to construct full attribute reference.

It should be consistent all over the model and its parts. cubes.model.Attribute instance is needed for getting full name and some localisation purposes.

Create denormalizer based on StarBrowser

Create new denormalized based on the new StarBrowser.

This replaces Issue #27.

Presenters and formatters (visualisation idea)

Add presenters/formatters module which will be able to format data for various visualisation libraries.

Also needed:

tables (any javascript table libs suggestions?)

Description

Presenter: presents whole result
Formatter: formats one value (like a table cell)

API suggestion:

get_presenter(name) - create a presenter instance, ex.:get_presenter("highcharts")`
Presenter.generate(object) - generate presentation of an object

Notes:

HTTP query would include a presenter key

Update documentation about aggregations

doc/aggregation.rst needs update - it is invalid(!). Use examples from blog posts and from tutorials. Currently it contains very obsolete information.

Check all examples to reflect "preferred way"

Check whether all examples are using cubes in a way that is the preferred (that is: using create_workspace() and other new additions).

Make sure that the new examples work with new star browser.

Restructure test suite

Restructure the test suite so that:

each module has correspoding test file
(at least) each public function has a at least one test case

create_table_from_csv fails with MySql

I tried to modify the hello_world example to use MySql, but get the following error, suggesting that create_table_from_csv doesn't use valid ddl for MySql?

$ python prepare_data.py 
loading data...
Traceback (most recent call last):
  File "prepare_data.py", line 31, in <module>
    create_id=True    
  File "/Library/Python/2.7/site-packages/cubes-0.8.0-py2.7.egg/cubes/tutorial/sql.py", line 37, in create_table_from_csv
    table.create()
  File "/Library/Python/2.7/site-packages/sqlalchemy/schema.py", line 583, in create
    checkfirst=checkfirst)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 2234, in _run_visitor
    conn._run_visitor(visitorcallable, element, **kwargs)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 1904, in _run_visitor
    **kwargs).traverse_single(element)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/visitors.py", line 86, in traverse_single
    return meth(obj, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/ddl.py", line 86, in visit_table
    self.connection.execute(schema.CreateTable(table))
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 1405, in execute
    params)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 1490, in _execute_ddl
    compiled = ddl.compile(dialect=dialect)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/expression.py", line 1724, in compile
    return self._compiler(dialect, bind=bind, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/schema.py", line 2872, in _compiler
    return dialect.ddl_compiler(dialect, self, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 699, in __init__
    self.string = self.process(self.statement)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 718, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/visitors.py", line 59, in _compiler_dispatch
    return getter(visitor)(self, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/compiler.py", line 1389, in visit_create_table
    not first_pk
  File "/Library/Python/2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1369, in get_column_specification
    self.dialect.type_compiler.process(column.type)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 764, in process
    return type_._compiler_dispatch(self)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/visitors.py", line 59, in _compiler_dispatch
    return getter(visitor)(self, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/compiler.py", line 1727, in visit_string
    return self.visit_VARCHAR(type_)
  File "/Library/Python/2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1656, in visit_VARCHAR
    self.dialect.name)
sqlalchemy.exc.CompileError: (in table 'ft_irbd_balance', column 'category'): VARCHAR requires a length on dialect mysql

Move model creation defaults into a utility function

"Explicit is better than implicit."

Current state: During creation of model objects, such as dimensions or cubes, some defaults are applied. For example in the dimension:

If no levels are specified during initialization, then dimension name is considered flat, with single attribute.
If no hierarchy is specified and levels are specified, then default hierarchy will be created from order of levels
If no levels are specified, then one level is created, with name default and dimension will be considered flat

This should not be enforced, as some users might not expect that. Also it might come into a conflict later. This should be moved to some utility function/method:

model.apply_defaults()

Model objects should be kept in their state as created by user, having the model valid or not. That's what model validation is there for. And after using model.apply_defaults() model should be valid, only if there are really serious issues.

Add server output format for charting JS libs

Add ability to the Slicer server to generate aggregation results appropriately formatted for some JS charting lib, so it can be directly used. options: Hi-charts, google charts, ... many more.

Note: This feature should belong to the cubes.js, but might be good idea to have it here for "bootstrapping" the idea.

Add possibility to choose browsing backend

Currently the Slicer server expects only default SQL backend to be used. Create a way how to:

specify backend by some identifier
provide configuration to the backend

Reuse idea from cubes.backends.sql.SQLWorkspace:

Workspace should take model as an argument, rest of args are optional
Workspace should provide method browser_for_cube(cube)

Use mapper from star browser in denormalized browser

Rewrite parts of sql denormalized browser and denormalizer to use new Mapper from star browser. This will fix inconsistency and possible bugs in #14

Cell cutting refactoring and improvement

There should be better cell cutting. Currently there are three ways for creating a cell:

initialization from list of cuts
slicing through one dimension point with Cell.slice(dimension, path)
slicing with multiple cuts with Cell.multi_slice(cuts)

Issues:

The two slicing method have similar name but different approach
If one wants to add single cut to the cell, the only way is to use Cell.multi_slice([cut]) which is not very nice nor intuitive

Proposal:

keep initialization with cuts
add method for adding a single cut: slice(cut)
add methods for slicing: slice_point(), slice_set(), slice_range() (?)

Still unsure about:

should Cell be mutable or immutable?
method naming/verbs: cut vs. slice. "Slice & Dice" is common in OLAP terminology, however slice is more part of the cube where "cut" is kind of point/line/boundary definition how the cell slice should be created from the cube.

Error running hello_world example

I install 0.8 version, when run slice to test hello_world example get this error.

ovnicraft-macbook:hello_world ovnicraft$ slice
sliceprint slicer
ovnicraft-macbook:hello_world ovnicraft$ slicer serve slicer.ini
Traceback (most recent call last):
File "/usr/local/bin/slicer", line 12, in
import argparse
ImportError: No module named argparse

Depreciate Dimension.default_hierarchy

Current status: Dimension.default_hierarchy is computed property. There is lots of code that checks whether a hierarchy argument is specified or not and then decides what hierarchy to use.

Solution: Dimension.hierarchy() now accepts None as a valid argument (set as default) and returns default_hierarchy in that case.

TO-DO: go through all the code and remove default_hierarchy.

SQL helper for date data type

Create SQL helper methods or integrate in SQL browsers to break down date data types into date dimension hierarchy levels.

databrewery / cubes Goto Github PK

cubes's Introduction

Cubes - Online Analytical Processing Framework for Python

Overview

Download

Documentation

Examples

Models

Development

Requirements

Support

Development

Authors

License

cubes's People

Contributors

Stargazers

Watchers

Forkers

cubes's Issues

Description

Recommend Projects

Recommend Topics

Recommend Org