Giter Site home page Giter Site logo

databrewery / cubes Goto Github PK

View Code? Open in Web Editor NEW
1.5K 106.0 315.0 9.41 MB

[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

Home Page: http://cubes.databrewery.org

License: Other

Python 99.11% HTML 0.56% Vim Script 0.33%
olap data data-warehouse sql multidimensional-analysis cube data-analysis

cubes's Introduction

Cubes - Online Analytical Processing Framework for Python

Join the chat at https://gitter.im/DataBrewery/cubes

Flattr this git repo

Cubes is a light-weight Python framework and set of tools for Online Analytical Processing (OLAP), multidimensional analysis and browsing of aggregated data.

Focus on data analysis, in human way

Overview

Purpose is to provide a framework for giving analyst or any application end-user understandable and natural way of presenting the multidimensional data. One of the main features is the logical model, which serves as abstraction over physical data to provide end-user layer.

Features:

  • OLAP and aggregated browsing (default backend is for relational databse - ROLAP)
  • multidimensional analysis
  • logical view of analysed data - how analysts look at data, how they think of data, not not how the data are physically implemented in the data stores
  • hierarchical dimensions (attributes that have hierarchical dependencies, such as category-subcategory or country-region)
  • localizable metadata and data
  • SQL query generator for multidimensional aggregation queries
  • OLAP server โ€“ HTTP server based on Flask Blueprint, can be easily integrated into your application.

Download

Current recommended version is 1.1.x. It hasn't been yet tagged so please use the master branch. This version includes SQL backend support out of the box, and other backends have been moved to separate projects (ie. MongoDB). This branch (currently master) will be soon tagged as 1.1 release.

Previous stable version was 1.0.1. This version included all backend types, but no further development will be done on this branch.

Documentation

Latest documentation

Examples

See examples directory in the source code repository for simple examples and use-cases.

See https://github.com/DataBrewery/cubes-examples for more complex examples.

Models

For cubes models see https://github.com/DataBrewery/cubes-models

Development

Source code is in a Git repository on GitHub

git clone git://github.com/DataBrewery/cubes

After you've cloned, you might want to install all of the development dependencies.

pip install -e .[dev]

Build the documentation like so. ::

cd doc
make help
make html

Outputs will go in doc/_*.

Requirements

Python >= 2.7 and Python >= 3.4.1

Most of the requirements are soft (optional) and need to be satisfied only if certain parts of cubes are being used.

Support

If you have questions, problems or suggestions, you can send a message to the Google group cubes-discuss.

IRC channel #databrewery on server irc.freenode.net

Report bugs using github issue tracking.

Development

If you are browsing the code and you find something that:

  • is over-complicated or not obvious
  • is redundant
  • can be done in better Python-way

... please let it be known.

Authors

Cubes is written and maintained by Stefan Urbanek (@Stiivi on Twitter) [email protected] and various contributors. See AUTHORS file for more information.

License

Cubes is licensed under MIT license. For full license see the LICENSE file.

cubes's People

Contributors

alberts avatar charlesfleche avatar christian-proust avatar devvmh avatar diwu1989 avatar gitter-badger avatar jjmontesl avatar khaledto avatar longhotsummer avatar ltvolks avatar magicjohnson avatar marcinn avatar martinbjeldbak avatar meyerson avatar micdm avatar michalskop avatar mintbridge avatar nonsleepr avatar pktippa avatar pudo avatar puhrez avatar rafaelchefe avatar rberlew avatar rgruebel avatar robin900 avatar sepastian avatar stiivi avatar thieman avatar tlevine avatar trumant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cubes's Issues

Generate cross-tab aggregation result

Allow ability to format output JSon to be easily used in a cross-tab. Possible json output:

{ "page" = ..., "row-headers": ..., "column-headers:...", "rows": ...}

or something like that (need to match to some table-display JS framework).

Allow slicer to get model from backend

Current situation: model has to be specified in the slicer configuration

Proposal: allow backend to provide the model.

Silly example: if slicer backend is used (remember "Inception" (2010)?) then the model can be received from the URL. We do not need to provide one.

Inconsistency between SQL denormalizer and browser with flat dimensions

Situation: physical model contains column year and a dimension is called date. Dimension is currently flat, containing only attribute year. Rected cube changes allow to browse flat dimensions just by dimension name, that would be date in this case. However, we want to browse is as no-flat dimension, that is as date.year.

Slicer config section for backends should be [workspace]

Current state: too much unnecessary freedom and inconsistency in slicer configuration section for backends.

To do:

  • remove backend_section
  • read configuration from [workspace]
  • silently accept [db] and [backend]
  • issue warning when [db] or [backend] is used

Advantage: [workspace] name consistent with how it is referred in the code (create_workspace() & workspace subclasses)

Flatten namespace

Problem: There is no practical reason to make user to import different modules, it might be even confusing. Currently module namespace is there just for taxonomy reasons.

Make requirement to import just cubes: user will get everything except backends. backends should be the only sub-module of cubes publicly visible.

That does not mean that the modules will be removed, just users will NOT be encouraged to use them. Rules:

  • no example should show use of cubes sub-modules, except backends
  • documentation should contain reference for 'un-moduled' objects
  • in documentation there should be a note, if there is any reason that user would want only one Cubes module, he might.

Considered as usability change.

Users should not have to ask "What package this class/function is in?"

Backend should provide keywords as list of features

Each backend or rather browser should provide metadata about itself: list of capabilities as keywords, such as:

  • remainder - if the backend provides remainder information on limit
  • ordering - if backend can order results or not
    ...

Slicer should be able to load modules

Problem: Slicer can use only backends that are provided by the Cubes framework

Proposed functionality: allow list of modules to be specified in the slicer configuration file. Slicer will load them. Modules might contain custom backends and the backends will be accessible by the slicer.

Options:

[modules]
module1=
module2=

or:

[server]
load=module1,module2

Add slicer model checker

Add checking of model against physical schema.

Slicer interface:

slicer test cofig.ini

Checks:

  • model validation
  • check each measure and attribute, whether it exists in the schema

*Requirements: * backends should implement validation function, preferably in workspace.

Change designated way of using Cubes in docs

  • Add create_workspace() to documentation examples and state that it is the designated way of using cubes.
  • Prefer model from file loading instead of programatic model creation
  • state that programatic model creation is for advanced users and for creating generators

Fix invalid cross-references in documentation

Documentation (both in /doc and in *.py sources) has to be checked for validity of cross-references: classes, functions, packages, ... (for example api/backends). There are some invalid - results in no links in the documentation, which might be frustrating for developers.

Restructure documentation - separate reference

Documentation is not mixed in one heap: for example explanation of principles with class references is confusing for newbies and information-overloading for those who just want to write code. It needs little bit of restructuring: split into two or three parts: Tutorial/Explainatory and Reference/API.

SQL: Allow tables to be in dfferent schemas

Current state: all tables of the star/snowflake schema have to be in one database schema (oracle, postgres, ...). This is considered a bug.

Solution: recognise schemas in mappings, either by dot '.' separation or use "schemaname.tablename" => {schema:"schemaname", table:"tablename}

This should go to the AttributeMapper

(reported/suggested by gauthier @ IRC)

Generate SQL schema based on model

Create a SQL schema from logical model. Options:

  • generate star schema with same naming as model attributes, one table per dimension
  • use the mappings and reversse-generate physical schema

Suggestion by gauthier (#databrewery IRC)

Problem with denormalizing localizable model

When trying to denormalize localized model:

Traceback (most recent call last):
  File "denormalize.py", line 53, in <module>
    tool.create_view()
  File "denormalize.py", line 50, in create_view
    builder.create_view(VIEW_NAME, schema = DATAMART_SCHEMA, index = True)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cubes-0.7.0-py2.7.egg/cubes/backends/sql/builder.py", line 120, in create_view
    self._create_view_expression()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cubes-0.7.0-py2.7.egg/cubes/backends/sql/builder.py", line 101, in _create_view_expression
    self._collect_columns()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cubes-0.7.0-py2.7.egg/cubes/backends/sql/builder.py", line 245, in _collect_columns
    self.columns.append(self._select_column(attribute, locale))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cubes-0.7.0-py2.7.egg/cubes/backends/sql/builder.py", line 294, in _select_column
    % (localized_alias, table_name, field_name) )
cubes.model.ModelError: Mapped column 'date.month_name.en.en' does not exist (as dm_date.month_name.en_en)

NameError: global name 'sys' is not defined (Flask example)

Running flask example:

sebastian@sebastian-laptop:~/src/cubes/examples/sandbox/flask_dimension_browser$ python application.py
Traceback (most recent call last):
File "application.py", line 107, in
initialize_model()
File "application.py", line 100, in initialize_model
view_prefix="vft_")
File "/usr/local/lib/python2.7/dist-packages/cubes-0.8.1-py2.7.egg/cubes/util.py", line 346, in create_workspace
backend = get_backend(backend_name)
File "/usr/local/lib/python2.7/dist-packages/cubes-0.8.1-py2.7.egg/cubes/util.py", line 311, in get_backend
backend = sys.modules.get("cubes.backends."+backend_name)
NameError: global name 'sys' is not defined

Refactor localization in Slicer

Rewrite localization in Slicer to use localization dictionary (already in app), request model by locale from App for each request (do not cache model).

References: server.slicer.Slicer.init(), server.controllers.application_controller.ApplicationController._localize_model (and init() )

Create SQL helpers

Create SQL helpers:

  • load CSV into DB (generic way)
  • generic denormalizer
  • schema-to-model - generate logical model automatically from shema

Reason: mostly newbies might not have the proper tools or knowledge how to do that, they just want to do that.

Rules for helpers:

  • do NOT try to cover all possible cases, just the most common ones
  • attach "appropriateness" notice: when it is OK to use helper and when the user should use some more "professional" way of doing it

Consider using brewery.

Idea for this issue comes from #30.

This issue affects project adoption, which should be considered kind of important.

Slicer is not recognized as an internal or external command

Hi Stefan,
I tried to run hello_world example on Win7, but stuck at running 'slicer serve slicer.ini' with above error message.
I found "Slicer" file (without exe or py extension) on C:\Python27\Scripts which contain following code:

#!C:\Python27\python.exe
# EASY-INSTALL-SCRIPT: 'cubes==0.8.1','slicer'
 __requires__ = 'cubes==0.8.1'
 import pkg_resources
 pkg_resources.run_script('cubes==0.8.1', 'slicer')

I tried both 'pip install cubes' and 'python setup.py install'. Do I miss something here?

Backend catalogue

Current state: backends have to be modules

Allow a backend to be anything with specified interface and have a way how to:

  1. register the backend
  2. get the backend by name

Affected code: create_workspace()

create_workspace() should take kwargs instead of dict

Current state: backends.*.create_workspace(model, config) takes a config dictionary with configuration. This is fine for "non-user" calls of this function, such as from server. There might be situation when it is desired to create workspace programmatically and it is more convenient to be able to set arguments as of any other function.

Solution:: The signature should be backends.*.create_workspace(model, **config)

All attributes should be of Attribute not strings

Current status: Inconsistent. Currently there is mix of cubes.model.Attribute instances and strings and it is causing problems on some places where full attribute is needed. Also it causes confusion about how to construct full attribute reference.

It should be consistent all over the model and its parts. cubes.model.Attribute instance is needed for getting full name and some localisation purposes.

Presenters and formatters (visualisation idea)

Add presenters/formatters module which will be able to format data for various visualisation libraries.

Also needed:

  • tables (any javascript table libs suggestions?)

Description

  • Presenter: presents whole result
  • Formatter: formats one value (like a table cell)

API suggestion:

  • get_presenter(name) - create a presenter instance, ex.:get_presenter("highcharts")`
  • Presenter.generate(object) - generate presentation of an object

Notes:

  • HTTP query would include a presenter key

Check all examples to reflect "preferred way"

Check whether all examples are using cubes in a way that is the preferred (that is: using create_workspace() and other new additions).

Make sure that the new examples work with new star browser.

Restructure test suite

Restructure the test suite so that:

  • each module has correspoding test file
  • (at least) each public function has a at least one test case

create_table_from_csv fails with MySql

I tried to modify the hello_world example to use MySql, but get the following error, suggesting that create_table_from_csv doesn't use valid ddl for MySql?

$ python prepare_data.py 
loading data...
Traceback (most recent call last):
  File "prepare_data.py", line 31, in <module>
    create_id=True    
  File "/Library/Python/2.7/site-packages/cubes-0.8.0-py2.7.egg/cubes/tutorial/sql.py", line 37, in create_table_from_csv
    table.create()
  File "/Library/Python/2.7/site-packages/sqlalchemy/schema.py", line 583, in create
    checkfirst=checkfirst)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 2234, in _run_visitor
    conn._run_visitor(visitorcallable, element, **kwargs)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 1904, in _run_visitor
    **kwargs).traverse_single(element)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/visitors.py", line 86, in traverse_single
    return meth(obj, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/ddl.py", line 86, in visit_table
    self.connection.execute(schema.CreateTable(table))
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 1405, in execute
    params)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 1490, in _execute_ddl
    compiled = ddl.compile(dialect=dialect)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/expression.py", line 1724, in compile
    return self._compiler(dialect, bind=bind, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/schema.py", line 2872, in _compiler
    return dialect.ddl_compiler(dialect, self, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 699, in __init__
    self.string = self.process(self.statement)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 718, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/visitors.py", line 59, in _compiler_dispatch
    return getter(visitor)(self, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/compiler.py", line 1389, in visit_create_table
    not first_pk
  File "/Library/Python/2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1369, in get_column_specification
    self.dialect.type_compiler.process(column.type)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 764, in process
    return type_._compiler_dispatch(self)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/visitors.py", line 59, in _compiler_dispatch
    return getter(visitor)(self, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/sql/compiler.py", line 1727, in visit_string
    return self.visit_VARCHAR(type_)
  File "/Library/Python/2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1656, in visit_VARCHAR
    self.dialect.name)
sqlalchemy.exc.CompileError: (in table 'ft_irbd_balance', column 'category'): VARCHAR requires a length on dialect mysql

Move model creation defaults into a utility function

"Explicit is better than implicit."

Current state: During creation of model objects, such as dimensions or cubes, some defaults are applied. For example in the dimension:

  • If no levels are specified during initialization, then dimension name is considered flat, with single attribute.
  • If no hierarchy is specified and levels are specified, then default hierarchy will be created from order of levels
  • If no levels are specified, then one level is created, with name default and dimension will be considered flat

This should not be enforced, as some users might not expect that. Also it might come into a conflict later. This should be moved to some utility function/method:

model.apply_defaults()

Model objects should be kept in their state as created by user, having the model valid or not. That's what model validation is there for. And after using model.apply_defaults() model should be valid, only if there are really serious issues.

Add server output format for charting JS libs

Add ability to the Slicer server to generate aggregation results appropriately formatted for some JS charting lib, so it can be directly used. options: Hi-charts, google charts, ... many more.

Note: This feature should belong to the cubes.js, but might be good idea to have it here for "bootstrapping" the idea.

Add possibility to choose browsing backend

Currently the Slicer server expects only default SQL backend to be used. Create a way how to:

  1. specify backend by some identifier
  2. provide configuration to the backend

Reuse idea from cubes.backends.sql.SQLWorkspace:

  • Workspace should take model as an argument, rest of args are optional
  • Workspace should provide method browser_for_cube(cube)

Cell cutting refactoring and improvement

There should be better cell cutting. Currently there are three ways for creating a cell:

  • initialization from list of cuts
  • slicing through one dimension point with Cell.slice(dimension, path)
  • slicing with multiple cuts with Cell.multi_slice(cuts)

Issues:

  • The two slicing method have similar name but different approach
  • If one wants to add single cut to the cell, the only way is to use Cell.multi_slice([cut]) which is not very nice nor intuitive

Proposal:

  • keep initialization with cuts
  • add method for adding a single cut: slice(cut)
  • add methods for slicing: slice_point(), slice_set(), slice_range() (?)

Still unsure about:

  • should Cell be mutable or immutable?
  • method naming/verbs: cut vs. slice. "Slice & Dice" is common in OLAP terminology, however slice is more part of the cube where "cut" is kind of point/line/boundary definition how the cell slice should be created from the cube.

Error running hello_world example

I install 0.8 version, when run slice to test hello_world example get this error.

ovnicraft-macbook:hello_world ovnicraft$ slice
sliceprint slicer
ovnicraft-macbook:hello_world ovnicraft$ slicer serve slicer.ini
Traceback (most recent call last):
File "/usr/local/bin/slicer", line 12, in
import argparse
ImportError: No module named argparse

Depreciate Dimension.default_hierarchy

Current status: Dimension.default_hierarchy is computed property. There is lots of code that checks whether a hierarchy argument is specified or not and then decides what hierarchy to use.

Solution: Dimension.hierarchy() now accepts None as a valid argument (set as default) and returns default_hierarchy in that case.

TO-DO: go through all the code and remove default_hierarchy.

SQL helper for date data type

Create SQL helper methods or integrate in SQL browsers to break down date data types into date dimension hierarchy levels.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.