Giter Site home page Giter Site logo

powerbi-vcs's Introduction

Project status

We're really busy at the moment (Jan 2018) and have put development of this on hold until we start to need it internally (which is likely to be a few months). If you're interested in using this, you have a few options:

File Format Specification

The PBIX and PBIT files are Open Packaging Conventions files. Within a PBIX container there are two binary files of particular note, which would require further conversion for storage within a VCS. Some of this work can be skipped by saving the file as a PBIT.

Binary blob format specifications

These can be used to further enhance the converters, if anyone ever has the time. There is no guarentee these formats are exact or current. The specifications are intended for the streams embedded in Excel files. However they are closely related (and may be identical).

ย 


NOTE: this is not yet ready to be used!

Introduction

Power BI does not (currently) support integration with source control, which is a real pain, most notably because *.pbi{tx} cannot be diffed and merged. This means:

  • repos blow up quickly, as even with minor changes, the entire *.pbi{xt} is saved.
  • it's hard to collaborate as changes from two developers can't be merged, and hence changes must be made one after the other (so you're effectively limited to a single full-time developer per report).

This repo aims to improve this as much as possible (without tweaking Power BI itself) until Powe BI itself supports this.

That's right ... this is only a temporary hack, and should be treated as such.

It abuses the fact that *.pbi{tx} files are (nearly) just (double) ZIP compressed folders which follow a specific structure.

Installation [TODO]

Install python 3 (I recommend Anaconda if you're using Windows). Until someone writes the install script: just run the pbivcs.py file

What do I get (currently)?

Say you've just made some changes to your Power BI file apples.pbix and you want to add it to version control. First, you'll need to export it as a Power BI Template i.e. apples.pbit, and then extract it into a VCS-friendly format:

pbivcs -x apples.pbit apples.pbit.vcs

will extract your apples.pbit into the VCS-friendly format at apples.pbit.vcs. If you choose, you can [TODO] automatically check that this will compress into a valid pbit. Then, for example

git commit -a -m "apples are so awesome"

will (assuming you've set it up as outlined below):

  1. [TODO] check you haven't accidentally forgotten to export a new pbit from your pbix
  2. commit apples.pbit.vcs to git
  3. (optionally) ignore apples.pbit and apples.pbix or [TODO] replace them with chksums (or a link to a file store of all versions? depending on your CI. TODO: can we actually create an apples.pbit.history folder to contain these? This would ensure no pbit is ever over-written.)

Now, suppose your colleague had also made a change to the same report. Then a git pull and git diff might show something like this:

...
-           "value": "Apples are yummy",
+           "value": "Apples are awesome",
...

and you can see that your colleague has just changed the title. There aren't any major conflicts, so you can happily git merge and merge your work.

You then want to make another change, so you need to compress the VCS-friendly format back to your pbit, which is as easy as

pbivcs -c apples.pbit.vcs apples.pbit

(and yes, since you're super careful, you can control how overwrites etc. happen).

Git textconv driver support

This option dumps the extracted file contents to standard out to allow for better diffs in git of files which were commited in the binary PBIT or PBIX format.

Add to repo .gitattributes file:

*.pbit diff=pbit
*.pbix diff=pbit

Add to global or local .gitconfig file:

[diff "pbit"]
	textconv = pbivcs -s

Diffs in git will do their diff on the extracted file content. Textconv diffs are only a visual guide, and can't be used to merge changes, but this provides better insight into what has changed in the power bi report.

Documentation of git textconv drivers [https://git.wiki.kernel.org/index.php/Textconv]

Other cool features

  • TODO: change control: we'll attempt to keep this as up-to-date with Power BI as possible. The version of this tool that was used will be saved in any extraction/compression process, to allow (in theory) this tool to work on a complete git history, regardless of the Power BI versions used. (Provided this tool always functioned.)
  • TODO: everything's configurable to your level of comfort (e.g. always overwrite files, or check first, etc.)
  • lots of configuration. There a built-in defaults (conservative safe ones), but you can also specify your own defaults (in a hierarchy of .pbivcs.conf files), as well as utilising environment variables, and command line arguments. See below [TODO]

What don't I get?

  • unthinking automation (at least for now):
    • you still need to manually export a *.pbit from your *.pbix
    • you have to run scripts before/after the git actions. If this solution proves to be robust, we may automate this somewhat with git hooks or filters, but I'm wary of the bugs these may introduce into the user experience.

Configuration

We use ConfigArgParse, which means pbivcs has the following configurations:

  • built-in defaults (which tend to be safe and conservative)
  • your own .pbivcs.conf files
  • environment variables
  • command line arguments

where each levels takes precedence over the one before. The main use is the .pbivcs.conf files which means you can customise it to behave as you want, without having to enter the options at the command line. The location of these files is such that they must be siblings of one of the elements on the path of you input file. E.g. if you run pbivcs -x /path/to/my/file.pbit then the following configuration files will be used (if they exist):

  • /.pbivcs.conf
  • /path/.pbivcs.conf
  • /path/to/.pbivcs.conf
  • /path/to/my/.pbivcs.conf

where each one takes precendence over the one preceeding it. Usually this would mean you would set a global .pbivcs.conf at the root of your project, but means you can have further ones in different parts of the project if you want different behaviour for the odd report.

Roadmap

  • figure out how to export *.pbit from *.pbix automatically
  • support other VCS ...
  • some git utility scripts e.g. to remove old *.pbix from repo and rebuild it as if we'd been using this tool the whole way along (i.e. replace *.pbit with the extracted version so we can hence track diffs)
  • automate git somewhat with hooks or filters

Contributing

TODO

License

See ./license.md.

TODO (before 'release')

  • argparse etc.
  • provision script that sets up given repo: provide git template .gitignore and .gitattribtes (e.g. to ignore *.pbix or smudge them to a checksum, and ignore changed modifiedTime etc. in diffs.
  • tests ... how?
  • change control ... save version of tool used?
  • after compressing, test that the decompressed version is valid (by opening in Power BI Desktop)?
  • complete install instructions inc. conda environment

Discussion

What about Power BI support?

Good question. Unfortunately, there are no indications of when this will be provided by Power BI.

What about git filters?

Sure you could do something like zippey. However, I think (?) this requires you to map a single file (*.pbit) to a single other file (in whatever format). While you could do something like in zippey (concatenating them all etc.) it'd start getting messy (especially with it still containing binary content e.g. images), and I'm not a fan. I also don't really like the idea of using automated filters (at least until I know more about how these are used in git).

Why not automate with git hooks?

Firstly, git hooks aren't shared between repos. Not a major, just saying.

Secondly, I don't know how things would behave in all situations. E.g. if you add the *.pbit and a hook runs to convert it to the VCS format. What then happens if you want to make a change to it? Anyway, if someone knows better, let me know (or submit a PR).

Tests

  • check that configargparse and use of config files behaves as expected

powerbi-vcs's People

Contributors

andasm avatar bartbroere avatar kcaswick avatar kodonnell avatar naiquevin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

powerbi-vcs's Issues

Any Idea where / How Bookmarks deserialize?

Hey Amazing work, Having fun. Any idea where bookmarks are in the vcs output??
I can't seem to find them, likely I'm doing something wrong...but I'm not finding them. Seems this might be by design, as a TODO was to check the complement of files. Just guessing.

A script to retrive only the relevant changes

Hi!

From what I've seen, the most relevant changes of the .pbit file are contained in the *.m file content and certain parts of the DataModelSchema file.

The .m files seem quite simple, however all of the "modifiedTime" timestamps and similar changes in DataModelSchema file really overcrowd the output.

Have you had any luck with simplifying the git log -p output of the relevant changes to the file?

It seems hard justifying the use of the extraction if locating the changes is that difficult.

P.S. This might be a noob question coming from a git beginner, but I started concidering version control just for the team-work on Power BI files :)

Mutating JSON?

Hi this is kinda off topic, but I'm at my wits end and looking for advice.

I'd like to update a JSON object in the vcs representation (replace a page with another page).
My Question is. Is anyone doing this? And how?

Details:
I've been trying to mutate (change) the array of objects in the sections object of Report\Layout
I've been using python jsonpath_rw w/ it's extentions to update the json document. (HERE)

But although I can reference objects with JsonPath successfully - down to the individual ReportPage.
which is a JSON array element of Report\Layout (sections).

I'm not having luck mutating the matched object found with the jsonpath. Seems I can mutate / update the root, with a search of '$' but anything else doesn't change the document and I'm kinda a newbie with all this.

I can get the contents of Report\Layouts sections[] both as a list of ReportPageObjects (my term) and just the sections object that are the parents to the list of ReportPageObjects.
But I'm not able to mutate (basically replace a existing ReportPageObject)

And it's likely bc all this is new to me.... any advice is welcome.

Thank you!

Dotnet port

Hi,

I've created a dotnet port of this project. https://github.com/Togusa09/powerbi-vcs-dotnet

I was wanting to improve the git diffs for powerbi files by providing it the extracted file contents to diff on. Your scripts were the best resource I could find regarding the makeup of the pbit/pbix files, so I've ported your scripts to c# as a base for me to work off.

The compression back to pbit/pbix is still incomplete in the port, but I intent to finish it when I get a chance.

Regards,

Ben

Error with Python 2.7.13

cannot run the script with python 2.7.13
Also tested with python 3.6.2.

Traceback (most recent call last):
File "pbivcs.py", line 123, in
if args['extract']:
TypeError: 'Namespace' object has no attribute 'getitem'

handle `modifiedTime` nicely

See #7 - often small changes get obscured by e.g. modifiedTime changing. Can we handle this nicely so that e.g. it's not shown in the diff? I think there might be ways using filters to strip out patterns and hence ignore them from the diff.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.