Giter Site home page Giter Site logo

powerbi-table-scraper's Introduction

powerbi-table-scraper

ci python Poetry Code style: black Checked with pyright

Python tool for scraping Power BI tables into an Excel or CSV file using Selenium. The tool can be run as a console application or with a GUI.

GUI Screenshot

Prerequisites

Installation

  1. Clone the repository:
git clone https://github.com/holstt/powerbi-table-scraper.git
cd powerbi-table-scraper
  1. Install dependencies using Poetry:
poetry install

For non-poetry users, a requirements.txt file is also provided.

Configuration

To set up configuration:

  1. Rename config.example.yml to config.yml.

  2. Update the config.yml file with your specific settings.

  • To switch between GUI and Console mode, change the mode value to either gui or console.
  • Depending on the mode, the gui or console section of the config file will be used. The other section will be ignored, but you can keep it in the file if you still want to have the possibility to switch between modes.
# EXAMPLE CONFIG FILE

mode: gui # REQUIRED: Options: gui or console

should_uncheck_filter: true # OPTIONAL (default=false): Find checkbox filter and uncheck all checkboxes before scraping
max_rows: null # OPTIONAL (default=None): Set a maximum number of rows to scrape (e.g. for reducing scraping time during testing)

console:
    url: https://app.powerbi.com/XXXXX # REQUIRED: URL to the Power BI report that should be scraped
    is_headless: true # OPTIONAL (default=true): 'true' hides the the browser window during scraping
    output_format: excel # OPTIONAL (default=excel): Options: excel, csv
    output_path: ./table.xlsx # OPTIONAL (default="./table.xlsx"): File extension should match the output_format (i.e. .xlsx for excel and .csv for csv)

gui:
    language: en # OPTIONAL (defaul=en): Options: en, da
    program_name: Power BI Table Scraper # OPTIONAL (defaul=Power BI Table Scraper) The program name that should be displayed in the GUI

    # Default values in the GUI. Can be changed by the user.
    default_values:
        url: https://app.powerbi.com/XXXXX # OPTIONAL (default=None): URL to the Power BI report that should be scraped
        is_headless: true # OPTIONAL (default=true): 'true' hides the the browser window during scraping
        output_format: excel # OPTIONAL (default=excel): excel or csv
        output_path: null # OPTIONAL(default=None): User is always required to browse for a valid path before being able to run the scraper unless a default path is specified here. File extension should match the output_format (i.e. .xlsx for excel and .csv for csv)

i.e. the minimum required configuration for the console mode is:

mode: console
console:
    url: https://app.powerbi.com/XXXXX

and for the GUI mode:

mode: gui

Usage

After setting up ./config.yml:

python main.py

For the GUI mode, follow the on-screen instructions. For the Console mode, scraping will start automatically based on the settings defined in config.yml.

Creating a Standalone Executable with PyInstaller

To create a standalone executable of the tool, run the following command:

pyinstaller --onefile --noconsole --name="Power BI Table Scraper" ./src/main.py

The executable will be created in the ./dist folder - remember to include the config.yml file in the same folder as the executable. Now the tool can be run without having to install Python or any dependencies.

powerbi-table-scraper's People

Contributors

holstt avatar github-actions[bot] avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.