Giter Site home page Giter Site logo

stephanepham-sib / terraform-google-bigquery Goto Github PK

View Code? Open in Web Editor NEW

This project forked from terraform-google-modules/terraform-google-bigquery

0.0 0.0 0.0 251 KB

This module allows you to create opinionated Google Cloud Platform BigQuery datasets and tables.

Home Page: https://registry.terraform.io/modules/terraform-google-modules/bigquery/google

License: Apache License 2.0

Ruby 11.67% Makefile 6.71% Shell 14.83% HCL 66.79%

terraform-google-bigquery's Introduction

terraform-google-bigquery

This module allows you to create opinionated Google Cloud Platform BigQuery datasets and tables. This will allow the user to programmatically create an empty table schema inside of a dataset, ready for loading. Additional user accounts and permissions are necessary to begin querying the newly created table(s).

Compatibility

This module is meant for use with Terraform 0.12. If you haven't upgraded and need a Terraform 0.11.x-compatible version of this module, the last released version intended for Terraform 0.11.x is 1.0.0.

Upgrading

The current version is 4.X. The following guides are available to assist with upgrades:

Usage

Basic usage of this module is as follows:

module "bigquery" {
  source  = "terraform-google-modules/bigquery/google"
  version = "~> 4.4"

  dataset_id                  = "foo"
  dataset_name                = "foo"
  description                 = "some description"
  project_id                  = "<PROJECT ID>"
  location                    = "US"
  default_table_expiration_ms = 3600000

  tables = [
  {
    table_id          = "foo",
    schema            =  "<PATH TO THE SCHEMA JSON FILE>",
    time_partitioning = {
      type                     = "DAY",
      field                    = null,
      require_partition_filter = false,
      expiration_ms            = null,
    },
    expiration_time = null,
    clustering      = ["fullVisitorId", "visitId"],
    labels          = {
      env      = "dev"
      billable = "true"
      owner    = "joedoe"
    },
  },
  {
    table_id          = "bar",
    schema            =  "<PATH TO THE SCHEMA JSON FILE>",
    time_partitioning = null,
    expiration_time   = 2524604400000, # 2050/01/01
    clustering        = [],
    labels = {
      env      = "devops"
      billable = "true"
      owner    = "joedoe"
    },
  }
  ],

  views = [
    {
      view_id    = "barview",
      use_legacy_sql = false,
      query          = <<EOF
      SELECT
       column_a,
       column_b,
      FROM
        `project_id.dataset_id.table_id`
      WHERE
        approved_user = SESSION_USER
      EOF,
      labels = {
        env      = "devops"
        billable = "true"
        owner    = "joedoe"
      }
    }
  ]
  dataset_labels = {
    env      = "dev"
    billable = "true"
  }
}

Functional examples are included in the examples directory.

Variable tables detailed description

The tables variable should be provided as a list of object with the following keys:

{
  table_id = "some_id"                        # Unique table id (will be used as ID and Freandly name for the table).
  schema = "path/to/schema.json"              # Path to the schema json file.
  time_partitioning = {                       # Set it to `null` to omit partitioning configuration for the table.
        type                     = "DAY",     # The only type supported is DAY, which will generate one partition per day based on data loading time.
        field                    = null,      # The field used to determine how to create a time-based partition. If time-based partitioning is enabled without this value, the table is partitioned based on the load time. Set it to `null` to omit configuration.
        require_partition_filter = false,     # If set to true, queries over this table require a partition filter that can be used for partition elimination to be specified. Set it to `null` to omit configuration.
        expiration_ms            = null,      # Number of milliseconds for which to keep the storage for a partition.
      },
  clustering = ["fullVisitorId", "visitId"]   # Specifies column names to use for data clustering. Up to four top-level columns are allowed, and should be specified in descending priority order. Partitioning should be configured in order to use clustering.
  expiration_time = 2524604400000             # The time when this table expires, in milliseconds since the epoch. If set to `null`, the table will persist indefinitely.
  labels = {                                  # A mapping of labels to assign to the table.
      env      = "dev"
      billable = "true"
    }
}

Variable views detailed description

The views variable should be provided as a list of object with the following keys:

{
  view_id = "some_id"                                                # Unique view id. it will be set to friendly name as well
  query = "Select user_id, name from `project_id.dataset_id.table`"  # the Select query that will create the view. Tables should be created before.
  use_legacy_sql = false                                             # whether to use legacy sql or standard sql
  labels = {                                                         # A mapping of labels to assign to the view.
      env      = "dev"
      billable = "true"
  }
}

A detailed example with authorized views can be found here.

Features

This module provisions a dataset and a list of tables with associated JSON schemas and views from queries.

Inputs

Name Description Type Default Required
access An array of objects that define dataset access for one or more entities. any
[
{
"role": "roles/bigquery.dataOwner",
"special_group": "projectOwners"
}
]
no
dataset_id Unique ID for the dataset being provisioned. string n/a yes
dataset_labels Key value pairs in a map for dataset labels map(string) {} no
dataset_name Friendly name for the dataset being provisioned. string null no
default_table_expiration_ms TTL of tables using the dataset in MS number null no
delete_contents_on_destroy (Optional) If set to true, delete all the tables in the dataset when destroying the resource; otherwise, destroying the resource will fail if tables are present. bool null no
description Dataset description. string null no
encryption_key Default encryption key to apply to the dataset. Defaults to null (Google-managed). string null no
external_tables A list of objects which include table_id, expiration_time, external_data_configuration, and labels.
list(object({
table_id = string,
autodetect = bool,
compression = string,
ignore_unknown_values = bool,
max_bad_records = number,
schema = string,
source_format = string,
source_uris = list(string),
csv_options = object({
quote = string,
allow_jagged_rows = bool,
allow_quoted_newlines = bool,
encoding = string,
field_delimiter = string,
skip_leading_rows = number,
}),
google_sheets_options = object({
range = string,
skip_leading_rows = number,
}),
hive_partitioning_options = object({
mode = string,
source_uri_prefix = string,
}),
expiration_time = string,
labels = map(string),
}))
[] no
location The regional location for the dataset only US and EU are allowed in module string "US" no
project_id Project where the dataset and table are created string n/a yes
tables A list of objects which include table_id, schema, clustering, time_partitioning, expiration_time and labels.
list(object({
table_id = string,
schema = string,
clustering = list(string),
time_partitioning = object({
expiration_ms = string,
field = string,
type = string,
require_partition_filter = bool,
}),
expiration_time = string,
labels = map(string),
}))
[] no
views A list of objects which include table_id, which is view id, and view query
list(object({
view_id = string,
query = string,
use_legacy_sql = bool,
labels = map(string),
}))
[] no

Outputs

Name Description
bigquery_dataset Bigquery dataset resource.
bigquery_external_tables Map of BigQuery external table resources being provisioned.
bigquery_tables Map of bigquery table resources being provisioned.
bigquery_views Map of bigquery view resources being provisioned.
external_table_ids Unique IDs for any external tables being provisioned
external_table_names Friendly names for any external tables being provisioned
project Project where the dataset and tables are created
table_ids Unique id for the table being provisioned
table_names Friendly name for the table being provisioned
view_ids Unique id for the view being provisioned
view_names friendlyname for the view being provisioned

Requirements

These sections describe requirements for using this module.

Software

The following dependencies must be available:

Service Account

A service account with the following roles must be used to provision the resources of this module:

  • BigQuery Data Owner: roles/bigquery.dataOwner

The Project Factory module and the IAM module may be used in combination to provision a service account with the necessary roles applied.

Script Helper

A helper script for configuring a Service Account is located at (./helpers/setup-sa.sh).

APIs

A project with the following APIs enabled must be used to host the resources of this module:

  • BigQuery JSON API: bigquery-json.googleapis.com

The Project Factory module can be used to provision a project with the necessary APIs enabled.

Contributing

Refer to the contribution guidelines for information on contributing to this module.

terraform-google-bigquery's People

Contributors

aaron-lane avatar averbuks avatar morgante avatar tdigangi avatar ingwarr avatar umairidris avatar release-please[bot] avatar yunus avatar alexander-rondon avatar cloud-foundation-bot avatar diloreto avatar thiagonache avatar paulyy-y avatar soggycactus avatar kpeder avatar ivankorn avatar hsorellana avatar erjohnso avatar bharathkkb avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.