Giter Site home page Giter Site logo

stellar / stellar-etl Goto Github PK

View Code? Open in Web Editor NEW
28.0 19.0 12.0 2.33 MB

Stellar ETL will enable real-time analytics on the Stellar network

License: Apache License 2.0

Go 99.82% Makefile 0.08% Dockerfile 0.10%
stellar stellar-network stellar-lumens etl-pipeline data-analysis etl-framework ethereum bitcoin blockchain

stellar-etl's Introduction

Stellar ETL

The Stellar-ETL is a data pipeline that allows users to extract data from the history of the Stellar network.

Table of Contents



Exporting the Ledger Chain

Docker

  1. Download the latest version of docker Docker
  2. Pull the stellar-etl Docker image: docker pull stellar/stellar-etl
  3. Run the Docker images with the desired stellar-etl command: docker run stellar/stellar-etl stellar-etl [etl-command] [etl-command arguments]

Manual Installation

  1. Install Golang v1.19.0 or later: https://golang.org/dl/

  2. Ensure that your Go bin has been added to the PATH env variable: export PATH=$PATH:$(go env GOPATH)/bin

  3. Download and install Stellar-Core v19.0.0 or later: https://github.com/stellar/stellar-core/blob/master/INSTALL.md

  4. Run go get github.com/stellar/stellar-etl to install the ETL

  5. Run export commands to export information about the legder

Command Reference

Every command accepts a -h parameter, which provides a help screen containing information about the command, its usage, and its flags.

Commands have the option to read from testnet with the --testnet flag, from futurenet with the --futurenet flag, and defaults to reading from mainnet without any flags.

NOTE: Adding both flags will default to testnet. Each stellar-etl command can only run from one network at a time.



Bucket List Commands

These commands use the bucket list in order to ingest large amounts of data from the history of the stellar ledger. If you are trying to read large amounts of information in order to catch up to the current state of the ledger, these commands provide a good way to catchup quickly. However, they don't allow for custom start-ledger values. For updating within a user-defined range, see the Stellar Core commands.

NOTE: In order to get information within a specified ledger range for bucket list commands, see the export_ledger_entry_changes command.


export_accounts

> stellar-etl export_accounts --end-ledger 500000 --output exported_accounts.txt

Exports historical account data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get account information within a specified ledger range, see the export_ledger_entry_changes command.


export_offers

> stellar-etl export_offers --end-ledger 500000 --output exported_offers.txt

Exports historical offer data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get offer information within a specified ledger range, see the export_ledger_entry_changes command.


export_trustlines

> stellar-etl export_trustlines --end-ledger 500000 --output exported_trustlines.txt

Exports historical trustline data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get trustline information within a specified ledger range, see the export_ledger_entry_changes command.


export_claimable_balances

> stellar-etl export_claimable_balances --end-ledger 500000 --output exported_claimable_balances.txt

Exports claimable balances data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get claimable balances information within a specified ledger range, see the export_ledger_entry_changes command.


export_pools

> stellar-etl export_pools --end-ledger 500000 --output exported_pools.txt

Exports historical liquidity pools data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get liquidity pools information within a specified ledger range, see the export_ledger_entry_changes command.


export_signers

> stellar-etl export_signers --end-ledger 500000 --output exported_signers.txt

Exports historical account signers data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get account signers information within a specified ledger range, see the export_ledger_entry_changes command.


export_contract_data

> stellar-etl export_contract_data --end-ledger 500000 --output export_contract_data.txt

Exports historical contract data data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get contract data information within a specified ledger range, see the export_ledger_entry_changes command.


export_contract_code

> stellar-etl export_contract_code --end-ledger 500000 --output export_contract_code.txt

Exports historical contract code data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get contract code information within a specified ledger range, see the export_ledger_entry_changes command.


export_config_settings

> stellar-etl export_config_settings --end-ledger 500000 --output export_config_settings.txt

Exports historical config settings data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get config settings data information within a specified ledger range, see the export_ledger_entry_changes command.


export_ttl

> stellar-etl export_ttl --end-ledger 500000 --output export_ttl.txt

Exports historical expiration data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get expiration information within a specified ledger range, see the export_ledger_entry_changes command.



History Archive Commands

These commands export information using the history archives. This allows users to provide a start and end ledger range. The commands in this category export a list of everything that occurred within the provided range. All of the ranges are inclusive.

NOTE: Commands except export_ledgers and export_assets also require Captive Core to export data.


export_ledgers

> stellar-etl export_ledgers --start-ledger 1000 \
--end-ledger 500000 --output exported_ledgers.txt

This command exports ledgers within the provided range.


export_transactions

> stellar-etl export_transactions --start-ledger 1000 \
--end-ledger 500000 --output exported_transactions.txt

This command exports transactions within the provided range.


export_operations

> stellar-etl export_operations --start-ledger 1000 \
--end-ledger 500000 --output exported_operations.txt

This command exports operations within the provided range.


export_effects

> stellar-etl export_effects --start-ledger 1000 \
--end-ledger 500000 --output exported_effects.txt

This command exports effects within the provided range.


export_assets

> stellar-etl export_assets \
--start-ledger 1000 \
--end-ledger 500000 --output exported_assets.txt

Exports the assets that are created from payment operations over a specified ledger range.


export_trades

> stellar-etl export_trades \
--start-ledger 1000 \
--end-ledger 500000 --output exported_trades.txt

Exports trade data within the specified range to an output file


export_diagnostic_events

> stellar-etl export_diagnostic_events \
--start-ledger 1000 \
--end-ledger 500000 --output export_diagnostic_events.txt

Exports diagnostic events data within the specified range to an output file



Stellar Core Commands

These commands require a Stellar Core instance that is v19.0.0 or later. The commands use the Core instance to retrieve information about changes from the ledger. These changes can be in the form of accounts, offers, trustlines, claimable balances, liquidity pools, or account signers.

As the Stellar network grows, the Stellar Core instance has to catch up on an increasingly large amount of information. This catch-up process can add some overhead to the commands in this category. In order to avoid this overhead, run prefer processing larger ranges instead of many small ones, or use unbounded mode.


export_ledger_entry_changes

> stellar-etl export_ledger_entry_changes --start-ledger 1000 \
--end-ledger 500000 --output exported_changes_folder/

This command exports ledger changes within the provided ledger range. Flags can filter which ledger entry types are exported. If no data type flags are set, then by default all types are exported. If any are set, it is assumed that the others should not be exported.

Changes are exported in batches of a size defined by the batch-size flag. By default, the batch-size parameter is set to 64 ledgers, which corresponds to a five minute period of time. This batch size is convenient because checkpoint ledgers are created every 64 ledgers. Checkpoint ledgers act as anchoring points for the nodes on the network, so it is beneficial to export in multiples of 64.

This command has two modes: bounded and unbounded.

Bounded

If both a start and end ledger are provided, then the command runs in a bounded mode. This means that once all the ledgers in the range are processed and exported, the command shuts down.

Unbounded

If only a start ledger is provided, then the command runs in an unbounded fashion starting from the provided ledger. In this mode, the Stellar Core connects to the Stellar network and processes new changes as they occur on the network. Since the changes are continually exported in batches, this process can be continually run in the background in order to avoid the overhead of closing and starting new Stellar Core instances.


export_orderbooks (unsupported)

> stellar-etl export_orderbooks --start-ledger 1000 \
--end-ledger 500000 --output exported_orderbooks_folder/

NOTE: This is an expermental feature and is currently unsupported.

This command exports orderbooks within the provided ledger range. Since exporting complete orderbooks at every single ledger would require an excessive amount of storage space, the output is normalized. Each batch that is exported contains multiple files, namely: dimAccounts.txt, dimOffers.txt, dimMarkets.txt, and factEvents.txt. The dim files relate a data structure to an ID. dimMarkets, for example, contains the buying and selling assets of a market, as well as the ID for that market. That ID is used in other places as a replacement for the full market information. This normalization process saves a significant amount of space (roughly 90% in our benchmarks). The factEvents file connects ledger numbers to the offer IDs that were present at that ledger.

Orderbooks are exported in batches of a size defined by the batch-size flag. By default, the batch-size parameter is set to 64 ledgers, which corresponds to a five minute period of time. This batch size is convenient because checkpoint ledgers are created every 64 ledgers. Checkpoint ledgers act as anchoring points in that once they are available, so are the previous 63 nodes. It is beneficial to export in multiples of 64.

This command has two modes: bounded and unbounded.

Bounded

If both a start and end ledger are provided, then the command runs in a bounded mode. This means that once all the ledgers in the range are processed and exported, the command shuts down.

Unbounded

If only a start ledger is provided, then the command runs in an unbounded fashion starting from the provided ledger. In this mode, the Stellar Core connects to the Stellar network and processes new orderbooks as they occur on the network. Since the changes are continually exported in batches, this process can be continually run in the background in order to avoid the overhead of closing and starting new Stellar Core instances.



Utility Commands

get_ledger_range_from_times

> stellar-etl get_ledger_range_from_times \
--start-time 2019-09-13T23:00:00+00:00 \
--end-time 2019-09-14T13:35:10+00:00 --output exported_range.txt

This command exports takes in a start and end time and converts it to a ledger range. The ledger range that is returned will be the smallest possible ledger range that completely covers the provided time period.



Schemas

See https://github.com/stellar/stellar-etl/blob/master/internal/transform/schema.go for the schemas of the data structures that are outputted by the ETL.



Extensions

This section covers some possible extensions or further work that can be done.

Adding New Commands

In general, in order to add new commands, you need to add these files:

  • export_new_data_structure.go in the cmd folder
    • This file can be generated with cobra by calling: cobra add {command}
    • This file will parse flags, create output files, get the transformed data from the input package, and then export the data.
  • export_new_data_structure_test.go in the cmd folder
    • This file will contain some tests for the newly added command. The runCLI function does most of the heavy lifting. All the tests need is the command arguments to test and the desired output.
    • Test data should be stored in the testdata/new_data_structure folder
  • new_data_structure.go in the internal/input folder
    • This file will contain the methods needed to extract the new data structure from wherever it is located. This may be the history archives, the bucket list, or a captive core instance.
    • This file should extract the data and transform it, and return the transformed data.
    • If working with captive core, the methods need to work in the background. There should be methods that export batches of data and send them to a channel. There should be other methods that read from the channel and transform the data so it can be exported.
  • new_data_structure.go in the internal/transform folder
    • This file will contain the methods needed to transform the extracted data into a form that is suitable for BigQuery.
    • The struct definition for the transformed object should be stored in schemas.go in the internal/transform folder.

A good number of common methods are already written and stored in the util package.

stellar-etl's People

Contributors

2opremio avatar acharb avatar cayod avatar chowbao avatar debnil avatar dependabot[bot] avatar edualvess avatar erika-sdf avatar isaiah-turner avatar jacekn avatar kanwalpreetd avatar laysabit avatar lucaszanotelli avatar sfsf9797 avatar sydneynotthecity avatar tamirms avatar vitoravancini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stellar-etl's Issues

[1](1) cmd: implement command for exporting trades

What

We need to implement the planned functionality for exporting trades. The command will use the transform package functions for transforming data. Part of #53.

Why

This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated trade data.

Investigate dependencies between commands

What

Some CLI commands are related, and we may need some way to ensure that our results stay in sync. For example, we don't want the information we export about trades to indicate an offer is completely filled while that offer is represented as unfilled in our table. We need to look into what data is exported in each command to ensure that everything is in sync.

Why

If our data becomes out of sync, then it hurts every project relying on the BigQuery table. It likely would be difficult to fix out of sync data as well.

Implement CLI commands

We will need to implement the functionality for the CLI commands. This involves retrieving information from the history archives based on command line arguments, transforming the data that is received using the transform package, and then exporting the data. In addition, all the planned flags should be functional.

  • accounts (#54)
  • ledgers (#55)
  • offers (#56)
  • operations (#57)
  • trades (#58)
  • transactions (#59)
  • trustlines (#60)

Implement transform function for offer data

What

We need to implement a function that transforms the information we get about offers from the new ingestion system into a form suitable for BigQuery. Part of #33.

Why

This function will be used when exporting offer data from the CLI.

Add ids from Horizon where applicable

What

We need to implement the horizon ids for ledgers, transactions, operations, and offers. Horizon uses an internal ToID package, which we should replicate.

Why

Right now, the ETL cannot output identification that is consistent with Horizon’s ids. We would like ecosystem wide consistency so that users who build off different ingestion engines see the same results, so we need to implement Horizon's id system.

Implement transform function for transaction data

What

We need to implement a function that transforms the information we get about transactions from the new ingestion system into a form suitable for BigQuery. Part of #33.

Why

This function will be used when exporting transaction information from the CLI.

Implement transform function for trade data

What

We need to implement a function that transforms the information we get about trades from the new ingestion system into a form suitable for BigQuery. Part of #33.

Why

This function will be used when exporting trade information from the CLI.

cmd: set up operations command

We want to set up an operations command, which will eventually get operations in a specified range. This will live in cmd/operations.go and correspond to the history_operations table in BigQuery.

Review core data structures in Stellar.

Reviewing the core data structures in the Stellar protocol will help guide an understanding of the functions needed in the CLI tool. It's also important to compare these with the data structures used in other blockchains, like unspent transactions, because that will help understand the differences between this CLI tool and the existing ones.

Useful links include:

  • the Horizon API (here)
  • the developer guide (here)
  • the accounts model (here)
  • comparison between utxos and accounts (here)

cmd: set up trustlines command

We want to set up a trustlines command, which will eventually get changes to trustlines in a specified timerange. This will live in cmd/trustlines.go and correspond to the trust_lines table in BigQuery.

cmd: implement command for exporting transactions

What

We need to implement the planned functionality for exporting transactions. The command will use the transform package functions for transforming data. Part of #53.

Why

This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated transaction data.

Map out trades input and output data structures for CLI

What

We need to map out what information we get about trades from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.

Why

This issue helps us know what information about trades we can get from the ingestion system and what information may have to be reconstructed or omitted.

Ensure that type casts don't lose information

What

Some type casts in the transform package may be invalid. Casting from uint32 to int32 could result in a loss of information, since uint32s have a higher maximum value. Similarly, casting from int64 to int32 could result in a loss of information. We need to ensure these casts do not happen, likely by changing the type of the field in the output struct to ensure it can hold all the information.

Why

Losing data in type casts makes the ETL transform functions unreliable. We want to have an accurate and complete picture of the data.

Map out offer input and output data structures for CLI

What

We need to map out what information we get about offers from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.

Why

This issue helps us know what information about offers we can get from the ingestion system and what information may have to be reconstructed or omitted.

[8](8) Dockerize CLI

What

We need to put the CLI into a docker container.

Why

Docker makes deploying and running the CLI across different operating systems and hardware easier.

cmd: implement command for exporting accounts

What

We need to implement the planned functionality for exporting accounts. The command will use the transform package functions for transforming data. Part of #53.

Why

This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated account data.

Implement transform function for operations data

What

We need to implement a function that transforms the information we get about operations from the new ingestion system into a form suitable for BigQuery. Part of #33.

Why

This function will be used when exporting operations data from the CLI.

cmd: set up accounts command

We want to set up an accounts command, which will eventually get accounts in a specified timerange. This will live in cmd/accounts.go and correspond to the accounts table in BigQuery.

Implement transform package functions

What

Once the input and output data structures are mapped out, we need to implement the various transform functions. These functions should be defined in an internal directory as part of a package called transform. Each transform function should also have associated unit test.

  • accounts (#34)
  • assets (#44)
  • ledgers (#35)
  • offers (#36)
  • operations (#37)
  • trades (#38)
  • transactions (#45 )
  • trustlines (#39)

Why

Implementing the transforms is a key part of the ETL. It will ensure that we can successfully pass data from the ingestion system into BigQuery.

Map out operations input and output data structures for CLI

What

We need to map out what information we get about operations from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.

Why

This issue helps us know what information about operations we can get from the ingestion system and what information may have to be reconstructed or omitted.

Map out account input and output data structures for CLI

What

We need to map out what information we get about accounts from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.

Why

This issue helps us know what account information is easily accessible from the ingestion system and what information may have to be reconstructed or omitted.

Implement id alternatives

What

We will implement alternative methods of identification for the various data structures. Some of these are already implemented and require no additional work.

  • Ledgers: use sequence number
  • Transactions: use transaction hash
  • Operations: use operation hash
  • Assets: use asset code, issuer and type
  • Accounts: use account address
  • Offers: already have ids
  • Trades: no id needed
  • Trustlines: no id needed

Why

We currently are unable to recreate any of the ids that are in the BigQuery tables. In the future that may change, since the horizon team does have a way of assigning ids that we may be able to access. For now, we need to have replacements for ids so that we can connect tables together.

Map out asset input and output data structures for CLI

What

We need to map out what information we get about assets from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.

Why

This issue helps us know what asset information is easily accessible from the ingestion system and what information may have to be reconstructed or omitted.

Map out transaction input and output data structures for CLI

What

We need to map out what information we get about transactions from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.

Why

This issue helps us know what transaction information is easily accessible from the ingestion system and what information may have to be reconstructed or omitted.

Map out trustline input and output data structures for CLI

What

We need to map out what information we get about trustlines from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.

Why

This issue helps us know what information about trustlines we can get from the ingestion system and what information may have to be reconstructed or omitted.

Implement transform function for ledger data

What

We need to implement a function that transforms the information we get about ledgers from the new ingestion system into a form suitable for BigQuery. Part of #33.

Why

This function will be used when exporting ledger information from the CLI.

Map out input and output data structures for CLI

After outlining the individual commands, we should think about the data transformation that this CLI tool will do. Specifically, how are the data structures from the ingestion system transformed to the outputs sent to BigQuery? Investigating the former will build familiarity with the ingestion system. It's also likely more technically complex than the latter, which will most likely be a basic serialization/deserialization using an appropriate existing framework.

So, with that in mind, here's a checklist of the different data structures to track. In comments on this issue, you should indicate (1) which function from the ingestion system's ledgerbackend package you would call in implementation; (2) the data structure that that results in; and (3) the struct you eventually want to output to BQ. This will also help us figure out which command-line tools have more direct implementations from the ingestion system, and which require more work.

  • accounts (#23)
  • assets (#42)
  • ledgers (#24)
  • offers (#25)
  • operations (#26)
  • trades (#27)
  • transactions (#43)
  • trustlines (#28)

Review command-line interfaces for existing Blockchain ETL.

Reviewing the command-line interfaces for existing Blockchain ETL projects will help inform intuition for the tool's desired functionality. In particular, it'll be helpful to think about the core data structures of those projects, and how they compare to Stellar's. Consider the Bitcoin ETL as a good example. Note that these are implemented in Python, while our planned tool is in Go.

cmd: implement command for exporting offers

What

We need to implement the planned functionality for exporting offers. The command will use the transform package functions for transforming data. Part of #53.

Why

This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated offer data.

Set up Go repo.

Before starting any implementation, we must initially set up a Go repository (using modules).

cmd: set up history accounts command

We want to set up a history accounts command, which will eventually get account history in a specified timerange. This will live in cmd/history_accounts.go .

Review system architecture for existing Blockchain ETL projects.

Reviewing the system architecture for existing Blockchain ETL projects is an essential first step in the Stellar ETL project. This will build an intuition for the system as a whole, help understand how the various pieces fit together, and contribute to the design of a minimal yet effective command-line tool.

The necessary information is in section 2 here.

Investigate having a mandatory long-running stellar core

What

We need to figure out if it is reasonable to require the user to set up and run a stellar-core instance. We need to determine when it is necessary (possibly not needed for ledgers, transactions, and operations).

Why

Captive core takes a while to startup, so we can't run a new instance every time the CLI command is called. We need to have a core instance running in the background that can handle all exports.

Figure out the serialization methods of other blockchain ETLs

What

Look at existing blockchain ETL projects and figure out the serialization methods they use to output data. Also, figure out what additional configuration files or libraries they use for serialization.

Why

We need a serialization method for the Stellar ETL, and by looking at existing ETLs we can get an idea of the best method for us.

cmd: set up ledgers command

We want to set up a ledgers command, which will eventually get ledgers in a specified timerange. This will live in cmd/ledgers.go and correspond to the history_ledgers table in BigQuery.

cmd: set up trades command

We want to set up a trades command, which will eventually get trades in a specified timerange. This will live in cmd/trades.go and correspond to the trades table in BigQuery.

Implement transform function for trustline data

What

We need to implement a function that transforms the information we get about trustlines from the new ingestion system into a form suitable for BigQuery. Part of #33.

Why

This function will be used when exporting trustline information from the CLI.

cmd: implement command for exporting trustlines

What

We need to implement the planned functionality for exporting trustlines. The command will use the transform package functions for transforming data. Part of #53.

Why

This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated trustline data.

cmd: set up offers command

We want to set up an offers command, which will eventually get offers in a specified timerange. This will live in cmd/offers.go and correspond to the offers table in BigQuery.

Understand the tools for deployment and storage.

Understanding the tools for deployment and storage will help build the broader context and vocabulary for the project. While it'll be mostly helpful later, when we deploy the tool, understanding how it will be used will help the design of the tool.

These include:

cmd: implement command for exporting ledgers

What

We need to implement the planned functionality for exporting ledgers. The command will use the transform package functions for transforming data. Part of #53.

Why

This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated ledger data.

[3](5) Map out airflow tasks

What

We need to plan out the airflow tasks that will run the ETL. These tasks will handle the preparation and execution of the CLI tool. See the bitcoin ETL DAGs as an example.

Why

Airflow allows us to manage and monitor the ETL's runs cleanly. Mapping out the tasks makes the implementation easier.

Map out ledger input and output data structures for CLI

What

We need to map out what information we get about ledgers from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.

Why

This issue helps us know what ledger information we can get from the ingestion system and what information may have to be reconstructed or omitted.

Specify the data structures and functions for the CLI.

Specifying the data structures and functions for the command-line tool will significantly help implementation.

This comes in the following concrete deliverables:

  • List of tables and their schemas: this mostly follows from the existing BigQuery, but reviewing it will help clarify the data model.
  • Software design document: basically, specify the various data structures you need. This can be replaced by the below.
  • Sketch out the overall architecture of the CLI tool, with function headers that are unimplemented.

This comes after gaining the context of the tool (#1, #2, #3).

cmd: set up transactions command

We want to set up a transactions command, which will eventually get transactions in a specified timerange. This will live in cmd/transactions.go and correspond to the history_transactions table in BigQuery.

Implement transform function for asset data

What

We need to implement a function that transforms the information we get about assets from the new ingestion system into a form suitable for BigQuery. Part of #33.

Why

This function will be used when exporting asset information from the CLI.

Implement transform function for account data

What

We need to implement a function that transforms the information we get about accounts from the new ingestion system into a form suitable for BigQuery. Part of #33.

Why

This function will be used when exporting account information from the CLI.

cmd: implement command for exporting operations

What

We need to implement the planned functionality for exporting operations. The command will use the transform package functions for transforming data. Part of #53.

Why

This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated operation data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.