stellar / stellar-etl Goto Github PK
View Code? Open in Web Editor NEWStellar ETL will enable real-time analytics on the Stellar network
License: Apache License 2.0
Stellar ETL will enable real-time analytics on the Stellar network
License: Apache License 2.0
We need to implement a function that transforms the information we get about trustlines from the new ingestion system into a form suitable for BigQuery. Part of #33.
This function will be used when exporting trustline information from the CLI.
Some type casts in the transform package may be invalid. Casting from uint32 to int32 could result in a loss of information, since uint32s have a higher maximum value. Similarly, casting from int64 to int32 could result in a loss of information. We need to ensure these casts do not happen, likely by changing the type of the field in the output struct to ensure it can hold all the information.
Losing data in type casts makes the ETL transform functions unreliable. We want to have an accurate and complete picture of the data.
We need to figure out if it is reasonable to require the user to set up and run a stellar-core instance. We need to determine when it is necessary (possibly not needed for ledgers, transactions, and operations).
Captive core takes a while to startup, so we can't run a new instance every time the CLI command is called. We need to have a core instance running in the background that can handle all exports.
Once the input and output data structures are mapped out, we need to implement the various transform functions. These functions should be defined in an internal directory as part of a package called transform. Each transform function should also have associated unit test.
Implementing the transforms is a key part of the ETL. It will ensure that we can successfully pass data from the ingestion system into BigQuery.
We want to set up a history accounts command, which will eventually get account history in a specified timerange. This will live in cmd/history_accounts.go
.
We need to plan out the airflow tasks that will run the ETL. These tasks will handle the preparation and execution of the CLI tool. See the bitcoin ETL DAGs as an example.
Airflow allows us to manage and monitor the ETL's runs cleanly. Mapping out the tasks makes the implementation easier.
We want to set up an offers command, which will eventually get offers in a specified timerange. This will live in cmd/offers.go
and correspond to the offers
table in BigQuery.
Reviewing the system architecture for existing Blockchain ETL projects is an essential first step in the Stellar ETL project. This will build an intuition for the system as a whole, help understand how the various pieces fit together, and contribute to the design of a minimal yet effective command-line tool.
The necessary information is in section 2 here.
We need to map out what information we get about assets from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.
This issue helps us know what asset information is easily accessible from the ingestion system and what information may have to be reconstructed or omitted.
Reviewing the core data structures in the Stellar protocol will help guide an understanding of the functions needed in the CLI tool. It's also important to compare these with the data structures used in other blockchains, like unspent transactions, because that will help understand the differences between this CLI tool and the existing ones.
Useful links include:
Specifying the data structures and functions for the command-line tool will significantly help implementation.
This comes in the following concrete deliverables:
This comes after gaining the context of the tool (#1, #2, #3).
We need to implement a function that transforms the information we get about transactions from the new ingestion system into a form suitable for BigQuery. Part of #33.
This function will be used when exporting transaction information from the CLI.
Before starting any implementation, we must initially set up a Go repository (using modules).
We need to map out what information we get about transactions from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.
This issue helps us know what transaction information is easily accessible from the ingestion system and what information may have to be reconstructed or omitted.
We want to set up a trustlines command, which will eventually get changes to trustlines in a specified timerange. This will live in cmd/trustlines.go
and correspond to the trust_lines
table in BigQuery.
We need to implement the planned functionality for exporting accounts. The command will use the transform
package functions for transforming data. Part of #53.
This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated account data.
We need to map out what information we get about trades from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.
This issue helps us know what information about trades we can get from the ingestion system and what information may have to be reconstructed or omitted.
After outlining the individual commands, we should think about the data transformation that this CLI tool will do. Specifically, how are the data structures from the ingestion system transformed to the outputs sent to BigQuery? Investigating the former will build familiarity with the ingestion system. It's also likely more technically complex than the latter, which will most likely be a basic serialization/deserialization using an appropriate existing framework.
So, with that in mind, here's a checklist of the different data structures to track. In comments on this issue, you should indicate (1) which function from the ingestion system's ledgerbackend
package you would call in implementation; (2) the data structure that that results in; and (3) the struct you eventually want to output to BQ. This will also help us figure out which command-line tools have more direct implementations from the ingestion system, and which require more work.
We want to set up a transactions command, which will eventually get transactions in a specified timerange. This will live in cmd/transactions.go
and correspond to the history_transactions
table in BigQuery.
We need to implement a function that transforms the information we get about trades from the new ingestion system into a form suitable for BigQuery. Part of #33.
This function will be used when exporting trade information from the CLI.
We need to implement the planned functionality for exporting offers. The command will use the transform
package functions for transforming data. Part of #53.
This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated offer data.
We need to implement a function that transforms the information we get about offers from the new ingestion system into a form suitable for BigQuery. Part of #33.
This function will be used when exporting offer data from the CLI.
We must set up the initial command-line tool, using the cobra
package. As a reference, this roughly corresponds to main.go
and cmd/root.go
in the Ticker project.
This is after the Go repo is set up (#6).
We want to set up a ledgers command, which will eventually get ledgers in a specified timerange. This will live in cmd/ledgers.go
and correspond to the history_ledgers
table in BigQuery.
We need to implement the planned functionality for exporting operations. The command will use the transform
package functions for transforming data. Part of #53.
This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated operation data.
We will need to implement the functionality for the CLI commands. This involves retrieving information from the history archives based on command line arguments, transforming the data that is received using the transform
package, and then exporting the data. In addition, all the planned flags should be functional.
We need to implement the planned functionality for exporting transactions. The command will use the transform
package functions for transforming data. Part of #53.
This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated transaction data.
We need to implement a function that transforms the information we get about operations from the new ingestion system into a form suitable for BigQuery. Part of #33.
This function will be used when exporting operations data from the CLI.
We will implement alternative methods of identification for the various data structures. Some of these are already implemented and require no additional work.
We currently are unable to recreate any of the ids that are in the BigQuery tables. In the future that may change, since the horizon team does have a way of assigning ids that we may be able to access. For now, we need to have replacements for ids so that we can connect tables together.
We need to implement the horizon ids for ledgers, transactions, operations, and offers. Horizon uses an internal ToID package, which we should replicate.
Right now, the ETL cannot output identification that is consistent with Horizon’s ids. We would like ecosystem wide consistency so that users who build off different ingestion engines see the same results, so we need to implement Horizon's id system.
Understanding the tools for deployment and storage will help build the broader context and vocabulary for the project. While it'll be mostly helpful later, when we deploy the tool, understanding how it will be used will help the design of the tool.
These include:
Look at existing blockchain ETL projects and figure out the serialization methods they use to output data. Also, figure out what additional configuration files or libraries they use for serialization.
We need a serialization method for the Stellar ETL, and by looking at existing ETLs we can get an idea of the best method for us.
Once the volume metric is added to the market tracker (#3), we need to add a dashboard with it to Grafana.
We need to map out what information we get about ledgers from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.
This issue helps us know what ledger information we can get from the ingestion system and what information may have to be reconstructed or omitted.
We need to map out what information we get about offers from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.
This issue helps us know what information about offers we can get from the ingestion system and what information may have to be reconstructed or omitted.
We need to map out what information we get about operations from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.
This issue helps us know what information about operations we can get from the ingestion system and what information may have to be reconstructed or omitted.
We want to set up an operations
command, which will eventually get operations in a specified range. This will live in cmd/operations.go
and correspond to the history_operations
table in BigQuery.
We need to put the CLI into a docker container.
Docker makes deploying and running the CLI across different operating systems and hardware easier.
We need to map out what information we get about trustlines from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.
This issue helps us know what information about trustlines we can get from the ingestion system and what information may have to be reconstructed or omitted.
We need to implement the planned functionality for exporting trades. The command will use the transform
package functions for transforming data. Part of #53.
This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated trade data.
We want to set up an accounts command, which will eventually get accounts in a specified timerange. This will live in cmd/accounts.go
and correspond to the accounts
table in BigQuery.
We need to implement the planned functionality for exporting trustlines. The command will use the transform
package functions for transforming data. Part of #53.
This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated trustline data.
We need to map out what information we get about accounts from the new ingestion system. We also need to know what the output to BigQuery should look like. Part of #19.
This issue helps us know what account information is easily accessible from the ingestion system and what information may have to be reconstructed or omitted.
Reviewing the command-line interfaces for existing Blockchain ETL projects will help inform intuition for the tool's desired functionality. In particular, it'll be helpful to think about the core data structures of those projects, and how they compare to Stellar's. Consider the Bitcoin ETL as a good example. Note that these are implemented in Python, while our planned tool is in Go.
We need to implement the planned functionality for exporting ledgers. The command will use the transform
package functions for transforming data. Part of #53.
This is one step towards having a functional pipeline for data transfer. The command will provide users with a way to access regularly updated ledger data.
We need to implement a function that transforms the information we get about accounts from the new ingestion system into a form suitable for BigQuery. Part of #33.
This function will be used when exporting account information from the CLI.
We need to implement a function that transforms the information we get about ledgers from the new ingestion system into a form suitable for BigQuery. Part of #33.
This function will be used when exporting ledger information from the CLI.
Some CLI commands are related, and we may need some way to ensure that our results stay in sync. For example, we don't want the information we export about trades to indicate an offer is completely filled while that offer is represented as unfilled in our table. We need to look into what data is exported in each command to ensure that everything is in sync.
If our data becomes out of sync, then it hurts every project relying on the BigQuery table. It likely would be difficult to fix out of sync data as well.
We want to set up a trades command, which will eventually get trades in a specified timerange. This will live in cmd/trades.go
and correspond to the trades
table in BigQuery.
We need to implement a function that transforms the information we get about assets from the new ingestion system into a form suitable for BigQuery. Part of #33.
This function will be used when exporting asset information from the CLI.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.