eosnetworkfoundation / chicken-dance Goto Github PK
View Code? Open in Web Editor NEWChicken Dance distributed replay of transactions
License: MIT License
Chicken Dance distributed replay of transactions
License: MIT License
Depends on Issue #20 Nginx setup. Will be configured in Nginx.
Create a basic username/password authentication
Add a cron job for replay nodes to get new jobs. Must include lock file to prevent more than one job running at the same time on host.
When the orchestrator starts up the relay hosts need to connect back to the orchestrator. The current security group settings only allows replay nodes to connect with the private ip of the orchestrator node.
There are several ways to address this problem
UI Work only
So we have the integrity hashes both expected and actual, along with start, end block numbers. Not sure what report folks want to see when the full replay has completed.
When the service tries to update a configuration that doesn't exist all you see on the log output is '404'. Need to improve this error message to say something like config at block height X does not exist
Transferred from AntelopeIO/leap#1427 original author bhazzard
Add a new meta-data configuration to specify the chain. Initially these would be
Clean up output from replay script. So if you watched it from the command line it would make more sense. This includes
This is a low priority
By default, expected integrity hashes are overwritten. Provide --use-configured-hash
option when starting orchestration service. When enabled
Another option to --protect-configured-hash
is similar. It will make sure the expected hash with a value can not be overwritten with None. It will allow new hashes values to be set over the old value.
Takes arguments for start_block_id
, end_block_id
, snapshot_path
, snapshot_type
, and expected_integrity_hash
, and generate configuration for leap version from 3.1, 3.2, 4.0, 5.0.
Alternatively takes arguments for a config
file, and a slice-id
and generates the same by parsing the file for the specified index.
The expected integrity hash depends on another replay job completing. Sometimes that job is slow and the expected integrity hash does not arrive arrive in time. The empty hash causes the job to be listed as a failure. These HASH-MISMATCH jobs should be retested, and labeled as COMPLETE if there hashes do match
Transferred from AntelopeIO/leap#1421 original author bhazzard
Requires a single stable orchestration instance with a fixed IP address. The idea is to enable authentication via github oauth and check for membership in a specific github group. This would provide a way to log into the orchestration without a new username/password.
Support secure HTTPs accesses. Caveat this will only support a single fixed orchestration node, used to test production runs.
Customized runs can not be done over HTTPs. Customized runs will require building a configuration file, spinning up a new orchestration service. A new orchestration service will not have a pre-determined HTTP address and will not have a host-domain in DNS.
Once DNS is setup, run lets encrypt to generate certificates for secure connection.
UI Work only
setup log rotation for /home/ubuntu/orch-complete-timings.log
on orchestration node.
Transferred from AntelopeIO/leap#1441 original author BenjaminGormanPMP
Steps to Reproduce
LowEndOrchestrator
# setup aws cli
sudo apt update
apt install unzip
sudo apt install unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && unzip awscliv2.zip
sudo ./aws/install
--iam-instance-profile
is removed.aws ec2 run-instances --image-id ami-053b0d53c279acc90 \
--instance-type t2.micro \
--security-group-ids sg-04f895bd6442b69b5 \
--key-name aws-chicken-start \
--subnet-id subnet-0b9517f11e9684b3b \
--iam-instance-profile Arn=arn:aws:iam::087045697350:instance-profile/ReplayOrchestration \
--user-data file:///home/ubuntu/replay-test/scripts/instance-bootstrap.sh \
--dry-run
Following leap issue 1420 and preceding replay-test issue 11, this ticket is to expand the IAM policy in the chickens-prod
AWS account to enable @ericpassmore and team to grant zero-trust access of S3 bucket(s) to their EC2 instances.
Create a stable fixed, public IP address for the main orchestration node. Any other orchestration nodes spun up will not be covered and their public IP address should be considered unstable.
Create post for integrity hash after snapshot loads. Posts updates configuration, and persists configuration file
Page to manage jobs.
If job running option to stop
If no job running and to Start Job, enter number of hosts, default 80% num slices
If no job running option to create job
input job name
drop down of config available
option to overwrite leap version
option to copy and past in new config
Transferred from AntelopeIO/leap#1422 original author bhazzard
Due to a bug need a http service to inform replay-nodes of run options, and gather status information from nodes.
Get Call
{
job_number: num,
snapshot_loc: string,
blocklog_loc: string,
last_block: number
}
Put Call
{ status:[ongoing|stopped|error|finished], block_number: last_block_num_processed }
Setup Nginx or Apache to proxy the orchestration webservice. For better HTTP support, keepalive, and authentication/access congtrol
Create basic HTML CSS for homepage. This includes .
### Tasks
- [ ] foreground, background, accent, shadow colors
- [ ] fonts
- [ ] media sizes for phone 360x800, desktop 1280x720, HD 1920x1080
- [ ] padding and margins
- [ ] text sizes header, body, strong,
- [ ] nav cards
- [ ] lists
There will be a single fixed orchestration service accessible by HTTP and a domain name. Depends on Issue #35
This issue covers access control for an API to reach this single host, start a pre-configured run, and collect status on that run.
Transferred from AntelopeIO/leap#1429 original author bhazzard
Enable a replay to pick up a job, process it, clean out state, and pick up a new job and process it. This is needed because some jobs will take longer than others, and we want to leverage all our compute power if there are any jobs WAITING_4_WORKER
Want to scale up the amount of compute. Our current launch template for replay
has a subnet with a specific placement zone. We want to explore ways get more capacity by spinning up nodes in many placement zones
Then automate a batch of 5 , then automate larger run.
Currently show HTTP 200 , Success. Need a more descriptive message.
Transferred from AntelopeIO/leap#1441 original author BenjaminGormanPMP
Transferred from AntelopeIO/leap#1430 original author bhazzard
New Top Nav HTML and CSS. Static and served by front-end Nginx, not by webservice.
### Tasks
- [ ] first pass icons and html in repo
- [ ] orchestration bootstrap copy in HTML, CSS, Icons
- [ ] visual test inspection
Transferred from AntelopeIO/leap#1423 original author bhazzard
All installation of software from deb packages. Installation of config from this repo
When job status is set to "STARTING" update the job start time. Currently start time is set to initialization of orchestration webservice.
Must do full run in less than 1 day. Ideally complete in 16 hours or less. Transferred from https://github.com/AntelopeIO/leap/issues/1431 original author bhazzard
Example Report below
JOB TIMING ALL TIMES IN MINUTES
-------------------------------
Number of Jobs: 222
Average: 99.73
Standard Deviation: 160.04
Median: 61.77
75th Percentile: 90.98
90th Percentile: 171.06
Longest Running Job 1333.6 mins
LONG RUNNING JOBS TOP 90%
-------------------------
Job 140648764956560 running time 171.1
Job 140648764956848 running time 177.4
Job 140648764956944 running time 212.58
Job 140648764957040 running time 230.88
Job 140648764957136 running time 240.62
Job 140648764957232 running time 248.12
Job 140648764957328 running time 247.9
Job 140648764957424 running time 285.37
Job 140648764957616 running time 273.65
Job 140648764957520 running time 302.13
Job 140648764957760 running time 252.17
Job 140648764957712 running time 281.08
Job 140648764957856 running time 267.17
Job 140648764957808 running time 522.9
Job 140648764957952 running time 493.6
Job 140648764958048 running time 477.83
Job 140648764958240 running time 474.8
Job 140648764957904 running time 742.62
Job 140648764958144 running time 804.1
Job 140648764958000 running time 1006.08
Job 140648764958432 running time 669.4
Job 140648764958336 running time 852.53
Job 140648764958096 running time 1333.6
search for matching blocks log
A little script to run in the background and update the job with the head_block_num
every few minutes. Will allow you to see progress as blocks update.
Create Summary Progress Page
### Tasks
- [ ] Construct Layout
- [ ] Color Change Progress Bar (attribution)
- [ ] Top Bar for blocks process and progress
- [ ] Second Bar icon key for jobs , succeed, failed , total
- [ ] Update progress stats python to produce HTML
- [ ] Third Content Card with stats (avg, median, stddev of times)
- [ ] Forth simple list of list failed jobs, larger font size
EX: progress bar https://codepen.io/alvarotrigo/pen/vYeNpjj
Create a Single Item Detail Page
Add classes to html
add Top Grey Banner Style
Visual Inspection
Transferred from AntelopeIO/leap#1426 original author bhazzard
Create items listing page.
### Tasks
- [ ] Change HTML structure to allow float
- [ ] Visual inspect multiple columns depending on screen width
When archiving blocks.log compress it. When retrieving blocks.log uncompress
UI Work Only
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.