Giter Site home page Giter Site logo

chicken-dance's People

Contributors

ericpassmore avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chicken-dance's Issues

Cron for replay node jobs

Add a cron job for replay nodes to get new jobs. Must include lock file to prevent more than one job running at the same time on host.

Share Orchestrator Private IP with Replay Nodes

When the orchestrator starts up the relay hosts need to connect back to the orchestrator. The current security group settings only allows replay nodes to connect with the private ip of the orchestrator node.

There are several ways to address this problem

DNS

  • you can use DNS to map a name to a private IP address, the relay hosts would simply connect using this DNS name
  • you can use DNS to map a name to a public IP address, and change the security to allow access from public interfaces of the relay nodes

AWS CLI

  • you can script the orchestrator node to spin up the relay nodes and then share the ip address with the relay nodes.
    • could share by passing an argument into a script
    • could share by placing the private IP into a file on the relay node

Better Error Message for Config Update

When the service tries to update a configuration that doesn't exist all you see on the log output is '404'. Need to improve this error message to say something like config at block height X does not exist

Clean up Replay Script Output

Clean up output from replay script. So if you watched it from the command line it would make more sense. This includes

  • directing verbose output to log away from console
  • echoing out status as progress is made

This is a low priority

[REPLAY-2] Orchestration Service supports option to prefer configured integrity hashes

By default, expected integrity hashes are overwritten. Provide --use-configured-hash option when starting orchestration service. When enabled

  • If the configuration has an expected integrity hash this will be used to validate replay
  • The replay script will understand this option, and skip the steps of posting back start block integrity hash
  • The replay script will instead directly load the snapshot and sync until end block
  • If the configuration does not have a value for expected integrity hash the normal process will continues as before
  • The replay script will loading snapshot, terminate nodeos, and starting read only nodes, and posting back start block integrity hash
  • Following that the nodes will continue to sync to end block, and continue normal operations

Another option to --protect-configured-hash is similar. It will make sure the expected hash with a value can not be overwritten with None. It will allow new hashes values to be set over the old value.

[REPLAY-9] Generate Multi-Version Manifest

Takes arguments for start_block_id, end_block_id, snapshot_path, snapshot_type, and expected_integrity_hash, and generate configuration for leap version from 3.1, 3.2, 4.0, 5.0.

Alternatively takes arguments for a config file, and a slice-id and generates the same by parsing the file for the specified index.

Summary report should re-test HASH-MISMATCH jobs

The expected integrity hash depends on another replay job completing. Sometimes that job is slow and the expected integrity hash does not arrive arrive in time. The empty hash causes the job to be listed as a failure. These HASH-MISMATCH jobs should be retested, and labeled as COMPLETE if there hashes do match

[REPLAY-10] Oauth Login to Orchestration Service Via Github

Requires a single stable orchestration instance with a fixed IP address. The idea is to enable authentication via github oauth and check for membership in a specific github group. This would provide a way to log into the orchestration without a new username/password.

[REPLAY-5] HTTPS access DNS A-records and certificate

Support secure HTTPs accesses. Caveat this will only support a single fixed orchestration node, used to test production runs.

Customized runs can not be done over HTTPs. Customized runs will require building a configuration file, spinning up a new orchestration service. A new orchestration service will not have a pre-determined HTTP address and will not have a host-domain in DNS.

Once DNS is setup, run lets encrypt to generate certificates for secure connection.

EC2 Run Instance does not work when AIM role specified

Steps to Reproduce

  1. enter chicken-dance aws organization
  2. start an instance using the template LowEndOrchestrator
  3. ssh into the instance and run the following to setup the aws cli
# setup aws cli
sudo apt update
apt install unzip
sudo apt install unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && unzip awscliv2.zip
sudo ./aws/install
  1. Using the dry-run option try to start a new node. Only works when --iam-instance-profile is removed.
aws ec2 run-instances --image-id ami-053b0d53c279acc90 \
--instance-type t2.micro \
--security-group-ids sg-04f895bd6442b69b5 \
--key-name aws-chicken-start \
--subnet-id subnet-0b9517f11e9684b3b \
--iam-instance-profile Arn=arn:aws:iam::087045697350:instance-profile/ReplayOrchestration \
--user-data file:///home/ubuntu/replay-test/scripts/instance-bootstrap.sh \
--dry-run

Fixed IP Address for main Orchestration Service

Create a stable fixed, public IP address for the main orchestration node. Any other orchestration nodes spun up will not be covered and their public IP address should be considered unstable.

[REPLAY-16] UI: Manage Run Page

Page to manage jobs.

  • If job running option to stop

  • If no job running and to Start Job, enter number of hosts, default 80% num slices

  • If no job running option to create job

  • input job name

  • drop down of config available

  • option to overwrite leap version

  • option to copy and past in new config

Create orchestration service

Due to a bug need a http service to inform replay-nodes of run options, and gather status information from nodes.

Tasks

Get Call

{
job_number: num,
snapshot_loc: string,
blocklog_loc: string,
last_block: number
}

Put Call
{ status:[ongoing|stopped|error|finished], block_number: last_block_num_processed }

[REPLAY-12] UI: Create Home Page CSS/HTML

Create basic HTML CSS for homepage. This includes .

### Tasks
- [ ] foreground, background, accent, shadow colors
- [ ] fonts 
- [ ] media sizes for phone 360x800, desktop 1280x720, HD 1920x1080
   - [ ] padding and margins 
   - [ ] text sizes header, body, strong, 
   - [ ] nav cards 
   - [ ] lists

[REPLAY-6] API authentication to orchestration service

There will be a single fixed orchestration service accessible by HTTP and a domain name. Depends on Issue #35

This issue covers access control for an API to reach this single host, start a pre-configured run, and collect status on that run.

Enable multiple runs on a replay node

Enable a replay to pick up a job, process it, clean out state, and pick up a new job and process it. This is needed because some jobs will take longer than others, and we want to leverage all our compute power if there are any jobs WAITING_4_WORKER

[REPLAY-20] Manual Update Blocklogs on S3

Tasks

Then automate a batch of 5 , then automate larger run.

[REPLAY-11] UI: Create Top Nav Component

New Top Nav HTML and CSS. Static and served by front-end Nginx, not by webservice.

### Tasks
- [ ] first pass icons and html in repo
- [ ] orchestration bootstrap copy in HTML, CSS, Icons 
- [ ] visual test inspection 

Create two AWS hosts for replay

Transferred from AntelopeIO/leap#1423 original author bhazzard

All installation of software from deb packages. Installation of config from this repo

Tasks

Generate Statistics on Job Times

Example Report below

JOB TIMING ALL TIMES IN MINUTES
-------------------------------
Number of Jobs: 222
Average: 99.73
Standard Deviation: 160.04
Median: 61.77
75th Percentile: 90.98
90th Percentile: 171.06
Longest Running Job 1333.6 mins

LONG RUNNING JOBS TOP 90%
-------------------------
Job 140648764956560 running time 171.1
Job 140648764956848 running time 177.4
Job 140648764956944 running time 212.58
Job 140648764957040 running time 230.88
Job 140648764957136 running time 240.62
Job 140648764957232 running time 248.12
Job 140648764957328 running time 247.9
Job 140648764957424 running time 285.37
Job 140648764957616 running time 273.65
Job 140648764957520 running time 302.13
Job 140648764957760 running time 252.17
Job 140648764957712 running time 281.08
Job 140648764957856 running time 267.17
Job 140648764957808 running time 522.9
Job 140648764957952 running time 493.6
Job 140648764958048 running time 477.83
Job 140648764958240 running time 474.8
Job 140648764957904 running time 742.62
Job 140648764958144 running time 804.1
Job 140648764958000 running time 1006.08
Job 140648764958432 running time 669.4
Job 140648764958336 running time 852.53
Job 140648764958096 running time 1333.6

[REPLAY-15] UI: Summary Progress

Create Summary Progress Page

### Tasks
- [ ] Construct Layout
- [ ] Color Change Progress Bar (attribution)
- [ ] Top Bar for blocks process and progress
- [ ] Second Bar icon key for jobs , succeed, failed , total
- [ ] Update progress stats python to produce HTML
- [ ] Third Content Card with stats (avg, median, stddev of times)
- [ ] Forth simple list of list failed jobs, larger font size

EX: progress bar https://codepen.io/alvarotrigo/pen/vYeNpjj

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.