Giter Site home page Giter Site logo

darklang / classic-dark Goto Github PK

View Code? Open in Web Editor NEW
46.0 46.0 10.0 28.76 MB

Darklang stable version - currently on darklang.com/classic

Home Page: https://darklang.com/classic

License: Other

Dockerfile 0.52% Shell 2.75% JavaScript 1.57% PLpgSQL 0.02% HTML 0.18% CSS 0.25% SCSS 1.57% Python 1.77% F# 44.68% GLSL 0.02% ReScript 44.86% TypeScript 1.57% HCL 0.25%
editor

classic-dark's People

Contributors

9ae avatar athinanarof avatar br1anchen avatar cooleydw494 avatar dependabot-preview[bot] avatar dependabot[bot] avatar dstrelau avatar ellenchisa avatar ellenchisa2 avatar ianconnolly avatar ismith avatar jceipek avatar jonathan-laurent avatar jwalter avatar kate-grant avatar kberridge avatar korede-ta avatar loganmac avatar mariajdab avatar oceanoak avatar pbiggar avatar pingiun avatar pirbo avatar posalusa24 avatar samstokes avatar stachudotnet avatar startling avatar sydnoteboom avatar xtopherbrandt avatar zshannon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

classic-dark's Issues

Deprecate Static Assets Plan

  • collect list of users who have previously upload static assets
    (there are 544 canvases with static assets as of mar 3, 2023)

Limiting new usages

  • store "upload allowlist"
    (k8s? hard-coded?)
  • update code: only users who are in that list can upload static assets

Reducing current usages to 0 (so we can deprecate)

  • store "keep my data list" (just allowing to serve existing assets)
    (k8s? hard-coded?)
  • occasionally, purge assets outside of the "keep my data" list
  • ask users if they're really using the feature
  • ask users if we can at least put them on the keep-my-data list and not the allow-uploads list, every ~6 months

Present options
.....

New plan for trace storage work

DB Clone

  • prototype the DB clone (go through the steps, record "down times")
    • events table (not v2)
      • verify that nothing is querying events (not v2) table
      • look at the events table and see if it's using any foreign keys
        • if any, remove the foreign keys
      • delete the events table from dark-west
    • do the DB clone
    • update the DB clone
      • set zoning to single-zone (applies to both servers and storage)
      • update postgres version to 14
        conclusion: probably brings unnecessary risk
      • turn down the CPUs by ~1/3

Drop Events table

  • drop events table in the codebase
    • review usages of "events" in codebase - see if we're missing anything
    • investigate connection to worker_stats_v1
    • write migration script (drop if still exists) to drop events
    • update tests if they were somehow referencing events
    • update clear-canvas script to not reference events table
      (note: apparently we weren't clearing events_v2!)
    • do we need to merge any changes before we drop the events table in prod?
      • Yes.
  • drop events table in production
    • set lock_timeout = '1s'
    • set statement_timeout = '1s'
    • alter table events drop constraint events_canvas_id_fkey
    • alter table events drop constraint events_account_id_fkey
    • drop index concurrently if exists idx_events_for_dequeue
    • drop index concurrently if exists idx_events_for_dequeue2
    • truncate events table
    • drop events table
  • merge the migration
  • copy the above all from stable-dark to dark

Get Google to shrink a clone

Goal: determine the amount of downtime

  • make a clone of our DB
  • set lock_timeout = '1s'
  • set statement_timeout = '1s'
  • drop FK on account_id
    alter table events drop constraint events_canvas_id_fkey
  • drop FK on canvas_id
    alter table events drop constraint events_account_id_fkey
  • drop index idx_events_for_dequeue
    drop index concurrently if exists idx_events_for_dequeue
  • drop index idx_events_for_dequeue2
    drop index concurrently if exists idx_events_for_dequeue2
  • drop index index_events_for_stats
  • truncate events table
  • drop events table
  • ask google to shrink it
    (they'll do this in real-time synchronously during a workday/call)
  • record the downtime for reference: [downtime]
  • lower availability to single-zone
  • lower CPU from 16 vCPUs to 12 vCPUs

Make a plan for doing this against the prod DB

  • plan how to alert customers
    • of expected downtime, etc
  • ...

another day: (pull into another issue)

Cloud storage

  • delete trace-related tests
  • check that 404s continue to work
  • ensure we overwrite cloud storage traces for execute_handler button
  • check if execute_function traces are appropriately merged with a cloud-storage -based trace
  • garbage collection - set object lifecycle for bucket or for traces
  • ensure pusher is supported
  • do walkthrough and check it all works

monitoring

  • schedule weekly call/meeting where we review usage, for 4 weeks. at the end of such, consider what to do
    • check table sizes
    • check costs

migrate existing canvases

  • upload to both simultaneously
  • fetch and upload existing trace data for existing canvases/handlers
  • possibly automatically switch LD flag once this is done
  • switch all users to only use uploaded storage data

Maybe later?

  • turn on private IPs (requires DB downtime)

DB maintenance

Announcing intentions

  • announce intentions in #status
  • tweet
  • mastodon
  • post in #general
  • 3 hr downtime

Prep

  • have devcontainer built on latest stable-dark

Once it starts

  • post to #status again righ
  • reply to intentions tweet
  • reply to intentions toot
  • post in #general
  • update status.darklang.com (via betterstatus)
  • turn off pagerduty or phones
  • turn off bwdserver (via k8s)
    • kubectl scale --replicas=0 -n darklang deployment/bwdserver-deployment
  • leave apiserver up (though, requests will largely fail upon DB connection)
  • turn off cronchecker (turn crons down to 0 in kubernetes, restart)
    • kubectl scale --replicas=0 -n default deployment/cronchecker-deployment
  • turn off queueworker
    • kubectl scale --replicas=0 -n darklang deployment/queueworker-deployment

Operations

  • sit with Google while they shrink the DB from 10TB (2 steps, 2:30hr)
  • reduce availability and CPU (16 -> 12)
  • don't upgrade DB version
  • throughout, record activity in #operations

Turning things back on

  • update some LaunchDarkly settings to 16
  • reenable pagerduty
  • turn up bwdserver to 2
    • kubectl scale --replicas=2 -n darklang deployment/bwdserver-deployment
  • check apiserver
  • change cronchecker to 1
    • kubectl scale --replicas=1 -n default deployment/cronchecker-deployment
  • turn on queueworker
    • kubectl scale --replicas=2 -n darklang deployment/queueworker-deployment

Announce done

  • announce in #status
  • announce done in #general
  • mastodon
  • tweet done

Shift + scroll in canvas should scroll horizontally

It is a standard behaviour that scrolling a mouse wheel while holding the shift key scrolls horizontally. Try it on normal web pages on chrome, or really anything that uses normal scroll behaviours.

The canvases in dark ought to do this too, but don't. The shift key is currently ignored.

Reduce trace costs

Our OpenTelemetry provider is putting their prices up, so we should reduce how much we use.

Currently, we're using about 1.2B events and the next lowest threshold is 450M.

They are currently split:

cloudsql-proxy 0.11%
kubernetes-bwd-nginx 0.15%
kubernetes-bwd-ocaml 57.03% (1.13B)
kubernetes-garbagecollector 38.02% (376M)
kubernetes-metrics 4.69% (45M)

Among kubernetes-bwd-ocaml, they are split:

BwdServer | 608,015,209
QueueWorker | 354,919,048
ApiServer | 66,742,393
CronChecker | 38,742,278
other  | 5,528,954

Note the numbers don't add up because we had a big month for BwdServer due to an anomaly.

To address this:

  • use TraceRatio samplers one each service (20% for BwdServer, 20% for QW, 100% for others)
    • write code
    • merge to dark repo
    • backport to classic-dark repo
    • merge & deploy
    • add flags to LaunchDarkly.
      • add flags
      • BwdServer
      • Queueworker
      • check it works
    • Reduce plan
  • use honeycomb sampling for garbagecollector (5% should be fine, I'd be surprised if we ever look at this again)
    • merge change
    • check it worked
  • disable k8s metrics (we get this from google cloud anyway)
    • merge change
    • check it worked

Overall, this should reduce us from 1.8B in march to:
BwdServer: 121M
QueueWorker: 71M
ApiServer: 67M
CronChecker: 39M
kubernetes-bwd-ocaml other: 6M
garbagecollector: 18M

Overall around 350M

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.