Giter Site home page Giter Site logo

sscplus-data-fetch's Introduction

sscplus-data-fetch

app that will fetch data from the drupal rest api (from ssc plus) and feed them out to a storage device

infrastructure

Checkout the infrastructure project then run the following commands:

cd live/sandbox/ssplus-data-fetch
terragrunt plan --terragrunt-source ~/git/sscplus-data-fetch/terraform

If requested for package.zip localtion simply create one first and then provide directory path (zip package.zip function_app.py requirements.txt host.json)

dev setup

This is how dev should setup to run this project on their work computers

virtual env

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt --upgrade

Might need to run Ctrl+Shift+P in VSCode, type Python: Create environment... and follow instructions if needed.

running the app

Make sure you install the Azure plugins (install the Azure Functions extension for Visual Studio Code)

Then press F5.

Run with python function/__init__.py (old).

troubleshooting

I had an issue where the trigger wasn't detected in the V2 model. I had to modify my local.settings.json to include this property (see documentation about it):

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsFeatureFlags": "EnableWorkerIndexing",
    ...
  }
}

documentation

sscplus-data-fetch's People

Contributors

guillaumeturcotte avatar

Watchers

 avatar

sscplus-data-fetch's Issues

ensure a bkp is made before the index is modified ...

Description

The mechanism that builds the index with new medatada just modifies the current latest index in the storage account and uploads the changes. Maybe we should at least create a backup of it before we modify it in case something goes wrong, or at least make sure we test the changes (load index, etc ,etc) add failsafes that way if we re-save it we know the changes were successful.

success criteria

  • ensure we create backup before modifying the index on the timed trigger
  • keep at least X (default 10) past indexes as backups...
  • add validation check before re-saving index, make sure it can be loaded, etc.

SSC Plus Data fetch Initial work

  • Get access to the dev drupal site (https://plus-dev.ssc-spc.gc.ca/en)
  • Get access to DEV Drupal API: https://plus-dev.ssc-spc.gc.ca/rest/all-ids
  • Test flow works from developper perspective and from where it will run in the cloud
  • code initial infrastructure for the project in the cloud
  • add code that reads all ids json payload
  • add code that iterates over all ids and pull a single json payload (page in en and fr)
  • add code that split the page into two processed file (1 en and 1 fr for the index building)
  • Re-index newly acquired data in the azure durable function

What we need

Current access to https://plus-dev.ssc-spc.gc.ca/rest/all-ids is secured behind SSC's F5 and uses https://login.microsoft.com authentication via browser/device MS credentials (on the 163gc domain).

I am writing code (currently in python) to access this API programmatically, in the current state that it is in, there is no proper way to access it. I need help with someone with enough permissions to dabble in the authentication provider portion of the SSC Plus F5 MySSC Dev configuration.

  • I can easily use MSAL to authenticate/authorize myself with the app.
  • Could use a private or public client with OIDC, but at the moment it seems disabled (I tried with my own user and it's disabled).
  • Or any other methods that you find preferable..

Scripted prod data fetch and indexing

*prerequisite to start this card = having access to production data.

  • Scripted data fetch and indexing (for ease and maybe automation)
  • Contact Daniel Brouseau
  • time based trigger
  • Code that support removing and adding documents from the vector index

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.