Giter Site home page Giter Site logo

finlab_data_center's Introduction

Finlab_Data_Center by Django

Purpose

Use Django framework to make a data-center website, in order to schedule crawlers or ETL functions by django-Q, which is a multiprocessing task queue for Django and has high performance and easy way to controll in django-admin website. In order to monitor ETL processes, we store the hook result(log) in django_admin database, then we could use these data to push and analysis info to our web or third party.

Data Flow

Data center basic env framework

MySQL Auth

mysql -u root
CREATE USER 'finlab'@'localhost' IDENTIFIED BY 'finlab';
GRANT ALL PRIVILEGES ON * . * TO 'finlab'@'localhost';

# change passward
ALTER USER 'finlab'@'localhost' IDENTIFIED BY 'passward';

TW_stock_setup

git clone https://gitlab.com/finlab_company_class/tw_stock.git
cp -r tw_stock/data finlab_data_center/tw_stock_class/

GCP Cloud SQL:

add IP4(search) in gcp sql connection

Django command

static file

use for django admin css no show in gunicorn server,run this command at first time in local use python manage.py collectstatic

Set env and install module

virtualenv venv
source venv/bin/activate
# primary module
pip install -r requirements-to-freeze.txt
# related module in requirements.txt
pip freeze > requirements.txt

DB related command

Only admin_db router could be migrate, because we need elastic in stock database.Use raw sql to define tables in stock database, remember to record raw sql files in sql-records directory.

# default setting is dev
python manage.py migrate --database=admin_db
python manage.py migrate --settings=finlab_data_center.settings.pro --database=admin_db

Create superuser

python manage.py createsuperuser --database=admin_db

Run jupyter by django-plus kernal

python manage.py shell_plus --notebook

add below code in ipynb file to connect django env

import sys,os
sys.path.append("..")
import django
django.setup()

Run python shell

python manage.py shell

Run Server

# only test
python manage.py runserver
# use gunicorn in production for security
gunicorn finlab_data_center.wsgi:application

Start app

python manage.py startapp us_data

use legacy db to export django-model format in model file python manage.py inspectdb --database=us_db > us_data/models.py

Create New Table Flow

  1. Define raw sql, and paste raw sql in records dir.

  2. Use inspectdb to print model codes in django ex:

     # show on terminal
     python manage.py inspectdb --database=us_db
     # export to modelfile
     python manage.py inspectdb --database=us_db > us_data/models.py
    
  3. Write crawlers and use CrawlerProcess.specified_date_crawl or SqlImporter.add_to_sql to create and test init data,don't use CrawlerProcess.auto_update_crawl firstly.

Crawlers

use etl/import_sql.py to control crawlers to import data.

Demo

  1. tw_data tw_data

  2. us_data us_data

Run django-Q

python manage.py qcluster

Use task func

we have two task funcs(to process for time series or non time series) and one hook func.

Schedule board

CRUD for tasks

args only accept str or number

File framework for app

  1. crawlers.py: write crawlers
  2. tasks.py: write schedule task functions
  3. models.py: define data format.

Notifications

use email to sent schdule daily report

Server(Gunicorn)

Deploy

Docker command(see /scripts)

build、tag、push

docker build -t finlab-data-center -f Dockerfile .

docker tag finlab-data-center asia.gcr.io/rare-mender-288209/finlab-data-center:v0.1.3

docker push asia.gcr.io/rare-mender-288209/finlab-data-center:v0.1.3

run

docker run --env PORT=8001 --env DJANGO_ENV=pro -it -p 8001:8001  finlab-data-center

use volume,Synchronous update with local, no need to rebuild

docker run --env PORT=8001 --env DJANGO_ENV=pro -it -p 8001:8001 -v "$(pwd):/app" finlab-data-center /bin/bash

Create Cluster

# init
gcloud config set project rare-mender-288209
gcloud config set compute/zone asia-east1-a

gcloud container clusters create finlab-micro-service --num-nodes=1

gcloud container clusters get-credentials finlab-micro-service

kubectl create deployment finlab-data-center --image=asia.gcr.io/rare-mender-288209/finlab-data-center:v0.1.3

kubectl expose deployment finlab-data-center --type LoadBalancer \
  --port 80 --target-port 8001

kubectl delete service finlab-data-center

gcloud container clusters delete finlab-data-center
1.use secret to set env args, sql proxy host must be localhost
kubectl create secret generic cloudsql-db-credentials --from-literal=DBACCOUNT=finlab_admin --from-literal=DBPASSWORD=qetuadgj --from-literal=DBHOST=127.0.0.1
3.set iam service account key
gcloud iam service-accounts keys create ~/key.json \
  --iam-account [email protected]

kubectl create secret generic cloudsql-instance-credentials --from-file=service_account.json="/Users/benbilly3/key.json"

first deploy(use k8s_deploy.yaml)

kubectl create -f k8s_deploy.yaml

rolling update(change image version)

kubectl set image deployment/finlab-data-center finlab-data-center=asia.gcr.io/rare-mender-288209/finlab-data-center:v0.1.3

Other notes

  1. config_json: note env args like db connection, api_key
  2. db_router.py: control multiple db connection
  3. settings dir: default env setting is dev, both two env are inherited from base.py. django-controller is setting in base.py.
  4. CHANGELOG.md: record work comments gor everyversion
  5. /scripts:record the command.sh

Reference

Django

Django-Q

Django-Q introduce in PyCon

Asynchronous tasks in Django with Django Q

GKE Asynchronous tasks in Django with Django Q

finlab_data_center's People

Contributors

benbilly3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.