Giter Site home page Giter Site logo

h1st-ai / h1st Goto Github PK

View Code? Open in Web Editor NEW
791.0 41.0 96.0 15.58 MB

Power Tools for AI Engineers With Deadlines

Home Page: https://h1st.ai

License: Other

Python 2.41% Jupyter Notebook 97.59%
data-science explainability home-automation time-series collaboration cybersecurity cold-start autonomous-vehicles automl avionics

h1st's People

Contributors

aht avatar arianaluongpham avatar ctn avatar enymuss avatar floer32 avatar hiro-v avatar khoama avatar loc-aitomatic avatar luongthevinh avatar nhanitvn avatar nqbao avatar nubs01 avatar phamhoangtuan avatar phanhongan avatar shiti avatar tgithubj avatar thangtp avatar thevinhluong102 avatar timroz24 avatar tqhuyen avatar trile avatar vophihungvn avatar zooeyn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

h1st's Issues

Better git workflow with better build and release CD pipelines

Is your feature request related to a problem? Please describe.

Recently I created a PR #158 that had been approved and merged to the main branch, even through everything went well in the PR it self, the merge commit that landed on main trigger another Github Action job to immediately Build and publish to TestPyPI and PyPI. This job then failed due when it cannot upload a wheel file with the same name to a previous deployed version. I understand that I can somehow create another PR to bump the package version so the package can be build and release properly. But I find this to be very strange:

  • First of all, I don't think every commit landed on main should be build and push to PyPI immediately, that would cause many unnecessary version changes, even if we some how tag these package version using the exact date and time when the CI job was ran.
  • Second, publish our package to TestPyPI, then PyPI directly next to it in the publish pipelines does not add any value. Why do we need to push the package to TestPyPI first if we don't even plan to do any test on the test package ?
  • And finally, I see that we have automatic tag creation at the end of the pipeline so user can I find the source code for each h1st version release on PyPI. But what about the change logs for each version ? Where and how our user can find the changes between each h1st version easily ?

Describe the solution you'd like

Inspired from other popular Python package, I think we should change our git workflow with better build and release CD pipelines to solve this problem:

  1. No longer trigger to Build and Release h1st python package on every merge to main. Instead what we should do is to only trigger this pipeline on release creation: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#release
  2. No longer publish our package to TestPyPI since it does not add any value
  3. Utilize the Release feature of Github to track each release version, with automatic change logs, generated by commit history so our user have a central place to know which changes between each h1st version. The release creation will also trigger the build and release CD pipeline, so it will only be run when it actually needed
  4. Better main merge policy, only allow squashed commit with link to the original PR so the git commit history on main will always be nice and readable, this also allow automatic change logs of Github to work very well

Describe alternatives you've considered
None

Additional context
None

Improve h1st output for explainable and trustable A.I

Is your feature request related to a problem? Please describe.
Power Tools for AI Engineers With Deadlines means quick prototyping and may cause unprecedented logical error. The should be embedded tool to visualize or show users with analysis on data and model.

Describe the solution you'd like

  • There is a folder for trust inside h1st which defines many concepts (Not yet stated in documentation) which uses shap and lime. There should be documentation and what it can help.
  • There are some tools which are published recently to produce such capabilities: explainerdashboard, Shapash, DALEX.
  • Add causal inference? This will shine in industries' cases where there is limited/ sparse data.

Describe alternatives you've considered
Some stated above, TBD

Additional context

  • Improve h1st output. Esp. when incorporating human knowledge into framework to produce models

Precheck for h1st installation and testing on different (OS, python version and CPU architecture)

Is your feature request related to a problem? Please describe.
Me, my colleagues and my customers feel frustrated when trying to install h1st in the machines as we can encounter multiple kind of errors. The cost in terms of time and labor is high, we have to wait so long for that to fix.

Describe the solution you'd like
Github action runs on Pull request to check with the combination of:

  • Python version: 3.8 to 3.10
  • OS: MacOS, Windows, Linux
  • CPU architecture: ARM or x86 (assuming we are all using 64bit)

Describe alternatives you've considered

  • Cirrus CI: It supports MacOS ARM container but there is no Mac intel container
  • Circle CI: There is no support for ARM now (beta)
  • Jenkins: We have to self host the runner, would make it a lot troublesome.

Additional context

  • Current implementation with github action does not cover Mac M1/ Windows ARM (because the cost for running such runner is high now), but we can cover it later

Release changelog

Is your feature request related to a problem? Please describe.

  • I find it frustrated to check what has been changed per h1st release in in terms of features, bug-fix
  • There is no changelog in h1st github, we should expect it similar to pytorch release. Also pypi h1st does not reflect the Github release page.

Describe the solution you'd like

  • Release based on git tag, automatically
  • Release nightly version on weekly basis

Describe alternatives you've considered
No.

Additional context

  • Very crucial to new and existing users.

CI failing due to deprecated Poetry installation

Describe the bug
CI failing when setting up environment and installing Poetry. This is probably due to the fact that the Poetry installation procedure has changed competely.

To Reproduce
Steps to reproduce the behavior:

  1. Go to #185
  2. See CI test result

Screen Shot 2023-02-24 at 8 06 52 AM

Screen Shot 2023-02-24 at 8 07 09 AM

Improve model repository behavior

From backlog list:

  • Currently, model repository only provides an unique ID for each model version. User should be able to use a user-defined tag to refer to a version.
  • Model repository should allow user to view all the versions and all the tags
  • Logging behavior should tells user which model is being load to improve user experience (currently, it just log which ID is being load)

Remove jupyter notebook inside h1st codebase

Is your feature request related to a problem? Please describe.

  • Jupyter notebook is heavy in terms of size
  • If h1st want to have example, there should be h1st-example

Describe the solution you'd like

  • Drop all jupyter notebook inside h1st codebase

Describe alternatives you've considered
No

Additional context

  • It looks more lean in terms of code

Ensemble

Problem statement

Currently, if users want to use h1st.stackensemble, they need to write many lines of client code.

Proposal

Given trained h1st.models, make a stack ensemble that doesn't require writing much client code from user.

Embedded graph visualization for model planning and execution

Is your feature request related to a problem? Please describe.
h1stis meant forPower Tools for AI Engineers With Deadlines` as it helps incorporating human knowledge with data to model quickly. I find it frustrated when trying to debug the modeling classes. There should be some kind of visualization.

Describe the solution you'd like

  • Model produced with h1st should have a picture as real graph- based execution. An interesting example has been written in Notebook.
  • All of steps can be in graph- based visualization.

Describe alternatives you've considered
Not yet

Additional context

  • Will improve user UX when using h1st

"pip3 install h1st" run with error on python3.7 32 bit

Describe the bug
If using python 3.7 32 bit on window 10, when running "pip3 install h1st", the following error ocurs:
PS C:\users\VietNV\AppData\Local\Programs\Python\Python37-32\Scripts> ./pip3 install h1st
Requirement already satisfied: h1st in c:\users\vietnv\appdata\local\programs\python\python37-32\lib\site-packages\h1st-2020.8-py3.7.egg (2020.8)
Collecting NumPy<1.19,>=1.18.4
Using cached numpy-1.18.5-cp37-cp37m-win32.whl (10.8 MB)
Collecting Pandas<1.2,>=1.0.4
Using cached pandas-1.1.3-cp37-cp37m-win32.whl (7.8 MB)
Collecting PyArrow>=0.17.1
Using cached pyarrow-1.0.1.tar.gz (1.3 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
Requirement already satisfied: cloudpickle==1.3.0 in c:\users\vietnv\appdata\local\programs\python\python37-32\lib\site-packages (from h1st) (1.3.0)
ERROR: Could not find a version that satisfies the requirement Ray==0.8.6 (from h1st) (from versions: none)
ERROR: No matching distribution found for Ray==0.8.6 (from h1st)
PS C:\users\VietNV\AppData\Local\Programs\Python\Python37-32\Scripts>

To Reproduce
Steps to reproduce the behavior:

  1. Go to Window PowerShell
  2. Run "pip3 install h1st"
  3. See error

Expected behavior
If h1st module doesnot work with python 32-bit, please mention this in userguide.

Desktop (please complete the following information):

  • OS: Win 10 Home

Example notebooks can't be run

The third cell of examples/Forecasting/notebooks/forecast.ipynb:

prepared_data = m.prep_data(m.load_data())

This fails because data is missing.
FileNotFoundError: [Errno 2] No such file or directory: './data/train.csv'


In examples/AutomotiveCybersecurity/notebooks/Automotive Cybersecurity - Cold Start Problem.ipynb

DATA_LOCATION = "COMING-SOON" df = pd.read_parquet('%s/train/attacks/20181113_Driver1_Trip1-0.parquet' % DATA_LOCATION)

This gets an OSError because there's no data.

Implement integration of data exploration/analysis as a “plugin“

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Better installation guide for h1st user

Is your feature request related to a problem? Please describe.
I find it difficult to follow when trying to install h1st on different machines

  • OS: Windows, MacOS (x86/ m1), Linux
    The current documentation is here.
    What is expected: Pytorch installation which is easier to follow with step by step.

Describe the solution you'd like
Learn from pytorch and apply to h1st, that would dramatically improve user experience when installing h1st. First impression is crucial.

Describe alternatives you've considered
No

Additional context

  • Me and my customer finds it difficult sometime to install system dependencies which is required by h1st.

Remove fsspec in pyproject.toml

Is your feature request related to a problem? Please describe.

  • fsspec is unused in pyproject.toml. Even Local Storage inside h1st at here does not use that.

Describe the solution you'd like

  • Remove fsspec inside pyproject.toml

Describe alternatives you've considered
No

Additional context

Feature request: Refactor Model/ Modeller

Problem Statement

Goals

Non-Goals

Proposed Solution

Deliverables

  • Design decision/ documentation)
  • Code implementation

Resource Plan

  • Owner: @ctn (design)
  • User: @vuonghoainam (Aitomatic k1st framework) and other customers
  • Member: @vuonghoainam

Add type annotations to h1st

Is your feature request related to a problem? Please describe.

  • It's standard for open source python framework to have type annotation.
  • User can add user- defined validators in advanced to prevent headache when running into wrong data type

Describe the solution you'd like

Describe alternatives you've considered

Additional context
No

H1st not installable on Py3.10

Currently installing H1st with Python 3.10 encounters a Poetry-related error, even when the Python requirement range is extended to cover Python 3.10.

The error message says either ModuleNotFoundError: No module named 'distutils.util' or ModuleNotFoundError: No module named 'poetry'

"pip3 install h1st" using python 3.9 doesnot work on win 10

Describe the bug
On window 10, python version is 3.9. When running "pip3 install h1st", the following error showed:
ERROR: Command errored out with exit status 1:
command: 'c:\python39\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\VietNV\AppData\Local\Temp\pip-install-sak903ey\h1st\setup.py'"'"'; file='"'"'C:\Users\VietNV\AppData\Local\Temp\pip-install-sak903ey\h1st\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\VietNV\AppData\Local\Temp\pip-pip-egg-info-y9mm4mm8'
cwd: C:\Users\VietNV\AppData\Local\Temp\pip-install-sak903ey\h1st
Complete output (7 lines):
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\VietNV\AppData\Local\Temp\pip-install-sak903ey\h1st\setup.py", line 18, in
long_description = f.read()
File "c:\python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 507: character maps to
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

To Reproduce
Steps to reproduce the behavior:

  1. Go to Command Prompt
  2. Run "pip3 install h1st"
  3. See error

Expected behavior
It should install h1st successfully without any error.

Desktop (please complete the following information):

  • OS: win 10 home

Universal way to work with files in remote object storage

Is your feature request related to a problem? Please describe.

  • Currently h1st uses s3fs defined in pyproject.toml and S3 storage.
  • h1st needs support for: GCS, Azure Blob, Aliyun OSS, etc ?
  • The vision of h1st is Power Tools for AI Engineers With Deadlines, we should decide to do this inside this project to boost data scientists' productivity.

Describe the solution you'd like

  • Simple but elegant implementation for client working with remote S3 compatible object storage in all cloud.
  • My suggestion is to ingrate cloudpathlib which is very active
  • The installation should split to cloud specific installation, e.g: h1st[gcp], h1st[azure], h1st[aws], h1st[aws], h1st[all]

Describe alternatives you've considered

  • fsspec familiy
  • The option of not including remote S3-compatible client inside h1st, but this one will need more examples?

Additional context

Update documentation

Is your feature request related to a problem? Please describe.
I find it frustrated when using h1st as I need to go into test folder to find out how to use features. This is wrong.

Describe the solution you'd like

  • Each modeller should have at least 1 working example, can be just IRIS
  • Add real examples?

Describe alternatives you've considered
TBD

Additional context

  • Improve how new and existing h1st users use it inside actual projects

Remove tensorflow from h1st dependencies

What ?

Currently, h1st require tensorflow = ">=2.10.0" as one of it dependencies but we should remove tensorflow from h1st dependencies list.

Why ?

tensorflow is not easy to install correctly on all architectures/platforms

Tensorflow is a machine learning framework that support multiple OS (Windows, Mac, Linux) and most of the Python version. But due to performance reason, it also have to support many different hardware acceleration method (CPU/GPU/TPU/NPU) and different compute architecture (x86, ARM, IBM Power). To take advantage of all these different platform, it have to depend on hardware specific framework and SDK such as: CUDA, CuDNN, TensorRT (for NVIDIA GPU) or MKL-DNN, OpenAPI (for Intel CPU), ROCm (for AMD CPU/GPU/APU), Metal, CoreML (for Apple CPU/GPU/Neural Engine), ... These SDK often get dynamically linked to the Tensorflow binary and require another level of dependencies management on it own so it can be utilize properly by Tensorflow.

Build, maintain, test and release all of these variation of the package is already a huge challenge for the Tensorflow team, and they already had to defer the work to multiple third party such as AWS, Intel and NVIDIA. This make it very difficult to trivially install Tensorflow on multiple platform, just take a quick look on this official installation instruction from Tensorflow. You can already see that each platform have it own installation command, some use conda, some use pip, some even have it's own separate python package.

Since it is crucial for the user to install the correct variation of Tensorflow so they can take full advantage of their hardware. And there is no way for poetry (our package manger) to accommodate all of that variation, we should not force h1st's user to go with a generic tensorflow>=2.10.0 requirement.

tensorflow is a heavy package that are huge in size

Looking at the PyPI pages that track all of tensorflow wheel files, we can see that in the majority of the wheel files for all platforms and python version have size bigger than 400 MB, if we count all the other dependencies that tensorflow require, it can easily go up to 500 MB. After unpack for installation, it could took close to a whole GB of space in most cases.

tensorflow does not get use very often, even internally by Aitomatic

Most of our work use scikit-learn and not tensorflow. For more complex model, pytorch are increasingly being adopted by the world and our own engineer instead of tensorflow. So why make tensorflow a mandatory dependencies on every single project that use h1st when it just sitting there as dead weight, costing us more in time, disk space and network transfer cost.

What if the user want to use their own tensorflow version ?

Many legacy ML system still uses tensorflow<=1.14 so by requiring tensorflow = ">=2.10.0" in h1st, we will lose compatibility with all of those system and potential user. This will slow down the adoption of H1st.

tensorflow can be safely remove from h1st codebase altogether

If we look for where tensorflow are being used in h1st codebase, you can quickly see that it is only being use for saving and loading MLModel that originate from Tensorflow.

This saving and loading logic can be implemented by h1st user to add support for any type of model that they might need to save and load along side with each h1st's MLModel. We already did this internally to support pytorch model as well. It is pretty easy to do ;)

How ?

We should remove tensorflow from h1st dependencies list, along side with any saving and loading code for tensorflow

Any alternatives ?

In theory one can provide a conda repo with every variation of the package similar to what pytorch did here

Even then:

  • It will still be limited by what platform and architecture that conda can support
  • It take massive amount of time and resource to do. Both Google and Meta with their massive amount of resource already gave up on build, maintain and release every variation by themself.

I think we should consider this alternatives impractical and shouldn't bother trying this way. Each user should decide how they want to install tensorflow for themself and how they want to use it with h1st.

Additional context

Same suggestion was made in #105

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.