Giter Site home page Giter Site logo

iusztinpaul / energy-forecasting Goto Github PK

View Code? Open in Web Editor NEW
789.0 14.0 177.0 4.2 MB

๐ŸŒ€ ๐—ง๐—ต๐—ฒ ๐—™๐˜‚๐—น๐—น ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ ๐Ÿณ-๐—ฆ๐˜๐—ฒ๐—ฝ๐˜€ ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€ ๐—™๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ | ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป ๐— ๐—Ÿ๐—˜ & ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€ for free by designing, building and deploying an end-to-end ML batch system ~ ๐˜ด๐˜ฐ๐˜ถ๐˜ณ๐˜ค๐˜ฆ ๐˜ค๐˜ฐ๐˜ฅ๐˜ฆ + 2.5 ๐˜ฉ๐˜ฐ๐˜ถ๐˜ณ๐˜ด ๐˜ฐ๐˜ง ๐˜ณ๐˜ฆ๐˜ข๐˜ฅ๐˜ช๐˜ฏ๐˜จ & ๐˜ท๐˜ช๐˜ฅ๐˜ฆ๐˜ฐ ๐˜ฎ๐˜ข๐˜ต๐˜ฆ๐˜ณ๐˜ช๐˜ข๐˜ญ๐˜ด

Home Page: https://www.pauliusztin.me/courses/the-full-stack-7-steps-mlops-framework

License: MIT License

Python 95.71% Dockerfile 1.73% Shell 2.56%
airflow batch-processing feature-store github-actions mlops python cicd docker fastapi gcp

energy-forecasting's Introduction

๐Ÿค– Paul Iusztin

Banner

Senior Machine Learning Engineer โ€ข MLOps โ€ข Contractor ~ Helping startups engineer production-ready ML/AI systems.

Views

About Me

I am a senior machine learning engineer and contractor with 5+ years of experience. I design and implement modular, scalable, and production-ready machine learning systems for startups worldwide.

๐Ÿ”ฅ My true passion is machine learning engineering.

๐Ÿ’› Secretly in love with software engineering.

๐ŸŽจ I enjoy sharing my knowledge through creating content about designing and productionizing ML systems.

๐Ÿช™ I am a blockchain and investing enthusiast.

๐Ÿ‘ฑ๐Ÿป Because I am not 100% a robot, I am also excited about self-development, psychology, cooking, hiking, skiing and my favorites ๐Ÿˆโ€โฌ› cats.


๐Ÿ‘‰ Check out The Full Stack 7-Steps MLOps Framework hands-on free course, where you will learn how to design, train, serve, and monitor an end-to-end ML batch system ~ source code + 2.5 hours of reading & video materials.


About my primary skills

Python

Pytorch

Pandas

Numpy

OpenCV

Linux

Docker

Git

AWS

PostgreSQL

Redis

FastAPI




โšซ๏ธ My top technologies โšซ๏ธ โšซ๏ธ My top interests โšซ๏ธ
โ€ข Python, SQL
โ€ข PyTorch, Scikit-Learn
โ€ข NumPy, Pandas
โ€ข AWS, GCP
โ€ข Docker, FastAPI, Airflow, Kafka, Spark
โ€ข DVC, Weights & Biases, MLFlow
โ€ข PostgreSQL, Elasticsearch, Redis
โ€ข MLOps
โ€ข generative AI
โ€ข computer vision
โ€ข recommender systems
โ€ข batch & online serving
โ€ข continuous training & monitoring
โ€ข REST API, gRPC & streaming design
โ€ข cloud & microservices
โ€ข distributed systems

.

๐Ÿ’ฌ Do you need machine learning solutions for your business? Let's discuss!

.


Let's connect โ†“

linkedin medium substack gmail twitter




โ†ณ Subscribe to my weekly newsletter: Decoding ML


๐ŸŽจ Creating content takes me a lot of time. If you enjoyed my work, you could support me by:

  1. joining Medium through my referral link, you can support me without any extra cost while enjoying limitless access to Mediumโ€™s rich collection of stories;
  2. buying me a coffee.


Thank you โœŒ๐Ÿผ !

energy-forecasting's People

Contributors

fuenal avatar gao-hongnan avatar iusztinpaul avatar kevmo avatar kurtispykes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

energy-forecasting's Issues

Error while running training-pipeline because it is not being able to find libomp.dylib

First of all, great work Paul!
I am running on Apple M1.
I was able to run the feature-pipeline without any issues. However when I run,
python -m training_pipeline.hyperparameter_tuning
as part of training-pipeline, I get the following error:

OSError: dlopen(/Users/vatulparakh/Library/Caches/pypoetry/virtualenvs/training-pipeline-8avOwJ3R-py3.9/lib/python3.9/site-packages/lightgbm/lib_lightgbm.so, 0x0006): Library not loaded: '/usr/local/opt/libomp/lib/libomp.dylib'
Referenced from: '/Users/vatulparakh/Library/Caches/pypoetry/virtualenvs/training-pipeline-8avOwJ3R-py3.9/lib/python3.9/site-packages/lightgbm/lib_lightgbm.so'
Reason: tried: '/usr/local/opt/libomp/lib/libomp.dylib' (no such file), '/usr/lib/libomp.dylib' (no such file)

Looks like some problem with it not being able to find the libomp library. Although I have this file in the following location:
/opt/homebrew/opt/libomp/lib

ml_pipeline DAG not running

Hi Paul,

Many thanks for the amazing course! The issue I encounter is that the ml_pipeline DAG is not running in Airflow (see screenshots).

Screenshot 2024-02-13 at 15 05 04 Screenshot 2024-02-13 at 15 20 59

When running 'docker compose up airflow-init', I don't get an error message but a warning which I don't know what to do about (see screenshot)
Screenshot 2024-02-13 at 15 27 27

I would appreciate your support to solve this. Thanks so much in advance!

Improve Poetry Environment Setup and Documentation.

This potential PR introduces several improvements to the setup and documentation of the
Poetry environment:

  • Poetry Environment Setup Scripts: Added scripts to facilitate the
    setting up of the Poetry environment, making it easier for users to get
    started.

  • Documentation Update: Updated the README.md in the feature-pipeline
    directory to reflect the setup of the Poetry environment. This makes it
    easier for new users to understand how to set up and use the Poetry
    environment.

Here's one other proposal:

  • Hopswork Project Name: Consider add Hopswork Project Name to .env
    file. I tried to create the project name as energy_consumption but unexpectedly, Hopswork
    said the project name is taken. Let me know if this proposal is a sensible change.

The changes can be viewed here and here for the scripts.

Is it possible to follow this course on a Windows machine?

I am a noob at Linux.

99% of everything I have done in Machine Learning has been via Anaconda on Windows (yes I know, also noob).

Can this course be done on windows, or would it be a headache?

Do you recommend I follow this on a virtual Linux machine?

Homebrew overwriting Path, making poetry dysfunctional

Hi, thanks @iusztinpaul and @kurtispykes for all your hardwork in here and helping noobies like me.

I am stuck at the installation step. I downloaded Poetry using curl -sSL https://install.python-poetry.org | python3 - and changed the path as well. It worked fine until I installed homebrew for installation of Apache. This changed the Path in .zprofile / .bashrc and now I do not know how to go about this. Would appreciate any help here.

P.S - I got this error :zsh: command not found: poetry

[Apple M1][Fix] Installing hopsworks

๐Ÿ‘‹

First of all, great job @iusztinpaul & @kurtispykes.
I've seen that this code was tested on Ubuntu. In case anyone tries to reproduce/run this on an Apple M1 chip, you might encouter some issues while installing hopsworks pip package.

If you get this log:
Screenshot 2023-05-03 at 19 20 47


You need to do the following:

  1. Install librdkafka (librdkafka <-- confluent_kafka <-- hopsworks): brew install librdkafka
  2. Check which version brew installed : ls /opt/homebrew/Cellar/librdkafka
  3. Export this env variable: export C_INCLUDE_PATH=/opt/homebrew/Cellar/librdkafka/VERSION/include (replace VERSION with yours)
  4. Export this env variable: export LIBRARY_PATH=/opt/homebrew/Cellar/librdkafka/VERSION/lib (replace VERSION with yours)

Run poetry install or poetry add hopsworks and it should be ok.

[TypeError] Error while running Feature-pipeline with Airflow

Hi guys, I'm getting the error while running the feature-pipeline on Airflow, I've followed the instructions as https://github.com/iusztinpaul/energy-forecasting#run and stuck here, the logs was look like this:

[2023-10-26, 21:48:13 +07] {process_utils.py:182} INFO - Executing cmd: /tmp/venvyb53_u_a/bin/python /tmp/venvyb53_u_a/script.py /tmp/venvyb53_u_a/script.in /tmp/venvyb53_u_a/script.out /tmp/venvyb53_u_a/string_args.txt /tmp/venvyb53_u_a/termination.log
[2023-10-26, 21:48:13 +07] {process_utils.py:186} INFO - Output:
[2023-10-26, 21:48:18 +07] {process_utils.py:190} INFO - INFO:main:export_end_datetime = 2023-10-26 14:46:59
[2023-10-26, 21:48:18 +07] {process_utils.py:190} INFO - INFO:main:days_delay = 15
[2023-10-26, 21:48:18 +07] {process_utils.py:190} INFO - INFO:main:days_export = 30
[2023-10-26, 21:48:18 +07] {process_utils.py:190} INFO - INFO:main:url = https://drive.google.com/uc?export=download&id=1y48YeDymLurOTUO-GeFOUXVNc9MCApG5
[2023-10-26, 21:48:18 +07] {process_utils.py:190} INFO - INFO:main:feature_group_version = 1
[2023-10-26, 21:48:18 +07] {process_utils.py:190} INFO - INFO:feature_pipeline.pipeline:Extracting data from API.
[2023-10-26, 21:48:18 +07] {process_utils.py:190} INFO - WARNING:feature_pipeline.etl.extract:We clapped 'export_end_reference_datetime' to 'datetime(2023, 6, 30) + datetime.timedelta(days=days_delay)' as the dataset will not be updated starting from July 2023. The dataset will expire during 2023. Check out the following link for more information: https://www.energidataservice.dk/tso-electricity/ConsumptionDE35Hour
[2023-10-26, 21:48:18 +07] {process_utils.py:190} INFO - INFO:feature_pipeline.etl.extract:Data already downloaded at: /opt/**/dags/output/data/ConsumptionDE35Hour.csv
[2023-10-26, 21:48:19 +07] {process_utils.py:190} INFO - INFO:feature_pipeline.pipeline:Successfully extracted data from API.
[2023-10-26, 21:48:19 +07] {process_utils.py:190} INFO - INFO:feature_pipeline.pipeline:Transforming data.
[2023-10-26, 21:48:19 +07] {process_utils.py:190} INFO - INFO:feature_pipeline.pipeline:Successfully transformed data.
[2023-10-26, 21:48:19 +07] {process_utils.py:190} INFO - INFO:feature_pipeline.pipeline:Building validation expectation suite.
[2023-10-26, 21:48:19 +07] {process_utils.py:190} INFO - INFO:feature_pipeline.pipeline:Successfully built validation expectation suite.
[2023-10-26, 21:48:19 +07] {process_utils.py:190} INFO - INFO:feature_pipeline.pipeline:Validating data and loading it to the feature store.
[2023-10-26, 21:48:19 +07] {process_utils.py:190} INFO - Connected. Call .close() to terminate connection gracefully.
[2023-10-26, 21:48:21 +07] {process_utils.py:190} INFO -
[2023-10-26, 21:48:21 +07] {process_utils.py:190} INFO -
[2023-10-26, 21:48:21 +07] {process_utils.py:190} INFO - UserWarning: The installed hopsworks client version 3.2.0 may not be compatible with the connected Hopsworks backend version 3.4.1.
[2023-10-26, 21:48:21 +07] {process_utils.py:190} INFO - To ensure compatibility please install the latest bug fix release matching the minor version of your backend (3.4) by running 'pip install hopsworks==3.4.
'
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO -
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/140438
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - Connected. Call .close() to terminate connection gracefully.
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - Traceback (most recent call last):
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - File "/tmp/venvyb53_u_a/script.py", line 90, in
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - res = run_feature_pipeline(*arg_dict["args"], **arg_dict["kwargs"])
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - File "/tmp/venvyb53_u_a/script.py", line 81, in run_feature_pipeline
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - return pipeline.run(
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - File "/tmp/venvyb53_u_a/lib/python3.9/site-packages/feature_pipeline/pipeline.py", line 61, in run
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - load.to_feature_store(
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - File "/tmp/venvyb53_u_a/lib/python3.9/site-packages/feature_pipeline/etl/load.py", line 24, in to_feature_store
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - feature_store = project.get_feature_store()
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - File "/tmp/venvyb53_u_a/lib/python3.9/site-packages/hopsworks/project.py", line 111, in get_feature_store
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - return connection(
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - File "/tmp/venvyb53_u_a/lib/python3.9/site-packages/hsfs/decorators.py", line 35, in if_connected
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - return fn(inst, *args, **kwargs)
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - File "/tmp/venvyb53_u_a/lib/python3.9/site-packages/hsfs/connection.py", line 178, in get_feature_store
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - return self._feature_store_api.get(util.rewrite_feature_store_name(name))
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - File "/tmp/venvyb53_u_a/lib/python3.9/site-packages/hsfs/core/feature_store_api.py", line 35, in get
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - return FeatureStore.from_response_json(
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - File "/tmp/venvyb53_u_a/lib/python3.9/site-packages/hsfs/feature_store.py", line 109, in from_response_json
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - return cls(**json_decamelized)
[2023-10-26, 21:48:23 +07] {process_utils.py:190} INFO - TypeError: init() missing 3 required positional arguments: 'hdfs_store_path', 'featurestore_description', and 'inode_id'
[2023-10-26, 21:48:26 +07] {taskinstance.py:1937} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/decorators/base.py", line 221, in execute
return_value = super().execute(context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 395, in execute
return super().execute(context=serializable_context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 192, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 609, in execute_callable
result = self._execute_python_callable_in_subprocess(python_path, tmp_path)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 463, in _execute_python_callable_in_subprocess
raise AirflowException(error_msg) from None
airflow.exceptions.AirflowException: Process returned non-zero exit status 1.
init() missing 3 required positional arguments: 'hdfs_store_path', 'featurestore_description', and 'inode_id'
[2023-10-26, 21:48:26 +07] {taskinstance.py:1400} INFO - Marking task as FAILED. dag_id=ml_pipeline, task_id=run_feature_pipeline, execution_date=20231026T144659, start_date=20231026T144705, end_date=20231026T144826
[2023-10-26, 21:48:26 +07] {standard_task_runner.py:104} ERROR - Failed to execute job 31 for task run_feature_pipeline (Process returned non-zero exit status 1.
init() missing 3 required positional arguments: 'hdfs_store_path', 'featurestore_description', and 'inode_id'; 3001)
[2023-10-26, 21:48:26 +07] {local_task_job_runner.py:228} INFO - Task exited with return code 1
[2023-10-26, 21:48:26 +07] {taskinstance.py:2778} INFO - 0 downstream tasks scheduled from follow-on schedule check

M1 still have issues using lightGBM

When I ran the bash scripts/install_poetry_macos_m1_chip.sh
I get the following error

Error: Failed to load cask: librdkafka.rb
Cask 'librdkafka' is unreadable: wrong constant name #<Class:0x00000001362484e8>

When I ran lightGBM, it's getting the following error

Traceback (most recent call last):
  File "/Users/bowen.chen/opt/anaconda3/envs/energy_forecasting_feature_pipeline/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/bowen.chen/opt/anaconda3/envs/energy_forecasting_feature_pipeline/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/bowen.chen/LLM/energy-forecasting/training-pipeline/training_pipeline/hyperparameter_tuning.py", line 18, in <module>
    from training_pipeline.models import build_model
  File "/Users/bowen.chen/LLM/energy-forecasting/training-pipeline/training_pipeline/models.py", line 1, in <module>
    import lightgbm as lgb
  File "/Users/bowen.chen/opt/anaconda3/envs/energy_forecasting_feature_pipeline/lib/python3.9/site-packages/lightgbm/__init__.py", line 8, in <module>
    from .basic import Booster, Dataset, Sequence, register_logger
  File "/Users/bowen.chen/opt/anaconda3/envs/energy_forecasting_feature_pipeline/lib/python3.9/site-packages/lightgbm/basic.py", line 110, in <module>
    _LIB = _load_lib()
  File "/Users/bowen.chen/opt/anaconda3/envs/energy_forecasting_feature_pipeline/lib/python3.9/site-packages/lightgbm/basic.py", line 101, in _load_lib
    lib = ctypes.cdll.LoadLibrary(lib_path[0])
  File "/Users/bowen.chen/opt/anaconda3/envs/energy_forecasting_feature_pipeline/lib/python3.9/ctypes/__init__.py", line 460, in LoadLibrary
    return self._dlltype(name)
  File "/Users/bowen.chen/opt/anaconda3/envs/energy_forecasting_feature_pipeline/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/bowen.chen/opt/anaconda3/envs/energy_forecasting_feature_pipeline/lib/python3.9/site-packages/lightgbm/lib_lightgbm.so, 0x0006): Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib
  Referenced from: <D21A7969-4567-3BC7-94ED-6A9E83AE9D78> /Users/bowen.chen/opt/anaconda3/envs/energy_forecasting_feature_pipeline/lib/python3.9/site-packages/lightgbm/lib_lightgbm.so
  Reason: tried: '/usr/local/opt/libomp/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/local/opt/libomp/lib/libomp.dylib' (no such file), '/usr/local/opt/libomp/lib/libomp.dylib' (no such file), '/usr/local/lib/libomp.dylib' (no such file), '/usr/lib/libomp.dylib' (no such file, not in dyld cache)

Is there anything I did wrong?

Problem with Poetry and twofish package

When trying to build a poetry environment in the training_pipeline section of the project, I am getting the error shown in the screenshot below. It appears that "twofish" is a package that is a required dependency of one of the poetry packages. This is preventing me from progressing in the course.

Screenshot 2024-02-06 at 11 58 07โ€ฏAM

docker compose on M1 Mac

When running the line:

docker compose --env-file .env up --build -d

I am getting the error:

no matching manifest for linux/arm64/v8 in the manifest list entries

I believe this problem is because the code is written to be run on an Ubuntu system, which defaults to a different image type, than on my Mac.

https://forums.docker.com/t/run-x86-intel-and-arm-based-images-on-apple-silicon-m1-macs/117123

When I try to add a line to the docker-compose.yaml file to specific a different platform, I get an error. Any ideas how to fix this?

Task `run_hyperparameter_tuning` not end

Thank @iusztinpaul was answer me the question before. I 'm not sure all my step are correct so I create this issues.
My pipe line block as train_from_best_config task. When I read log in air flow, it said I should run the hyperparameter first. The error log below:

[2023-10-08, 03:16:32 UTC] {process_utils.py:190} INFO -   File "/tmp/venv2ub9pbpz/script.py", line 40, in train_from_best_config
[2023-10-08, 03:16:32 UTC] {process_utils.py:190} INFO -     raise RuntimeError(
[2023-10-08, 03:16:32 UTC] {process_utils.py:190} INFO - RuntimeError: No best config found. Please run hyperparameter tuning first.
[2023-10-08, 03:16:35 UTC] {taskinstance.py:1943} ERROR - Task failed with exception

Then, I change variable ml_pipeline_should_run_hyperparameter_tuning from False to True and task run_hyperparameter_tuning is running instead skip before. But that task look like never ever end (when I post this issue, It has running 2,5 hours). I try mask this task to success but both tasks upload_best_config and train_from_best_config are failed.
The log of hyperparameter task is (it still running)

[2023-10-08, 12:30:05 UTC] {process_utils.py:190} INFO - wandb: ๐Ÿš€ View run at https://wandb.ai/mlops-cs2215-ch1701/energy_consumption/runs/c7y26bdp
[2023-10-08, 12:37:29 UTC] {process_utils.py:190} INFO - 2023-10-08 12:37:29,817 INFO: Validation MAPE: 0.14
[2023-10-08, 12:37:29 UTC] {process_utils.py:190} INFO - 2023-10-08 12:37:29,817 INFO: Mean fit time: 133.19 s
[2023-10-08, 12:37:29 UTC] {process_utils.py:190} INFO - 2023-10-08 12:37:29,817 INFO: Mean predict time: 11.20 s
[2023-10-08, 12:37:30 UTC] {process_utils.py:190} INFO - wandb: Waiting for W&B process to finish... (success).
[2023-10-08, 12:37:36 UTC] {process_utils.py:190} INFO - wandb: - 0.020 MB of 0.020 MB uploaded (0.000 MB deduped)
wandb: \ 0.020 MB of 0.024 MB uploaded (0.000 MB deduped)
wandb: | 0.024 MB of 0.024 MB uploaded (0.000 MB deduped)
wandb: / 0.024 MB of 0.024 MB uploaded (0.000 MB deduped)
wandb: ๐Ÿš€ View run experiment_2023-10-08_12-29-58 at: https://wandb.ai/mlops-cs2215-ch1701/energy_consumption/runs/c7y26bdp
[2023-10-08, 12:37:36 UTC] {process_utils.py:190} INFO - wandb: Synced 5 W&B file(s), 1 media file(s), 0 artifact file(s) and 0 other file(s)
[2023-10-08, 12:37:36 UTC] {process_utils.py:190} INFO - wandb: Find logs at: ./wandb/run-20231008_123000-c7y26bdp/logs
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: Agent Starting Run: ibt7dj90 with config:
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: 	daily_season__manual_selection: ['day_of_week', 'hour_of_day']
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: 	forecaster__estimator__learning_rate: 0.1
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: 	forecaster__estimator__max_depth: 5
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: 	forecaster__estimator__n_estimators: 1000
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: 	forecaster__estimator__n_jobs: -1
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: 	forecaster__estimator__reg_lambda: 0.01
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: 	forecaster_transformers__window_summarizer__lag_feature__lag: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: 	forecaster_transformers__window_summarizer__lag_feature__mean: [[1, 24], [1, 48], [1, 72]]
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: 	forecaster_transformers__window_summarizer__lag_feature__std: [[1, 24], [1, 48]]
[2023-10-08, 12:37:40 UTC] {process_utils.py:190} INFO - wandb: 	forecaster_transformers__window_summarizer__n_jobs: 1
[2023-10-08, 12:37:41 UTC] {process_utils.py:190} INFO - wandb: WARNING Ignored wandb.init() arg project when running a sweep.
[2023-10-08, 12:37:41 UTC] {process_utils.py:190} INFO - wandb: WARNING Ignored wandb.init() arg entity when running a sweep.
[2023-10-08, 12:37:43 UTC] {process_utils.py:190} INFO - wandb: wandb version 0.15.12 is available!  To upgrade, please run:
[2023-10-08, 12:37:43 UTC] {process_utils.py:190} INFO - wandb:  $ pip install wandb --upgrade
[2023-10-08, 12:37:43 UTC] {process_utils.py:190} INFO - wandb: Tracking run with wandb version 0.14.2
[2023-10-08, 12:37:43 UTC] {process_utils.py:190} INFO - wandb: Run data is saved locally in /opt/***/wandb/run-20231008_123741-ibt7dj90
[2023-10-08, 12:37:43 UTC] {process_utils.py:190} INFO - wandb: Run `wandb offline` to turn off syncing.
[2023-10-08, 12:37:43 UTC] {process_utils.py:190} INFO - wandb: Syncing run experiment_2023-10-08_12-37-40
[2023-10-08, 12:37:43 UTC] {process_utils.py:190} INFO - wandb: โญ๏ธ View project at https://wandb.ai/mlops-cs2215-ch1701/energy_consumption
[2023-10-08, 12:37:43 UTC] {process_utils.py:190} INFO - wandb: ๐Ÿงน View sweep at https://wandb.ai/mlops-cs2215-ch1701/energy_consumption/sweeps/5ir39mxy
[2023-10-08, 12:37:43 UTC] {process_utils.py:190} INFO - wandb: ๐Ÿš€ View run at https://wandb.ai/mlops-cs2215-ch1701/energy_consumption/runs/ibt7dj90

When I mask it as success, two tasks fail have log bellow

[2023-10-08, 10:01:13 UTC] {process_utils.py:190} INFO -   File "/tmp/venvjc13ytim/script.py", line 37, in <module>
[2023-10-08, 10:01:13 UTC] {process_utils.py:190} INFO -     res = upload_best_config(*arg_dict["args"], **arg_dict["kwargs"])
[2023-10-08, 10:01:13 UTC] {process_utils.py:190} INFO -   File "/tmp/venvjc13ytim/script.py", line 34, in upload_best_config
[2023-10-08, 10:01:13 UTC] {process_utils.py:190} INFO -     best_config.upload(sweep_id=last_sweep_metadata["sweep_id"])
[2023-10-08, 10:01:13 UTC] {process_utils.py:190} INFO - TypeError: 'NoneType' object is not subscriptable
[2023-10-08, 10:01:14 UTC] {taskinstance.py:1943} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/decorators/base.py", line 221, in execute
    return_value = super().execute(context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 395, in execute
    return super().execute(context=serializable_context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 192, in execute
    return_value = self.execute_callable()
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 602, in execute_callable
    result = self._execute_python_callable_in_subprocess(python_path, tmp_path)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 463, in _execute_python_callable_in_subprocess
    raise AirflowException(error_msg) from None
airflow.exceptions.AirflowException: Process returned non-zero exit status 1.
'NoneType' object is not subscriptable
[2023-10-08, 10:01:14 UTC] {taskinstance.py:1400} INFO - Marking task as FAILED. dag_id=ml_pipeline, task_id=upload_best_config, execution_date=20231008T060000, start_date=20231008T095947, end_date=20231008T100114
[2023-10-08, 10:01:14 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 121 for task upload_best_config (Process returned non-zero exit status 1.
'NoneType' object is not subscriptable; 10084)
[2023-10-08, 10:01:14 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 1
[2023-10-08, 10:01:14 UTC] {taskinstance.py:2784} INFO - 1 downstream tasks scheduled from follow-on schedule check

and

[2023-10-08, 10:02:36 UTC] {process_utils.py:190} INFO -   File "/tmp/venvg8qg89hr/script.py", line 40, in train_from_best_config
[2023-10-08, 10:02:36 UTC] {process_utils.py:190} INFO -     raise RuntimeError(
[2023-10-08, 10:02:36 UTC] {process_utils.py:190} INFO - RuntimeError: No best config found. Please run hyperparameter tuning first.
[2023-10-08, 10:02:37 UTC] {taskinstance.py:1943} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/decorators/base.py", line 221, in execute
    return_value = super().execute(context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 395, in execute

So where am I wrong and how should I fix it. I paste ml-pipeline graph bellow to more clearly if you need.
image

'feature_group_id' error while running pipeline.py

Hey Paul,
I am getting the following error while running
python3 -m feature_pipeline.pipeline

File "/Users/vatulparakh/Downloads/energy-forecasting/venv/lib/python3.9/site-packages/hsfs/feature_group.py", line 656, in expectation_suite
self._expectation_suite = ExpectationSuite(**expectation_suite)
TypeError: init() got an unexpected keyword argument 'feature_group_id'

This was running just fine last week. But now, I see this error. How do I resolve this?

Permissions error in training_pipeline.hyperparameter_tuning -

I've rerun the feature-pipeline steps and re-logged into wandb as my user w/ the API key in .env. I tried pre-creating the project with 'open' access rights with the same effect.

python -m training_pipeline.hyperparameter_tuning

Connected. Call .close() to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/560278
Connected. Call .close() to terminate connection gracefully.
wandb: Currently logged in as: jodyhuntatx. Use wandb login --relogin to force relogin
wandb: ERROR Error while calling W&B API: permission denied (<Response [403]>)
Problem at: /home/demo/AIML/energy-forecasting/training-pipeline/training_pipeline/utils.py 117 init_wandb_run
wandb: ERROR It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 403: Forbidden)
Traceback (most recent call last):
File "/home/demo/.pyenv/versions/3.9.19/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/demo/.pyenv/versions/3.9.19/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/demo/AIML/energy-forecasting/training-pipeline/training_pipeline/hyperparameter_tuning.py", line 174, in
fire.Fire(run)
File "/home/demo/.cache/pypoetry/virtualenvs/training-pipeline-Z8Uy0jSz-py3.9/lib/python3.9/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/demo/.cache/pypoetry/virtualenvs/training-pipeline-Z8Uy0jSz-py3.9/lib/python3.9/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/demo/.cache/pypoetry/virtualenvs/training-pipeline-Z8Uy0jSz-py3.9/lib/python3.9/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/demo/AIML/energy-forecasting/training-pipeline/training_pipeline/hyperparameter_tuning.py", line 50, in run
y_train, _, X_train, _ = load_dataset_from_feature_store(
File "/home/demo/AIML/energy-forecasting/training-pipeline/training_pipeline/data.py", line 31, in load_dataset_from_feature_store
with init_wandb_run(
File "/home/demo/AIML/energy-forecasting/training-pipeline/training_pipeline/utils.py", line 117, in init_wandb_run
run = wandb.init(
File "/home/demo/.cache/pypoetry/virtualenvs/training-pipeline-Z8Uy0jSz-py3.9/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1206, in init
raise e
File "/home/demo/.cache/pypoetry/virtualenvs/training-pipeline-Z8Uy0jSz-py3.9/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1187, in init
run = wi.init()
File "/home/demo/.cache/pypoetry/virtualenvs/training-pipeline-Z8Uy0jSz-py3.9/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 786, in init
raise error
wandb.errors.CommError: It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 403: Forbidden)

Hopsworks can't create projects, can't find API keys

there seems to be a change in the hopworks UI, now I can't seem to create new projects, as there's no create project button.
image

Also, I tried to create an API key from here

image

but I kept on getting this error

RestAPIError: Metadata operation error: (url: https://c.app.hopsworks.ai/hopsworks-api/api/project). Server response: 
HTTP code: 401, HTTP reason: Unauthorized, body: b'{"errorCode":320002,"errorMsg":"Api key not found"}', error code: 320002, error msg: Api key not found, user msg: 

Any ideas?

Error while running ml_pipeline DAG

@iusztinpaul
Getting the following error while running the ml_pipeline DAG. I am on Mac M1.

3fbd685c2f7e
*** Found local files:
*** * /opt/airflow/logs/dag_id=ml_pipeline/run_id=manual__2023-08-21T13:03:22.018425+00:00/task_id=run_feature_pipeline/attempt=1.log
[2023-08-21, 13:03:25 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: ml_pipeline.run_feature_pipeline manual__2023-08-21T13:03:22.018425+00:00 [queued]>
[2023-08-21, 13:03:25 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: ml_pipeline.run_feature_pipeline manual__2023-08-21T13:03:22.018425+00:00 [queued]>
[2023-08-21, 13:03:25 UTC] {taskinstance.py:1308} INFO - Starting attempt 1 of 1
[2023-08-21, 13:03:25 UTC] {taskinstance.py:1327} INFO - Executing <Task(PythonVirtualenvDecoratedOperator): run_feature_pipeline> on 2023-08-21 13:03:22.018425+00:00
[2023-08-21, 13:03:25 UTC] {standard_task_runner.py:57} INFO - Started process 622 to run task
[2023-08-21, 13:03:25 UTC] {standard_task_runner.py:84} INFO - Running: ['***', 'tasks', 'run', 'ml_pipeline', 'run_feature_pipeline', 'manual__2023-08-21T13:03:22.018425+00:00', '--job-id', '7', '--raw', '--subdir', 'DAGS_FOLDER/ml_pipeline_dag.py', '--cfg-path', '/tmp/tmp1avv9lk
']
[2023-08-21, 13:03:25 UTC] {standard_task_runner.py:85} INFO - Job 7: Subtask run_feature_pipeline
[2023-08-21, 13:03:25 UTC] {task_command.py:410} INFO - Running <TaskInstance: ml_pipeline.run_feature_pipeline manual__2023-08-21T13:03:22.018425+00:00 [running]> on host 3fbd685c2f7e
[2023-08-21, 13:03:25 UTC] {taskinstance.py:1547} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='' AIRFLOW_CTX_DAG_ID='ml_pipeline' AIRFLOW_CTX_TASK_ID='run_feature_pipeline' AIRFLOW_CTX_EXECUTION_DATE='2023-08-21T13:03:22.018425+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-08-21T13:03:22.018425+00:00'
[2023-08-21, 13:03:25 UTC] {process_utils.py:181} INFO - Executing cmd: /usr/local/bin/python -m virtualenv /tmp/venvh_7hgdul --system-site-packages --python=python3.9
[2023-08-21, 13:03:25 UTC] {process_utils.py:185} INFO - Output:
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - created virtual environment CPython3.9.2.final.0-64 in 574ms
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - creator CPython3Posix(dest=/tmp/venvh_7hgdul, clear=False, no_vcs_ignore=False, global=True)
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/
/.local/share/virtualenv)
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - added seed packages: pip==23.1, setuptools==67.6.1, wheel==0.40.0
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
[2023-08-21, 13:03:27 UTC] {process_utils.py:181} INFO - Executing cmd: /tmp/venvh_7hgdul/bin/pip install -r /tmp/venvh_7hgdul/requirements.txt
[2023-08-21, 13:03:27 UTC] {process_utils.py:185} INFO - Output:
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - Traceback (most recent call last):
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - File "/tmp/venvh_7hgdul/bin/pip", line 5, in
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - from pip._internal.cli.main import main
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - File "/tmp/venvh_7hgdul/lib/python3.9/site-packages/pip/_internal/cli/main.py", line 9, in
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - from pip._internal.cli.autocompletion import autocomplete
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - File "/tmp/venvh_7hgdul/lib/python3.9/site-packages/pip/_internal/cli/autocompletion.py", line 10, in
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - from pip._internal.cli.main_parser import create_main_parser
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - File "/tmp/venvh_7hgdul/lib/python3.9/site-packages/pip/_internal/cli/main_parser.py", line 9, in
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - from pip._internal.build_env import get_runnable_pip
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - File "/tmp/venvh_7hgdul/lib/python3.9/site-packages/pip/_internal/build_env.py", line 19, in
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - from pip._internal.cli.spinners import open_spinner
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - File "/tmp/venvh_7hgdul/lib/python3.9/site-packages/pip/_internal/cli/spinners.py", line 9, in
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - from pip._internal.utils.logging import get_indentation
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - File "/tmp/venvh_7hgdul/lib/python3.9/site-packages/pip/_internal/utils/logging.py", line 29, in
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - from pip._internal.utils.misc import ensure_dir
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - File "/tmp/venvh_7hgdul/lib/python3.9/site-packages/pip/_internal/utils/misc.py", line 44, in
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - from pip._internal.locations import get_major_minor_version
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - File "/tmp/venvh_7hgdul/lib/python3.9/site-packages/pip/_internal/locations/init.py", line 66, in
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - from . import _distutils
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - File "/tmp/venvh_7hgdul/lib/python3.9/site-packages/pip/_internal/locations/_distutils.py", line 20, in
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - from distutils.cmd import Command as DistutilsCommand
[2023-08-21, 13:03:27 UTC] {process_utils.py:189} INFO - ModuleNotFoundError: No module named 'distutils.cmd'
[2023-08-21, 13:03:27 UTC] {taskinstance.py:1824} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/decorators/base.py", line 220, in execute
return_value = super().execute(context)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 374, in execute
return super().execute(context=serializable_context)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 181, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 575, in execute_callable
pip_install_options=self.pip_install_options,
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/python_virtualenv.py", line 99, in prepare_virtualenv
execute_in_subprocess(pip_cmd)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/process_utils.py", line 170, in execute_in_subprocess
execute_in_subprocess_with_kwargs(cmd, cwd=cwd)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/process_utils.py", line 193, in execute_in_subprocess_with_kwargs
raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['/tmp/venvh_7hgdul/bin/pip', 'install', '-r', '/tmp/venvh_7hgdul/requirements.txt']' returned non-zero exit status 1.
[2023-08-21, 13:03:27 UTC] {taskinstance.py:1350} INFO - Marking task as FAILED. dag_id=ml_pipeline, task_id=run_feature_pipeline, execution_date=20230821T130322, start_date=20230821T130325, end_date=20230821T130327
[2023-08-21, 13:03:27 UTC] {standard_task_runner.py:109} ERROR - Failed to execute job 7 for task run_feature_pipeline (Command '['/tmp/venvh_7hgdul/bin/pip', 'install', '-r', '/tmp/venvh_7hgdul/requirements.txt']' returned non-zero exit status 1.; 622)
[2023-08-21, 13:03:27 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code 1
[2023-08-21, 13:03:27 UTC] {taskinstance.py:2653} INFO - 0 downstream tasks scheduled from follow-on schedule check

Error while running training-pipeline because it is not being able to find libomp.dylib

First of all, great work Paul!
I am running on Apple M1.
I was able to run the feature-pipeline without any issues. However when I run,
python -m training_pipeline.hyperparameter_tuning
as part of training-pipeline, I get the following error:

OSError: dlopen(/Users/vatulparakh/Library/Caches/pypoetry/virtualenvs/training-pipeline-8avOwJ3R-py3.9/lib/python3.9/site-packages/lightgbm/lib_lightgbm.so, 0x0006): Library not loaded: '/usr/local/opt/libomp/lib/libomp.dylib'
Referenced from: '/Users/vatulparakh/Library/Caches/pypoetry/virtualenvs/training-pipeline-8avOwJ3R-py3.9/lib/python3.9/site-packages/lightgbm/lib_lightgbm.so'
Reason: tried: '/usr/local/opt/libomp/lib/libomp.dylib' (no such file), '/usr/lib/libomp.dylib' (no such file)

Looks like some problem with it not being able to find the libomp library. Although I have this file in the following location:
/opt/homebrew/opt/libomp/lib

lesson 3 poetry Installing training-pipeline (0.1.0): Failed

So I configured pypi credentials:

sudo apt install -y apache2-utils
pip install passlib
mkdir ~/.htpasswd
htpasswd -sc ~/.htpasswd/htpasswd.txt energy-forecasting
poetry config repositories.my-pypi http://localhost
poetry config http-basic.my-pypi energy-forecasting <password>

That gave me this, the password seems missing, but not sure if it is expected here or not:

โฏ cat ~/.config/pypoetry/auth.toml
[http-basic.my-pypi]
username = "energy-forecasting"

set up the pypi server separately:

docker run -p 80:8080 -v ~/.htpasswd:/data/.htpasswd pypiserver/pypiserver:latest run -P .htpasswd/htpasswd.txt --overwrite

Built and published in training-pipeline
But poetry install in batch-predictions-pipeline failed:

Using python3 (3.9.10)
Installing dependencies from lock file

Package operations: 1 install, 0 updates, 0 removals

 โ€ข Installing training-pipeline (0.1.0): Failed

 RuntimeError

 Retrieved digest for link training_pipeline-0.1.0.tar.gz(md5:ddbaa...) not in poetry.lock metadata {'sha256:acf4ba6c...', 'sha256:b5db651b07...'}

 at ~/.local/share/pypoetry/venv/lib/python3.10/site-packages/poetry/installation/chooser.py:117 in _get_links
     113โ”‚ 
     114โ”‚             selected_links.append(link)
     115โ”‚ 
     116โ”‚         if links and not selected_links:
   โ†’ 117โ”‚             raise RuntimeError(
     118โ”‚                 f"Retrieved digest for link {link.filename}({h}) not in poetry.lock"
     119โ”‚                 f" metadata {hashes}"
     120โ”‚             )
     121โ”‚ 

lesson 1 feature pipeline.

I am using wsl to go through the course. Though am getting stuck here `python -m feature_pipeline.pipeline', I am getting an error
'ModuleNotFoundError: No module named '_bz2'', I have tried to install machine but the error seems not to go away, Could it be that my python maybe isnot installed well.

Permission error running DAG

Running the ml_pipeline DAG on airflow failed at the run_feature_pipeline stage.

[2023-07-10, 20:04:07 UTC] {process_utils.py:189} INFO -   Created wheel for pyhopshive: filename=PyHopsHive-0.6.4.1.dev0-py3-none-any.whl size=48570 sha256=cc58cad2e6520348f8a554bfec9d584a5de309dfce6d8e656801672923f0ed2d
[2023-07-10, 20:04:07 UTC] {process_utils.py:189} INFO -   Stored in directory: /home/***/.cache/pip/wheels/36/91/cc/cb4be3581fd6cd95b73d2b229d66c8975c2d0a1087fa8703b2
[2023-07-10, 20:04:07 UTC] {process_utils.py:189} INFO - Successfully built fire hopsworks avro hsml hsfs twofish thrift future pyhopshive
[2023-07-10, 20:04:08 UTC] {process_utils.py:189} INFO - Installing collected packages: wcwidth, twofish, pytz, pyhumps, pure-eval, ptyprocess, pickleshare, javaobj-py3, fastjsonschema, executing, dataclasses, confluent-kafka, backcall, zipp, urllib3, tzlocal, typing-extensions, traitlets, tqdm, toolz, termcolor, six, ruamel.yaml.clib, rpds-py, python-dotenv, pyparsing, PyMySQL, pygments, pycryptodomex, pycparser, pyasn1, prompt-toolkit, platformdirs, pexpect, parso, packaging, numpy, multidict, mock, mistune, markupsafe, jsonpointer, jmespath, idna, greenlet, future, fastavro, entrypoints, decorator, colorama, Click, charset-normalizer, certifi, avro, attrs, yarl, thrift, sqlalchemy, scipy, ruamel.yaml, requests, referencing, python-dateutil, pyasn1-modules, pyarrow, orderedmultidict, matplotlib-inline, jupyter-core, jsonpatch, jinja2, jedi, importlib-metadata, fire, cffi, asttokens, stack-data, pyjks, pandas, jsonschema-specifications, furl, cryptography, botocore, s3transfer, pyhopshive, jsonschema, Ipython, nbformat, boto3, altair, hsml, great_expectations, hsfs, hopsworks, feature_pipeline
[2023-07-10, 20:04:31 UTC] {process_utils.py:189} INFO - Successfully installed Click-8.1.4 Ipython-8.14.0 PyMySQL-1.1.0 altair-4.2.2 asttokens-2.2.1 attrs-23.1.0 avro-1.11.0 backcall-0.2.0 boto3-1.28.1 botocore-1.31.1 certifi-2023.5.7 cffi-1.15.1 charset-normalizer-3.2.0 colorama-0.4.6 confluent-kafka-1.9.0 cryptography-41.0.1 dataclasses-0.6 decorator-5.1.1 entrypoints-0.4 executing-1.2.0 fastavro-1.7.3 fastjsonschema-2.17.1 feature_pipeline-0.1.0 fire-0.5.0 furl-2.1.3 future-0.18.3 great_expectations-0.14.13 greenlet-2.0.2 hopsworks-3.2.0 hsfs-3.2.0 hsml-3.2.0 idna-3.4 importlib-metadata-6.8.0 javaobj-py3-0.4.3 jedi-0.18.2 jinja2-3.0.3 jmespath-1.0.1 jsonpatch-1.33 jsonpointer-2.4 jsonschema-4.18.0 jsonschema-specifications-2023.6.1 jupyter-core-5.3.1 markupsafe-2.0.1 matplotlib-inline-0.1.6 mistune-3.0.1 mock-5.0.2 multidict-6.0.4 nbformat-5.9.1 numpy-1.25.1 orderedmultidict-1.0.1 packaging-23.1 pandas-1.5.3 parso-0.8.3 pexpect-4.8.0 pickleshare-0.7.5 platformdirs-3.8.1 prompt-toolkit-3.0.39 ptyprocess-0.7.0 pure-eval-0.2.2 pyarrow-12.0.1 pyasn1-0.5.0 pyasn1-modules-0.3.0 pycparser-2.21 pycryptodomex-3.18.0 pygments-2.15.1 pyhopshive-0.6.4.1.dev0 pyhumps-1.6.1 pyjks-20.0.0 pyparsing-2.4.7 python-dateutil-2.8.2 python-dotenv-1.0.0 pytz-2023.3 referencing-0.29.1 requests-2.31.0 rpds-py-0.8.10 ruamel.yaml-0.17.17 ruamel.yaml.clib-0.2.7 s3transfer-0.6.1 scipy-1.11.1 six-1.16.0 sqlalchemy-2.0.18 stack-data-0.6.2 termcolor-2.3.0 thrift-0.16.0 toolz-0.12.0 tqdm-4.65.0 traitlets-5.9.0 twofish-0.3.0 typing-extensions-4.7.1 tzlocal-5.0.1 urllib3-1.26.16 wcwidth-0.2.6 yarl-1.9.2 zipp-3.16.0
[2023-07-10, 20:04:31 UTC] {process_utils.py:189} INFO - 
[2023-07-10, 20:04:31 UTC] {process_utils.py:189} INFO - [notice] A new release of pip is available: 23.1 -> 23.1.2
[2023-07-10, 20:04:31 UTC] {process_utils.py:189} INFO - [notice] To update, run: /tmp/venvx9zfic8n/bin/python -m pip install --upgrade pip
[2023-07-10, 20:04:31 UTC] {process_utils.py:181} INFO - Executing cmd: /tmp/venvx9zfic8n/bin/python /tmp/venvx9zfic8n/script.py /tmp/venvx9zfic8n/script.in /tmp/venvx9zfic8n/script.out /tmp/venvx9zfic8n/string_args.txt
[2023-07-10, 20:04:31 UTC] {process_utils.py:185} INFO - Output:
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - Traceback (most recent call last):
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/usr/lib/python3.9/pathlib.py", line 1312, in mkdir
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     self._accessor.mkdir(self, mode)
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - FileNotFoundError: [Errno 2] No such file or directory: '/home/hud/projects/energy-forecasting/output'
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - 
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - During handling of the above exception, another exception occurred:
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - 
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - Traceback (most recent call last):
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/usr/lib/python3.9/pathlib.py", line 1312, in mkdir
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     self._accessor.mkdir(self, mode)
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - FileNotFoundError: [Errno 2] No such file or directory: '/home/hud/projects/energy-forecasting'
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - 
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - During handling of the above exception, another exception occurred:
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - 
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - Traceback (most recent call last):
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/usr/lib/python3.9/pathlib.py", line 1312, in mkdir
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     self._accessor.mkdir(self, mode)
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - FileNotFoundError: [Errno 2] No such file or directory: '/home/hud/projects'
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - 
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - During handling of the above exception, another exception occurred:
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - 
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - Traceback (most recent call last):
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/tmp/venvx9zfic8n/script.py", line 89, in <module>
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     res = run_feature_pipeline(*arg_dict["args"], **arg_dict["kwargs"])
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/tmp/venvx9zfic8n/script.py", line 59, in run_feature_pipeline
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     from feature_pipeline import utils, pipeline
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/tmp/venvx9zfic8n/lib/python3.9/site-packages/feature_pipeline/utils.py", line 5, in <module>
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     from feature_pipeline import settings
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/tmp/venvx9zfic8n/lib/python3.9/site-packages/feature_pipeline/settings.py", line 44, in <module>
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/usr/lib/python3.9/pathlib.py", line 1316, in mkdir
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     self.parent.mkdir(parents=True, exist_ok=True)
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/usr/lib/python3.9/pathlib.py", line 1316, in mkdir
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     self.parent.mkdir(parents=True, exist_ok=True)
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/usr/lib/python3.9/pathlib.py", line 1316, in mkdir
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     self.parent.mkdir(parents=True, exist_ok=True)
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -   File "/usr/lib/python3.9/pathlib.py", line 1312, in mkdir
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO -     self._accessor.mkdir(self, mode)
[2023-07-10, 20:04:32 UTC] {process_utils.py:189} INFO - PermissionError: [Errno 13] Permission denied: '/home/hud'

So far from what I've read sudo chmod 777 ./logs ./plugins should solve any permission problems, but it didnt work here - any ideas? /home/hud/projects/energy-forecasting is the root folder.

Fail when run the pipeline because cannot install requiment.txt

Hello guy, I am running the pipeline in the first time following guide in https://github.com/iusztinpaul/energy-forecasting#run. When I click button RUN in the Airflow, the pipeline fail in first task: run_feature_pipeline. I located the log and it say:

[2023-10-07, 03:57:40 UTC] {process_utils.py:190} INFO - [notice] A new release of pip is available: 23.2 -> 23.2.1
[2023-10-07, 03:57:40 UTC] {process_utils.py:190} INFO - [notice] To update, run: /tmp/venvhe33wlw9/bin/python -m pip install --upgrade pip
[2023-10-07, 03:57:40 UTC] {taskinstance.py:1943} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/decorators/base.py", line 221, in execute
    return_value = super().execute(context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 395, in execute
    return super().execute(context=serializable_context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 192, in execute
    return_value = self.execute_callable()
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 594, in execute_callable
    prepare_virtualenv(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/python_virtualenv.py", line 99, in prepare_virtualenv
    execute_in_subprocess(pip_cmd)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/process_utils.py", line 171, in execute_in_subprocess
    execute_in_subprocess_with_kwargs(cmd, cwd=cwd)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/process_utils.py", line 194, in execute_in_subprocess_with_kwargs
    raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['/tmp/venvhe33wlw9/bin/pip', 'install', '-r', '/tmp/venvhe33wlw9/requirements.txt']' returned non-zero exit status 1.
[2023-10-07, 03:57:40 UTC] {taskinstance.py:1400} INFO - Marking task as FAILED. dag_id=ml_pipeline, task_id=run_feature_pipeline, execution_date=20231007T034621, start_date=20231007T035737, end_date=20231007T035740
[2023-10-07, 03:57:40 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 82 for task run_feature_pipeline (Command '['/tmp/venvhe33wlw9/bin/pip', 'install', '-r', '/tmp/venvhe33wlw9/requirements.txt']' returned non-zero exit status 1.; 69)
[2023-10-07, 03:57:40 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 1
[2023-10-07, 03:57:40 UTC] {taskinstance.py:2784} INFO - 0 downstream tasks scheduled from follow-on schedule check

As follow this bug, I think the problem come from venv create by Airflow. Have anyone met this issue and how can I fix it?

Hopsworks project name

Hi, you have mentioned to name the project as energy_consumption specifically over hops works but when trying to create a project with the mentioned name, I am getting message "This project name is already taken."

can you help me out here what can be done please?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.