intelligent-environments-lab / citylearn Goto Github PK

Official reinforcement learning environment for demand response and load shaping

License: MIT License

Python 99.88% Shell 0.12%

citylearn's Introduction

CityLearn

CityLearn is an open source Farama Foundation Gymnasium environment for the implementation of Multi-Agent Reinforcement Learning (RL) for building energy coordination and demand response in cities. A major challenge for RL in demand response is the ability to compare algorithm performance. Thus, CityLearn facilitates and standardizes the evaluation of RL agents such that different algorithms can be easily compared with each other.

Environment Overview

CityLearn includes energy models of buildings and distributed energy resources (DER) including air-to-water heat pumps, electric heaters and batteries. A collection of building energy models makes up a virtual district (a.k.a neighborhood or community). In each building, space cooling, space heating and domestic hot water end-use loads may be independently satisfied through air-to-water heat pumps. Alternatively, space heating and domestic hot water loads can be satisfied through electric heaters.

Installation

Install latest release in PyPi with pip:

pip install CityLearn

Documentation

Refer to the docs for documentation of the CityLearn API.

citylearn's People

Stargazers

Watchers

Forkers

francescofraternali canteli smottahedi pkj415 hq01 harshitavemula david-woelfle yangrui2015 anjukan daya1986 yangyangfu jasmineyas bingqingchen kejingjing88212 zahraghh rll2396 gabevera lampkang vinayk94 xuqianwen92 teenkevo phoenixera knu-system-software mzhuang1 srummeny armando-fandango shghazy guaika2333 zihhh-chen wuyou33 apigott dustin231 silvio-b junbozhao jagopyzhao emielmaes keshava walterzwang jfgf11 ydu0116 kirito-e2d khayatian ahmadsuleman mohsen-kalantar jie-jay qasimwani d-kold mingyu272 rakdol vermashresth gsn-180 hh30hh vegewenqi isthatasim somniferume evagorasmakridis anushaihalapathirana moratodpg yangzh1997 jianzuo nick-panaya rfarrey techthiyanes shengrenhou wwiiwwiiww jasonjewik xilinhan littwhite-wang mingzhe37 nehanoren blesslord colin-fox orait2021 tjindiamond jinghaow zongas2 yyystg jungyubaik mansur007 ntlamdut spmohanty jiahanxie353 shiyaoli95 tungom danwang9264 suleenwong erickcantu leee-p tashinahmed neu-able-lab asks12018 msiba cmhyett ludwigbald biodun zbitouzakaria txus anvelezec kmysen anushasheth

citylearn's Issues

SOC calculation in building observation property

Divisor in SOC should be capacity before any degradation for electrical_storage.

[BUG] Inconsistent update function

Issue Description

citylearn/agents/base.py has this call to update:
self.update(observations, actions, rewards, next_observations, done=done)
but
citylearn/agents/q_learning.py update has this definition:
def update(self, observations: List[List[float]], actions: List[List[float]], reward: List[float], next_observations: List[List[float]])
which does not have an done in the update which results in this error:

File ~/.conda/envs/citytest310c/lib/python3.10/site-packages/citylearn/agents/base.py:155, in Agent.learn(self, episodes, deterministic, deterministic_finish, logging_level)
    153 # update
    154 if not deterministic:
--> 155     self.update(observations, actions, rewards, next_observations, done=done)
    156 else:
    157     pass

TypeError: TabularQLearning.update() got an unexpected keyword argument 'done'

if you run examples/citylearn_rlem23_tutorial.ipynb with a current version of citylearn.

Expected Behavior

The example to work with a current version of citylearn.

Actual Behavior

The above error.

Steps to Reproduce

Use the code in examples/citylearn_rlem23_tutorial.ipynb with citylearn 2.1.0

Environment

CityLearn version: 2.1.0
Operating System: linux
Python version: 3.10

Possible Solution

Either remove the done=done from base.py, or add back the done parameter removed in 3b562b9

other information

I realize the example is targeted to version 1.8.0, but I am not sure how you would use TabularQLearning in version 2.1.0 without triggering this issue.

Thanks.

[FEATURE REQUEST] Ability to override internal auto-definition of action and observation space limits with user defined limits

Is your feature request related to a problem? Please describe.
The method of internally estimating action and observation space limits though generalized enough, does not always provide the best limits.

Describe the solution you'd like

observation and action space limits should be optional initialization parameters.

Describe alternatives you've considered
NIL

Additional context
NIL

Upload weight file in submission?

In last years competition, we were allowed to submit an optional file for model weight and policy params. There was no information on the same for this years' competition. Can we submit a pre-trained agent, i.e. include the weight file?

Add support for stable baselines3

As a CityLearn user, I want to be able to take advantage of stable baselines3 reliable implementations of RL algorithms to enable me easily evaluate my environment on a diverse set of algorithms and benchmark the performance of the algorithms.

Changes can be made to the environment as long as that the evaluation criteria below are met.

Acceptance Criteria

Setup works for the RL algorithms that make use of Box gym.space.
Setup works for n building environment when env.central_agent = True (single agent controls all buildings)
Setup works for n building environment when env.central_agent = False (independent multi-agent i.e. each building has its own agent and agents do no share information)
Setup does not disrupt the compatibility of the environment with CityLearn’s RBC, SAC, and MARLISA implementations in citylearn/agents.
The test_environment.py module runs without error.
The example.ipynb notebook runs without error.
The example.ipynb notebook provides an example implementation of using at least on of the Stable Baselines3 algorithms for n buildings in central and non-central agent scenarios.

References

MARLISA agent states

make sure net_electricity_consumption is always an active state when using MARLISA agents and that the agent algorithm is aware of the position of net_electricity_consumption in the list of observation values. Important since net_electricity_consumption needs to be identified as the predicted value for the internal regression model.

Question about state dimension

Hi, thank you for sharing this repo.

I was trying to experiment with the CityLearn environment and Marlisa agent, then I found the dimension of states are varied with different commands.

For example, while doing env.observation_space.shape[0] the return value is 91, however when I do a env.reset() the dimension of state is (28,9), I think the 9 is the building amount. Furthermore, if I save the states in the replay buffer of one single bulding, the dimension becomes 36.

I am quite confused, what was the dimension of states used in CityLearn challenge and Marlisa paper, etc.?

Battery degradation causing soc > 1.0

When the battery is already 100%, the capacity at the next time step will be less than the capacity at which it reached 100% because of degradation at previous time step. So, the normalized soc > 1.0. When calculating the max input/output power from the power curve at normalized soc > 1.0, it outputs the value as if soc << 1.0 (maximum output). This will be the case until the soc loss as a result of the loss coefficient brings the soc below the degraded capacity which will happen in a few of time steps.

Most obvious with a random action agent but an intelligent agent could learn the behavior overtime and just avoid sending large discharge actions when soc == 1.0

To fix the bug, make sure this line always evaluates 0.0 <= normalized_soc <= 1.0

Battery.energy_balance() applies efficiency losses twice

Battery.energy_balance() starts with a call to super().energy_balance().

In both calls, efficiency penalties are applied to the energy balance, so they are applied twice. I think that is wrong, but it should be an easy fix in the Battery module.

[BUG] SAC Agent normalization AssertionError

Issue Description

When i run:
"
from citylearn.citylearn import CityLearnEnv
from citylearn.agents.sac import SAC as RLAgent

dataset_name = 'baeda_3dem'
env = CityLearnEnv(dataset_name, central_agent=False, simulation_end_time_step=10)
model = RLAgent(env)
model.learn(episodes=2, deterministic_finish=True)
"

I get this:
"
obs: [-2.44929360e-16 1.00000000e+00 1.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 5.40640817e-01 8.41253533e-01
2.39775427e-01 -1.93590194e-08 6.18397155e-01 3.73208446e-01
0.00000000e+00 5.10551796e-01 0.00000000e+00 0.00000000e+00
4.57856874e-09 4.57856874e-09 4.57856874e-09 4.57856874e-09
4.95913909e-01 0.00000000e+00 0.00000000e+00 3.75607514e-01]
mean: None
std: None
"
with the trace back:
"---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in get_normalized_observations(self, index, observations)
230 try:
--> 231 return (np.array(observations, dtype = float) - self.norm_mean[index])/self.norm_std[index]
232 except:

TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'

During handling of the above exception, another exception occurred:

AssertionError Traceback (most recent call last)
in
5 env = CityLearnEnv(dataset_name, central_agent=False, simulation_end_time_step=10)
6 model = RLAgent(env)
----> 7 model.learn(episodes=2, deterministic_finish=True)
8

~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/base.py in learn(self, episodes, keep_env_history, env_history_directory, deterministic, deterministic_finish, logging_level)
139
140 while not self.env.done:
--> 141 actions = self.predict(observations, deterministic=deterministic)
142
143 # apply actions to citylearn_env

~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in predict(self, observations, deterministic)
188
189 if self.time_step > self.end_exploration_time_step or deterministic:
--> 190 actions = self.get_post_exploration_prediction(observations, deterministic)
191
192 else:

~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in get_post_exploration_prediction(self, observations, deterministic)
204 for i, o in enumerate(observations):
205 o = self.get_encoded_observations(i, o)
--> 206 o = self.get_normalized_observations(i, o)
207 o = torch.FloatTensor(o).unsqueeze(0).to(self.device)
208 result = self.policy_net[i].sample(o)

~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in get_normalized_observations(self, index, observations)
236 print('std:',self.norm_std[index])
237 print(self.time_step, self.standardize_start_time_step, self.batch_size, len(self.replay_buffer[0]))
--> 238 assert False
239
240 def get_encoded_observations(self, index: int, observations: List[float]) -> npt.NDArray[np.float64]:

AssertionError:
"

Expected Behavior

Please describe what you expected to happen.

Actual Behavior

Please describe what actually happened.

Steps to Reproduce

Please provide detailed steps to reproduce the issue.

Environment

CityLearn version: 2.0b2
Operating System: macOS
Python version: 3.8

Possible Solution

If you have any ideas for how to fix the issue, please describe them here.

Additional Notes

Please provide any additional information that may be helpful in resolving this issue.

question about how to integrate prediction and control model

I have a question about how to integrate prediction and control models together.

I think using a prediction model, we can measure the sensor input and control input, and then we can predict future energy consumption. For the control problem, we generate an optimal control decision based on the state observation. As we can see, there are two tracks, prediction and control:
https://www.aicrowd.com/challenges/neurips-2023-citylearn-challenge.

Based on my knowledge, I think the prediction task is to learn a simulator that could be used for RL training.

When I learned this demo:

from citylearn.agents.rbc import BasicRBC as RBCAgent
from citylearn.citylearn import CityLearnEnv, EvaluationCondition
import citylearn
dataset_name = 'citylearn_challenge_2022_phase_1'
env = CityLearnEnv(dataset_name, central_agent=True, simulation_end_time_step=1000)
model = RBCAgent(env)
model.learn(episodes=4)

## print cost functions at the end of episode
kpis = model.env.evaluate(baseline_condition=EvaluationCondition.WITHOUT_STORAGE_BUT_WITH_PARTIAL_LOAD_AND_PV)
kpis = kpis.pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
print(kpis)
print(citylearn.data.DataSet.get_names())

i think it directly trains the model on the dataset, I am a bit confused, should it build a prediction model first, then train the RL agent? I think it is directly trained on the dataset here: https://github.com/intelligent-environments-lab/CityLearn/tree/master/citylearn/data. So, do we really need a prediction model?

AttributeError

I have an error when I run example module for RBC and SAC showing AttributeError: type object 'CostFunction' has no attribute 'net_electricity_consumption'

When I run quickstart example it's run fine without any error.

Thank you

[BUG] CityLearnEnv.evaluate() raises AttributeError

Issue Description

Downloading the quickstart jupyter notebook resulted in an attribute error after running the second code cell. The error originates from the evaluate function, where, apparently, a building in the schema doesn't have the 'net_electricity_consumption_without_storage_and_partial_load' attribute.

Expected Behavior

Expected an output of the cost functions.

Actual Behavior

Got the following error:
CityLearnEnv.evaluate() raises AttributeError: 'Building' object has no attribute 'net_electricity_consumption_without_storage_and_partial_load'.

Steps to Reproduce

pip install CityLearn==2.0.0

from citylearn.citylearn import CityLearnEnv
from citylearn.agents.rbc import BasicRBC as RBCAgent

dataset_name = 'citylearn_challenge_2022_phase_1'
env = CityLearnEnv(dataset_name, central_agent=True, simulation_end_time_step=1000)
model = RBCAgent(env)
model.learn(episodes=1)

# print cost functions at the end of episode
kpis = model.env.evaluate().pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
display(kpis)

Environment

CityLearn version: 2.0.0
Operating System: Windows 11
Python version: 3.8.5

Possible Solution

Additional Notes

Tested it outside of jupyter notebook as well, and got the same error. I played around by testing some different agents, but didn't make a difference.

Changing the dataset to 'baeda_3dem' got rid of the error. This gave me the impression the error likely has to do with the schema.json.

However, given that the error happened through basic use of the citylearn package I wouldn't be surprised if the mistake was on my part.

[FEATURE REQUEST] RLlib single-agent and multi-agent wrapper

Is your feature request related to a problem? Please describe.
I want to be able to use RLlib library with CityLearn

Describe the solution you'd like

Wrapper that provides interface between RLlib and CityLearn
Examples in quickstart.ipynb to use single-agent and multi-agent wrappers

Describe alternatives you've considered
NIL

Additional context
NIL

compatibility issue with latest gym (0.26.1) and stable -baseline3 (2.0.0)[

Issue Description
I am encountering an error while running the code provided in the official CityLearn documentation. I have not modified a single line of the code, and I'm using the exact code snippet provided.

Expected Behavior
I expected the code to run without errors, as it's directly taken from the official CityLearn documentation.

Actual Behavior
I am facing the following error:
ValueError: not enough values to unpack (expected 2, got 1)

Steps to Reproduce
Install CityLearn (version 2.0b4), Stable Baselines 3 (version 2.0.0), and Gym (version 0.26.1).
Run the code provided in the official CityLearn documentation:(https://www.citylearn.net/quickstart.html)

Code:
from stable_baselines3.sac import SAC
from citylearn.citylearn import CityLearnEnv
from citylearn.wrappers import NormalizedObservationWrapper, StableBaselines3Wrapper

dataset_name = 'baeda_3dem'
env = CityLearnEnv(dataset_name, central_agent=True, simulation_end_time_step=1000)
env = NormalizedObservationWrapper(env)
env = StableBaselines3Wrapper(env)
model = SAC('MlpPolicy', env)
model.learn(total_timesteps=env.time_steps*2)

evaluate

observations = env.reset()

while not env.done:
actions, _ = model.predict(observations, deterministic=True)
observations, _, _, _ = env.step(actions)

kpis = env.evaluate().pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
display(kpis)

Environment
CityLearn version: 2.0b4
Operating System: Windows
Python version: 3.10

Possible Solution
I have tried various solutions and referred to the official documentation, but I am unable to find a compatible version combination that resolves this issue. I'm looking for guidance from the community.

Additional Notes
I'm following the instructions exactly as provided in the official documentation, so I'm puzzled as to why I'm encountering this issue. If anyone has experience with these libraries and can provide guidance or suggestions, I would greatly appreciate it.

Thank you for your time and assistance!

Question: Mapping discrete actions to continuous space

I've trained a PPO using 20 discrete actions (controlling electrical SOC) and I'm trying to explain what each action does, is there a way to map them back to the continuous space from [-1.0, 1.0]? I'm assuming 0 is a full charge and equivalent to 1.0, and 19 to -1.0, but how could I map out the intermediate values?

Thanks!

One question about Marlisa training.

Thank you for sharing the repo. I have a question about reproducing Marlisa result.
In the paper, it says:

MARLISA performed constrained random action exploration for the first 250 days of the simulation. Then, it performed an exploration-exploitation process using SAC to maximize the expected rewards and the entropy of the policy (for 300 more days). Finally, after 550 days into the simulation, MARLISA started to evaluate the stochastic policy deterministically (by choosing the mean value of the policy rather than sampling from it).

So, in order to reproduce Marlisa, I have to:

Set 'start_training': 6000 and exploration_period:6000 (250 days * 24 hours), 'safe_exploration':False since with True it is controlled by RBC
When timestep is 13200 (550th day), I should set 'is_evaluate' = True, so that Marlisa gives a deterministic policy

Am I right?

Why is max_heating defined in terms of cop_cooling and not cop_heating?

CityLearn/energy_models.py

Line 373 in 8245a54

    
           self.max_heating = min(max_electric_power, self.nominal_power)*self.cop_cooling[self.time_step]

max_cooling is defined as min(max_electric_power, self.nominal_power)*self.cop_cooling[self.time_step]. But max_heating is defined as same. Is it supposed to be cop_heating?

Error while running citylearn_sb3.py

on running the line
env = ss.pettingzoo_env_to_vec_env_v1(citylearn_pettingzoo_env)

I get the error as
AssertionError: observation spaces not consistent. Perhaps you should wrap with supersuit.aec_wrappers.pad_observations?

when i change to
env = ss.pad_observations_v0(citylearn_pettingzoo_env)

creating env does not throws any error

but on calling the model, it throws error as

File "C:\Users\anuj\Anaconda3\envs\city_challenge\lib\site-packages\stable_baselines3\common\vec_env\util.py", line 74, in obs_space_info
shapes[key] = box.shape

AttributeError: 'function' object has no attribute 'shape'

Anyone knows how we can train models from sb3 for citylearn

High ram usage

Hi, I am interested in this project and I have tried running main.py. I experience high ram usage that causes my whole computer to freeze. Is that normal? How do I make use of my cuda enabled GPU to compute? Is there a memory leak? Please advise thankyou. I am using Ubuntu 18.04.

[BUG]

Issue Description

In energy_model.py line 554, should be "assert 0 <= loss_coefficient <= 1, 'loss_coefficient must be >= 0 and <= 1.'"

Testing

As a CityLearn developer, I want to know when a functionality in CityLearn fails after a change has been made to the source code or its dependencies' versions so that I can easily debug the problem and release a new CityLearn version.

Acceptance Criteria

Tests for all methods in the CityLearn Python package.
Workflow in the workflow directory that runs tests each time a push is made to the master branch and returns an output that can be checked on to see failing tests.

References

Getting Started With Testing in Python.

[BUG] citylearn.reward_function.MARL

Hi, I think there is a bug in citylearn.reward_function.MARL.

Issue Description

Please provide a brief description of the issue.
When I use MARL reward function in file citylearn.reward_function as my reward function, bug appears at line 64 of this file.

Expected Behavior

It should give the maximum number between 0 and district_electricity_consumption.

Actual Behavior

TypeError: 'numpy.float64' object cannot be interpreted as an integer

Steps to Reproduce

Just replace the reward function to MARL, and then the bug appears.

Environment

CityLearn version: 1.8.0
Operating System: Ubuntu 22.04
Python version: 3.8.0

Possible Solution

Simply add parentheses inside the np.nanmax.

reward = np.sign(building_electricity_consumption)*0.01*building_electricity_consumption**2*np.nanmax((0, district_electricity_consumption))

[BUG]

Issue Description

I'm getting an error when I try to reproduce the examples/tutorial.ipynb operation.

error

Traceback (most recent call last):
File "E:\PycharmProjects\citylearn\tutorail.py", line 709, in
_ = tql_model.learn(episodes=tql_episodes)
File "E:\PycharmProjects\citylearn\venv\lib\site-packages\citylearn\agents\base.py", line 150, in learn
next_observations, rewards, done, _ = self.env.step(actions)
File "E:\PycharmProjects\citylearn\venv\lib\site-packages\gym\core.py", line 319, in step
return self.env.step(action)
File "E:\PycharmProjects\citylearn\venv\lib\site-packages\gym\core.py", line 456, in step
return self.env.step(self.action(action))
File "E:\PycharmProjects\citylearn\venv\lib\site-packages\gym\core.py", line 456, in step
return self.env.step(self.action(action))
File "E:\PycharmProjects\citylearn\venv\lib\site-packages\gym\core.py", line 380, in step
observation, reward, terminated, truncated, info = self.env.step(action)
File "E:\PycharmProjects\citylearn\venv\lib\site-packages\gym\core.py", line 380, in step
observation, reward, terminated, truncated, info = self.env.step(action)
ValueError: not enough values to unpack (expected 5, got 4)

Environment

The environment is the same as the tutorial configuration
python：3.9

Possible Solution

I tried to change the number of variables in base.py corresponding to the source code, but he gets another error.

What should I do?

Datasets of climate zone 1-4 missing files carbbon_intensity.csv

The datasets in Climate_zone 1-4 miss files about carbbon_intensity.csv; thus, the env can't be initialized by Climate_zone 1-4. And it said in the paper "MARLISA: Multi-Agent Reinforcement Learning with Iterative Sequential Action Selection for Load Shaping of Grid-Interactive Connected Buildings" that the datasets in climate zone 2A contain five years datasets. But in fact, there are no datasets of climate zone containing more than five years in this project. I am confused about the datasets used in papers. I would be appreciated if you can upload a dataset explanation document and missing files carbbon_intensity.csv.

[FEATURE REQUEST] Adding Vehicle batteries to the environment

Dear all,

First of all, thank you for creating and maintaining such interesting open source OpenAI Gym environment for MARL as a way to standardize development in the area. I'm currently working on a V2G optimization multi-agent architecture and while doing the state-of-the-art research I've come to find CityLearn. As far as I understand a significant set of assets are already implemented for the OpenAI Gym, including stationary batteries. But I think it would be interesting to add vehicle batteries and their specific modelation.

For example, adding specificities such as State of Charge (SOC) on arrival, requested SOC of EV at departure, requested departure hour, typical arrival and departure date time, maximum EV charger efficiency, among others. I think the citylearn.energy_model.Battery already models a big part of the batteries and so I think adding V2G to the environment would be a very interesting step forward.

Are there any plans to implement such elements ?

Best Regards,
Tiago Fonseca

[FEATURE REQUEST] Upgrade CityLearn to later Python version and gymnasium environment

Is your feature request related to a problem? Please describe.

I am unable to use CityLearn with it's current setup with newer versions of stable-baselines3 and RLlib.
Also later Python versions do not successfully install CityLearn given its current requirements, possibly limiting its reach and use in Colab for some users.

Describe the solution you'd like

Upgrade CityLearn to work with Python version as new as 3.12
Upgrade environment to gymansium environment

Describe alternatives you've considered
NIL

Additional context
NIL

[FEATURE REQUEST] EV and EV charger model, and EV dataset for simulating EV loads

Is your feature request related to a problem? Please describe.
Need to be able to consider EV loads for load V2G and G2V application.

Describe the solution you'd like
Integrate the work by @calofonseca in a future CityLearn release:

Creata a pull request to merge the latest stable branch in (CityLearnEVs) with the master branch in CityLearn.
Fill in PR template and make sure to run tests that the code works before merging with master branch.

Describe alternatives you've considered
NIL

Additional context
NIL

Experience bad reward

Hi, I am running the main.py. I printed the reward and observed that the agent is getting inconsistent rewards. Is that normal or I am missing something?

Reduce simulation and RL training time

As a CityLearn developer, I want to speed up the training of RL agents so that I can use fewer HPC resources for simulations, train for longer episodes and scale up my district size.

This enhancement only applies to the internally defined RL agents in CityLearn:

The citylearn.py, building.py and energy_model.py can also benefit from source code optimization for speed.

One approach can be to profile the simulation of the SAC agent example in example.ipynb.

Acceptance Criteria

Quantify the improvement in training between CityLearn v.1.4.4 and subsequent optimized version.

References

api error

When I run
python main.py
I get this error, seems the interface is broken, please fix it

Traceback (most recent call last):
File "/Users/matthewd/PycharmProjects/CityLearn/main.py", line 37, in
agents = Agent(**params_agent)
TypeError: init() got an unexpected keyword argument 'observation_spaces'

ModuleNotFoundError: No module named 'rewards'

Hi, when I create a citylearn environment: env = CityLearnEnv(schema='.data/citylearn_challenge_2022_phase_1/schema.json'), it will raise this error. Here is my citylearn installing instruction: pip install git+https://github.com/intelligent-environments-lab/CityLearn.git@citylearn_2022

SB3 example error: observation spaces not consistent

Hi,

I tried to run "citylearn_sb3.py" file, but recieved below error:

File "anaconda3/envs/citylearn/lib/python3.8/site-packages/supersuit/vector/markov_vector_wrapper.py", line 22, in __init__
    assert all(
AssertionError: observation spaces not consistent. Perhaps you should wrap with `supersuit.aec_wrappers.pad_observations`?

There might be a problem on supersuit version. Which version do you use?

Thank you.

Dimension error within the environment

Hi everyone,

I was doing some experimentation and I think I faced an issue that is outside of my control. When getting the reward from env.step method I'm getting this error:

ValueError: operands could not be broadcast together with shapes (8760,) (8761,)

It was raised when trying to compute the net_electricity_consumption function (the original) in the building.py line 344.

Could anyone give me a hint about how to solve this issue?

Thanks in advance!

Fatal issue in Central Agent

CityLearn/citylearn.py

Line 592 in b451f05

s.append(building.sim_results[state_name][self.time_step])

when using central agent, the line referenced above breaks the code because it can't recognize electrical_storage_soc state. When modifying the line above such that it takes that state name we have a state size of 102. But when doing env.reset(), we have a state size of 93. So, the electrical_storage_soc is missing from state, which is expected.
However, in building_loader, you have excluded electrical_storage_soc which explains why adding the same conditional fixes the bug to create the agent (bug comes from reset) but still the state space is 102 while as env.reset is still 93.
Since you understand the environment better, I must be missing something silly.

Please lmk if you find the issue.

Where can we find a winner solution for CityLearn Challenge 2022

Hi,

I tried to find the solution on this website. However, when I clicked view of the leaderboard winner and then clicked the REPO_URL has expired and I couldn't find any solution.

Where can we find a winner solution for CityLearn Challenge 2022 or solution for previous challenges? Or is there a competition solution paper available?

Thank you!!

[BUG] env.observation_names does not provide names for the entire observation space

Issue Description

The list of observations names is smaller than the observation space returned by the environment. My observation space has 31 elements, but I only have the names for 28, and don't know which are unnamed.

Expected Behavior

I expect the environment would provide the name of each observation/feature in the observation space, so there'd be a name for every feature in the observation space.

Actual Behavior

The environment does not provide the name of each observation/feature in the observation space, as there are fewer names than features

Steps to Reproduce

from citylearn.citylearn import CityLearnEnv
from citylearn.wrappers import NormalizedObservationWrapper, StableBaselines3Wrapper, DiscreteActionWrapper
from citylearn.data import DataSet

dataset_name = 'citylearn_challenge_2022_phase_1'
schema = DataSet.get_schema(dataset_name)
env = CityLearnEnv(schema, 
        central_agent=True, 
        buildings='building_1')
env = DiscreteActionWrapper(env)
env = NormalizedObservationWrapper(env)
env = StableBaselines3Wrapper(env)

print(len(env.observation_names)) #28
print(env.observation_space.shape[0])  #31

Environment

CityLearn version: 2.0b5
Operating System: Win 10
Python version: 3.10.12

Possible Solution

Sorry, no idea

Additional Notes

env.observation_names lists the observations which are active in the schema, but the env lists a larger observation space.

[FEATURE REQUEST] Create thermostat class

Is your feature request related to a problem? Please describe.
Need a way to define thermostat operation and schedules especially when there is occupant interaction.

Describe the solution you'd like
Create a thermostat class with basic functions like update setpoint, apply hold, sense occupancy, revert hold.

Describe alternatives you've considered
Implementing the thermostat logic directly in the building class when setpoint is updated but does not generalize well especially for custom implementations.

Additional context
NIL

[FEATURE REQUEST] Update the pandas version ? 1.3.5 was released in 2021

compatitable issue

when i run :

i created a python=3.11 environment, and i run :

pip install CityLearn

it will have the following issue:

Pip package not properly working

Good morning,
I am working on a fully automated integration of CityLearn with StableBaselines3 and other Gyms agents,

installing citylearn with pip:
pip install citylearn
i have found several problems and i'd like to request some help.

Problem in testing data

The folders on the "data" folder:

citylearn_challenge_2020_climate_zone_1
citylearn_challenge_2020_climate_zone_2
citylearn_challenge_2020_climate_zone_3
citylearn_challenge_2020_climate_zone_4
citylearn_challenge_2021
citylearn_challenge_2022_phase_1

have the wrong name on the weather.csv file, it should be weather_data.csv acording to the schema.json of the same folders.

Problem with observations,

Executing the environment created with

from citylearn.citylearn import CityLearnEnv
env = CityLearnEnv(schema="citylearn_challenge_2020_climate_zone_1")

There are NaN values at the end of every vector of the observation:

print(observation)
>>>
[
[1, 4, 9, 11.04, 16.74, 12.48, 7.76, 80.12, 55.33, 67.31, 85.33, 115.19, 164.94, 0.0, 88.31, 16.88, 413.68, 0.0, 450.45, 0.5453005045, 21.88, 40.35, 70.91, 11.447196, 1.0, 0.0, 0.0, 0.0, nan], 
[1, 4, 9, 11.04, 16.74, 12.48, 7.76, 80.12, 55.33, 67.31, 85.33, 115.19, 164.94, 0.0, 88.31, 16.88, 413.68, 0.0, 450.45, 0.5453005045, 22.94, 33.11, 11.41, 1.0, 0.0, 0.0, 0.0, nan], 
[1, 4, 9, 11.04, 16.74, 12.48, 7.76, 80.12, 55.33, 67.31, 85.33, 115.19, 164.94, 0.0, 88.31, 16.88, 413.68, 0.0, 450.45, 0.5453005045, 21.05, 43.22, 7.61, 0.9813014190740775, 0.0, 0.0, nan],
[1, 4, 9, 11.04, 16.74, 12.48, 7.76, 80.12, 55.33, 67.31, 85.33, 115.19, 164.94, 0.0, 88.31, 16.88, 413.68, 0.0, 450.45, 0.5453005045, 20.92, 41.61, 1.55, 3.815732, 0.9813886486921994, 0.0, 0.0, nan], 
[1, 4, 9, 11.04, 16.74, 12.48, 7.76, 80.12, 55.33, 67.31, 85.33, 115.19, 164.94, 0.0, 88.31, 16.88, 413.68, 0.0, 450.45, 0.5453005045, 22.57, 41.81, 16.8, 2.3848325, 1.0, 0.0, 0.0, 0.0, nan],
[1, 4, 9, 11.04, 16.74, 12.48, 7.76, 80.12, 55.33, 67.31, 85.33, 115.19, 164.94, 0.0, 88.31, 16.88, 413.68, 0.0, 450.45, 0.5453005045, 21.95, 43.22, 12.8, 1.907866, 1.0, 0.0, 0.0, 0.0, nan],
[1, 4, 9, 11.04, 16.74, 12.48, 7.76, 80.12, 55.33, 67.31, 85.33, 115.19, 164.94, 0.0, 88.31, 16.88, 413.68, 0.0, 450.45, 0.5453005045, 23.18, 41.62, 12.3, 0.9266858922799234, 0.0, 0.0, 0.0, nan],
[1, 4, 9, 11.04, 16.74, 12.48, 7.76, 80.12, 55.33, 67.31, 85.33, 115.19, 164.94, 0.0, 88.31, 16.88, 413.68, 0.0, 450.45, 0.5453005045, 22.94, 41.5, 21.0, 0.9904502885063199, 0.0, 0.0, 0.0, nan],
[1, 4, 9, 11.04, 16.74, 12.48, 7.76, 80.12, 55.33, 67.31, 85.33, 115.19, 164.94, 0.0, 88.31, 16.88, 413.68, 0.0, 450.45, 0.5453005045, 23.1, 41.84, 10.1, 1.0, 0.0, 0.0, 0.0, nan]
]

and that lead to the following error if i use it with stable_baselines3 PPO

Traceback (most recent call last):
  File "\citylearn_playground\citylearn_sb3.py", line 70, in <module>
    agent.learn(total_timesteps=100)
  File "\citylearn_playground\venv\lib\site-packages\stable_baselines3\ppo\ppo.py", line 317, in learn
    return super().learn(
  File "\citylearn_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 262, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "\citylearn_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 172, in collect_rollouts
    actions, values, log_probs = self.policy(obs_tensor)
  File "\citylearn_playground\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "\citylearn_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 590, in forward
    distribution = self._get_action_dist_from_latent(latent_pi)
  File "\citylearn_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 606, in _get_action_dist_from_latent
    return self.action_dist.proba_distribution(mean_actions, self.log_std)
  File "\citylearn_playground\venv\lib\site-packages\stable_baselines3\common\distributions.py", line 153, in proba_distribution
    self.distribution = Normal(mean_actions, action_std)
  File "\citylearn_playground\venv\lib\site-packages\torch\distributions\normal.py", line 56, in __init__
    super(Normal, self).__init__(batch_shape, validate_args=validate_args)
  File "\citylearn_playground\venv\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (1, 9)) of distribution Normal(loc: torch.Size([1, 9]), scale: torch.Size([1, 9])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan]], device='cuda:0')

Process finished with exit code 1

Diferences with repository and pip

The package installed with pip is considerably diferent with the one found on this repository. Is the pip package not "oficial" ?
what is the recomended way to install CityLearn for usage?

Anex

this is the class i am using to transform the action and observation space for gym, if there is an oficial or better way, i whould like to ask for a bit of help

from citylearn.citylearn import CityLearnEnv
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
import gym
import numpy as np

class EnvCityGym(gym.Env):

    def __init__(self, env):
        self.env = env
        self.num_envs = 1
        # get the number of buildings
        self.num_buildings = len(env.action_spaces)
        self.act_lows = np.array([])
        self.act_highs = np.array([])
        for uid in env.buildings_states_actions:
            #print(env.buildings_states_actions[uid]["actions"])
            #print(sum(env.buildings_states_actions[uid]["actions"].values()))
            self.act_lows = np.concatenate((self.act_lows, np.array([-1] * sum(env.buildings_states_actions[uid]["actions"].values())),))
            self.act_highs = np.concatenate((self.act_highs, np.array([1] * sum(env.buildings_states_actions[uid]["actions"].values())),))
        # define action and observation space
        #log.debug(self.act_lows)
        #log.debug(self.act_highs)
        self.action_space = gym.spaces.Box(low=self.act_lows,
                                           high=self.act_highs, dtype=np.float32)

        self.obs_lows = np.array([])
        self.obs_highs = np.array([])
        for obs_box in env.observation_spaces:
            self.obs_lows = np.concatenate((self.obs_lows, obs_box.low))
            self.obs_highs = np.concatenate((self.obs_highs, obs_box.high))

        self.observation_space = gym.spaces.Box(low=self.obs_lows, high=self.obs_highs,
                                                dtype=np.float32)

    def reset(self):
        obs = self.env.reset()

        observation = self.get_observation(obs)

        return observation

    def get_observation(self, obs):
        obs_list = np.array([])
        for obs_box in obs:
            obs_list = np.concatenate((obs_list, obs_box))
        print(obs)
        #obs_list = np.nan_to_num(obs_list) #This removes the nan from the observation but does not solve the issue
        print(obs_list)
        return obs_list

    def step(self, action):
        action = [[act] for act in action]
        obs, reward, done, info = self.env.step(action)
        observation = self.get_observation(obs)
        return observation, sum(reward), done, info

    def render(self, mode='human'):
        return self.env.render(mode)



if __name__ == "__main__":
    import torch as th

    th.autograd.set_detect_anomaly(True)

    city_env = CityLearnEnv(schema="citylearn_challenge_2020_climate_zone_1")
    env = EnvCityGym(city_env)
    agent = PPO(policy=MlpPolicy, env=env)
    agent.learn(total_timesteps=100)


    state = env.reset()
    done = False

    action, coordination_vars = agent.select_action(state)
    while not done:
        next_state, reward, done, _ = env.step(action)
        action_next, coordination_vars_next = agent.select_action(next_state)
        coordination_vars = coordination_vars_next
        state = next_state
        action = action_next

    env.cost()

Building electrical_storage capacity resetting

When resetting the env, the capacities of electrical_storage of each building doesn't reset.

[FEATURE REQUEST] Is there a way to access the planned states and actions?

Is your feature request related to a problem? Please describe.
I am trying to test a pricing algorithm using CityLearn platform. I am wondering if there are ways to access the planned states and actions of each building, or whether the environment has planned states and actions?

Describe the solution you'd like

Describe alternatives you've considered
I have considered randomly initializing random planned states for each building if the environment doesn't have one already.

Additional context
Thank you!

[BUG] stable-baselines3 version incompatible with gym environment

Issue Description

ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_23588\1249709747.py in
8 env = StableBaselines3Wrapper(env)
9 model = SAC('MlpPolicy', env)
---> 10 model.learn(total_timesteps=env.time_steps*2)
11
12 # evaluate

e:\Anaconda\envs\pc\lib\site-packages\stable_baselines3\sac\sac.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps, progress_bar)
311 tb_log_name=tb_log_name,
312 reset_num_timesteps=reset_num_timesteps,
--> 313 progress_bar=progress_bar,
314 )
315

e:\Anaconda\envs\pc\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps, progress_bar)
304 reset_num_timesteps,
305 tb_log_name,
--> 306 progress_bar,
307 )
308

e:\Anaconda\envs\pc\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py in _setup_learn(self, total_timesteps, callback, reset_num_timesteps, tb_log_name, progress_bar)
287 reset_num_timesteps,
288 tb_log_name,
--> 289 progress_bar,
290 )
291

e:\Anaconda\envs\pc\lib\site-packages\stable_baselines3\common\base_class.py in _setup_learn(self, total_timesteps, callback, reset_num_timesteps, tb_log_name, progress_bar)
422 assert self.env is not None
423 # pytype: disable=annotation-type-mismatch
--> 424 self._last_obs = self.env.reset() # type: ignore[assignment]
425 # pytype: enable=annotation-type-mismatch
426 self._last_episode_starts = np.ones((self.env.num_envs,), dtype=bool)

e:\Anaconda\envs\pc\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py in reset(self)
74 def reset(self) -> VecEnvObs:
75 for env_idx in range(self.num_envs):
---> 76 obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx])
77 self._save_obs(env_idx, obs)
78 # Seeds are only used once

e:\Anaconda\envs\pc\lib\site-packages\stable_baselines3\common\monitor.py in reset(self, **kwargs)
81 raise ValueError(f"Expected you to pass keyword argument {key} into reset")
82 self.current_reset_info[key] = value
---> 83 return self.env.reset(**kwargs)
84
85 def step(self, action: ActType) -> Tuple[ObsType, SupportsFloat, bool, bool, Dict[str, Any]]:

e:\Anaconda\envs\pc\lib\site-packages\shimmy\openai_gym_compatibility.py in reset(self, seed, options)
239 )
240
--> 241 obs = self.gym_env.reset()
242
243 if self.render_mode == "human":

e:\Anaconda\envs\pc\lib\site-packages\gym\core.py in reset(self, **kwargs)
321 def reset(self, **kwargs) -> Tuple[ObsType, dict]:
322 """Resets the environment with kwargs."""
--> 323 return self.env.reset(**kwargs)
324
325 def render(

e:\Anaconda\envs\pc\lib\site-packages\gym\core.py in reset(self, **kwargs)
377 def reset(self, **kwargs):
378 """Resets the environment, returning a modified observation using :meth:self.observation."""
--> 379 obs, info = self.env.reset(**kwargs)
380 return self.observation(obs), info
381

ValueError: not enough values to unpack (expected 2, got 1)

I ran quickstart.ipynb without making any changes and it throws this error, may I ask why?

the site of quickstart.ipynb is https://github.com/intelligent-environments-lab/CityLearn/blob/master/examples/quickstart.ipynb

Expected Behavior

Environment

CityLearn version:2.0b3
Operating System:win11
Python version:1.7.16

@kingsleynweye Kingsley Nweye

Question regarding evaluation

Hello, thanks for providing this environment.

I took part in the 2022 challenge and I am looking at the 2021 environment. I'm confused on the right way to evaluate the agent. In the 2022 challenge we have the env.evaluate() function, in the 2021 environment it seems that the env.cost() method is used to evaluate the agent (from the challenge page), but it doen't seem to exist anymore.

Do we have to use the cost functions in the citylearn.cost_function file and implement our own cost function ?

Thank you !

[BUG] custom_module not found when trying to update scheme after defining CustomReward class

Issue Description

After I define my own reward function and updating the scheme in the source code following this link: https://www.citylearn.net/overview/reward_function.html?highlight=custom_module

I keeps getting the "ModuleNotFoundError: No module named 'custom_module'" error when defining env = CityLearnEnv(dataset_name, central_agent=True, simulation_end_time_step=WINDOW*14)

Expected Behavior

Please describe what you expected to happen.

Actual Behavior

Please describe what actually happened.

Steps to Reproduce

After following the above link, I run:

dataset_name = 'citylearn_challenge_2022_phase_1'
WINDOW = 24
env = CityLearnEnv(dataset_name, central_agent=True, simulation_end_time_step=WINDOW*14)

This would give the error

Environment

CityLearn version: 2.0b2
Operating System: macOS 13.2.1 (22D68)
Python version:3.8.3

Possible Solution

If you have any ideas for how to fix the issue, please describe them here.

Additional Notes

Please provide any additional information that may be helpful in resolving this issue.

MARLISA Error

I am using CiryLearn, MARLISA example with my data, and I faced an error as you can see in the picture! I checked all my input data and there is no NaN and infinity!
So, the NaN or infinity value should be produced during the marlisa.py line 300 as seen in the error list.
I worked on it solving this error for several days but I could not! Could you please help me with this? What can I fix this error?

)

CityLearn competitin missing info on the online evaluation

Hi I noticed that for the competition we are given building_info and observation_space in the online evaluation you could for example see it here

I added this in the OrderEnforcingWrapper

class OrderEnforcingAgent:
    """
    Emulates order enforcing wrapper in Pettingzoo for easy integration
    Calls each agent step with agent in a loop and returns the action
    """
    def __init__(self):
        self.num_buildings = None
        self.agent = UserAgent()
        self.action_space = None
    
    def register_reset(self, observation):
        """Get the first observation after env.reset, return action""" 
        action_space = observation["action_space"]
        self.action_space = [dict_to_action_space(asd) for asd in action_space]
        obs = observation["observation"]
        self.num_buildings = len(obs)

        print(f'building_info_in_keys : {"building_info" in observation.keys()}')

if I look at the logs I see this

Warning: Gym version v0.24.1 has a number of critical issues with `gym.make` such that environment observation and action spaces are incorrectly evaluated, raising incorrect errors and warning . It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1
/srv/conda/envs/notebook/lib/python3.8/site-packages/sklearn/linear_model/_least_angle.py:34: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  method='lar', copy_X=True, eps=np.finfo(np.float).eps,
/srv/conda/envs/notebook/lib/python3.8/site-packages/sklearn/decomposition/_lda.py:28: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  EPS = np.finfo(np.float).eps
/srv/conda/envs/notebook/lib/python3.8/site-packages/sklearn/ensemble/_gb.py:33: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  from ._gradient_boosting import predict_stages
2022-08-15 21:48:35.016 | INFO     | aicrowd_gym.clients.base_oracle_client:register_agent:210 - Registering agent with oracle...
2022-08-15 21:48:35.020 | SUCCESS  | aicrowd_gym.clients.base_oracle_client:register_agent:226 - Registered agent with oracle
building_info_in_keys : False
Device:cpu
building_info_in_keys : False
Device:cpu
building_info_in_keys : False
Device:cpu
building_info_in_keys : False
Device:cpu
building_info_in_keys : False
Device:cpu
building_info_in_keys : False
Device:cpu

Will they will be added at some point in time or we will have to use only what is passed?
Thank you in advance.

[BUG] The day_type returned by get_periodic_observation_metadata function in building.py might be wrong

Issue Description

Hi, I think the day_type returned by get_periodic_observation_metadata function in building.py might be wrong

Current Code

def get_periodic_observation_metadata(self) -> Mapping[str, int]:
    r"""Get periodic observation names and their minimum and maximum values for periodic/cyclic normalization.

    Returns
    -------
    periodic_observation_metadata : Mapping[str, int]
        Observation low and high limits.
    """

    return {
        'hour': range(1, 25), 
        'day_type': range(1, 9), 
        'month': range(1, 13)
    }

Possible Solution

According to the description written in document, day of week ranging from 1 (Monday) through 7 (Sunday). I think the correct range of day_type should be 1 to 7. And it can be verified since I cannot find 8 in day_type when I check the building csv files.

    def get_periodic_observation_metadata(self) -> Mapping[str, int]:
        r"""Get periodic observation names and their minimum and maximum values for periodic/cyclic normalization.

        Returns
        -------
        periodic_observation_metadata : Mapping[str, int]
            Observation low and high limits.
        """

        return {
            'hour': range(1, 25), 
            'day_type': range(1, 8), 
            'month': range(1, 13)
        }

TypeError for when running model.learn(episodes=1, deterministic_finish=True)

Issue Description

When I am trying to reproduce the results from Quickstart section - Decentralized-Independent SAC, I got TypeError with detailed message:

"---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
8 env = CityLearnEnv(dataset_name, central_agent=False, simulation_end_time_step=1000)
9 model = RLAgent(env)
---> 10 model.learn(episodes=1, deterministic_finish=True)
11
12 # print cost functions at the end of episode

~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in predict(self, observations, deterministic)
183
184 if self.time_step > self.end_exploration_time_step or deterministic:
--> 185 actions = self.get_post_exploration_prediction(observations, deterministic)
186
187 else:

~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in get_post_exploration_prediction(self, observations, deterministic)
199 for i, o in enumerate(observations):
200 o = self.get_encoded_observations(i, o)
--> 201 o = self.get_normalized_observations(i, o)
202 o = torch.FloatTensor(o).unsqueeze(0).to(self.device)
203 result = self.policy_net[i].sample(o)

~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in get_normalized_observations(self, index, observations)
224 def get_normalized_observations(self, index: int, observations: List[float]) -> npt.NDArray[np.float64]:
225 # try:
--> 226 return (np.array(observations, dtype = float) - self.norm_mean[index])/self.norm_std[index]
227 # except:
228 # # print("unable to get normalized observations")

TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'"

Expected Behavior

Function get_normalized_observations is supposed to normalize the observations.

Actual Behavior

I printed norm_mean and norm_std within get_normalized_observations from sac.py and found that they are all None, from the initialization.

Steps to Reproduce

I just copied the code from Quickstart section:
from citylearn.citylearn import CityLearnEnv
from citylearn.agents.sac import SAC as RLAgent

dataset_name = 'citylearn_challenge_2022_phase_1'
env = CityLearnEnv(dataset_name, central_agent=False, simulation_end_time_step=1000)
model = RLAgent(env)
model.learn(episodes=2, deterministic_finish=True)

Environment

CityLearn version: 1.8
Operating System: OS
Python version: 3

Possible Solution

If you have any ideas for how to fix the issue, please describe them here.

Additional Notes

Please provide any additional information that may be helpful in resolving this issue.