Giter Site home page Giter Site logo

nrel / flasc Goto Github PK

View Code? Open in Web Editor NEW
28.0 11.0 18.0 108.24 MB

A rich floris-driven suite for SCADA analysis

Home Page: https://nrel.github.io/flasc/

License: BSD 3-Clause "New" or "Revised" License

Python 2.82% Jupyter Notebook 97.18%
scada data analysis model validation floris wind farm control

flasc's Introduction

FLORIS-Based Analysis for SCADA Data (FLASC)

Note: Further documentation is available at https://nrel.github.io/flasc/

Description

FLASC provides a rich suite of analysis tools for SCADA data filtering, analysis, wind farm model validation, field experiment design, and field experiment monitoring.

The repository is centrally built around NRELs in-house FLORIS wind farm model, available at https://github.com/nrel/floris. FLASC also largely relies on the energy ratio, among others, to quantify wake losses in synthetic and historical data, to perform turbine northing calibrations, and model parameter estimation.

For technical questions or concerns, please email [email protected].

pages-build-deployment

Automated tests & code coverage

License

Installation

We recommend installing this repository in a separate virtual environment. After creating a new virtual environment, clone this repository to your local system and install it locally using pip. The command for this is pip install -e flasc.

If installing for develop, follow the developer install instructions

Documentation

Documentation is provided via the included examples folders as well as online documentation.

Engaging on GitHub

FLASC leverages the following GitHub features to coordinate support and development efforts:

  • Discussions: Collaborate to develop ideas for new use cases, features, and software designs, and get support for usage questions
  • Issues: Report potential bugs and well-developed feature requests
  • Projects: Include current and future work on a timeline and assign a person to "own" it

Your feedback is crucial in this environment, as it helps identify areas for enhancement, resolve issues, and ensure the project meets the needs of its users. By sharing your insights and suggestions, you contribute to the project's evolution and success.

Generally, the first entry point for the community will be within one of the categories in Discussions. Ideas is a great spot to develop the details for a feature request. Q&A is where to get usage support. Show and tell is a free-form space to show off the things you are doing with FLORIS.

License

BSD 3-Clause License

Copyright (c) 2024, Alliance for Sustainable Energy LLC, All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

flasc's People

Contributors

bartdoekemeijer avatar bayc avatar christiannvaughn avatar misi9170 avatar paulf81 avatar rafmudaf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flasc's Issues

conversion problem between pd.Timestamp and np.datetime64

I ran into a bug in flasc/optimization.py while running example a_06a where i got the following error on line 112 in find_timeshift_between_dfs:
while current_time < max_time:

TypeError: Cannot compare tz-naive and tz-aware timestamps

pd.Timestamp objects generally have a reference to a timezone, while np.datetime64 objects do not.
In the first iteration this is not a problem as both current_time and max_time are defined from the timestamp in the dataframe which in the example are pd.Timestamp objects.
The problem arises in the while loop after current_time gets assigned a np.datetime64 in line 205.

I could fix this by changing the lines 109-112 to:

    current_time = np.datetime64(min_time)
    output_list = []
    print('Estimating required timeshift for df1.')
    while current_time < np.datetime64(max_time):

Change behavior for dealing with NaN/Null measurements

Change behavior for dealing with NaN/Null measurements

Code changed inadvertently in pandas->polars conversion. Propose to submit a pull request which reverts to former behavior for dealing with NaNs on turbines within a set of reference or test turbines. Specifically:

  • Change to default behavior that computes an average so long as any turbine is not nan
  • Allow an optional behavior to require all
  • Add 1-2 tests to ensure this behavior remains consistent going forward

Discussed in #119

Originally posted by Bartdoekemeijer September 5, 2023
The recent Polars merge in develop has added a specific change to how NaNs are dealt with.

When FLASC relied on Pandas, we defined pow_ref as the nanmean of the reference turbines, thus where NaNs were ignored and we just took the average of all non-NaN values. The same holds for pow_test. This means that if one reference/test turbine had a NaN value, that turbine would just be ignored and the power reference/test was just calculated with the remaining turbines.

Now with Polars, FLASC enforces that the measurements of all reference turbines must be valid. The same thing holds for all test turbines. This is achieved in energy_ratio.py and in energy_ratio_output.py through the command:

df_ = df_.filter(pl.all_horizontal(pl.col(ref_cols + test_cols + ws_cols + wd_cols).is_not_null()))

I think this is particularly (and unnecessarily) restrictive. Note that it's fine to keep NaNs in our energy ratio analysis, so as long as those NaNs are mirrored in the FLORIS predictions, so that we're still comparing apples to apples. This in the case by default in interpolate_floris_from_df_approx, see:

mirror_nans=True,

My solution to this would be to either remove the is_not_null() filter command in energy_ratio.py and in energy_ratio_output.py, or just turn it into an option that is disabled by default (my preference). I could probably be persuaded to enable it by default.

What do you think, @paulf81, @misi9170?

problem with default value in calc_floris_approx_table()

There is the function calc_floris_approx_table() in flasc/floris_tools.py which takes ti_array as an input value.
The default input value is set to None (so is the value in the example a_07a, where i encountered the problem)

But the function as it is does not work without at least one value in the ti_array

[Feature]: Re-organize FLASC examples slightly

Description

FLASC examples are currently into root-level folders examples_artificial_data and examples_smarteole, but then also some files needed for these examples are within flasc/examples, which breaks a bit the convention of the self-contained examples folder in floris. Propose to re-organize a little bit for clarity.

Smaller point might be to number the notebooks in examples_artificial_data to clarify which should be run first

Related URLs

No response

[BUG] return_index_mapping option needed in df_downsample

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

A recent feature that updated df_downsample() removed the return_index_mapping flag and the corresponding data_indices output, breaking this functionality for some users.

Expected Behavior

df_downsample() should take return_index_mapping as a flag and, when this flag is True, return an array data_indices that maps indices in the high resolution input data to indices in the downsampled output data.

Steps To Reproduce

No response

Environment

- OS:
- pip environment (can be retrieved with `pip list`):

Anything else?

No response

[Feature]: Reduce precision of dataframes

Description

FLASC often works with large amounts of data. This becomes particularly apparent if the user is working with 1Hz data rather than 10-minute averages. To reduce file size and memory usage, dataframes can be formatted to an appropriate precision.

Namely:

  • Almost all floats can be saved as 32-bit floats rather than 64-bit floats, since the additional precision will drown in measurement noise anyway.
  • According to @paulf81 strings are better saved as categories or chars, not just objects
  • True/False columns should be saved as booleans, not integers

This can generally reduce the size of the dataframes by typically 50%, sometimes more. A simple function can be written and added to dataframe_manipulations to reduce the precision of any dataframe.

Related URLs

No response

[Feature]: Rework examples around open Smarteole database

Description

A dataset from a wake steering campaign has been made publically available. This data is provided by Engie from the Wind Farm Control campaign at Sole du Moulin Vieux and is available here:

https://zenodo.org/record/7342466

Further @ejsimley points us to code on how to download data from zenodo:

https://github.com/charlie9578/OpenOA/blob/feature/cubico-projects/examples/project_Cubico.py

Dataset should not need much filtering but can still probably show some pre-processing syntax, however, best opportunities are for showing SCADA / FLORIS analysis using real data .

@Bartdoekemeijer wanted to flag you on this to bring

Related URLs

No response

[BUG] <smarteole example notebook 06 code cell 8 wd col name>

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

in 06 example notebook in folder examples_smarteole, cell 8


AttributeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_11804\4103650034.py in ?()
1 # Limit the data to this region
----> 2 df_scada = df_scada[(df_scada.wd_smarteole > (start_of_offset - 20)) &
3 (df_scada.wd_smarteole < (end_of_offset + 20))]

c:\ProgramData\miniconda3\envs\flasc_tests\Lib\site-packages\pandas\core\generic.py in ?(self, name)
6200 and name not in self._accessors
MicrosoftTeams-image

6201 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6202 ):
6203 return self[name]
-> 6204 return object.getattribute(self, name)

AttributeError: 'DataFrame' object has no attribute 'wd_smarteole'

copilot suggests the column name might have been changed to just 'wd' from 'wd_smarteole'

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- OS:
- pip environment (can be retrieved with `pip list`):

Anything else?

No response

[Feature]: Turn examples into a separate repository

Description

The way we work with FLASC is that we usually create one Git repository for each wind farm that we do analysis on. Taking an example repository for the Shell-owned OWEZ wind farm, we typically have a folder structure like:

wfc_owez
  - tests/
      - __init__.py
      - test_wfc_owez.py
  - wfc_owez/
      - __init__.py
      - wfc_owez_cc_floris_model.yaml
      - wfc_owez_gch_floris_model.yaml
      - models.py
      - sql_interface.py
      - windrose.py
  - raw_data_processing/
      - a_00_initial_download.py
      - a_01_to_common_format_df.py
      - a_02_basic_filters.py
      - a_03_check_sensor_stuck_faults.py
      - a_04_wspowercurve_filtering.py
      - a_05a_plot_faults_with_layout.py
      - a_05b_cross_compare_wd_measurement_calibrations.py
      - a_06a_determine_timeshift_datasources.py
      - a_06b_apply_timeshift_dfs.py
      - a_07a_estimate_wd_bias_per_turbine.py
      - a_07b_wd_bias_to_df.py
      - a_08_plot_energy_ratios.py
  - energy_ratio_analysis/
      - energy_ratio_in_turbine_array.py
      - energy_ratio_vs_wd.py
      - estimate_heterogeneity_from_energy_ratios.py
  - yield_calculations/
      - calculate_aep.py
  - .gitignore
  - setup.py
  - setup.cfg

Behind the scenes, we have often copied over this folder structure from an existing wind farm repository to create new repositories, but this has been hidden from the public. Our current examples folder insufficiently represents this structure and can make it complicated for new users to build well-structured repositories and analyses.

I would like to create a template repository like this, flasc_windfarm_example, which contains this folder structure and readily includes the examples. This would either completely replace the examples folder in flasc or supplement it. When creating a new repository, one can clone this repository and adjust it accordingly for the user's wind farm. This also gives us much greater freedom in the examples that we build. Specifically, since these repositories are Python installable with pip, it allows us to build classes and functions inside the models.py library that modularly construct, for example, FlorisInterface objects with different combinations of turbine curtailment. For example, consider functions that if we plug in a timestamp it'll return the right FlorisInterface object (e.g., due to noise-reduced operation at certain times in the day).

If we were to additionally make flasc available in PyPI, we can include any particular version of flasc in the requirements in the setup.py and the right version for that repo will automatically be installed for new users starting work on that repo.

@misi9170 and @paulf81, what do you think? And would this replace or supplement the current example folders?

Related URLs

No response

Add templates to issues

The issues are missing the helpful templates that exist on the FLORIS repo, this issue is just to mark we should add them

Decide on binning conventions for FLASC

Description

I was hoping we could use this issue to make a decision on binning conventions in FLASC. Or maybe we have one already and I just need to stick to it! But just in case, I had a proposal that can be incorporated within PR #102 where we use the following conventions everywhere:

  • Conventions apply to binning ws, wd, power, and time
  • If you supply a minimum value, a maximum value, and a step size, the binning assumes you gave the left most and right most edge, and the labels will be the center of every bin (even in time)
  • Or should time be an exception? If it is I prefer left-edge labels, although in FLASC I think we've used right-edge
  • Every function that accepts a min, max and step should alternatively accept the bin_edges directly, but the bin_labels will yet be created as before
  • bins will not be an input because it can be more ambiguous than bin_edges
  • overlapped bins accomplished by reducing the bin size to some fraction of the original request and using a rolling operation instead of a groupby. Amount of overlap specified using an input like percentage overlap

Interested to hear what you all think!

Related URLs

No response

[BUG] Failing tests due to pandas pd.concat

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The test df_time_operations.py is failing due to type errors. The issue has been traced to a pd.concat call in time_operations.py where a column is trying to be added, but instead, rows are added with the column.

Expected Behavior

The test should pass.

Steps To Reproduce

Run pytest df_time_operations.py in the tests directory.

Environment

- OS: Ubuntu 20.04
- pip environment (can be retrieved with `pip list`):
  - pandas 1.4.4
  - numpy 1.21.5
  - scipy 1.8.0

Anything else?

None.

[Feature]: Include methods for calculating total uplift from a controller change

Description

The pull request #107 will convert the current set of energy ratio functions, but not yet incorporate methods for calculating total uplift (as opposed to the included uplift by wind direction). Total uplift is calculated in #66 but the plan is to reproduce equivalent calculations using polars after #107 is merged. This issue is meant to capture the intention to have this included for milestone

Related URLs

No response

[BUG] Dependency conflict between FLASC and OpenOA

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

FLASC currently requires pandas>=1.5. However, OpenOA requires pandas>=0.23.4,<1.3. This raises errors when trying to pip install packages that have a dependency on flasc. In practice, however, I do not experience any issues in FLASC when forcing OpenOA to use pandas==1.5.

Expected Behavior

There should be no dependency conflict.

Steps To Reproduce

  1. Create a new environment.
  2. pip install FLASC
  3. pip install a wind farm package (e.g., using cookiecutter) with a dependency on flasc

Environment

- OS:Ubuntu 20.04
- pip environment (can be retrieved with `pip list`):
`pandas==1.5.3`, `flasc==1.1`, `openoa==2.3`

Anything else?

No response

[Feature]: FLASC on PyPI

Description

I would like to have flasc available through PyPI such that we can include it and have it auto-installed through the setup.py Python module file, e.g., for #48.

Also, it would allow users to download flasc with a single call,
pip install flasc
pip install flasc==1.0

@rafmudaf were you the one to do this for floris? How complicated is this process?

Related URLs

No response

Update requirements

I think that the current requirements in the setup file and separate requirements doc were set by @Bartdoekemeijer to specify a not-too recent package for a few modules to avoid an issue in pandas. I think with the most recent version of pandas we can revert to specifying a minimum version, rather than a precise version or max version. @misi9170 would you take a look?

[Feature]: Increase usage of OpenOA

Description

OpenOA seems like a very mature tool for basic SCADA analysis and operations. FLASC currently has a set of tools that do very similar things as OpenOA and has minor dependencies on OpenOA, but could be much better integrated. Putting in this as a placeholder/reminder.

Related URLs

https://github.com/NREL/OpenOA

Take advantage of KATS outlier detection

Description

The KATS (Kits to Analyze Time Series) (https://facebookresearch.github.io/Kats/) has some functionality for outlier detection that could be applied in our filtering steps without much trouble. Not urgent but wanted to put a pin in the idea we might see if using something like OutlierDetector described here:

https://github.com/facebookresearch/Kats/blob/main/tutorials/kats_202_detection.ipynb

Couldn't be a useful thing to do someday

Related URLs

No response

[Feature]: Add improved SQL interaction to FLASC

Description

Earlier versions of the polars conversion included some improvements to SQL interfacing which sped up downloads and made the GUI data explorer more responsive. A little out of scope for the current #107 but should be brought back in seperately.

Related URLs

No response

[BUG] Polars versioning

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Having a small issue with polars versioning where version above 0.19.5 are not able to run on my mac for reasons related to:

pola-rs/polars#12454

And starting with 0.19.13, CI is failing for a polars-test fail, but I can't debug locally because of the earlier issue. Would like to find a longer term solution, but for now setting the requirement to 0.19.5 and I think we can revisit in a few weeks.

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- OS:
- pip environment (can be retrieved with `pip list`):

Anything else?

No response

[Feature]: Detect and correct step changes in northing calibration

The northing calibration tools in FLASC currently optimize for a single wind direction bias across the entire history. It would be great to have a method for detecting when there has been a step change in the northing calibration, as can happen when a yaw encoder resets, and a time-dependent northing calibration correction.

Steps could be:

  • Detect periods of steady nothing error and step changes in northing error (possibly using a single bias for all time stamps and outlier detection tools, see #36)
  • Determine biases for each identified period (using existing tools where possible, possibly by separating periods into distinct dataframes to apply the northing calibration methods)
  • Recombine dataframes if necessary

[Feature]: Estimate wind speed from power

Description

Not an immediate need, but I think an interesting feature to have would be to use the ws/cp tables in floris to provide a wind speed estimator based on reference power. These is a problem of ambiguity in higher wind speeds but could either mix in ws data, or limit to below-rated, but think it could be helpful sometimes so wanted to open an issue to log the idea

Related URLs

No response

[BUG] Dealing with turbine curtailment in FLASC

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Currently, FLASC marks a turbine's individual measurements as faulty if that turbine is in curtailed operation. That ensures that all non-NaN measurements of that turbine follow the nominal power and thrust curve. However, measurements of turbines neighboring a curtailed turbine remain unaffected in the FLASC analyses. In reality, their power production can be significantly affected because curtailed turbines shed shallower wakes.

If we compare this to FLORIS model predictions which do not account for curtailment, we are not drawing fair comparisons.

Expected Behavior

A solution to this would be to mark the measurements of all turbines as faulty if at least one turbine is in curtailed operation. However, if multiple turbines regularly demonstrate curtailment, this could potentially eliminate a large share of your dataset. A less restrictive solution could be to mark only the measurements of the curtailed turbine and all turbines downstream of it as NaNs.

Another solution is to create FLORIS models for each curtailment situation and generate FLORIS predictions for the SCADA in accordance with which turbines are downrated and which moments in time. This requires knowing when which turbines are in curtailed operation.

It's worthwhile thinking about the best way to approach this.

Steps To Reproduce

N/A

Environment

N/A

Anything else?

No response

[BUG] Documentation is out of date and misleading

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

FLASC documentation has fallen behind and has not been updated. Description of examples is inaccurate/incomplete, and some documentation (particularly pertaining to the energy_ratio module) need to be updated to reflect changes made in #80.

Expected Behavior

Updated documentation (and possibly README) to reflect current status of code.

[Feature]: Dealing with faulty (non-circular) averaging in SCADA

Description

FLASC currently assumes that the wind direction is the underlying SCADA has been correctly time-averaged, i.e., dealing with 360 deg wrapping for angular variables like wind direction and nacelle heading. However, this is often not the case when working with 10-minute averaged data from an operational asset. It would be great if FLASC could find these obvious outliers and mark them as faulty.

Related URLs

No response

Remove confidence interval label from energy ratio plot

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Currently, energy ratio plots are overly cluttered by labeling of confidence interval bands, opening this issue to prevent labeling of them

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- OS:
- pip environment (can be retrieved with `pip list`):

Anything else?

No response

[BUG] Remove codecov bric-a-brac

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Codecov is always causing tests to fail with pull requests for issues I think pertaining more to github.com/codecov than anything we could fix, think we should find a way to yank it out

Expected Behavior

Not getting a bunch of failing messages from codecov

Steps To Reproduce

No response

Environment

- OS:
- pip environment (can be retrieved with `pip list`):

Anything else?

No response

[Feature]: Quick script to generate artificial data

Description

Certain examples need the artificial data to exist, which requires running some python scripts in on the floris_setup folder and some notebooks in raw_processing. Propose to offer a shortcut script which runs both the script and the notebooks in the correct sequence to fast-track setting up flasc

Related URLs

No response

Change in df_movingaverage wrt to time column

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

This may or not be what we intended so wanted to raise this as a possible issue and see if we prefer it as is or if we want to revert. One behavior change when we updated df_movingaverage is that in the former code, the code allowed for either there was a column called 'time' and if not, then assume 'time' was the index. Now it is required 'time' is column, which caused an error in some processes here.

image

The above image shows the responsible change.

I was thinking though I couldn't decide, maybe it is nice to standardize on time is a column, and any process which wants it to be an index is responsible for making a local version that way, so don't assume we wanted to change it back. Just wanted flag for discussion. @Bartdoekemeijer and @misi9170 what do you think?

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- OS:
- pip environment (can be retrieved with `pip list`):

Anything else?

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.