pvcaptest / pvcaptest Goto Github PK

View Code? Open in Web Editor NEW

17.0 9.0 12.0 4.58 MB

Collection of functions and jupyter notebooks to partially automate running a capacity test following ASTM E2848.

License: MIT License

Python 100.00%

pvcaptest's Introduction

pvcaptest

Latest Release

What is pvcaptest?

pvcaptest is an open source python package created to facilitate capacity testing following the ASTM E2848 standard. The captest module contains a single class, CapData, which provides methods for loading, visualizing, filtering, and regressing capacity testing data. The module also includes functions that take CapData objects as arguments and provide summary data and capacity test results.

Documentation and examples are available on readthedocs including full examples in jupyter notebooks that can be run in the browser without installing anything.

Installation

These instructions assume that you are new to using conda and python, if that is not the case skip to the last section for users familiar with conda and pip.

The recommended method to install pvcaptest is to create a conda environment for pvcaptest. Installing Anaconda or miniconda will install both python and conda. There is no need to install python separately.

Easiest Option:

Download and install the anaconda distribution. Follow the default installation settings.
On Windows go to the start menu and open the Anaconda prompt under the newly installed Anaconda program. On OSX or Linux open a terminal window.
Install pvcaptest by typing the command conda install -c conda-forge pvcaptest and pressing enter. The -c conda-forge option tells conda to install pvcaptest from the conda forge channel.

This will install the pvcaptest package in the base environment created when Anaconda is installed. This should work and provide you with jupyter notebook and jupyer lab to run pvcaptest in. If you think you will use your Anaconda installation to create and maintain additional environments, the following process for creating a stand alone option is likely a better option.

Better long term option:

If you do not already have it installed, download and install the anaconda distribution or miniconda.
Go to the project github page and download the project source to obtain a copy of the environment.yml file. Click the green code button and click 'Download ZIP'.
On Windows go to the start menu and open the Anaconda prompt under the newly installed Anaconda program. On OSX or Linux open a terminal window. Note the path in the prompt for the next step. On Windows this should be something like C:\Users\username\.
Unzip and move the environment.yml file to the folder identified by the path from the previous step.
In your Anaconda prompt or terminal type conda env create -f environment.yml
and hit enter. Wait for a few seconds while conda works to solve the environment. It should ask you if you want to proceed to install new packages including pvcaptest. Type y enter to proceed and wait for conda to finish installing pvcaptest and the other packages.
Once the installation is complete conda will print out a command for activating the new environment. Run that command, which should be like conda activate captest_env.

The environment created will include jupyter lab and notebook for you to use pvcaptest in. You can start these using the commands jupyter lab or jupyter notebook.

See the conda documentation for more details on using conda to create and manage environments.

Install for users familiar with conda and pip:

Conda install into an existing environment:

conda install -c conda-forge pvcaptest

If you prefer, you can pip install pvcaptest, but the recommended approach is to use the conda package.

Note: The conda package is named pvcaptest and the pip package is named captest. The project is moving to consistent use of the pvcaptest name, but the package name on pypi will remain as captest.

pvcaptest's People

Contributors

Stargazers

Watchers

Forkers

frivollier kyraannemoore bt- jforbess cwhanse rlpappan matsuobasho andylam598 cb-molina bramhoex kandersolar colmboreilly

pvcaptest's Issues

Update document string for cptest.filter_sensors

The document string does not include the parameter reg_trans. Update document string to include explanation of this parameter.

Need to be able to update translation after automatic mapping fails

Once the automatic translation has failed, need a way to fix mistaken translations.

Related to #27 making the __set_trans() method public.

```capdata.load_data``` - can adding data to path be made optional

@bt-,
capdata.load_data seems to be the convenience wrapper to add data to the class instance. However, it seems to be adding a data to the prefix attempting to redirect the user to reference data from the .tests\data folder.

I would suggest if it is not already there to modify this method or add a different method for the user to add data to the class instance. Here is an example:

import pandas as pd
import os
from pathlib import Path
from captest import capdata as pvc


# Define file path to measured data
sample_data_folder = Path(r'C:\Users\xxxx\Documents\Python Scripts\pvcaptest\tests\data')
measured_data_path = sample_data_folder / 'example_meas_data.csv'

das = pvc.CapData('das')

das.load_data(fname='example_meas_data,csv', source='AlsoEnergy')

das.review_column_groups()

Export Filtered Datasets

Is there a way to export the filtered das and sim datasets to csv files? Are they stored as specific variables?

Need to change abbreviated terms to full words as much as possible

The abbreviations in the code base for functions and variables are too difficult to follow and generally unpythonic.

Big picture: don't shorten things by removing vowels, and don't use multiple abbreviations in a single name, especially when they are super overloaded like reg - register, regular, ? ? ? or inv - inverter? no? crazy.

Recommended updates:

flt = filter
trans = translation or translate, as appropriate
inv = inverted (not sure what this means in terms of translation keys anyway)
cntg_eoy = end_of_year_wrap? could do contiguous_end_of_year, but not sure that's clearer
df_beg = df_start (matches end/df_end then)
ix_ser = ix_series
mnth = month (why take out 1 letter?)
boy = start_of_year (could not parse this without code)
loop_cnt = loop_count
cp_results = captest_results
cprat = captest_ratio
cprat_cpval = ?
reg_fml = regression_fml or regression_formula
reg_trans = regression_variable_translation_table (this is crazy long, but something between the two, maybe regression_vars_table. Maybe regression_vars_dict, and dict should be used most places trans is used?)

arguable:
irrRC_balanced = irr_rc_balanced (this is per the PEP8 standard, but I don't feel strongly about it)

Increase Flexibility and improve CapData.plot()

Add keyword arguments for plot width and height
Add keyword for number of columns in grid
Add argument to control order of returned plots and/or plot subset of plots
Add argument to use to combine plots to make new ones, i.e. poa + ghi irradiance or mod + amb temp
Add hover tooltip and remove legend. Entire grid of plots fails to show if one of the legend on one of the plots becomes too tall. Fixed temporarily by increasing plot height. Fix by removing legend and adding hover tooltip or letting figure plot_height set dynamically.
Color data by category, so irradiance data is always the same color or from the same palette of colors
Remove limitation that prevents plotting categories of data that have more than 10 columns, like inverters

Add warning when trying to filter_irr on irr group with more than one column of data

It might be helpful to generate a more clear warning when trying to filter irradiance prior to aggregating measurements. Currently, if there are two or more POA measurements within the POA group the filter_irr method will fail, but the errors in the stack traceback do not point to the issue very directly.

Convert from ipynb to html

Is there an easy way to convert the Jupyter Notebook to a HTML format as a report once the program is fully run?

Improvements to CapTest.scatter_hv()

Add 'both' option to create scatter of measured ('das') and modeled ('sim') data.
Set colors by data plotted so it is consistent. For example das is blue, sim is orange and those colors are maintained in the plot of both.

CapData.plot fails if index is not named 'Timestamp'

Plotting method fails if the index of the dataframe used to create the column data source is not named 'Timestamp'.

Sample Data

Hello,
Thanks a lot for working on this package. It meets some of my needs really well. I wanted to play around and get a better understanding of the code, but I am unable to figure out what format the sample datasets need to be. I understand you expect .csv files, however could you please add some example files to the repo ? It will be easier to follow through with the examples and try a few things.

Thanks
uday

rep_cond documentation - method options

Complete after or in conjunction with #14. Documentation for rep_cond method should be updated to make it more clear how combinations of arguments can/should be used.

translation dictionary mis-categorizes GlobHor and GlobInc PVsyst variables

The translation dictionary incorrectly categorizes GlobInc as a GHI measurement instead of POA.

Create set of examples for rep_cond method

Complete after or in conjunction with #14 and #15. Document string for method should possibly reference examples.

Write test for capdata.filter_pvsyst

filter_pvsyst method does not have a test. Test should check that method is flexible enough to run on available pvsyst columns without failing.

CapData.trans readability

Improve the readability of the CapData translation dictionary, so it is immediately clear which values are assigned to which keys.

environment.yml "scikit-learn" version outdated

The outdated version number caused conflicts while trying to install. By removing the version number in the yml file, conda was able to get the best version available and the install went smoothly.

CapData.load_data() duplicate timestamps

Consider updating method to remove duplicate timestamps when importing separate files with overlapping dates.

filter_time passes datetime to loc

When using the filter_time method to select a period centered around a passed test date, the beginning and end of the period are passed to the pandas dataframe loc method as datetimes and should be passed as strings.

test_date = pd.to_datetime(test_date)
...
offset = pd.DateOffset(days=days/2)
start = test_date - offset
end = test_date + offset
...
flt_cd.df = flt_cd.df.loc[start:end, :]

Shortcut for manipulation of translation dictionary to add plots to CapData.plot()

Currently the translation dictionary and the trans_keys attributes of CapData objects can be extended to include additional combinations of the groupings provided by the translation dictionary, for example plotting GHI and POA irradiance on the same plot for comparison. It would be convenient to have syntax for quick way to create new keys in the translation dictionary that combines the default values and also updates the trans_keys attribute.

filter_clearsky does not update summary data

The update_summary decorator needs to be applied to the filter_clearsky method.

Captest summary method display default arguments

Adjust the captest summary method to show either 'defaults' or actually display the default arguments when a filtering function is called with only the default arguments. The current behavior is to display nothing in the arguments column.

filter_outliers fails when there are NAs

The filter_outliers function fails when there are NaNs present in either the POA or Power columns returned by the CapData.rview(['poa', 'power']).

Fitting the elliptic envelope from scikit-learn requires the array passed to be free from NaNs.
This line is where the error occurs: clf_1.fit(X1)

One way to work around this is to filter the DataFrame in self.data_filtered to remove columns where the poa or power contain NaNs, then fit the Elliptic Envelope, and then filter the data_filtered DataFrame for outliers.

Adding the following two lines to the function as shown below is a rough implementation of this approach.
XandY_nona = XandY.dropna()
df = self.data_filtered.loc[XandY_nona.index, :]

Adjusted function:

    XandY_nona = XandY.dropna()
    X1 = XandY_nona.values

    if 'support_fraction' not in kwargs.keys():
        kwargs['support_fraction'] = 0.9
    if 'contamination' not in kwargs.keys():
        kwargs['contamination'] = 0.04

    clf_1 = pvc.sk_cv.EllipticEnvelope(**kwargs)
    clf_1.fit(X1)

    if inplace:
        df = self.data_filtered.loc[XandY_nona.index, :]
        self.data_filtered = df[clf_1.predict(X1) == 1]

This approach should include an argument to switch the additional dropna filtering and it should ideally be recorded in the filtering summary separately. This could be accomplished by calling the below in filter_outliers or asking the user to run this before calling filter_outliers.
meas.filter_custom(pd.DataFrame.dropna, subset=['poa_col', 'power_col'])

Contribution Guide

@bt-,
I wasn't sure if I missed it, but I didn't see a contribution guide. Could you point me to one ? This work aligns with my needs and I would like to contribute. Please let me know if you are interested and I would like to start taking some issues.

Thanks
Uday

Computation of spectral corrections in E2848-13 Appendix A

I recently started https://github.com/markcampanelli/pvfit-m, an open source Python project to ensure that the computations of spectral mismatch correction factors are valid, consistent, and transparent. I am reaching out to this project regarding the usage of such computations in E2848-13 (esp. Appendix A). I would like to know of any requirements that are currently unmet, and I would be happy to implement functions that would be valuable here. The project is in a very early stage, and I intend to add significant documentation and, once the API is sufficiently stable, publish a v1.0 module on PyPI under the MIT license.

Make CapData__set_trans() normal method i.e. not hidden

The CapData__set_trans method should be unhidden. It is frequently useful to use outside of a load_data call after adding new categories or adjusting column names.

plot method hover value should be data value not mouse coordinate

The hovertooltip for the bokeh gridplot generated by the plot method currently displays y-axis values as mouse coordinates rather than actual data values. These values are generally close, but it would be much better to display the actual underlying data.

hover.tooltips = [("Name", "$name"),
                  ("Datetime", "@Timestamp{%D %H:%M}"),
                  ("Value", "$y"),]

The $y needs to be changed to use the @ syntax to reference a column of data. This may take some work to relabel columns within the method to handle unwieldy column names.

A bigger issue in making this change may be dealing with the number of different columns displayed in the gridplots generated by the plot method and how to reference them.

Implement unstable irradiance filter

Implement and write tests for the unstable irradiance filtering method as defined by the ASTM standard in section 9.1.7.1

Cod for initial implementation below.

def unstable_irr(df, thresh=0.05):
    std = np.std(df.values)
    mean = np.mean(df.values)
    return std <= (mean * thresh)

class CapData:
#     @pvc.update_summary
    def filter_unstable_irr(self, freq, agg_method='mean', inplace=True, thresh=0.05, flt=True):
        """Remove periods of unstable irradiance as outlined in ASTM E2848-13 9.1.7.1.

        Calculates the standard deviation and mean for high frequency irradiance values and
        resamples the data to a lower frequency and removes time periods where the standard
        deviation is greater than threshold percent of the mean.

        Parameters
        ----------
        freq : str
            Pandas offset alias to use in resampling data to a lower frequency. Ex: '5min'
            for 5 minute data.
        agg_method : function, str, list or dict
            Passed directly to pandas resampler object. Refer to pandas
            documentation for options.
        thresh : float, default 0.05
            Percentage, as decimal, of the mean used to calculate the threshold
            for the std deviation above which the time interval is removed.
        inplace : bool, default True
            When true updates CapTest object flt_sim or flt_das attribute.
        flt : bool, default True
            When true the unstable irradiance periods are removed.  When false a new
            column 'unstable_irr_flt' of booleans is added to the returned dataframe.
        """

        flt_cd = self._CapTest__flt_setup('das')

        poa_df = flt_cd.rview('poa')

        poa_df_groups = poa_df.resample(freq, label='left', closed='left')
        df_groups = flt_cd.df.resample(freq, label='left', closed='left')

        irr_filter = []
        for key in poa_df_groups.groups.keys():
            irr_filter.append(unstable_irr(poa_df_groups.get_group(key), thresh=thresh))

        if flt:
            flt_cd.df = df_groups.agg(agg_method)[irr_filter]
        else:
            df_resampled = df_groups.agg(agg_method)
            df_resampled['unstable_irr_flt'] = irr_filter
            flt_cd.df = df_resampled

        if inplace:
            self.flt_das = flt_cd
        else:
            return flt_cd

Update get_tz_index use of pd.tz_localize

The captest function get_tz_index is using the deprecated errors option of the pandas index method tz_localize.

time_source = time_source.tz_localize(loc['tz'], ambiguous='infer', errors='coerce')

It looks like this should be updated to use nonexistant='NaT' instead.

time_source = time_source.tz_localize(loc['tz'], ambiguous='infer', nonexistant='NaT')

Remove "valueWarning" from type_defs

Bounds_check option of the CapData.__series_type cause plot method to break up sensor types into separate plots if one (some) sensors of the sensor type have value warnings and others don't.

Best way to fix may be to adjust so that bounds check does not affect the type definitions, but there is still an option to warn about outliers and/or remove outliers.

The valuesError insert also causes issues with the assignment of a color to each column that was added in v0.3.2.

Is there a bifaciality factor included in the current logic ?

Since a lot of the current market deployments are bifacial modules, does the current method account for bifaciality anywhere ?
If yes, which method ?
If no, can we start looking at adding it ?

Add an option to rename column headers to load_das and load_pvsyst

The current approach to dealing with column headers is to keep the column headers in the raw imported data, parse them for categorization, and use the translation dictionary for access (usually avoid needing to use raw column names).

This is ok, but the column names from common DAS providers currently need to be modified for captest to set the translation dictionary correctly, which defeats the purpose.

To solve this I am going to develop function(s) for renaming columns prior to setting the translation dictionary. The pandas dataframe rename method takes a dictionary or function, so this approach will have the inherent flexibility for user specified names.

Implement the daily statistic filter method

Implement and test method to filter data based on a comparison of a threshold value to a daily statistic. Draft implementation below:

import operator
operators = {'<': operator.lt,
             '>': operator.gt,
             '<=': operator.le,
             '>=': operator.ge,
             '==': operator.eq}

def filter_daily_stat(capdata, threshold, reg_var='poa', agg_func='max', compare='>='):
    """
    Filter on daily statistic.
    threshhold : float
        Value to compare daily statistics against.
    reg_var : string, default 'poa'
        Regression variable name- only 'power', 'poa', 't_amb', 'w_vel'
    agg_func : string
        String aggregation function- min, max, mean etc.
    compare : string
        Any of <, >, <=, >=, == to use in boolean comparison of daily statistic against threshold.
    """
    daily_stat = capdata.rview(reg_var).groupby(pd.TimeGrouper('D', closed='left', label='left')).agg(agg_func)
    daily_bool = operators[compare](daily_stat, threshold)
    hourly_bool = daily_bool.reindex(index=capdata.df.index, method='ffill')
    return capdata.df[hourly_bool.values]

Create a filter to remove or select a list of days

Implement something similar to the below function as a method of the CapData class to select or remove the data for a list of days.

def filter_days(df, days, drop=False):
    """
    Select timestamps for days passed.

    Parameters
    ----------
    days : list
        List of days to select.
    """
    ix_all_days = None
    for day in days:
        ix_day = df[day].index
        if ix_all_days is None:
            ix_all_days = ix_day
        else:
            ix_all_days = ix_all_days.union(ix_day)
    if drop:
        return df.loc[df.index.difference(ix_all_days), :]
    else:
        return df.loc[ix_all_days, :]

rep_cond documentation when using irr_bal method

When using the irr_bal option of the rep_cond method the reporting temperature and wind speed returned are means. This should be more clearly communicated in the rep_cond doc string.

irrRC_balanced documentation

Documentation string should be re-written and expanded. This function contains an algorithm that iteratively solves for a reporting irradiance that meets the common language of finding an irradiance where 40% of points within the filtered dataset are above the reporting irradiance. This algorithm does not directly calculate the 60% percentile of the irradiance data at any point.

CapTest.summary() column labels

pts_before_filter column label is incorrect; it should be pts_after_filter.

filter_time pd.DateOffset will fail with odd number of days

When using the filter_time method to select a time period of a number of days centered on a test date the method used the pandas function DateOffset to create an offset to add and subtract from the test date. This function will fail if the days argument is an odd number because pd.DateOffset expects an integer and the days is divided by 2.

Changing

offset = pd.DateOffset(days=days/2)

offset =pd.DateOffset(days = days // 2)

should fix the bug.

Column categorization based on data in column

With some contextual information it should be possible to determine the measurement type from the data within each column. The column type determined this way could be checked against a type determined from the column header to validate the type or identify potential mis-labelling issues for investigation.

Need to work on the feature engineering and determine the minimum amount of contextual information required.

Thoughts on implementation:

Possibly use the scikit-learn implementation of a decision tree algorithim or build the boolean decision tree by hand
Differentiating irradiance and power will require knowing what the AC nameplate of the system is as otherwise these two datasets have the same trends for a plant without performance issues.
Filter out very distant outliers to remove data points with erroneous values due to DAS issues prior to calculating features

Ideas for features:

timestamp of daily min/max values
daily averages for daytime and night time (will need to calculate sunrise and sunset using pv_lib)
percent difference from predicted clear sky irradiance (would need to identify clear days)
max daily length of time values do not change (within a band) for detecting irradiance measurements (readings at night have minimal fluctuation)
other summary statistics- mean, mode, percentiles, max, min

Update scatter_hv to allow removal of column of date strings

Update scatter_hv and other methods to allow removing:

ix_ser = all_sensors.index.to_series()
all_sensors['index'] = ix_ser.apply(lambda x: x.strftime('%m/%d/%Y %H %M'))  # noqa: E501

They are only there to create an index of dates as strings, which are used in some of the plotting methods, specifically scatter_hv for the hover tooltip. It should be possible to update the holoviews code to use the index itself instead of a separate column of date strings.

Update to work with pvlib v0.7

There are a few changes introduced in pvlib 0.7 that introduced errors in pvcaptest.

The Travis CI requirements file did not have the pvlib version pinned prior to discovering the issues with pvlib 0.7 when Travis ran for name changing pull request #43. I temporarily pinned pvlib to 0.6.3 to avoid errors in #43.

I would like to create a separate branch to address updates for pvlib compatibility.

Looks like the failure is caused by the name changes in the SAM inverter file. Change this line to

cec_inverter = cec_inverters['ABB__MICRO_0_25_I_OUTD_US_208__208V_']

I think pvlib v0.7 will work for you then.

Originally posted by @cwhanse in #43

Allow filter_pvsyst to accept PVsyst shade outputs other than FShdBm

Currently the captest.filter_pvsyst method will filter out time intervals with shade based on the PVsyst output FShdBm. FShdBm is a fractional number (0 to 1). PVsyst will also output shading factors that are power lost and it would be convenient for this method to handle those PVsyst outputs as well.

Clarify capacity ratio and tolerance

Improve the documentation around the captest tolerance and the capacity ratio presented in the results methods- cp_results and res_summary. It should be clear that the ratio is the measured regression result divided by the predicted regression result without any modification to either and that the tolerance is applied to this ratio.

set_plot_attributes fails with more than 4 columns in a group

The column type color dictionary only has 4 colors for each type, so there is an index error if there are more than 4 columns within a translation dictionary group.

check_all_perc_diff_comb fails due to missing abs in perc_difference function

If both values checked by the perc_difference function it returns a negative. This issue causes incorrect results from the check_all_perc_diff_comb function, which is testing that the percent difference returned from perc_difference is less than the passed threshold value.

Current code:

def perc_difference(x, y):
    """
    Calculate percent difference of two values.
    """
    if x == y == 0:
        return 0
    else:
        return abs(x - y) / ((x + y) / 2)

Scipy version in yaml

environment.yml explicitly specifies scipy=1.2.1. However, I'm running with scipy v. 1.7.1 and seems to be fine.

Should it be
scipy>=1.2.1 instead?

Create tests for captest rep_cond method

The rep_cond method is missing tests for most of the functionality it provides. Tests should cover all potential reasonable ways to call method.

Rewrite installation section of docs

The installation section of the documentation should be updated to be more clear and explicit and to provide direction.

section with detailed directions for new users that are not familiar with pip, conda, environments, terminals, etc.
short section for users familiar with conda
short section at the end for pip install options

#39 needs to be resolved before publishing new doc directing users to conda install.

Update `filter_pvsyst` to handle underscores

One of the updates to PVsyst added underscores to the inverter loss variables (IL Pmin etc.).

filter_pvsyst should be updated to handle PVsyst inverter loss output variable names with and without underscores.

implement temperature correction method

Implement method to apply an adjustment to measured DC power data for temperature. Draft method below:

        def apply_temp_correction(self, power_temp_coeff, column, output_col=None):
            """
            Apply temperature correction to measured power data.

            power_temp_coeff : float
                Module power temperature coefficient as percent per degree celsius.
                Ex. -0.36
            column : str
                Name of column to apply the temperature correction to.
            output_col : str, default None
                Name to use for new column with temperature corrected data added do dataframe. By default appends '_temp_adj' to the name of the column that is adjusted.
            """
            if output_col is None:
                new_col_name = column + '_temp_adj'
            else:
                new_col_name = output_col
    #         amb_temp_column = self.trans[self.reg_trans['t_amb']]
            meas.df_flt[new_col_name] = ((1 + (power_temp_coeff / 100) * (25 - self.rview('t_amb').iloc[:, 0]))) * meas.df_flt[column]
            meas.df[new_col_name] = meas.df_flt[new_col_name]