aguaclara / aguaclara_research Goto Github PK

a python package containing research tools for AguaClara

License: MIT License

Python 100.00%

aguaclara_research's Introduction

`aguaclara`

aguaclara is a Python package developed by AguaClara Cornell and AguaClara Reach for designing and performing research on AguaClara water treatment plants. The package has several main functionalities:

DESIGN of AguaClara water treatment plant components
MODELING of physical, chemical, and hydraulic processes in water treatment
PLANNING of experimental setup for water treatment research
ANALYSIS of data collected by ProCoDA (process control and data acquisition tool)

Installing

The aguaclara package can be installed from Pypi by running the following command in the command line:

pip install aguaclara

To upgrade an existing installation, run

pip install aguaclara --upgrade

Using `aguaclara`

aguaclara's main functionalities come from several sub-packages.

Core: fundamental physical, chemical, and hydraulic functions and values
Design: modules for designing components of an AguaClara water treatment plant
Research: modules for process modeling, experimental design, and data analysis in AguaClara research

To use aguaclara's registry of scientific units (based on the Pint package), use from aguaclara.core.units import u. Any other function or value in a sub-package can be accessed by importing the package itself:

Example Usage: Design

import aguaclara as ac
from aguaclara.core.units import u

# Design a water treatment plant
plant = ac.Plant(
    q = 40 * u.L / u.s,
    cdc = ac.CDC(coag_type = 'pacl'),
    floc = ac.Flocculator(hl = 40 * u.cm),
    sed = ac.Sedimentor(temp = 20 * u.degC),
    filter = ac.Filter(q = 20 * u.L / u.s)
)

Example Usage: Core

# continued from Example Usage: Design

# Model physical, chemical, and hydraulic properties 
cdc = plant.cdc
coag_tube_reynolds_number = ac.re_pipe(
    FlowRate = cdc.coag_q_max,
    Diam = cdc.coag_tube_id,
    Nu = cdc.coag_nu(cdc.coag_stock_conc, cdc.coag_type)
)

Example Usage: Research

import aguaclara as ac
from aguaclara.core.units import u
import matplotlib.pyplot as plt

# Plan a research experiment
reactor = ac.Variable_C_Stock(
    Q_sys = 2 * u.mL / u.s, 
    C_sys = 1.4 * u.mg / u.L, 
    Q_stock = 0.01 * u.mL / u.s
)
C_stock_PACl = reactor.C_stock()

# Visualize and analyze ProCoDA data
ac.iplot_columns(
    path = "https://raw.githubusercontent.com/AguaClara/team_resources/master/Data/datalog%206-14-2018.xls", 
    columns = [3, 4], 
    x_axis = 0
)
plt.ylabel("Turbidity (NTU)")
plt.xlabel("Time (hr)")
plt.legend(("Influent", "Effluent"))

The package is still undergoing rapid development. As it becomes more stable, a user guide will be written with more detailed tutorials. At the moment, you can find some more examples in specific pages of the API reference.

Contributing

Bug reports, features requests, documentation updates, and any other enhancements are welcome! To suggest a change, make an issue in the aguaclara Github repository.

To contribute to the package as a developer, refer to the Developer Guide.

aguaclara_research's People

Contributors

Stargazers

Watchers

Forkers

lilyfalk chingpangggg cheertsang gitter-badger

aguaclara_research's Issues

Tutorial and Research Organization

After discussing this, Hannah and I had an idea for a new structure for organizing tutorials, report templates, and assignments. @monroews @ethan92429 what are your thoughts? Naming structure is also not final, and suggestions are welcome there as well.

aguaclara_orientation repository: general tutorials applicable to all team members, including assignments like the python tutorial, tutorials on Github, and research report templates. Strong tutorials written and reviewed by team leadership.
aguaclara_tutorial: slightly more specific tutorials that team members believe would be helpful for their subteam as well as other subteams. Tutorials in this repo would be written by all team members and be available as a resource, but not required learning for everyone. This repository will need to be well organized by larger categories such as research topics, Fabrication, and Going Global.
subteam tutorials: tutorials specific to a subteam, written in the wiki section of the subteam's own Github repository.
aguaclara_research: Wiki would contain information on functions as well as example markdown documents used for running calculations as you would in your report.

creating a pdf of markdown files

Could you provide a method to create a pdf from a markdown file that shows nicely formatted equations? The idea is to have a way for students to submit their assignments in a nicely formatted way. I know there was discussion of this, but I can't find the documentation of how to do this. I'll need to link to that documentation (once it exists) from the course websites where I define assignments.

PACl.Density yields 1138 without any units

There seems to be an error in how units are handled. I would have expected units to be included.

Get the current master branch to build without errors on Travis.

@fletchapin there appears to be a recursion error that pops up on the Travis build. Don't know how this got introduced. Any chance you could help us with it?

convert indexes to integers for data extraction

ftime and Column_of_data functions have column and row indexes as inputs. If a number with a decimal is passed it causes the functions to return an error. Eliminate this error by converting those inputs to int(). The use case is for when the row numbers are calculated values that result in floating point type.

Write working Travis YML

Please edit the .travis.yml to pass Travis tests. Thanks!

accessing functions that are inside floc_model

What does from aguaclara_research.play import * load? It doesn't appear that this loads any of the modules that are inside aguaclara_research.
How do I access functions inside floc_model, for example?

Testing ProCoDA_Parser.py

I tried both functions again, and I'm pretty sure the current issues are:

read_state_with_metafile: It seems that it is having trouble reading the metafile in .xls format. But, I may be misinterpreting this.

---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
<ipython-input-9-836ee8f49771> in <module>()
----> 1 pro.read_state_with_metafile(avg, "2", "1", "C:\\Users\\whp28\\Google Drive\\AGUACLARA DRIVE\\AguaClara Grads\\William Pennock\\Meta File.xls", metaids=[20],extension=".xls", units="NTU")

~\aguaclara_research\aguaclara_research\ProCoDA_Parser.py in read_state_with_metafile(func, state, column, path, metaids, extension, units)
    644     outputs = []
    645 
--> 646     metafile = pd.read_csv(path, delimiter='\t', header=None)
    647     metafile = np.array(metafile)
    648 

~\Anaconda\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    653                     skip_blank_lines=skip_blank_lines)
    654 
--> 655         return _read(filepath_or_buffer, kwds)
    656 
    657     parser_f.__name__ = name

~\Anaconda\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    409 
    410     try:
--> 411         data = parser.read(nrows)
    412     finally:
    413         parser.close()

~\Anaconda\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1003                 raise ValueError('skipfooter not supported for iteration')
   1004 
-> 1005         ret = self._engine.read(nrows)
   1006 
   1007         if self.options.get('as_recarray'):

~\Anaconda\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1746     def read(self, nrows=None):
   1747         try:
-> 1748             data = self._reader.read(nrows)
   1749         except StopIteration:
   1750             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:11884)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows (pandas\_libs\parsers.c:11755)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error (pandas\_libs\parsers.c:28765)()

ParserError: Error tokenizing data. C error: Expected 5 fields in line 117, saw 12

write_calculations_to_csv: While the function is written to use metaids, I don't see it in the function call (inputs). Once that is resolved, the same challenge with the metafile may present itself.

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-bcdb4af79b07> in <module>()
----> 1 pro.write_calculations_to_csv([avg], ["2"], ["1"], "C:\\Users\\whp28\\Google Drive\\AGUACLARA DRIVE\\AguaClara Grads\\William Pennock\\Meta_File.txt", ["mean"], "test.txt",extension=".xls")

~\aguaclara_research\aguaclara_research\ProCoDA_Parser.py in write_calculations_to_csv(funcs, states, columns, path, headers, out_name, extension)
    760     for i in range(len(headers)):
    761         ids, data = read_state_with_metafile(funcs[i], states[i], columns[i],
--> 762                                              path, extension)
    763         data_agg = np.append(data_agg, [data])
    764 

~\aguaclara_research\aguaclara_research\ProCoDA_Parser.py in read_state_with_metafile(func, state, column, path, metaids, extension, units)
    652         paths = []
    653     for i in range(len(ids)):
--> 654         if ids[i] in meta_ids:
    655             paths.append(metafile[i, 4])
    656     else:

NameError: name 'meta_ids' is not defined

root finding using functions that have units

Would it be possible to wrap the SciPy root finding methods
https://docs.scipy.org/doc/scipy-0.18.1/reference/optimize.html#root-finding
so that they could handle units? The ability to easily do root finding with our functions with often be useful and currently it is a hassle to have to strip the units from the functions. I'm hoping there is an elegant way to do this.

move floc_model from aide to aguaclara_research

I think that the floc_model is primarily for research and should be moved out of aide.

General Recommendations for ProCoDA_Parser.py

Based on my quick glance at the last two functions in ProCoDA_Parser.py, here are a few features I think might be good to add:

In the annotations for the function, it would be good to include with every path variable that the directories should be divided by \\ rather than \.
It would be helpful if the functions also took as an input a list of the MetaIDs you wanted. Not all experiments may be valid or comparable to one another, so it would be good to be able to input a list of only the experiments you want to analyze in the particular way you are describing with the function.
I don't know how much overhead it would add (maybe a lot), but it would be nice if rather than restricting it to column indices, users could also use strings from the headers in their data files that the function would then search for and match to to find the proper index (e.g., ["Influent Turbidity (NTU)"]). The reason this is a useful feature is that the indices of data files can be shifted by adding or removing a sensor or function. So, if a team decides they want to add a pH probe to their apparatus, their previous data processor will have to be done in two steps: one for the old format of data files, and one for the new format.
I do not believe the current way the functions load the metafile that it will be able to read .xls files. Can the function be generalized to do that? Since it will be most convenient for teams to use Excel instead of tab-delimited .txt files, this would streamline things a bit. Otherwise, we'll need to let them know that they'll need to save a version of their .xls files as a tab-delimited .txt. This could become an issue if they save updates to their .xls file but forget to save them to their .txt file.

Great work, Fletcher! I'm very excited by the promise of this module!

Testing of ProCoDA_Parser.py

So far, this looks like an excellent module, and I'm looking forward to getting it up and running!

I tried the last two functions, and these were the results:

read_state_with_metafile:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-b8318104443f> in <module>()
----> 1 pro.read_state_with_metafile(avg, "2", "1", "C:\\Users\\whp28\\Google Drive\\AGUACLARA DRIVE\\AguaClara Grads\\William Pennock\\Meta_File.txt", units="NTU")

~\aguaclara_research\aguaclara_research\ProCoDA_Parser.py in read_state_with_metafile(func, state, column, path, units)
    595 
    596     # use a loop to evaluate each experiment in the metafile
--> 597     for i in range(paths):
    598         # get the range of dates for experiment i
    599         day1 = metafile[i+1, 1]

TypeError: only integer scalar arrays can be converted to a scalar index

I believe it was a problem with calling range on a list of strings, but I'm not sure.

write_calculations_to_csv:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-37-42eac6674187> in <module>()
----> 1 pro.write_calculations_to_csv([avg], ["2"], ["1"], "C:\\Users\\whp28\\Google Drive\\AGUACLARA DRIVE\\AguaClara Grads\\William Pennock\\Meta_File.txt", ["mean"], "test.txt")

~\aguaclara_research\aguaclara_research\ProCoDA_Parser.py in write_calculations_to_csv(funcs, states, columns, path, headers, out_name)
    684 
    685     data_agg = []
--> 686     for i in range(headers.len()):
    687         ids, data = read_state_with_metafile(funcs[i], states[i], columns[i], path)
    688         data_agg = np.append(data_agg, [data])

AttributeError: 'list' object has no attribute 'len'

It appears it did not consider the headers list to have the property len(). Maybe it would be okay to just do len(headers)?

missing functions

The following functions are missing from Environmental_Processes_Analysis
Perhaps eliminate the filedialog.askdirectory() since it seems that code doesn't work on Macs. Instead pass the function the directory path.

def aeration_data(DO_column):
""" This function extracts the data from folder containing tab delimited files of aeration data.
The file must be the original tab delimited file.
All text strings below the header must be removed from these files.
The file names must be the air flow rates with units of micromoles/s.
An example file name would be "300.xls" where 300 is the flowr ate in micromoles/s
The function opens a file dialog for the user to select the directory containing the data.

Parameters
----------
DO_column: index of the column that contains the dissolved oxygen concentration data.

Returns
-------
filepaths: list of all file paths in the directory sorted by flow rate
airflows: sorted numpy array of air flow rates with units of micromole/s attached
DO_data: sorted list of numpy arrays. Thus each of the numpy data arrays can have different lengths to accommodate short and long experiments
time_data: sorted list of numpy arrays containing the times with units of seconds. Each
"""

dirpath = filedialog.askdirectory()
#return the list of files in the directory
filenames = os.listdir(dirpath)
#extract the flowrates from the filenames and apply units
airflows=((np.array([i.split('.', 1)[0] for i in filenames])).astype(np.float32))
#sort airflows and filenames so that they are in ascending order of flow rates
idx   = np.argsort(airflows)
airflows = (np.array(airflows)[idx])*u.umole/u.s
filenames = np.array(filenames)[idx]

filepaths = [os.path.join(dirpath, i) for i in filenames]
#DO_data is a list of numpy arrays. Thus each of the numpy data arrays can have different lengths to accommodate short and long experiments
# cycle through all of the files and extract the column of data with oxygen concentrations and the times
DO_data=[Column_of_data(i,0,-1,DO_column,'mg/L') for i in filepaths]
time_data=[(ftime(i,0,-1)).to(u.s) for i in filepaths]
aeration_collection = collections.namedtuple('aeration_results','filepaths airflows DO_data time_data')
aeration_results = aeration_collection(filepaths, airflows, DO_data, time_data)
return aeration_results

def O2_sat(Pressure_air,Temperature):
"""
This equation is valid for 278 K < T < 318 K

Parameters
----------
Pressure_air: air pressure with appropriate units.
Temperature: water temperature with appropriate units

Returns
-------
Saturated oxygen concentration in mg/L
"""
fraction_O2 = 0.21
Pressure_O2 = Pressure_air *fraction_O2
return (Pressure_O2.to(u.atm).magnitude)*u.mg/u.L*np.exp(1727/Temperature.to(u.K).magnitude - 2.105)

def Gran(data_file_path):
""" This function extracts the data from a ProCoDA Gran plot file.
The file must be the original tab delimited file.

Parameters
----------
data_file_path : string of the file name or file path.
If the file is in the working directory, then the file name is sufficient.
Example data_file_path = 'Reactor_data.txt'

Returns
-------
V_titrant (mL) as numpy array
ph_data as numpy array (no units)
V_sample (mL) volume of the original sample that was titrated
Normality_titrant (mole/L) normality of the acid used to titrate the sample
V_equivalent (mL) volume of acid required to consume all of the ANC
ANC (mole/L) Acid Neutralizing Capacity of the sample

"""
df = pd.read_csv(data_file_path,delimiter='\t',header=5)
V_t = np.array(pd.to_numeric(df.iloc[0:,0]))*u.mL
pH = np.array(pd.to_numeric(df.iloc[0:,1]))
df = pd.read_csv(data_file_path,delimiter='\t',header=-1,nrows=5)
V_S = pd.to_numeric(df.iloc[0,1])*u.mL
N_t = pd.to_numeric(df.iloc[1,1])*u.mole/u.L
V_eq = pd.to_numeric(df.iloc[2,1])*u.mL
ANC_sample = pd.to_numeric(df.iloc[3,1])*u.mole/u.L
Gran_collection = collections.namedtuple('Gran_results','V_titrant ph_data V_sample Normality_titrant V_equivalent ANC')
Gran = Gran_collection(V_titrant=V_t, ph_data=pH,V_sample=V_S, Normality_titrant=N_t, V_equivalent=V_eq, ANC=ANC_sample )
return Gran;

peristaltic pump code for experimental design

We need code that walks students through the process of selecting peristaltic pump tubing, stock concentrations, and flow rates. The ideas for this are in the auto tutorial for peristaltic pumps and a Mathcad sheet has code that can be a guide.

The code should include a function that returns uL/rev for the different sizes of peristaltic tubing.
Perhaps that is most critical function at this time. The additional analysis can be recreated by the teams as they design their apparatus.

Test and improve documentation of EPA

Add examples to comments and start implementing doctesting in EPA.

Also add in-line comments so code is easier to understand and modify in the future

Create Tutorial for ProCoda data analysis software

Fletcher, can you create a tutorial on how to use your data analysis software for ProCoda? This will be used over the summer for reports.

Add Coagulant Dosing Functions

Hi Hannah, you should have write access to the repository! Once you confirm your coagulant dosing calculation and compare with Fluoride, please add it to the research repository. If there are any other functions you think would be valuable to add, let me know on this issue.

G coil equation is wrong in floc_model

The Gcoil equation has the wrong term raised to the 4th power
Also verify that the correct log base 10 is used.

$$\overline{G_{coil}} = \bar G\left[ 1 + 0.033\left(log_{10}De\right)^4 \right]^\frac{1}{2} $$

Update Contributing Guide

Before the summer is over, this repo should have a strong contributing guide explaining how members can help work on the project. The current guide looks automatically generated and is missing lots of information.

This guide should include both how to add an issue to the repo (for members who will not be directly contributing to the code, but still want changes made) and how to edit the code directly. Make sure to link it from the README as well!