alphatwirl / alphatwirl Goto Github PK

View Code? Open in Web Editor NEW

10.0 3.0 10.0 3.72 MB

A Python library for summarizing event data into multivariate categorical data

License: BSD 3-Clause "New" or "Revised" License

Python 99.33% R 0.17% C 0.50% Shell 0.01%

data-analysis root-cern pandas r data-frame alphatwirl

alphatwirl's Introduction

A Python library for summarizing event data into multivariate categorical data

Description

AlphaTwirl is a Python library that summarizes event data into multivariate categorical data as data frames. Event data, input to AlphaTwirl, are data with one entry (or row) for one event: for example, data in ROOT TTrees with one entry per collision event of an LHC experiment at CERN. Event data are often large—too large to be loaded in memory—because they have as many entries as events. Multivariate categorical data, the output of AlphaTwirl, have one row for one category. They are usually small—small enough to be loaded in memory—because they only have as many rows as categories. Users can, for example, import them as data frames into R and pandas, which usually load all data in memory, and can perform categorical data analyses with a rich set of data operations available in R and pandas.

Quick start

Jupyter Notebook: Quick start of AlphaTwirl

Publication

Tai Sakuma, "AlphaTwirl: A Python library for summarizing event data into multivariate categorical data", EPJ Web of Conferences 214, 02001 (2019), doi:10.1051/epjconf/201921402001, 1905.06609

License

AlphaTwirl is licensed under the BSD license.

Contact

Tai Sakuma - [email protected]

alphatwirl's People

Contributors

Stargazers

Watchers

Forkers

dsmiff cmsra1 skalafut shane-breeze taisakuma benkrikler lucien1011 eshwen dbanthony

alphatwirl's Issues

support decorator for using "concurrently"

support decorator for using "concurrently"
- a step towards making concurrently an independent package

add unit tests to test if receive_one() in TaskPackageDropbox unpickles one result

sett comment in #51

update test for run.py

update test for run.py

to test SIGINT is actually ignored:

alphatwirl/alphatwirl/concurrently/run.py

Line 23 in 6c8580b

signal.signal(signal.SIGINT, signal.SIG_IGN)

to test log level and handler are in effect:

alphatwirl/alphatwirl/concurrently/run.py

Lines 85 to 99 in 611214e

    
           def setup_logging(): 
        
               path = 'logging_levels.json.gz' 
        
               if not os.path.isfile(path): 
        
                   return 
        
               with gzip.GzipFile(path, 'r') as f: 
        
                   loglevel_dict = json.loads(f.read().decode('utf-8')) 
        
               for name, level in loglevel_dict.items(): 
        
                   logger = logging.getLogger(name) 
        
                   logger.setLevel(level) 
        
               handler = logging.StreamHandler(sys.stdout) 
        
               formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') 
        
               handler.setFormatter(formatter) 
        
               logging.getLogger('').addHandler(handler)

correct module names for multiple _deprecated_class_method_option

When _deprecated_class_method_option() or _deprecated_func_option() is used multiple times on the same method or function, module_name of the outer decorator will be the module name of the inner decorator, that is deprecation.

need to check if the decorated function is already decoreated and properly set module_name to the module name of the original function.

develop progress presentation for jupyter notebook

There are ways to delete stdout in jupyter notebook
There also might be scroll bar widget

fix Read the Docs

the style is not applied

support the "with" statement for using "concurrently"

support the "with" statement for using "concurrently"
- a step towards making concurrently an independent package

Unit tests package dependencies not documented

I was trying to understand why the unit tests were failing when I run pytest and it turns out it's because I didn't have the pytest-console-scripts package installed. I realised that these were needed by looking in the travis CI config file, but it would be good to define pytest, pytest-mock, pytest-cov and pytest-console-scripts either in the requirements.txt file, or mention it in the README explicitly.

Add support for SGE batch systems

I've no idea how much work this would involve, but to bring alphatwirl out to more sites, support beyond HTCondor would be nice to be able to run components in parallel. In particular, sun grid engine batch systems, although somewhat old-fashioned, are still common such as lxplus or on the Imperial College batch system.

don't skip a test for Summarizer for Python 3

alphatwirl/tests/unit/summary/test_Summarizer.py

Lines 67 to 71 in 6106bde

    
           @pytest.mark.skipif(sys.version_info >= (3, 0), reason="requires python 2") 
        
           def test_to_tuple_list_key_not_tuple(obj): 
        
               obj.add('A', (12, )) # the keys are not a tuple 
        
               obj.add(2, (20, ))   # 
        
               assert [(2, 20), ('A', 12)] == obj.to_tuple_list()

It is skipped because int and str cannot be compared in python 3

alphatwirl/alphatwirl/summary/Summarizer.py

Line 59 in 6106bde

keys_sorted = sorted(self._results.keys())

replace GenericKeyComposerB with KeyValueComposer

rename valOutColumnNames

valOutColumnNames is a misnomer
- TableConfigCompleter.py
- build_counter_collector_pair.py
They are not column names for values but for summaries.
- should be called summaryColumnNames or something.

collect results of jobs as they finish

collect results from jobs as they finish instead of collecting them after all jobs finish.

ValueError with duplicate datasets

Release version

0.16.0 (expected to appear in 0.20.0 as well)

Issue

ValueError is raised on loop/merge.py#L7 when duplicate datasets (two or more dataset entries with the same name attribute)

Description

Duplicate datasets can appear by user error. The final duplicate dataset will overwrite all the other in loop/EventDatasetReader.py#L74, but loop/EventDatasetReader.py#L68 is filled with runids from all dataset duplicates.

When the code reaches loop/merge.py#L7 it tries to access runids from duplicate datasets that have been overwritten in loop/EventDatasetReader.py#L74

Resolution

Everything works fine if there are no duplicate datasets.

However, this could be checked or enforced somewhere earlier.

make simple_repr work

https://github.com/alphatwirl/alphatwirl/blob/master-20190512-01-simple-repr/alphatwirl/misc/simple_repr.py

not sure how to make it picklable.

use deque

use deque for self.boundaries:
https://github.com/TaiSakuma/AlphaTwirl/blob/766483220649396bb3a6106059f08cd1791f2f65/alphatwirl/binning/Round.py#L25

EventLoop doesn't provide Reader a way to terminate execution elegantly

Problem

The code in the EventLoop class handles the reader (typically a composite reader) and provides it with events from the tree(s) by:

Call reader.begin() method
For each event:
a. Call reader.event()
Call reader.end()

In each case, the return of each of reader's methods is ignored. This makes it awkward for a reader to tell the event loop to terminate early, or not run at all in the case of begin(). begin() can raise an exception of course, but in some cases that is more extreme than really needed.

Solution

Have the return of each calll to reader begin, event, and end checked, and execution halted if the return does not evaluate to true.

Specific example

I'm running over a large heppy dataset and including the existing RA1 AlphaTools sequence as a single Reader. The heppy dataset is being split up using AlphaTwirl and submitted to the Bristol htcondor batch system. One of the datasets in this input is not supposed to be handled by AlphaTools and AlphaTools knows this, so it would normally just ignore it, however this causes an exception to be raised crashing the batch job, which is then re-submitted. To enable things to finish elegantly, I would prefer to be able to signal to the EventLoop that this jobs should not proceed in the begin() method of the reader that wraps AlphaTools.

propagate the config for logging to resume.py

related to #25, propagate the log levels to resume.py as well.

add option to run profile in batch jobs

add an option to run the profile in batch jobs.
- the option exists in run.py
- but no ways to use the option
- can be an option for dropbox or dispatcher

allow elements of the option binnings of KeyValueComposer to be None

allow elements of the option binnings of KeyValueComposer to be None
currently binnings itself can be None. but when binnings is a list, its element is not allowed to be None
this feature is supported in qtwirl

test gets stuck in python 3.6 in some circumstances

at the commit 7405535, the test gets stuck in some cirumstances.

example how to reproduce

in the top directory of alphatwirl,

pytest -s tests

the test starts normally.

============================================ test session starts =============================================
platform darwin -- Python 3.6.3, pytest-3.4.0, py-1.5.2, pluggy-0.6.0
rootdir: #######, inifile:
plugins: mock-1.6.3, cov-2.5.1, console-scripts-0.1.4, hypothesis-3.38.5
collected 619 items

and all tests pass.

tests/unit/summary/test_Summarizer.py .....s....                                                       [  0%]
tests/unit/summary/test_WeightCalculatorOne.py .                                                       [  0%]
tests/unit/summary/test_convert_key_vals_dict_to_tuple_list.py ......                                  [  0%]
tests/unit/summary/test_parse_indices_config.py .

=================================== 579 passed, 40 skipped in 5.09 seconds =================================

but, the test doesn't finish. the shell prompt won't appear.

hit ctrl-c

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/Users/sakuma/anaconda3/envs/py3_6-pd0_20/lib/python3.6/multiprocessing/popen_fork.py", line 29, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
$

the problem doesn't always happen

the same problem doesn't happen for example if

-s option is not used
the trailing slash is given as in tests/

pytest tests
pytest -s tests/

the same problem doesn't happen in python 2.7

platform darwin -- Python 2.7.12, pytest-3.3.2, py-1.5.2, pluggy-0.6.0
rootdir: #######, inifile: pytest.ini
plugins: mock-1.6.3, cov-2.5.1, console-scripts-0.1.3

HTCondorJobSubmitter crashes if problem submitting

When trying to run on lxplus, with my working directory under /eos I observed the following traceback:

Traceback (most recent call last):
  File "/afs/cern.ch/user/b/bkrikler/.local/bin/fast_carpenter", line 11, in <module>
    load_entry_point('fast-carpenter', 'console_scripts', 'fast_carpenter')()
  File "/eos/user/b/bkrikler/CHIP/fast-carpenter/fast_carpenter/__main__.py", line 67, in main
    return process.run(datasets, sequence)
  File "/afs/cern.ch/user/b/bkrikler/.local/lib/python2.7/site-packages/atuproot/atuproot_main.py", line 49, in run
    return self._run(loop)
  File "/afs/cern.ch/user/b/bkrikler/.local/lib/python2.7/site-packages/atuproot/atuproot_main.py", line 101, in _run
    result = loop()
  File "/eos/user/b/bkrikler/CHIP/alphatwirl/alphatwirl/datasetloop/loop.py", line 55, in __call__
    self.reader.read(dataset)
  File "/eos/user/b/bkrikler/CHIP/alphatwirl/alphatwirl/datasetloop/reader.py", line 27, in read
    reader.read(dataset)
  File "/eos/user/b/bkrikler/CHIP/alphatwirl/alphatwirl/loop/EventDatasetReader.py", line 66, in read
    runids = self.eventLoopRunner.run_multiple(eventLoops)
  File "/eos/user/b/bkrikler/CHIP/alphatwirl/alphatwirl/loop/MPEventLoopRunner.py", line 93, in run_multiple
    return self.communicationChannel.put_multiple(eventLoops)
  File "/eos/user/b/bkrikler/CHIP/alphatwirl/alphatwirl/concurrently/CommunicationChannel.py", line 131, in put_multiple
    return self.dropbox.put_multiple(packages)
  File "/eos/user/b/bkrikler/CHIP/alphatwirl/alphatwirl/concurrently/TaskPackageDropbox.py", line 60, in put_multiple
    runids = self.dispatcher.run_multiple(self.workingArea, pkgidxs)
  File "/eos/user/b/bkrikler/CHIP/alphatwirl/alphatwirl/concurrently/HTCondorJobSubmitter.py", line 129, in run_multiple
    njobs = int(regex.search(stdout).groups()[0])
AttributeError: 'NoneType' object has no attribute 'groups'

The underlying problem in this case was that htcondor submission on lxplus doesn't work below /eos only under /afs, however the traceback there hid this somewhat. It might be helpful if the match object from the regular expression on line 127 was first checked to be valid and not None. If it is None, then the stdout and stderr could be printed and some exception raised. That would help a user understand more immediately why the jobs weren't submitted I suspect.

propagate the config for logging to batch jobs

update how HeppyResult checks if a dir is a component

take over https://github.com/CMSRA1/AlphaTwirl-old/issues/4
continue from https://github.com/CMSRA1/AlphaTwirl-old/issues/3

RuntimeWarning in travis

alphatwirl/tests/unit/datasetloop/test_resume_py.py

Lines 56 to 63 in e0ea580

    
           # assert '' == ret.stderr 
        
           ## commented out because of "RuntimeWarning" in travis 
        
           ## https://travis-ci.org/alphatwirl/alphatwirl/jobs/409237073 
        
           ## E AssertionError: assert '' == '/home/travis/miniconda/envs...ok( name, 
        
           ## *args, **kwds )\n' 
        
           ## E + /home/travis/miniconda/envs/testenv/lib/ROOT.py:301: RuntimeWarning: 
        
           ## numpy.dtype size changed, may indicate binary incompatibility. Expected 
        
           ## 96, got 88

https://travis-ci.org/alphatwirl/alphatwirl/jobs/409237073

fix travis for python 2.7 with ROOT

https://travis-ci.org/alphatwirl/alphatwirl/builds/412573877
- error with Python 2.7 with ROOT
- while installing scandir, required by pytest

use dict.setdefault

rewite this construction

if key not in counts:
  counts[key] = <<initial value>>

with

counts.setdefault(key, <<initial value>>)

for example in
https://github.com/TaiSakuma/AlphaTwirl/blob/v0.9.x/AlphaTwirl/Summary/Count.py#L16-L17

It will run faster because it searches for key once instead of twice

handle TChain with no files

alphatwirl/alphatwirl/loop/EventLoop.py

Lines 36 to 39 in fcf553e

    
           events = self.build_events() 
        
           self.nevents = len(events) 
        
           self._report_progress(0) 
        
           self.reader.begin(events)

self.reader.begin(events) can fail if events are built from TChain with no file.

This could happen if no files are verified here:

alphatwirl/alphatwirl/roottree/build.py

Lines 24 to 28 in fcf553e

    
           if self.config['check_files']: 
        
               paths = self._verify_files(paths, self.config['skip_error_files']) 
        
           chain = ROOT.TChain(self.config['tree_name']) 
        
           for path in paths: 
        
               chain.Add(path)

Extend the contents used for DataFrame filenames

This is based on a series of posts first made to Slack. But for the TL;DR:

The TableFileNameComposer class currently creates filenames that contain only the binning scheme for the DFs
Sometimes the same binning scheme will be used whilst other variables are changed to create a DF (eg, for systematics, where we change the weighting or corrections).
I suggest we have TableFileNameComposer.call() receive the full DF config specification, such that any generic filename composer one would like to create has the full amount of information available to it. So on this line, you would just pass in ret:

ret['outFileName'] = self.createOutFileName(ret)

For the meantime, I have a hacky solution to my problem, which can be used instead of TableFileNameComposer:

class WithInsertTableFileNameComposer():                                           
    def __init__(self, composer, inserts):                                         
        self.inserts = inserts                                                     
        self.composer = composer                                                   
        self.frame_idx = 0                                                         
                                                                                   
    def __call__(self, columnNames, indices, **kwargs):                                     
        this_insert = self.inserts[self.frame_idx]                                 
        suffix = kwargs.get("suffix", self.composer.default_suffix)      
        kwargs["suffix"] = "--{}.{}".format(this_insert, suffix)                   
        self.frame_idx += 1                                                        
        return self.composer(columnNames, indices, **kwargs)

change the way to give a progressReporter to a task

At the moment, a progressReporter is given to a task as an argument:

alphatwirl/alphatwirl/concurrently/Worker.py

Lines 24 to 29 in dd0a7a3

    
           def _run_task(self, package): 
        
               try: 
        
                   result = package.task(progressReporter = self.progressReporter, *package.args, **package.kwargs) 
        
               except TypeError: 
        
                   result = package.task(*package.args, **package.kwargs) 
        
               return result

this potentially causes a name conflict if a task already has an argument progressReporter for a different purpose.

Instead of Worker giving the reporter to a task, the task should find the reporter if the task wants a reporter.

A possible solution: a worker register a reporter to a DB, e.g., ProgressReporterAgency or ProgressReporterOffice defined in the module progressbar. A task gets the reporter from the DB.

change the default datasetColumnName to 'dataset'

at either

alphatwirl/alphatwirl/collector/ToTupleListWithDatasetColumn.py

Line 6 in 705bcce

datasetColumnName='component'

and
alphatwirl/alphatwirl/collector/ToDataFrameWithDatasetColumn.py

Line 10 in 705bcce

datasetColumnName = 'component'

or

alphatwirl/alphatwirl/configure/build_counter_collector_pair.py

Lines 27 to 29 in 705bcce

    
           resultsCombinationMethod = ToTupleListWithDatasetColumn( 
        
               summaryColumnNames = tblcfg['keyOutColumnNames'] + tblcfg['valOutColumnNames'] 
        
           )

build_counter_collector_pair can become a class so that datasetColumnName can be changed by an option to __init__().

avoid mutable default

$ find ./ -name \*.py | xargs grep 'def' | grep '\[ *\]'
./parallel/build.py:def build_parallel(parallel_mode, quiet=True, processes=4, user_modules=[ ],
./concurrently/HTCondorJobSubmitter.py:    def __init__(self, job_desc_extra=[ ], job_desc_dict={}):
./selection/modules/with_count.py:    def __init__(self, name='All', selections=[ ]):
./selection/modules/with_count.py:    def __init__(self, name='Any', selections=[ ]):
./selection/modules/basic.py:    def __init__(self, name='All', selections=[ ]):
./selection/modules/basic.py:    def __init__(self, name='Any', selections=[ ]):
./loop/ReaderComposite.py:    def __init__(self, readers=[]):
$ find ./ -name \*.py | xargs grep 'def' | grep '{ *}'
./parallel/build.py:        logger.warning('unknown parallel_mode "{}", use default "{}"'.format(
./concurrently/HTCondorJobSubmitter.py:    def __init__(self, job_desc_extra=[ ], job_desc_dict={}):
./selection/factories/expand.py:def expand_path_cfg(path_cfg, alias_dict={ }, overriding_kargs={ }):

possible more. arguments can be listed in multiple lines.

optimize the sending results from worker nodes for data frames

currently, the package concurrently, uses pickle to send the results from the worker nodes.

pickle files with data frames tend to become large. and loading large pickles is slow.

develop an alternative implementation which is specifically optimized for data frames, for example, by using dask so that loading the results at the interactive node is fast.

update tests for TaskPackageDropbox

update tests for TaskPackageDropbox
- the current tests only cover limited situations
- what to vary
  - the number of tasks, e.g., 0, 1, or 5
  - the number of fails before success for each task, for example, [0, 0, 0, 0, 0], [0, 0, 2, 0, 0], [0, 1, 1, 0, 1] if the number of tasks is 5.
  - sequences of runs in poll
    - permutation
    - the number of runs in each pol

localize the concept of data sets

localize the concept of data sets to certain sub packages

organize requirements

pointed out in #30 (comment)

need at least three separate lists of requirements

to use alphatwirl
to run the tests
to compile the docs

they can be specified in install_requires in setup.py and requirements files for pip.

absorb collectors into readers

this will be surgical changes.

it will help to solve #10.

steps:

add the operator + (or method merge()) to Reader
create an alternative to EventsInDatasetReader, in which readers for the same data set are merged at the end. - EventDatasetReader
add merge() to ReaderComposite
add merge() to AllwCount, AnywCount, and NotwCount
replace the return in end() in EventDatasetReader .
deprecate EventsInDatasetReader

should min be applied on bin boundaries rather than values?

In the version 0.18.3, the following code produce awkward results

from qtwirl import qtwirl
filepath = 'https://github.com/alphatwirl/qtwirl/raw/v0.03.1/tests/data/sample_chain_01.root'
results = qtwirl(
    file=filepath, tree_name='tree',
    reader_cfg=dict(keyAttrNames='met', binnings=RoundLog(0.2, 100, min=10, underflow_bin=0)))
print results

met	n	nvar
0.000000	102	102
6.309573	0	0
10.000000	53	53
15.848932	77	77
25.118864	100	100
39.810717	141	141
63.095734	153	153
100.000000	166	166
158.489319	114	114
251.188643	78	78
398.107171	14	14
630.957344	2	2
1000.000000	0	0

There shouldn't be the 2nd row, i.e. the row with met = 6.309573.

This is even wrong. It looks like there is no events with 6.309573 <= met < 10.0, which is wrong.

This happens because, in RoundLog (and Round as well), the next to the underflow bin is the bin for the min.

An example

>>> b = RoundLog(0.2, 100, min=10, underflow_bin=0)
>>> b(10)
6.309573444801936
>>> b(8)
0

Here is why this happens. Because 10 is greater than or equal to min (in fact exactly the min), it returns the lower edge of the bin. However, because of the rounding issue of float, 10 doesn't fall into the interval [10, 10^1.2) but one below [10^0.8, 10). So it returns 10^0.8 = 6.30957. On the other hand, 8 is below min, it returns underflow_bin.

This is a featrue but is very uncomfortable. 8 is greator than 6.3. If there is a bin [10^0.8, 10), 8 should fall in that bin.

As long as the bin is determined after the value is compared with the min, this can happen.

This uncomfortable situation can be avoided if the minimum value is the lower edge of the bin the argument min falls in.

ROOT is not imported in Travis

ROOT doesn't appear to be imported at Travis according to Codecov
cannot investigate at the moment because the job log is empty at Travis

HTCondorSubmitter tries to access the first element of a potentially empty list

Line 74 of HTCondorJobSubmitter calls a function then immediately gets the first element in what it returns. However two of the three return statements in that function return empty lists, and so this can lead to the code crashing.

move resume.py from fwtwirl

move resume.py from fwtwirl
write a test for resume.py

SetBranchAddress() doesn't work for std::vector in Python 3

alphatwirl/alphatwirl/roottree/BranchAddressManagerForVector.py

Line 46 in 53ec9d0

tree.SetBranchAddress(leaf.GetName(), itsVector)

this line causes segmentation violation.

make it possible to include lambdas in tasks

currently, it is impossible for a task in concurrently to include lambda. this is because lambda isn't picklable.

there are several solutions there to serialize lambdas, including dill.

note that unlike results, tasks are generally small. serializing and unserializing are not necessary to be very fast.

enable coverage in Travis-CI

disabled now because it takes too long to finish
https://travis-ci.org/github/alphatwirl/alphatwirl/builds/673907629

"can't pickle _thread.lock" in ResumableDatasetLoop with SubprocessRunner in Python 3

at 8db62fc, a few commits after v0.24.2

Traceback (most recent call last):
  File "./bdphi-scripts/bdphiROC/twirl_mktbl_heppy.py", line 721, in <module>
    main()
  File "./bdphi-scripts/bdphiROC/twirl_mktbl_heppy.py", line 62, in main
    run(reader_collector_pairs)
  File "./bdphi-scripts/bdphiROC/twirl_mktbl_heppy.py", line 609, in run
    treeName='tree'
  File "./bdphi-scripts/external/atheppy/atheppy/fw.py", line 110, in run
    self._run(loop)
  File "./bdphi-scripts/external/atheppy/atheppy/fw.py", line 285, in _run
    loop()
  File "./bdphi-scripts/external/alphatwirl/alphatwirl/datasetloop/loop.py", line 60, in __call__
    pickle.dump(self.reader, f, protocol=pickle.HIGHEST_PROTOCOL)
TypeError: can't pickle _thread.lock objects

The error occures in ResumableDatasetLoop:

alphatwirl/alphatwirl/datasetloop/loop.py

Lines 52 to 61 in 8db62fc

    
           def __call__(self): 
        
               self.reader.begin() 
        
               for dataset in self.datasets: 
        
                   self.reader.read(dataset) 
        
               path = os.path.join(self.workingarea.path, 'reader.p.gz') 
        
               with gzip.open(path, 'wb') as f: 
        
                   pickle.dump(self.reader, f, protocol=pickle.HIGHEST_PROTOCOL) 
        
               return self.reader.end()

This happens only in Python 3, not in Python 2.7
This happens only in the parrallel mode subprocess, not in htcondor.

The proc in SubprocessRunner might not be picklable in Python 3:

alphatwirl/alphatwirl/concurrently/SubprocessRunner.py

Lines 44 to 50 in 8db62fc

    
           proc = subprocess.Popen( 
        
               args, 
        
               stdout=subprocess.PIPE if self.pipe else None, 
        
               stderr=subprocess.PIPE if self.pipe else None, 
        
               cwd=taskdir 
        
           ) 
        
           self.running_procs.append(proc)

can try to pickle before self.reader.read(dataset) or self.reader.begin()

make repr short

within one line
use str for long text
for all classes

	def setup_logging():
	path = 'logging_levels.json.gz'
	if not os.path.isfile(path):
	return
	with gzip.GzipFile(path, 'r') as f:
	loglevel_dict = json.loads(f.read().decode('utf-8'))

	for name, level in loglevel_dict.items():
	logger = logging.getLogger(name)
	logger.setLevel(level)

	handler = logging.StreamHandler(sys.stdout)
	formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
	handler.setFormatter(formatter)
	logging.getLogger('').addHandler(handler)

	@pytest.mark.skipif(sys.version_info >= (3, 0), reason="requires python 2")
	def test_to_tuple_list_key_not_tuple(obj):
	obj.add('A', (12, )) # the keys are not a tuple
	obj.add(2, (20, )) #
	assert [(2, 20), ('A', 12)] == obj.to_tuple_list()

	# assert '' == ret.stderr
	## commented out because of "RuntimeWarning" in travis
	## https://travis-ci.org/alphatwirl/alphatwirl/jobs/409237073
	## E AssertionError: assert '' == '/home/travis/miniconda/envs...ok( name,
	## args, *kwds )\n'
	## E + /home/travis/miniconda/envs/testenv/lib/ROOT.py:301: RuntimeWarning:
	## numpy.dtype size changed, may indicate binary incompatibility. Expected
	## 96, got 88

	events = self.build_events()
	self.nevents = len(events)
	self._report_progress(0)
	self.reader.begin(events)

	if self.config['check_files']:
	paths = self._verify_files(paths, self.config['skip_error_files'])
	chain = ROOT.TChain(self.config['tree_name'])
	for path in paths:
	chain.Add(path)

	def _run_task(self, package):
	try:
	result = package.task(progressReporter = self.progressReporter, package.args, *package.kwargs)
	except TypeError:
	result = package.task(package.args, *package.kwargs)
	return result

	resultsCombinationMethod = ToTupleListWithDatasetColumn(
	summaryColumnNames = tblcfg['keyOutColumnNames'] + tblcfg['valOutColumnNames']
	)

	def __call__(self):
	self.reader.begin()
	for dataset in self.datasets:
	self.reader.read(dataset)

	path = os.path.join(self.workingarea.path, 'reader.p.gz')
	with gzip.open(path, 'wb') as f:
	pickle.dump(self.reader, f, protocol=pickle.HIGHEST_PROTOCOL)

	return self.reader.end()

	proc = subprocess.Popen(
	args,
	stdout=subprocess.PIPE if self.pipe else None,
	stderr=subprocess.PIPE if self.pipe else None,
	cwd=taskdir
	)
	self.running_procs.append(proc)

alphatwirl / alphatwirl Goto Github PK

alphatwirl's Introduction

Description

Quick start

Publication

License

Contact

alphatwirl's People

Contributors

Stargazers

Watchers

Forkers

alphatwirl's Issues

Release version

Issue

Description

Resolution

Problem

Solution

Specific example

example how to reproduce

the problem doesn't always happen

Recommend Projects

Recommend Topics

Recommend Org