PAttern MIning (PAMI) is a Python library containing several algorithms to discover user interest-based patterns in a wide-spectrum of datasets across multiple computing platforms. Useful links to utilize the services of this library were provided below:
Version 2023.07.07: New algorithms: cuApriroi, cuAprioriBit, cuEclat, cuEclatBit, gPPMiner, cuGPFMiner, FPStream, HUPMS, SHUPGrowth New codes to generate synthetic databases
Version 2023.06.20: Fuzzy Partial Periodic, Periodic Patterns in High Utility, Code Documentation, help() function Update
Version 2023.03.01: prefixSpan and SPADE
Total number of algorithms: 83
Features
β Well-tested and production-ready
π Highly optimized to our best effort, light-weight, and energy-efficient
π Proper code documentation
πΌ Ample examples of using various algorithms at ./notebooks folder
π€ Works with AI libraries such as TensorFlow, PyTorch, and sklearn.
β‘οΈ Supports Cuda and PySpark
π₯οΈ Operating System Independence
π¬ Knowledge discovery in static data and streams
π Snappy
π» Ease of use
Maintenance
Installation
Installing basic pami package (recommended)
pip install pami
Installing pami package in a GPU machine that supports CUDA
pip install 'pami[gpu]'
Installing pami package in a distributed network environment supporting Spark
pip install 'pami[spark]'
Installing pami package for developing purpose
pip install 'pami[dev]'
Installing complete Library of pami
pip install 'pami[all]'
Upgradation
pip install --upgrade pami
Uninstallation
pip uninstall pami
Information
pip show pami
Try your first PAMI program
$ python
# first import pami fromPAMI.frequentPattern.basicimportFPGrowthasalgfileURL="https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv"minSup=300obj=alg.FPGrowth(iFile=fileURL, minSup=minSup, sep='\t')
obj.startMine()
obj.save('frequentPatternsAtMinSupCount300.txt')
frequentPatternsDF=obj.getPatternsAsDataFrame()
print('Total No of patterns: '+str(len(frequentPatternsDF))) #print the total number of patternsprint('Runtime: '+str(obj.getRuntime())) #measure the runtimeprint('Memory (RSS): '+str(obj.getMemoryRSS()))
print('Memory (USS): '+str(obj.getMemoryUSS()))
Output:
Frequent patterns were generated successfully using frequentPatternGrowth algorithm
Total No of patterns: 4540
Runtime: 8.749667644500732
Memory (RSS): 522911744
Memory (USS): 475353088
Evaluation:
we compared three different Python libraries such as PAMI, mlxtend and efficient-apriori for Apriori.
(Transactional_T10I4D100K.csv)is a transactional database downloaded from PAMI and
used as an input file for all libraries.
Minimum support values and seperator are also same.
The performance of the Apriori algorithm is shown in the graphical results below:
Comparing the Patterns Generated by different Python libraries for the Apriori algorithm:
Evaluating the Runtime of the Apriori algorithm across different Python libraries:
Comparing the Memory Consumption of the Apriori algorithm across different Python libraries:
For more information, we have uploaded the evaluation file in two formats:
The idea and motivation to develop PAMI was from Kitsuregawa Lab at the University of Tokyo. Work on PAMI started at University of Aizu in 2020 and
has been under active development since then.
Getting Help
For any queries, the best place to go to is Github Issues GithubIssues.
Discussion and Development
In our GitHub repository, the primary platform for discussing development-related matters is the university lab. We encourage our team members and contributors to utilize this platform for a wide range of discussions, including bug reports, feature requests, design decisions, and implementation details.
Contribution to PAMI
We invite and encourage all community members to contribute, report bugs, fix bugs, enhance documentation, propose improvements, and share their creative ideas.
Tutorials
0. Association Rule Mining
Basic
Confidence
Lift
Leverage
1. Pattern mining in binary transactional databases
I'm looking at various repos for the purpose of frequent pattern mining. I found this repo in this article, and I think the repo can be added to this topic for more visibility.
Hey there, nice stuff so far.
I am a bit confused as to why there is no (convenient/clear) option to acquire the assotiation rules and rank then according to lift after running a basic frequent patern mining algorithm. Instead its hidden inside a separate class that spceifically creates association rules rather than it being an extension to any algorithm run. Could you consider adapting this to become a general method of the basic pattern miners?
Dear Sir,
I am encountering an indentation error when importing this code. Specifically, there is an indentation issue in the block of code for createTransactional, after the else statement. I kindly request your assistance in resolving this matter promptly.
Thank you for your attention to this matter.
Sincerely,
Ashutosh Kumar
def createTransactional(self, outputFile):
"""
:Description: Create transactional data base
:param outputFile: str :
Write transactional data base into outputFile
"""
self.outputFile = outputFile
with open(outputFile, 'w') as f:
if self.condition not in condition_operator:
print('Condition error')
else:
for tid in self.tids:
transaction = [item for item in self.items if condition_operator[self.condition](self.inputDF.at[tid, item], self.thresholdValue)]
if len(transaction) > 1:
f.write(f'{transaction[0]}')
for item in transaction[1:]:
f.write(f'\t{item}')
elif len(transaction) == 1:
f.write(f'{transaction[0]}')
else:
continue
f.write('\n')
Hello, I am a researcher that recently encountered a problem which requires me to use sequence pattern mining algorithm, so I found this package which is perfect. However, I still have some issues using it because there is too little information and documentation on this project, I don't know how to do the visualization and how to switch algorithms. It would be great if there is more manual, tutorial, etc.
Hi, thank you for developing such a wonderful open-source library for Pattern Mining.
I am using the FPFP algorithm and face some problems:
The data format from the doc (https://udayrage.github.io/PAMI/fuzzyPeriodicFrequentPatternMining.html) does not work
Particularly, for example, each row (transaction) from the website only has 1 colon(:) for separating between item and fuzzy value. However, with this format, the algorithm return an error (which inferred as an additional colon (:) is needed)
....
I have also read your paper (*) and fuzzying values from transactional database as written but it seems not right to your implemented algorithm (as I inspect to the code).
Can you please provide me the correct format of the data for this FPFP algorithm as well as explaination for how to create that format with a simple example?
Thanks in advance
(*) Kiran, R. Uday, et al. "Discovering fuzzy periodic-frequent patterns in quantitative temporal databases." 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2020.
Thanks for developing this great library! can we use categorical data for the temporal database scenario? looking at the example databases, can we use only numeric data variables for all the algorithms?
In this line, it is expected to have a column named "tid." However, the documentation does not mention anything about it, does it? The documentation states: inputDataFrame - the dataframe that needs to be converted into a database.
Furthermore, in the following line, the items are taken from the first column. Is this because it assumes that column index 0 is the timestamp? If I manually remove the timestamp in the dataframe, I will be missing one column.
[/usr/local/lib/python3.10/dist-packages/PAMI/fuzzyCorrelatedPattern/basic/FCPGrowth.py] in _creatingItemSets(self)
420 parts = line.split(":")
421 items = parts[0].split()
--> 422 quantities = parts[2].split()
423 self._transactions.append([x for x in items])
424 self._fuzzyValues.append([x for x in quantities])
if name=="main":
_ap = str()
if len(_ab._sys.argv) == 7 or len(_ab._sys.argv) == 6:
if len(_ab._sys.argv) == 7:
_ap = CMine(_ab._sys.argv[1], _ab._sys.argv[3], _ab._sys.argv[4], _ab._sys.argv[5], _ab._sys.argv[6])
if len(_ab._sys.argv) == 6:
_ap = CMine(_ab._sys.argv[1], _ab._sys.argv[3], _ab._sys.argv[4], _ab._sys.argv[5])
_ap.startMine()
print("Total number of coverage Patterns:", len(_ap.getPatterns()))
_ap.save(_ab._sys.argv[2])
print("Total Memory in USS:", _ap.getMemoryUSS())
print("Total Memory in RSS", _ap.getMemoryRSS())
print("Total ExecutionTime in ms:", _ap.getRuntime())
else:
print("Error! The number of input parameters do not match the total number of parameters provided")
When trying to convert a sparse dataframe into a transactional database, through the code provided on link the following error appears : " AttributeError: module 'PAMI.extras.DF2DB.sparseDF2DB' has no attribute 'sparse2DB'. "
Firstly, I simply change the word sparse2DB to sparseDF2DB, but then a different error appears " ValueError: DataFrame constructor not properly called! "
My dataframe was already imported into the Jupyter notebook when I called it to the function, however, I also tried to save it and export it as an excel file and import it directly on the function, however, nothing worked and the error persisted.