Giter Site home page Giter Site logo

lanl / t-elf Goto Github PK

View Code? Open in Web Editor NEW
6.0 7.0 1.0 25.96 MB

Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped with automatic model determination (also known as the estimation of latent factors - rank) for accurate data modeling. Our software suite encompasses cutting-edge data pre-processing and post-processing modules.

Home Page: https://lanl.github.io/T-ELF/

License: Other

Python 100.00%
blind-source-separation dimensionality-reduction feature-extraction gpu high-performance-computing hpc latent-variables machine-learning matrix matrix-factorization

t-elf's Introduction

Tensor Extraction of Latent Features (T-ELF)

Build Status License Python Version DOI

T-ELF is one of the machine learning software packages developed as part of the R&D 100 winning SmartTensors AI project at Los Alamos National Laboratory (LANL). T-ELF presents an array of customizable software solutions crafted for analysis of datasets. Acting as a comprehensive toolbox, T-ELF specializes in data pre-processing, extraction of latent features, and structuring results to facilitate informed decision-making. Leveraging high-performance computing and cutting-edge GPU architectures, our toolbox is optimized for analyzing large datasets from diverse set of problems.

Central to T-ELF's core capabilities lie non-negative matrix and tensor factorization solutions for discovering multi-faceted hidden details in data, featuring automated model determination facilitating the estimation of latent factors or rank. This pivotal functionality ensures precise data modeling and the extraction of concealed patterns. Additionally, our software suite incorporates cutting-edge modules for both pre-processing and post-processing of data, tailored for diverse tasks including text mining, Natural Language Processing, and robust tools for matrix and tensor analysis and construction.

T-ELF's adaptability spans across a multitude of disciplines, positioning it as a robust AI and data analytics solution. Its proven efficacy extends across various fields such as Large-scale Text Mining, High Performance Computing, Computer Security, Applied Mathematics, Dynamic Networks and Ranking, Biology, Material Science, Medicine, Chemistry, Data Compression, Climate Studies, Relational Databases, Data Privacy, Economy, and Agriculture.

Installation

Step 1: Install the Library

Option 1: Install via PIP

conda create --name TELF python=3.11.5
source activate TELF # or <conda activate TELF>
pip install git+https://github.com/lanl/T-ELF.git

Option 2: Install from Source

git clone https://github.com/lanl/T-ELF.git
cd T-ELF
conda create --name TELF python=3.11.5
source activate TELF # or <conda activate TELF>
pip install -e . # or <python setup.py install>

Option 3: Install via Conda

git clone https://github.com/lanl/T-ELF.git
cd T-ELF
conda env create --file environment_gpu.yml # use <conda env create --file environment_cpu.yml> for CPU only
conda activate TELF_conda
conda develop .

Step 2: Install Spacy NLP model and NLTK Packages

python -m spacy download en_core_web_lg
python -m spacy download en_core_web_trf
python -m nltk.downloader wordnet omw-1.4

Step 3: Install Cupy if using GPU (Optional - Skip if used Option 3 in Step 1)

conda install -c conda-forge cupy

Step 4: Install MPI if using HPC (Optional)

module load <openmpi> # On a HPC Node
pip install mpi4py # or <conda install -c conda-forge mpi4py> depending on the system

Jupyter Setup Tutorial for using the examples (Link)

Other Considerations

On some Linux devices, based on how CUDA was configured, you may get an error when using a GPU. Install cudatoolkit to resolve the error:

conda install cudatoolkit
conda install cudnn

Capabilities

Please see our 📃 Publications for the capabilities

Modules

TELF.factorization

Method Dense Sparse GPU CPU Multiprocessing HPC Description Example Release Status
NMFk ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ NMF with Automatic Model Determination Link
Custom NMFk ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Use Custom NMF Functions with NMFk Link
TriNMFk ✔️ ✔️ ✔️ ✔️ ✔️ NMF with Automatic Model Determination for Clusters and Patterns Link
RESCALk ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ RESCAL with Automatic Model Determination Link
RNMFk ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Recommender NMFk Link
SymNMFk ✔️ ✔️ ✔️ ✔️ ✔️ NMFk with Symmetric Clustering Link
WNMFk ✔️ ✔️ ✔️ ✔️ ✔️ NMFk with weighting - used for recommendation system Link
BNMFk Boolean NMFk 🔜
HNMFk Hierarchical NMFk 🔜
SPLIT NMFk Joint NMFk factorization of multiple data via SPLIT 🔜
SPLIT Transfer Classifier Supervised transfer learning method via SPLIT and NMFk 🔜
CP-ALS Alternating least squares algorithm for canonical polyadic decomposition 🔜
CP-APR Alternating Poisson regression algorithm for canonical polyadic decomposition 🔜
NTDS_FAPG Non-negative Tucker Tensor Decomposition 🔜

TELF.pre_processing

Method Multiprocessing HPC Description Example Release Status
Vulture ✔️ ✔️ Advanced text processing tool for cleaning and NLP Link
Beaver ✔️ ✔️ Fast matrix and tensor building tool for text mining Link
iPenguin Online Semantic Scholar information retrieval tool 🔜
Orca Duplicate author detector for text mining and information retrival 🔜

TELF.post_processing

Method Description Example Release Status
Peacock Data visualization and generation of actionable statistics 🔜
Wolf Graph centrality and ranking tool 🔜
Fox Report generation tool for text data 🔜
SeaLion Generic report generation tool 🔜

TELF.applications

Method Description Example Release Status
Cheetah Fast search by keywords and phrases Link
Bunny Dataset generation tool for documents and their citations/references 🔜
Termite Knowladge graph building tool 🔜

How to Cite T-ELF?

If you use T-ELF please cite.

APA:

Eren, M., Solovyev, N., Barron, R., Bhattarai, M., Truong, D., Boureima, I., Skau, E., Rasmussen, K., & Alexandrov, B. (2023). Tensor Extraction of Latent Features (T-ELF) (Version 0.0.12) [Computer software]. https://doi.org/10.5281/zenodo.10257897

BibTeX:

@software{TELF,
  author = {Eren, Maksim and Solovyev, Nick and Barron, Ryan and Bhattarai, Manish and Truong, Duc and Boureima, Ismael and Skau, Erik and Rasmussen, Kim and Alexandrov, Boian},
  month = oct,
  title = {{Tensor Extraction of Latent Features (T-ELF)}},
  url = {https://github.com/lanl/T-ELF},
  doi = {10.5281/zenodo.10257897},
  year = {2023}
}

Authors

  • Maksim Ekin Eren: Advanced Research in Cyber Systems, Los Alamos National Laboratory (Website)
  • Nicholas Solovyev: Theoretical Division, Los Alamos National Laboratory
  • Ryan Barron: Theoretical Division, Los Alamos National Laboratory
  • Manish Bhattarai: Theoretical Division, Los Alamos National Laboratory
  • Duc Truong: Theoretical Division, Los Alamos National Laboratory
  • Ismael Boureima: Theoretical Division, Los Alamos National Laboratory
  • Erik Skau: Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory
  • Kim Rasmussen: Theoretical Division, Los Alamos National Laboratory
  • Boian S. Alexandrov: Theoretical Division, Los Alamos National Laboratory

Patents

Boian ALEXANDROV, o. S. F., New Mexico, Maksim Ekin EREN, of Sante Fe, New Mexico, Manish BHATTARAI, of Albuquerque, New Mexico, Kim Orskov RASMUSSEN of Sante Fe, New Mexico, and Charles K. NICHOLAS, of Columbia, Maryland, (“Assignor”) DATA IDENTIFICATION AND CLASSIFICATION METHOD, APPARATUS, AND SYSTEM. No. 63/472,188. Triad National Security, LLC. (June 9, 2023).

BS. Alexandrov, LB. Alexandrov, and VG. Stanev et al. 2020. Source identification by non-negative matrix factorization combined with semi-supervised clustering. US Patent S10,776,718 (2020).

Copyright Notice

© 2022. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.

LANL C Number: C22048

License

This program is open source under the BSD-3 License. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Developer Test Suite

Developer test suites are located under tests/ directory. Tests can be ran from this folder using python -m pytest *.

t-elf's People

Contributors

barronlanl avatar maksimekin avatar ryancb4 avatar solonick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

t-elf's Issues

Error in plot_NMFk.plot_consensus_mat: ValueError: Argument must be an image or collection in this Axes matplotlib

Irregular error when plotting consensus matrix in TELF.factorization.utilities.plot_NMFk.plot_consesnsus_mat at plt.imshow(C).

Error: ValueError: Argument must be an image or collection in this Axes matplotlib

I believe might be caused by some issue in matplotlib. Added temporary hot-fix where plotting is skipped. If we re-run the code the missing plots are generated without error.

Replicate:

import os
os.environ["OMP_NUM_THREADS"] = "1" # export OMP_NUM_THREADS=1
os.environ["OPENBLAS_NUM_THREADS"] = "1" # export OPENBLAS_NUM_THREADS=1
os.environ["MKL_NUM_THREADS"] = "1" # export MKL_NUM_THREADS=1
os.environ["VECLIB_MAXIMUM_THREADS"] = "1" # export VECLIB_MAXIMUM_THREADS=1
os.environ["NUMEXPR_NUM_THREADS"] = "1" # export NUMEXPR_NUM_THREADS=1
from TELF.factorization import NMFk
import sys; sys.path.append("../../scripts/")
from generate_X import gen_data,gen_data_sparse

X = gen_data(R=4, shape=[100, 200])["X"]

params = {
    "n_perturbs":12,
    "n_iters":100,
    "epsilon":0.015,
    "n_jobs":-1,
    "init":"nnsvd", 
    "use_gpu":False,
    "save_path":"../../results/", 
    "save_output":True,
    "collect_output":True,
    "predict_k_method":"sill",
    "verbose":True,
    "nmf_verbose":False,
    "transpose":False,
    "sill_thresh":0.8,
    "pruned":True,
    'nmf_method':'nmf_fro_mu', # nmf_fro_mu, nmf_recommender
    "calculate_error":True,
    "predict_k":True,
    "use_consensus_stopping":0,
    "calculate_pac":True,
    "consensus_mat":True,
    "perturb_type":"uniform",
    "perturb_multiprocessing":False,
    "perturb_verbose":False,
    "simple_plot":True
}
Ks = range(1,9,1)
name = "Example_NMFk"
note = "This is an example run of NMFk"

model = NMFk(**params)
results = model.fit(X, Ks, name, note)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.