Giter Site home page Giter Site logo

exascaleinfolab / resmerge Goto Github PK

View Code? Open in Web Editor NEW
2.0 7.0 0.0 90 KB

Resolution levels clustering merger with filtering and clusters deduplication. Flattens a hierarchy/list of multiple resolutions levels (clusterings) into the single flat clustering (collection), synchronizing the node base and deduplicating.

License: Apache License 2.0

Shell 0.25% C 34.98% C++ 60.54% Makefile 4.23%
clustering-evaluation data-cleaning

resmerge's Introduction

resmerge - Resolution Level Clusterings Merger with Filtering

Merges multiple clustering resolutions / hierarchy levels into the single flat collection, i.e. flattens the input hierarchy / resolutions specified by the files and / or directories. Also, can be used to clean up a single clustering (collection of clusters) deduplicating and optionally filtering out clusters and nodes by the specified criteria.
Only the unique clusters independent of the nodes order are saved into the output file with optional filtering by the clusters size and node base synchronization. The order of nodes of in the input clusters is retained. The execution is extremely fast, O(N).
resmerge is one of the utilities designed for the PyCaBeM clustering benchmark.

Author (c) Artem Lutov [email protected]

Content

Deployment

Requirements

There no any requirements for the execution or compilation except the standard C++ library.
However, to extend the input options and automatically regenerate the input parsing, gengetopt application should be installed: $ sudo apt-get install gengetopt.

For the prebuilt executable on Linux Ubuntu 16.04 x64: $ sudo apt-get install libstdc++6.

Compilation

Just execute $ make.
To update/extend the input parameters modify args.ggo and run GenerateArgparser.sh (calls gengetopt).

Build errors might occur if the default g++/gcc <= 5.x.
Then g++-5 should be installed and Makefile might need to be edited replacing g++, gcc with g++-5, gcc-5.

Usage

Execution Options:

$ ./resmerge -h
resmerge 1.2

Merge multiple clusterings (resolution/hierarchy levels) outputting only the
unique clusters with the optional their filtering by the size and nodes
filtering by the specified base.

Usage: resmerge [OPTIONS] clusterings...

  clusterings...  - clusterings specified by the given files and directories
(non-recursive traversing)

  -h, --help              Print help and exit
  -V, --version           Print version and exit
  -o, --output=STRING     output file name. If a single directory <dirname> is
                            specified then the default output file name is
                            <dirname>.cnl.
                            NOTE: the number of nodes is written to the output
                            file only if the node base synchronization is
                            applied, otherwise 0 is set
                            (default=`clusters.cnl')
  -r, --rewrite           rewrite already existing resulting file or skip the
                            processing  (default=off)
  -b, --btm-size=LONG     bottom margin of the cluster size to process
                            (default=`0')
  -t, --top-size=LONG     top margin of the cluster size to process
                            (default=`0')
  -m, --membership=FLOAT  average expected membership of the nodes in the
                            clusters, > 0, typically >= 1  (default=`1')

 Mode: sync
  Synchronize the node base of the merged clustering
  -s, --sync-base=STRING  synchronize node base with the specified collection

 Mode: exrtact
  Extract the node base from the specified clustering(s)
  -e, --extract-base      extract the node base from the clusterings instead of
                            merging the clusterings  (default=off)

Examples Merge clusterings (resolution levels) from the <dirname> to <dirname>.cnl:

$ ./resmerge  /opt/tests/tmp/resolutions

Deduplicate a single clustering:

$ ./resmerge communs/com-dblp.all.cmty.txt -o communs/com-dblp.all.cmty.dedub.cnl

Extract node base to <filename>_base.cnl:

./resmerge -e  /opt/tests/collection.cnl

Merge clusterings, synchronize the node base and output resulting flattened hierarchy/levels to the specified file:

$ ./resmerge -s /opt/tests/levels_nodebase.cnl -o /opt/tests/flatlevs_synced.cnl /opt/tests/levels/ /opt/tests/level_extra.cnl

Related Projects

  • Clubmark - A parallel isolation framework for benchmarking and profiling clustering (community detection) algorithms considering overlaps (covers).
  • xmeasures - Extrinsic quality (accuracy) measures evaluation for the overlapping clustering on large datasets: family of mean F1-Score (including clusters labeling), Omega Index (fuzzy version of the Adjusted Rand Index) and standard NMI (for non-overlapping clusters).
  • GenConvNMI - Overlapping NMI evaluation that is compatible with the original NMI and suitable for both overlapping and multi resolution (hierarchical) clustering evaluation.
  • ExecTime - A lightweight resource consumption profiler.

Note: Please, star this project if you use it.

resmerge's People

Contributors

luav avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.