Giter Site home page Giter Site logo

Comments (4)

nncarlson avatar nncarlson commented on September 24, 2024

I'm certainly open to it. The use case for what you describe sounds different than what I intended for the MD5/SHA1 cryptographic hashes in Petaca, but that's not to say it wouldn't be useful. The interface was designed to sequentially feed in arbitrary variables/arrays of intrinsic types (the components of a derived type, for example) in order to obtain a fingerprint of the data for comparison, in the same way one uses md5sum to get the checksum of a file. This is something I seldom use, but I have found it useful for verifying intermediate quantities between two versions of a code in debugging situations. It never occurred to me that one would use MD5/SHA1 as a hash function for hash tables. I agree that it is way too expensive for that and a different interface would be desired as well, I think. Regarding the unsigned integer issue, the md5/sha1 implementations rely on the same wrap-around behavior of 2's complement representation; requires avoiding certain run-time checking.

I use a hash table in one situation (out of necessity; if I understood them better I expect I'd use them more often). I've attached the hash function for it -- fibonacci I pulled out of Knuth's book. The use case is specialized, handling short integer arrays (length 2 to 4, the node indices of a face of a mesh cell), with the hash invariant with respect to permutation of the array elements.

facet_hash_type.txt

from petaca.

zbeekman avatar zbeekman commented on September 24, 2024

Ah, my apologies for miss-understanding the use case; I skimmed the petaca code a few weeks ago and was under the impression map_any_type used a hash table rather than a linked list for the associative array. I was working on a similar associative array implementation at the time, but have since then paused my efforts, at least until I can look at your implementation in greater detail; no point reinventing the wheel if I don't have to...

The ability to checksum data is extraordinarily useful, and if you can take advantage of cache-locality to perform this when reading or writing the data, even better! You certainly want to use MD5 or better for this use case. In the past I have used HDF5 to take advantage of this functionality as well as some of their built in compression and MPI-IO capabilities.

As an aside, it might be worthwhile considering using a hash table over a linked list for the map_any_type associative array, especially if the number of input parameters passed around with petaca is expected to be quite large, and if they are going to be retrieved randomly, rather than in the order they get put into the list.

I'll have to see if I can find more details about Knuth's fibonacci hash. His book is on my wish-list, but it's pretty expensive, so I haven't purchased it yet. Maybe I'll buy it once I'm at a new job.

Any way, thanks for the useful software, and instructive implementations of various algorithms in modern Fortran!

(Feel free to close this issue, since it seems that you use a SLL rather than a hash table for the associative array, and I was confused about the purpose of your hash functions)

from petaca.

nncarlson avatar nncarlson commented on September 24, 2024

Yeah, for map_any_type I've done just about the dumbest thing possible; I don't even think the linked-list is sorted. I've rationalized it by assuming the size will be small, accessed once, etc. But I think it would be great to have the internals redone to use a scalable algorithm, like the containers from Python or STL do -- it's just not one of my areas of expertise. I have other containers that I intend to move into Petaca that face the same issues. Now if someone could contribute in this area ...

from petaca.

zbeekman avatar zbeekman commented on September 24, 2024

@nncarlson should we close this issue, since I misunderstood your original usage of the hash functions? For fingerprinting data, I think MD5 would be safe enough, and maybe a little bit faster, but SHA is a good choice here, and better guarantees prevention of collisions... I have not started using Petaca in my own work, so have not had time or need to help out with this, but I haven't ruled it out for the future 😄

from petaca.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.