Giter Site home page Giter Site logo

Maps about wren HOT 10 CLOSED

wren-lang avatar wren-lang commented on July 28, 2024
Maps

from wren.

Comments (10)

kmarekspartz avatar kmarekspartz commented on July 28, 2024

If keys are restricted to Strings, that greatly simplifies the implementation. The map need not be a hash map, at least not initially.

from wren.

munificent avatar munificent commented on July 28, 2024

Awesome tracking bug!

Syntax

Yes, that's exactly the syntax I had in mind. I don't believe it's ambiguous except for the corner case of an empty map in statement position. In that case, it actually doesn't even matter which one the compiler chooses since both an empty map and an empty block have no effect there.

One thing we'll have to keep in mind is that if we ever allow statements in single-expression blocks (in other words, make things like if an expression) then we'll have to make sure a {} in there does what the user wants and returns an empty map.

The other thing we'll have to be careful about is using identifiers as map keys: {foo: 123}. In JS, that's equivalent to {"foo": 123} but (assuming we want to allow non-string keys) I think in Wren that should be "look up the variable foo and use its value as the key".

Here's an idle thought that just popped into my head. We could also allow a tiny bit of syntax sugar where if you leave off : <value>, it's implicitly : null. That would us a cheap way of approximating sets: {a, b, c} is a map with three keys whose values are all null.

Hash function

I'm not picky or very well-versed in hash functions. In the past I've used FNV2, but I don't have a strong preference as long as it's simple and fast. I think we will want to do something similar to Lua where we don't walk the whole string. We don't want hashing to be O(n)!

One thing I do want to avoid is an identity hash value. Java et. al. rely on having a unique number with each object that they can use as its hash value if no other hash method is provided. But that ends up either wasting space (by storing it in every object) or ruling out copying GC (by using the object's address).

I don't want to do either of those, so we'll have to figure out what the default hash code is for all of the built-in types, and for instances of classes that don't override hash.

Another option is to only provide hash codes for built-in types (numbers, strings, lists, ranges, bools, and null) but not instances of classes. If you want your class to be usable as a map key, you have to provide your own hash implementation. This keeps things simple, but it has an unfortunate consequence: if you're using a class you didn't create and it doesn't implement hash, you're basically stuck.

Dart worked this way for a long time, and it was a huge pain in the ass and eventually the core library was changed to allow all types to be hashable. I think we'll end up wanting the same thing for Wren.

Hash randomization

In general, Wren—like Lua—isn't designed to run potentially-malicious code. We may go in that direction over time, but it adds a lot of complexity. Hash randomization falls under that. However, I don't want to rule it out later. We don't want to paint ourselves into a corner here.

I think we can punt on this for now without too many problems, though. We just have to be careful about how things like the iteration order of items in a map are specified.

Key types

I think we should support keys of arbitrary types. It does add complexity (see above), but it's incredibly useful. Brendan Eich and other JS folks regard this as a historical mistake in JavaScript. They're trying to remedy it by adding symbols.

from wren.

edsrzf avatar edsrzf commented on July 28, 2024

Java et. al. rely on having a unique number with each object that they can use as its hash value if no other hash method is provided. But that ends up either wasting space (by storing it in every object) or ruling out copying GC (by using the object's address).

Doesn't interoperating well with C already pretty much rule out a copying GC? This LuaJIT GC page says that a Lua GC must be non-copying "due to various constraints in the Lua/C API". I'm not sure what those constraints are; maybe it's possible to avoid them.

Assuming a copying GC is possible, that gives us some awkward choices with regards to lists and other mutable types. We can either:

  1. Say that mutable types have no hash method. (This is what Python does, and why it also has immutable tuples that can be used as dictionary keys.)
  2. Tell programmers "be careful" and give mutable types a hash method anyway. (Ruby seems to do this.)

2 seems better in the name of minimalism, but it makes me uncomfortable. I agree that maps should support arbitrary key types in general, though.

List equality is also currently based on identity, so if lists get a hash method that's not based on identity, equality should probably change.

from wren.

munificent avatar munificent commented on July 28, 2024

Doesn't interoperating well with C already pretty much rule out a copying GC?

Nope. All it means is that the C API can't return raw pointers to objects. If it returns some kind of abstract handle or, like Lua, generally doesn't return persistent references to objects at all, you're fine.

Tell programmers "be careful" and give mutable types a hash method anyway.

Yeah, I think we'll have to do that. It's a bit icky, but I think it works fine in practice.

List equality is also currently based on identity, so if lists get a hash method that's not based on identity, equality should probably change.

Yeah, we'll definitely have to keep that in mind.

from wren.

munificent avatar munificent commented on July 28, 2024

OK, I've done a bunch of prototyping and thinking and I think I have enough of an idea to start implementing this. The two hard questions are:

Problems

  1. What kind of keys are allowed?
  2. How are they hashed?

I would like to have a fully general answer here and say you can use any object as a key and it's up to the object's class to determine how it's hashed. That's what most of the "big" languages like Java and C# do.

However, this comes with real costs:

  1. If a class does not define a custom hash function, the language has to provide a reasonable default. I'm not aware of any way to do that that doesn't either spend extra memory in every object in the entire heap or prevent objects from being moved by the garbage collector.

    Right now, Wren does actually have some spare bits laying around we could use for this, but I don't want to count on that and rule out later memory optimizations.

  2. It means the hash table implementation has to call into Wren code. Wren (unlike some other language implementations) doesn't support native methods implemented in C calling methods implemented in Wren.

    That means we'd have to implement pretty much the whole map type in Wren. I did that this morning and it was a fun little exercise, but I'm not crazy about it. A pure C hash table is also needed (for symbol tables, globals, and eventually the module registry) and I don't want to have two implementations, one in each language. Also, a map written in Wren won't have anywhere near the performance of a pure C one.

  3. There's also the general high level problem of what happens when a user uses a mutable type as a hash key. The typical solution is "don't do that", but it's still a concern.

Proposal

Given that, here's the limited solution I'm thinking. By default, maps can only use immutable primitive objects as keys. That means null, bools, nums, strings, and ranges. Also, I think class objects. (We'd hash them by their name.) Everything else is a runtime error.

All of those types have obvious hash code implementations that don't require extra memory and can be implemented in C. They're also conveniently immutable, so this solves all of the above three problems in one fell swoop.

But what if I want to use other objects as keys?

The above limitation keeps the language and core library minimal which is always goal number one, but being general is also important. I have two ideas in mind:

  1. You can always implement your own map type in pure Wren. Since the [] and []= operators can be overloaded, this will be just as pleasant to use as a built-in one, except for the lack of literal support.

    We can do something similar to Sequence/List where we put a bunch of the map methods in an abstract base class that a user-defined hash table can inherit from to re-use.

  2. We might be able to add an optional "toKey" function to Map. I'll have to try this out to see if it works in practice, but I'm thinking you can optionally pass a function when you create a map:

    var map = new Map {|key| ... }

    Whenever an object is used as a key in the map, the function is called with that object. The function is responsible for returning a valid "primitive" key for the object: something that can be used directly as a map key.

    This optional layer would be implemented in Wren, since it's calling other Wren code, but it may be possible to layer things such that normal map usage goes straight to the C code.

Prior art

JavaScript only allows strings for object keys. ES6 is adding a separate Map type but still uses built-in equality semantics for the keys.

Lua does allow objects to override == (the __eq metamethod), but still uses its own built-in hashing. This means you can have two objects where a == b is true, but if you use them both as hash keys, they will act like distinct keys (!).

Guile has something similar to my last suggestion where you can optionally provide a hash function when creating a hash table. That function has to return a numeric hash code. Otherwise, it uses some built-in hashing behavior

I think I'm happy enough with this to start hacking on it, but I would definitely appreciate feedback.

from wren.

munificent avatar munificent commented on July 28, 2024

I made some progress on this. It'll take a few patches before it's all in good shape, so I'm tracking it on this branch: https://github.com/munificent/wren/tree/maps

from wren.

MarcoLizza avatar MarcoLizza commented on July 28, 2024

Awesome! Didn't find the time since yesterday to comment the proposal... and you already implemented it! 👍

I'm gonna try and use it at once!

from wren.

munificent avatar munificent commented on July 28, 2024

Give me another day and I should have it all merged into master. :)

from wren.

munificent avatar munificent commented on July 28, 2024

And by another day, I mean "35 minutes". This is in master now! I haven't done the set syntax sugar I mentioned, but I pulled that out to a separate bug (#137). This one is done!

We probably will need to tweak this once we have larger real world problems (right now it just does linear probing), but it's a good first pass and has the semantics I want.

from wren.

MarcoLizza avatar MarcoLizza commented on July 28, 2024

Good greif! Can't go and rest some time after putting my daughter to sleep that I found you already merged the branch! 💨

Gonna give the Map some test and let you know.

from wren.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.