juliacollections / datastructures.jl Goto Github PK
View Code? Open in Web Editor NEWJulia implementation of Data structures
Home Page: https://juliacollections.github.io/DataStructures.jl/latest/
License: MIT License
Julia implementation of Data structures
Home Page: https://juliacollections.github.io/DataStructures.jl/latest/
License: MIT License
I ported this C red black tree library, (see here for a complete description), and I'd like to know if you're interested in adding it to this package.
https://github.com/pygy/RedBlackTrees.jl
It implements push!()
, delete!()
, in()
, size()
, length()
, show()
and the iterator protocol.
A tree is created using RedBlackTree(some_type)
. You can iterate backwards using for i in RedBlackTrees.Backward(tree) ...
show(tree)
only displays the type and size of the tree. You can see the content of the tree using show(tree.root)
.
On my machine, insertion beats push!(array, item); sort!(array)
from ~3000 items on.
I'll add the possibility to pass custom cmp()
and copy()
functions (copy
for node insertion). Currently, it uses Base.cmp
and doesn't make any copy.
There are probably some more things left to tweak. For example, according to @time
iteration allocates ~ 400 bytes per item in the tree, even though it shouldn't.
It's also the first time I use parametric types, and I may have overused parametric function definitions.
At last, I use an alias RBLNode Union(RBNode{T}, Nothing)
as the type for nodes, with Nothing representing leafs. It may or may not help to use an abstract type and have both RBNode and Leaf depend on it, but it would require some tweaks since the nodes are parametrized, I'd have to keep a typed leaf around, or create one each time I need one, which may slow things down.
It doesn't appear to be possible to have Deques (or data structures that they support) of abstract types. Is this by design?
julia> s=deque(Number)
Deque []
julia> push!(s, 3)
ERROR: no method push!(Deque{Number}, Int64)
PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.2) and the nightly build of the unstable version (0.3). The results of this script are used to generate a package listing enhanced with testing results.
Tests pass.
Tests fail, but package loads.
Tests pass.
means that PackageEvaluator found the tests for your package, executed them, and they all passed.
Tests fail, but package loads.
means that PackageEvaluator found the tests for your package, executed them, and they didn't pass. However, trying to load your package with using
worked.
This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.
Test log:
INFO: Cloning cache of DataStructures from git://github.com/JuliaLang/DataStructures.jl.git
INFO: Installing DataStructures v0.2.14
INFO: REQUIRE updated.
ERROR: wrong number of arguments
in include at boot.jl:238
at /home/idunning/pkgtest/.julia/v0.2/DataStructures/test/test_deque.jl:10
at /home/idunning/pkgtest/.julia/v0.2/DataStructures/runtests.jl:18
INFO: REQUIRE updated.
From the examples:
julia> a = DisjointSets{String}(["a", "b", "c", "d"])
DisjointSets{String}(["c"=>3,"b"=>2,"a"=>1,"d"=>4],IntDisjointSets([1,2,3,4],[0,0,0,0],4))
julia> union!(a, "a", "b")
ERROR: `union!` has no method matching union!(::DisjointSets{String}, ::ASCIIString, ::ASCIIString)
Any reason why the tree.jl types are not currently being exported?
PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.
Tests pass.
Package doesn't load.
Tests pass.
means that PackageEvaluator found the tests for your package, executed them, and they all passed.
Package doesn't load.
means that PackageEvaluator did not find tests for your package. Additionally, trying to load your package with using
failed.
Special message from @IainNZ: This change may be due to JuliaLang/julia#8712.
This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.
Test log:
>>> 'Pkg.add("DataStructures")' log
INFO: Installing DataStructures v0.3.4
INFO: Package database updated
>>> 'using DataStructures' log
ERROR: K not defined
in include at ./boot.jl:242
in include_from_node1 at ./loading.jl:128
in include at ./boot.jl:242
in include_from_node1 at ./loading.jl:128
in reload_path at ./loading.jl:152
in _require at ./loading.jl:67
in require at ./loading.jl:52
in require_3B_3948 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
in include at ./boot.jl:242
in include_from_node1 at loading.jl:128
in process_options at ./client.jl:293
in _start at ./client.jl:362
in _start_3B_3776 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.4/DataStructures/src/defaultdict.jl, in expression starting on line 17
while loading /home/idunning/pkgtest/.julia/v0.4/DataStructures/src/DataStructures.jl, in expression starting on line 46
while loading /home/idunning/pkgtest/.julia/v0.4/DataStructures/testusing.jl, in expression starting on line 2
Julia Version 0.4.0-dev+1249
Commit 23a9373 (2014-10-23 04:46 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
>>> test log
no tests to run
>>> end of log
The root of this problem likely has nothing to do with images, but:
img = Image(fill(Color.color("blue"), (3,3)))
counter([img, img])
yields
Accumulator{RGB{Float64},Int64}([RGB{Float64}(0.0,0.0,1.0)=>18])
It seems to iterate over the individual pixels, rather than over the images. Dicts do not have this issue, as I can build a counter with d = DefaultDict(0); for im=[img, img] d[im]+=1 end
and it works fine.
When you call find_root
on a DisjointSets
object, it returns the internal integer representative for the element. This breaks the rest of the API which assumes that the input is the original element. For example,
julia> using DataStructures
julia> S = DisjointSets{Symbol}([:a,:b,:c])
DisjointSets{Symbol}([:b=>2,:a=>1,:c=>3],IntDisjointSets([1,2,3],[0,0,0],3))
julia> r1 = find_root(S,:a)
1
julia> r2 = find_root(S,:b)
2
julia> union!(S,r1,r2)
ERROR: no method union!(DisjointSets{Symbol}, Int64, Int64)
Is there a way to iterate through a SortedDict without causing memory allocation? (I'm using a SortedDict in an inner loop, and most of the time is spent in the next
function used for iterating.)
Here is a small example:
using DataStructures
function bench()
dict = SortedDict(["New York" => 1788, "Illinois" => 1818])
clear_malloc_data()
@time for trial in 1:10000000
counter = 0
for item in dict
counter += 1 # dummy operation for example purposes only
end
end
end
bench()
I'm running Julia 0.3.4 with --track-allocation=all
. I'm seeing allocations on the for item in dict
line in my script, and in the nexthelper
function in sortedDict.jl
. It seems that 320 bytes are allocated at every iteration.
If you think this is doable and point me in the right direction, I will consider contributing a pull request :)
I forked DataStructures.jl , checked out, and did
ravi@hod:~/juliastackhack/DataStructures.jl$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean
ravi@hod:~/juliastackhack/DataStructures.jl$ cd test
julia runtests.jl
/home/ravi/juliastackhack/DataStructures.jl/test/test_deque.jl ...
WARNING: module DataStructures should explicitly import < from Base
WARNING: module DataStructures should explicitly import <= from Base
/home/ravi/juliastackhack/DataStructures.jl/test/test_sortedcontainers.jl ...
ERROR: LoadError: LoadError: UndefVarError: pastendsemitoken not defined
in test1 at /home/ravi/juliastackhack/DataStructures.jl/test/test_sortedcontainers.jl:281
in include at ./boot.jl:254
in include_from_node1 at ./loading.jl:263
in anonymous at no file:19
in include at ./boot.jl:254
in include_from_node1 at ./loading.jl:263
in process_options at ./client.jl:308
in _start at ./client.jl:411
while loading /home/ravi/juliastackhack/DataStructures.jl/test/test_sortedcontainers.jl, in expression starting on line 1529
while loading /home/ravi/juliastackhack/DataStructures.jl/test/runtests.jl, in expression starting on line 16
The two WARNING s seem worthy of being looked at as well. Please let me know if I'm doing something incorrectly. AFAIK this is the way to run tests for Julia packages.
I want to ask you if the difference that you mention about the difference in performance for Deque and Vector is noticeably when we are working with a small Deque (Vector) - let's say with size much less than 2K -.
Hi, I'm getting the following error on master
:
julia> d = Deque{Int}();
julia> push!(d,4);
julia> empty!(d)
ERROR: h not defined
in empty! at /home/mike/.julia/v0.3/DataStructures/src/deque.jl:183
Looks like it's just some code that got left behind during refactoring. What's the general approach to submitting PRs for packages? Is Pkg.submit
the right way to do it for something simple like this?
Is this sth that should be fixed (or can be reasonably fixed) in the package or should this wait for better solution for method ambiguity landed in the language?
julia> using DataStructures
Warning: New definition
serialize(Any, DataStructures.HashDict) at /usr/share/julia/site/v0.4/DataStructures/src/hashdict.jl:91
is ambiguous with:
serialize(Base.SerializationState, Any) at serialize.jl:411.
To fix, define
serialize(Base.SerializationState, DataStructures.HashDict)
before the new definition.
Warning: New definition
serialize(Any, DataStructures.HashDict) at /usr/share/julia/site/v0.4/DataStructures/src/hashdict.jl:91
is ambiguous with:
serialize(Base.IO, Any) at serialize.jl:433.
To fix, define
serialize(Base.IO, DataStructures.HashDict)
before the new definition.
Warning: New definition
deserialize(Any, Type{DataStructures.HashDict{#K<:Any, #V<:Any, #O<:Any}}) at /usr/share/julia/site/v0.4/DataStructures/src/hashdict.jl:100
is ambiguous with:
deserialize(Base.SerializationState, DataType) at serialize.jl:666.
To fix, define
deserialize(Base.SerializationState, Type{DataStructures.HashDict{#K<:Any, #V<:Any, #O<:Any}})
before the new definition.
hi. obviously this is just a suggestion, and i am very grateful for this library, but it seems to me that things like Stack and Queue should have types that show their contents, not their implementation. the end user, reading the code, "wants" to see what the stack contains, not how it is implemented.
so, without thinking through any consequences, i'd suggest something more like:
immutable Stack{S}
stack::UnderlyingType{S}
end
rather than the stack{S}
you have at the moment.
I'm looking for a weak object-id dict for JuliaLang/julia/pull/5572. There is one implementation here: JuliaLang/julia/issues/3002. One feature needed is that it also works with immutable keys.
A potentially general way to go about this is to add two features to HashDict:
add_weak_key
function in Base does.)hashindex
function. For object-id it would look:hashindex(key, sz) = (int(hash(object_id(key))) & (sz-1)) + 1
I'm not sure how to go about this: The usual way would be to use dispatch on the functions: hashindex(::Associative, key, sz) = (int(hash(key)) & (sz-1)) + 1
. However, because HashDict
is wrapped this does not work (or at least I cannot see how). Any ideas on how to go about this?
PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.
Tests pass.
Tests fail, but package loads.
Tests pass.
means that PackageEvaluator found the tests for your package, executed them, and they all passed.
Tests fail, but package loads.
means that PackageEvaluator found the tests for your package, executed them, and they didn't pass. However, trying to load your package with using
worked.
This error on Julia 0.4 is possibly due to recently merged pull request JuliaLang/julia#8420.
This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.
Test log:
>>> 'Pkg.add("DataStructures")' log
INFO: Installing DataStructures v0.3.2
INFO: Package database updated
>>> 'using DataStructures' log
Julia Version 0.4.0-dev+842
Commit e5d8c1a (2014-09-29 06:50 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
>>> test log
ERROR: test error in expression: d['c'] == 1
InexactError()
in ht_keyindex2 at /home/idunning/pkgtest/.julia/v0.4/DataStructures/src/hashdict.jl:296
in get! at /home/idunning/pkgtest/.julia/v0.4/DataStructures/src/hashdict.jl:385
in getindex at /home/idunning/pkgtest/.julia/v0.4/DataStructures/src/delegate.jl:11
in anonymous at test.jl:83
in do_test at test.jl:47
in include at ./boot.jl:245
in include_from_node1 at ./loading.jl:128
in anonymous at no file:17
in include at ./boot.jl:245
in include_from_node1 at loading.jl:128
in process_options at ./client.jl:285
in _start at ./client.jl:354
in _start_3B_3625 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.4/DataStructures/test/test_defaultdict.jl, in expression starting on line 16
while loading /home/idunning/pkgtest/.julia/v0.4/DataStructures/runtests.jl, in expression starting on line 14
test/test_deque.jl ...
test/test_stack_and_queue.jl ...
test/test_accumulator.jl ...
test/test_classifiedcollections.jl ...
test/test_disjoint_set.jl ...
test/test_binheap.jl ...
test/test_mutable_binheap.jl ...
test/test_defaultdict.jl ...
>>> end of log
Feature requested by Ed. Scheinerman of Hopkins
(doing cool graph theory stuff with julia)
He offers a solution as well
using DataStructures
# This takes a DisjointSets object and returns its ground set
function ground_set{T}(DS::DisjointSets{T})
G = Set{T}()
for item in keys(DS.intmap)
push!(G,item)
end
return G
end
function set_of_sets{T}(DS::DisjointSets{T})
n = num_groups(DS)
GS = ground_set(DS)
# Get root elements
roots = Set{Int}()
for item in GS
r = find_root(DS,item)
push!(roots,r)
end
roots = collect(roots)
# Map root numbers to [1:n]
rootmap = Dict{Int,Int}()
for k=1:n
rootmap[roots[k]] = k
end
# Create an array of sets to hold the parts
parts = Array(Set{T},n)
for k=1:n
parts[k] = Set{T}()
end
# place each ground set item in its appropriate part
for item in GS
index = rootmap[find_root(DS,item)]
push!(parts[index], item)
end
# now take those parts and pack them into a set
P = Set{Set{T}}() # This is a set of sets
for item in parts
push!(P,item)
end
return P
end
Moved from JuliaLang/julia#7310
deserialize is extended in hashdict.jl [1] yet not imported [2]
crashes if julia -p N
Currently, you are providing an isequal
method for various types. In Julia 0.3, you will also need to provide ==
. See JuliaLang/julia#6833.
(In Julia 0.2, ==
called isequal
by default, so providing isequal
was enough. In Julia 0.3, however, these two functions are swapped: isequal
calls ==
by default. So overriding isequal
is not enough to make things like !=
or ==
work. In order to remain backward-compatible with Julia 0.2, you will need to provide both methods.)
in https://github.com/JuliaLang/DataStructures.jl/blob/master/src/hashdict.jl#L74 :
hashindex(key, sz) = (int(hash(key)) & (sz-1)) + 1
the index should be unsigned. this calculation can go wrong since hash(k) returned an unsigned.
additionally, sz is valid only when it is a power of two.
a possible fix i used is:
hashindex(key, sz) = (reinterpret(Int,(hash(key))) & (sz-1)) + 1
but others are possible.
It looks like that Nullable
has been pretty efficient.
I think using this new facility would make the implementation of some data structure more straightforward (e.g. Deque
, linked list, and various trees).
I haven't tested the code, but I'm in the process of implementing a better connected components algorithm for Images and was looking at IntDisjointSets. I'm going to create my own (which rather than using ranks always has the smallest label win), but while looking over the code I noticed a few things that look like bugs:
push!
does not increment s.ngroups
find_root
, union!
are not safe due to the @inbounds
in find_root_impl!
; what if I pass an x > length(parents)
?The code I'm writing implements the latter in the following way (feel free to borrow):
function find_root!(sets::DisjointMinSets, m::Integer)
p = sets.parents[m] # don't use @inbounds here, it might not be safe
if p != m
@inbounds sets.parents[m] = p = _find_root!(sets.parents, p)
end
p
end
# an unsafe variant of the above
function _find_root!(parents::Vector{Int}, m::Int)
@inbounds p = parents[m]
@inbounds if parents[p] != p
parents[x] = p = _find_root!(parents, p)
end
p
end
Also note this extends the path compression to the original element.
I kind of have an itch to dig into a new data structure implementation, so I'm wondering what could be most useful, some ideas are:
I think we could have an entire Trees.jl
package, if the right abstractions could be ironed out, but I'm not feeling quite that ambitious, so it'd be better if there's a specific type that would be deemed highest priority.
lobi@orange4:~/juliarepo$ ../julia04/julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "help()" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.4.0-dev (2015-05-14 00:12 UTC)
_/ |\__'_|_|_|\__'_| | Commit ff35d98 (0 days old master)
|__/ | x86_64-linux-gnu
julia> Pkg.status()
WARNING: unknown Clang commit c4d18994, metadata may be ahead of package cache
3 required packages:
- Clang 0.0.5+ master
- Gtk 0.8.0
- Winston 0.11.9
13 additional packages:
- BinDeps 0.3.12
- Cairo 0.2.26
- Color 0.4.5
- Compat 0.4.4
- Cxx 0.0.0- master (unregistered)
- DataStructures 0.3.9
- Docile 0.5.0
- FixedPointNumbers 0.0.7
- Graphics 0.1.0
- IniFile 0.2.4
- SHA 0.0.4
- Tk 0.3.3
- URIParser 0.0.5
julia> using DataStructures
Warning: Method definition call(Type{IPv4}, AbstractString) in module Base at socket.jl:35 overwritten in module Compat at /home/lobi/.julia/v0.4/Compat/src/Compat.jl:41.
Warning: Method definition call(Type{IPv6}, AbstractString) in module Base at socket.jl:78 overwritten in module Compat at /home/lobi/.julia/v0.4/Compat/src/Compat.jl:42.
Warning: Method definition isless(T<:Base.IPAddr, T<:Base.IPAddr) in module Base at socket.jl:6 overwritten in module Compat at /home/lobi/.julia/v0.4/Compat/src/Compat.jl:47.
Warning: Method definition call(Type{Dict{K,V}}, Any) in module Base at dict.jl:406 overwritten in module Compat at /home/lobi/.julia/v0.4/Compat/src/Compat.jl:59.
ERROR: LoadError: LoadError: TypeError: apply_type: in Type, expected Type{T}, got Tuple{TypeVar,TypeVar}
in include at ./boot.jl:252
in include_from_node1 at ./loading.jl:134
in reload_path at ./loading.jl:158
in _require at ./loading.jl:70
in require at ./loading.jl:56
in include at ./boot.jl:252
in include_from_node1 at ./loading.jl:134
in reload_path at ./loading.jl:158
in _require at ./loading.jl:70
in require at ./loading.jl:54
while loading /home/lobi/.julia/v0.4/Compat/src/Compat.jl, in expression starting on line 60
while loading /home/lobi/.julia/v0.4/DataStructures/src/DataStructures.jl, in expression starting on line 3
The following code demonstrates a bug in OrderedDict when you delete most of the elements. The issue is that _compact_order in hashdict.jl is not checking bounds in the first 2 loops.
using DataStructures
od = OrderedDict{Int,Int}()
od[1] = 2
ranges = [2:5,6:9,10:13]
for range in ranges
for i = range
od[i] = i+1
end
for i = range
delete!( od, i )
end
end
od[14]=15
When I run "typeof(collect(keys(SortedDict( Dict( ["a","b","c"], [1,2,3] ) ))))", I get "Array{Any,1}" whereas I would expect to get "Array{ASCIIString,1}" which is what "typeof(collect(keys(SortedDict( Dict( ["a","b","c"], [1,2,3] ) ))))" gives me.
I'm using Julia version 0.3.8 and seem to be on commit 79e517 of the DataStructures package.
Right now Deque works in for loops, but Stacks and Queues don't
using DataStructures
s = Stack(Int)
q = Queue(Int)
deq = Deque{Int}()
for x in deq println(x) end
works
but
for x in s println(x) end
ERROR: `start` has no method matching start(::Stack{Deque{Int64}})
in anonymous at no file
julia> for x in q println(x) end
ERROR: MethodError: `start` has no method matching start(::DataStructures.Queue{DataStructures.Deque{Int64}})
in anonymous at no file
I think this is because Stack and Queue, while building on Deque, don't redefine next() etc required for iteration. I thought they'd use the underlying Deque's itrerator protocol [1], but apparently not.
[1]which would still be wrong for one of Stack or Queue or both -if Deque uses random access for iteration - Stack iteration should be LIFO and Queue should have FIFO iteration
You can find this question on stack overflow : http://stackoverflow.com/questions/21926255/traverse-nested-dict-in-julia-lang
While I traverse a nested Dict in Julia, it gives this Error:
ERROR: access to undefined reference
in next at dict.jl:567
Here is the code where you can reproduce this error:
a = [0,19620,7291,32633,9,32513,42455,10045,31964,42455,11767,54]
b = [14318,16405,19,18913,19,8141,18958,12336,7,16588,17358,30]
d = Dict()
for aa in a
for bb in b
if ! haskey(d,aa)
d[aa]=Dict()
end
d[aa][bb] = 0.5
end
end
for k1 in keys(d)
s =0.0
for k2 in keys(d[k1])
s+= d[k1][k2]
end
for k2 in keys(d[k1])
d[k1][k2] = d[k1][k2] / s
end
end
Actually, as long as array b has 11 distinct elements, the error would happen. Also, if
d[k1][k2] = d[k1][k2] / s
become
d[k1][k2] = d[k1][k2] * s
or any other operations, the error disappear.
Any ideas ?
Following on from the removal of the Graphics
module from Base
, the Collections
module doesn't seem to belong there either, and (as far as I can tell) is not used elsewhere in Base
. I propose that the (nice) functionality be moved to the DataStructures
package.
On a related note, in PriorityQueue
, I suggest that enqueue!
and dequeue!
be renamed to push!
and pop!
for consistency, and that pop!
should return not only the value associated with the first key, but also the key itself, in a tuple. (Currently, if the key is needed, it must currently be duplicated inside the value!)
this should illustrate the issue:
julia> s = Stack(Int)
DataStructures.Stack{DataStructures.Deque{Int64}}(Deque [Int64[]])
julia> push!(s, 0)
DataStructures.Stack{DataStructures.Deque{Int64}}(Deque [[0]])
julia> top(s) += 1
ERROR: syntax: invalid assignment location
julia> a = Array(Int, 0)
0-element Array{Int64,1}
julia> push!(a, 1)
1-element Array{Int64,1}:
1
julia> a[end] += 1
2
julia> a
1-element Array{Int64,1}:
2
not pressing - going to switch to arrays instead of stacks.
I'm trying to load one of the default dictionaries on OS X into a Trie, but some keys seem to go missing, and I can't work out the pattern. Here's an example:
#pr-test.jl
using DataStructures
tokenDictionary = Trie{Nothing}()
for token in eachline(STDIN)
tokenDictionary[lowercase(strip(token))] = Nothing()
end
for k in sort(collect(keys(tokenDictionary)))
println(k)
end
Run with the following command:
cat /usr/share/dict/words \
| julia pr-test.jl \
| diff -i /usr/share/dict/words -
I'm using DataStructures 0.3.9
and Julia 0.3.8
. The diff
only shows keys missing from the Julia Trie, not the other way. Am I doing something wrong?
(I'm guessing.)
Leave them in, if @deprecate
depending on VERSION
, add missing methods?
For v0.3, the Set
interface has been changed, so that Set
s are now constructed from iterables. OrderedSet
should make the same change.
(Need to check if this is true for Dict
s as well.)
julia> a = [1,2,3]
3-element Array{Int64,1}:
1
2
3
julia> Set(a)
Set{Int64}({2,3,1})
julia> OrderedSet(a)
OrderedSet{Array{Int64,1}}([1,2,3])
This last one should be:
julia> OrderedSet(a)
OrderedSet{Int64}(1,2,3)
Assigning myself, but I may not get to it soon, so anyone else should feel free to change this. It should be an easy change.
This package provides important support to other packages (e.g. Graphs.jl
that is already under JuliaLang).
I believe it would be useful to move this to JuliaLang, such that more people can contribute to it easily, and this would be the place to look at when people want some data structures for their applications.
Related: https://github.com/kmsquire/OrderedCollections.jl/issues/1#issuecomment-32997226
PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.2) and the nightly build of the unstable version (0.3). The results of this script are used to generate a package listing enhanced with testing results.
Tests pass.
Tests fail, but package loads.
Tests pass.
means that PackageEvaluator found the tests for your package, executed them, and they all passed.
Tests fail, but package loads.
means that PackageEvaluator found the tests for your package, executed them, and they didn't pass. However, trying to load your package with using
worked.
This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.
Test log:
INFO: Installing DataStructures v0.3.0
INFO: Package database updated
Warning: could not import Base.add! into DataStructures
Warning: could not import Base.add! into DataStructures
ERROR: add! not defined
in include at ./boot.jl:245
in include_from_node1 at ./loading.jl:128
in anonymous at no file:17
in include at ./boot.jl:245
in include_from_node1 at loading.jl:128
in process_options at ./client.jl:285
in _start at ./client.jl:354
while loading /home/idunning/pkgtest/.julia/v0.3/DataStructures/test/test_accumulator.jl, in expression starting on line 13
while loading /home/idunning/pkgtest/.julia/v0.3/DataStructures/runtests.jl, in expression starting on line 14
INFO: Package database updated
Note this is possibly due to removal of deprecated functions in Julia 0.3-rc1: JuliaLang/julia#7609
PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.
Tests pass.
Tests fail, but package loads.
Tests pass.
means that PackageEvaluator found the tests for your package, executed them, and they all passed.
Tests fail, but package loads.
means that PackageEvaluator found the tests for your package, executed them, and they didn't pass. However, trying to load your package with using
worked.
This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.
Test log:
>>> 'Pkg.add("DataStructures")' log
INFO: Installing DataStructures v0.3.5
INFO: Package database updated
>>> 'using DataStructures' log
Julia Version 0.4.0-dev+1330
Commit 7fdc860 (2014-10-28 03:56 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
>>> test log
/home/idunning/pkgtest/.julia/v0.4/DataStructures/test/test_deque.jl ...
/home/idunning/pkgtest/.julia/v0.4/DataStructures/test/test_stack_and_queue.jl ...
/home/idunning/pkgtest/.julia/v0.4/DataStructures/test/test_accumulator.jl ...
/home/idunning/pkgtest/.julia/v0.4/DataStructures/test/test_classifiedcollections.jl ...
/home/idunning/pkgtest/.julia/v0.4/DataStructures/test/test_disjoint_set.jl ...
/home/idunning/pkgtest/.julia/v0.4/DataStructures/test/test_binheap.jl ...
/home/idunning/pkgtest/.julia/v0.4/DataStructures/test/test_mutable_binheap.jl ...
/home/idunning/pkgtest/.julia/v0.4/DataStructures/test/test_defaultdict.jl ...
ERROR: test error in expression: sort(collect(keys(d))) == ['a':'z']
`isless` has no method matching isless(::Char, ::Char)
in sort! at sort.jl:247
in sort! at sort.jl:291
in sort! at sort.jl:296
in sort! at sort.jl:332
in sort at sort.jl:337
in anonymous at test.jl:83
in do_test at test.jl:47
in include at ./boot.jl:242
in include_from_node1 at ./loading.jl:128
in anonymous at no file:17
in include at ./boot.jl:242
in include_from_node1 at loading.jl:128
in process_options at ./client.jl:293
in _start at ./client.jl:362
in _start_3B_3769 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.4/DataStructures/test/test_defaultdict.jl, in expression starting on line 45
while loading /home/idunning/pkgtest/.julia/v0.4/DataStructures/test/runtests.jl, in expression starting on line 14
INFO: Testing DataStructures
===========================[ ERROR: DataStructures ]============================
failed process: Process(`/home/idunning/julia04/usr/bin/julia /home/idunning/pkgtest/.julia/v0.4/DataStructures/test/runtests.jl`, ProcessExited(1)) [1]
================================================================================
INFO: No packages to install, update or remove
ERROR: DataStructures had test errors
in error at error.jl:21
in test at pkg/entry.jl:719
in anonymous at pkg/dir.jl:28
in cd at ./file.jl:20
in cd at pkg/dir.jl:28
in test at pkg.jl:68
in process_options at ./client.jl:221
in _start at ./client.jl:362
in _start_3B_3769 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
>>> end of log
I've been running tests like this:
julia test/runtests.jl
But when I add the line
@test true == false
into any one of the tests, everything passes.
Clearly, I'm a bit confused. Perhaps I'm running the tests incorrectly?
There are a couple of enhancements that may make heaps in this package more useful:
Ordering
(as is used by the Sort
machinery in Base
).The second proposal could be made possible by a specific Ordering
, but would be more efficient if it were coded for in the data structure.
It may make sense to code these as separate data structures (I haven't looked at the code), implementing the AbstractHeap
interface if that makes sense.
julia> d
DataStructures.OrderedDict{Char,Int64} with 1 entry:
'a' => 1
julia> c
DataStructures.OrderedDict{ASCIIString,Int64} with 2 entries:
"b" => 1
"d" => 4
julia> merge(d,c)
Dict{Any,Int64} with 3 entries:
"b" => 1
'a' => 1
"d" => 4
As the package grows and more data structures come in, it becomes cumbersome to use a single readme page to host all the documentation.
I propose to convert the documentation to Sphinx (similar to that in the Distributions package), and host the documentation on ReadTheDocs.
If there's no objection, I would make a PR later this week.
The current master has a bug in the routine "in(., smd)" where smd is a sortedmultidict. The bug is that the termination test of the loop is misplaced. This may lead to false negatives from the routine (i.e., (k,v) is actually in smd but the `in' routine reports that it is not). This 'in' routine should not be used until the next version of DataStructures.jl is released. The bug is fixed in the branch called 'fix_eltype_again', which hopefully will be merged with the master branch soon.
using DataStructures
set = OrderedSet{Int}()
push!(set,5)
push!(set,8)
push!(set,7)
push!(set,6)
for j in set
println(j)
end
outputs 5, 8, 7, 6.
I was quite surprised by this behavior, as I had assumed that the ordered set would be iterated through in an ordered manner.
Is there a preferred different way to iterate through the set?
I'm implementing suffix trees in Julia for fun and was wondering whether I should cleanup the code and give it a home here.
As was pointed out here, external constructors in Julia typically match the type name exactly, including case (with some exceptions).
I'd like to suggest the following deprecation/renames:
stack
-> Stack
queue
-> Queue
accumulator
-> Accumulator
Not as sure about these:
counter
-> Counter
binary_minheap
-> MinHeap
binary_maxheap
-> MaxHeap
mutable_binary_minheap
-> MutableMinHeap
mutable_binary_maxheap
-> MutableMaxHeap
For the second set of renamings, I'm aware of one precedent in Base
to have an external constructor with a different name (PipeBuffer
creates and IOBuffer
). But I'm not aware of others.
Thoughts? Worthwhile or good enough the way it is?
Shall we move examples/list.jl
and examples/tree.jl
to here from the main repo?
Just stumbled upon this comment in https://github.com/JuliaLang/DataStructures.jl/blob/master/src/defaultdict.jl#L69-L85:
# TODO: remove this if/when minimum julia version is 0.3 or greater
if !applicable(get!, (Dict,))
...
REQUIRE lists 0.3 as minimum version. Just FYI / to keep track of this.
Best shown with the example below.
julia> Set{(Symbol...)}([(:a,)])
Set((Symbol...,)[(:a,)])
julia> OrderedSet{(Symbol...)}([(:a,)])
ERROR: `convert` has no method matching convert(::Type{(Symbol...,)}, ::Array{(Symbol,),1})
Closest candidates are:
convert(::Type{T}, ::T)
convert(::(Type{T<:Top},Type{T<:Top}...), ::(Any,Any...))
convert(::Type{Ptr{Void}}, ::Array{T,N})
...
in setindex! at /usr/share/julia/site/v0.4/DataStructures.jl/src/hashdict.jl:366
in union! at /usr/share/julia/site/v0.4/DataStructures.jl/src/orderedset.jl:31
in call at /usr/share/julia/site/v0.4/DataStructures.jl/src/orderedset.jl:12
julia> Dict{Symbol, (Symbol...)}(:a => (:a,))
Dict{Symbol,(Symbol...,)} with 1 entry:
:a => (:a,)
julia> OrderedDict{Symbol, (Symbol...)}(:a => (:a,))
ERROR: `convert` has no method matching convert(::Type{DataStructures.OrderedDict{Symbol,(Symbol...,)}}, ::Pair{Symbol,(Symbol,)})
Closest candidates are:
convert(::Type{T}, ::T)
convert(::Type{Nullable{T}}, ::T)
convert(::Type{Scical.optics.Focus}, ::Any)
...
in call at ./base.jl:34
I've been using mutable_binary_minheap
for a Dijkstra calculation and am getting reasonable performance for betweenness centrality (which basically does all-pairs Dijkstra) in a large graph:
julia> g = readgraph("/Users/seth/Downloads/pgp2.jgz")
{39796, 301498} directed graph # {number of vertices, number of edges}
julia> @time z = betweenness_centrality(g);
1426 seconds (3768 M allocations: 328 GB, 21.91% gc time)
Realizing that I really didn't need the mutability (I'm just push!
ing and pop!
ping), I switched over to binary_minheap
, thinking that I'd get at least the same, but possibly better, performance. However:
julia> @time z = betweenness_centrality(g);
1743 seconds (8469 M allocations: 385 GB, 28.40% gc time
These results are repeatable (I ran each test 3 times).
Why would binary_minheap
be so much slower than its mutable counterpart, and why would it require more than twice the number of allocations (note that the amount of memory allocated is also different, but maybe not significantly so)?
Thanks for any assistance / insight.
Dear all,
there might be an issue in how OrderedDicts are handled by deepcopy:
julia> d1 = OrderedDict([("hi", 1)])
OrderedDict{ASCIIString,Int64} with 1 entry:
"hi" => 1
julia> d2 = deepcopy(d1)
OrderedDict{ASCIIString,Int64} with 1 entry:
"hi" => 1
julia> d2["hi"] = 2
2
julia> d1
OrderedDict{ASCIIString,Int64} with 1 entry:
"hi" => 2
I would expect a behaviour similar as this one:
julia> d1 = Dict([("hi", 1)])
Dict{ASCIIString,Int64} with 1 entry:
"hi" => 1
julia> d2 = deepcopy(d1)
Dict{ASCIIString,Int64} with 1 entry:
"hi" => 1
julia> d2["hi"] = 2
2
julia> d1
Dict{ASCIIString,Int64} with 1 entry:
"hi" => 1
I am using Julia 0.3.5 and DataStructures 0.3.5.
Let me know if you have an idea where the problem is or if you have an idea for a workaround. Thanks in advance!
Best,
Philipp.
Edit: Added formatting (kms)
`sizehint!` has no method matching sizehint!(::DataStructures.OrderedDict{ASCIIString,ASCIIString}, ::Int64)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.