Giter Site home page Giter Site logo

Comments (32)

Ladicek avatar Ladicek commented on August 27, 2024

I'd gladly investigate the issue, if you provided a way to reproduce. Since the stack trace starts at JarIndexer.createJarIndex, I assume you have a JAR file that you're trying to index. Is that JAR file available somewhere, or is it possible to build it from source that is available somewhere?

from jandex.

attila-vegh avatar attila-vegh commented on August 27, 2024

The jar file is our company build.
I`m sending a screenshot where the "same" object is created twice
Dotname@957 vs DotName2717

image

I think the type is constructed from scala jar, what is a dependency jar for any other library file.
Our jar does not use scala directly.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

The Scala library, at least the one I'm looking at (org.scala:scala-library:2.13.8) doesn't contain scala.Function18$$anonfun$curried$1. I'll need a way to reproduce the issue, I don't see a way around it.

Your screenshot is unfortunately useless. The nameTable in IndexWriterV2 is a HashMap and DotName provides equality and hash code, so the fact that there are 2 different instances of DotName with the same content doesn't mean anything.

from jandex.

attila-vegh avatar attila-vegh commented on August 27, 2024

We are decided to filter out indexing scala jars. Now the indexing is ok

from jandex.

attila-vegh avatar attila-vegh commented on August 27, 2024

The Function is inside the: scala-library-2.9.1.jar

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

Thank you! Indeed running java -jar core/target/jandex-3.0.0-SNAPSHOT.jar ~/Downloads/scala-library-2.9.1-1.jar fails with the same exception:

$ java -jar core/target/jandex-3.0.0-SNAPSHOT.jar ~/Downloads/scala-library-2.9.1-1.jar 
java.lang.IllegalStateException: Class not found in class table:scala.collection.TraversableLike$$anonfun$isEmpty$1
        at org.jboss.jandex.IndexWriterV2.positionOf(IndexWriterV2.java:491)
        at org.jboss.jandex.IndexWriterV2.writeTypeEntry(IndexWriterV2.java:801)
        at org.jboss.jandex.IndexWriterV2.writeTypeTable(IndexWriterV2.java:264)
        at org.jboss.jandex.IndexWriterV2.write(IndexWriterV2.java:206)
        at org.jboss.jandex.IndexWriter.write(IndexWriter.java:102)
        at org.jboss.jandex.IndexWriter.write(IndexWriter.java:66)
        at org.jboss.jandex.JarIndexer.createJarIndex(JarIndexer.java:197)
        at org.jboss.jandex.JarIndexer.createJarIndex(JarIndexer.java:88)
        at org.jboss.jandex.Main.getIndex(Main.java:87)
        at org.jboss.jandex.Main.execute(Main.java:67)
        at org.jboss.jandex.Main.main(Main.java:52)

I'll look into it.

In the meantime, I believe what you did seems right, indexing the Scala classes would most likely be useless for you.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

Reopening, I can reproduce the issue by indexing a single file: scala/Function2$$anonfun$curried$1$$anonfun$apply$1.class. That should itself work fine, because Jandex is explicitly designed for incomplete classpaths. Yet I get the exception when writing the index through IndexWriterV2.

It also turns out that the hint at 2 different DotName objects is relevant. When the type table is being serialized, it seems the name table contains 2 entries with a key of scala.Function2$$anonfun$curried$1. Those 2 DotNames are differently componentized, and there may be a bug either in the name destructuring logic, or in equality/hash code.

Looking more into it.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

One more thing: I recall we adjusted DotName parsing to account for $ at the beginning or at the end of a name component. Since the class names in question here contain $$, I'm thinking maybe those adjustments are at fault. Need to check.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

The changes we did to account for $ at first/last position don't seem to be related.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

OK so DotName.hashCode seems fine, hash codes of those 2 different DotNames are equal. DotName.equals is more suspicious, but I guess that one of the DotName's core assumptions is that componentization is canonical (there's exactly one componentization for each valid name). In other words, the problem is probably in the code that creates the componentized DotNames in the first place.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

(Side note: to be completely fair, implementing DotNames equality by delegating to DotName.toString's equality would probably be easiest and I'm not convinced it would be much slower :-) )

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

So the 1st DotName is created by parsing a field, whose descriptor is scala/Function2$$anonfun$curried$1. This is componentized by NameTable.convertToName as:

  • {local = "scala", innerClass = false}
  • {local = "Function2$", innerClass = false}
  • {local = "anonfun", innerClass = true}
  • {local = "curried", innerClass = true}
  • {local = "1", innerClass = true}

The same DotName would later be created by parsing a method parameter, whose descriptor is identical, but that is correctly found in the intern pool.

The 2nd DotName is created by parsing a generic signature of that same method, which reads (Lscala/Function2<TT1;TT2;TR;>.$anonfun$curried$1;)V. That is a little strange, but valid.

The generic signature parser first finds a type name of scala.Function2. That name is componentized by NameTable.convertToName as:

  • {local = "scala", innerClass = false}
  • {local = "Function2", innerClass = false}

Then, the generic signature parser finds and parses the type arguments, and then it looks for nested types. Those are delimited by ., so there's just one: $anonfun$curried$1. NameTable.wrap is used to create a DotName whose prefix is the 2-element name shown above and local name is $anonfun$curried$1. In total, it looks like this:

  • {local = "scala", innerClass = false}
  • {local = "Function2", innerClass = false}
  • {local = "$anonfun$curried$1", innerClass = true}

Each case makes perfect sense when taken in isolation. However, looking at both of them at the same time, I'm starting to believe that DotName componentization most likely should NOT be assumed to be canonical. In other words, DotName.equals is wrong when both names are componentized.

from jandex.

attila-vegh avatar attila-vegh commented on August 27, 2024

It is possible to somehow resolve this?

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

It is possible. I'm still working on how to do it best.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

Hi @n1hility, could you please comment on how DotName componentization is supposed to work? At the moment, it seems to me that DotName.equals assumes that there's always exactly 1 componentization of any given name, but I've discovered (see above if you have plenty of time) that that's not really the case.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

I realized yesterday evening that the generic signature parser is probably at fault, because it does not componentize the inner class name. I'll check today.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

I came up with a relatively simple change that makes the generic signature parser componentize the inner class name and end up with the following DotName:

  • {local = "scala", innerClass = false}
  • {local = "Function2", innerClass = false}
  • {local = "$anonfun", innerClass = true}
  • {local = "curried", innerClass = true}
  • {local = "1", innerClass = true}

This is almost identical to the DotName created by parsing the descriptor, shown in one of the previous comments. The only difference is the placement of that one extra $.

Fixing that will likely affect the treatment of $ in the first/last position, as I mentioned above. Will continue digging.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

I think I just got enlightened :-) The problem with a descriptor like scala/Function2$$anonfun$curried$1 is that it is literally impossible to know if any given $ is a nesting separator or a true part of the name. Therefore, the only possible canonical componentization of a name like this is to split on every occurence of $, which leads to introduction of empty local names.

We eliminated empty local names previously (as I mentioned several times, by treating $ in the first/last position specially), which leads to ambiguity: should we split greedily (leading to Function2 + $anonfun), or eagerly (leading to Function2$ + anonfun). Any given answer to this question may be wrong, which will manifest in a situation like this, when there's another source of the name which leads to a different componentization (in this case, it's the generic signature).

The only question is, well, now what :-)

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

I have a patch locally that allows (not sure if correctly :-) ) indexing scala-library-2.9.1-1.jar. That patch attempts to restore canonical DotName componentization.

However, looking more at DotName code, it only guards against . separators when constructing componentized instances. It seems $ is allowed to be mixed relatively freely. Sometimes, it's part of the local name, and sometimes, it's a nested type separator. Under that condition, componentization cannot be canonical, and hence DotName.equals() is wrong when both instances are componentized.

from jandex.

n1hility avatar n1hility commented on August 27, 2024

@Ladicek sorry for the delay I will take a look it this tomorrow (today for you). Skimming it sounds like you are on the right track. The intention is that it's not precise (since not enough info), but it should be deterministic.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

@n1hility thanks, I think I've got it now. I need to fix DotName.equals(), which will be interesting (I found in history that it was specifically refactored to avoid allocations etc.), but doable.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

I have a fixed version of DotName.equals that allows successfully indexing the entire scala-library-2.9.1-1.jar. Unfortunately, reading the index back fails, so the fix isn't exactly a fix :-) Will continue digging.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

Reproducing the additional failure is possible by indexing one file (scala/collection/parallel/ParSeqLike$Elements$$anonfun$psplit$3$$anon$3.class) and then serializing and deserializing that index.

Of course, it's once again caused by a piece of code that assumes that componentization is canonical. This time, it's the code that reads/writes the name table.

Specifically, these 2 names are serialized next to each other:

  • scala.collection.parallel.ParSeqLike$Elements$$anonfun$psplit
  • scala.collection.parallel.ParSeqLike$Elements$$anonfun$psplit$3

The first is componentized like this:

  • {local = "scala", innerClass = false}
  • {local = "collection", innerClass = false}
  • {local = "parallel", innerClass = false}
  • {local = "ParSeqLike", innerClass = false}
  • {local = "Elements$", innerClass = true}
  • {local = "anonfun", innerClass = true}
  • {local = "psplit", innerClass = true}

While the second is componentized like this:

  • {local = "scala", innerClass = false}
  • {local = "collection", innerClass = false}
  • {local = "parallel", innerClass = false}
  • {local = "ParSeqLike", innerClass = false}
  • {local = "Elements$$anonfun$psplit$3", innerClass = true}

The code that constructs componentized DotNames during deserialization completely miscalculates the name depths and from there, everything goes wrong.

Will think about a fix next week.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

(Adding here just for the fun: after the index reader miscalculates the name depths, it even creates non-existing names such as scala.collection.parallel.ParSeqLike$Elements$$anonfun$psplit$3$3$ :-) )

from jandex.

n1hility avatar n1hility commented on August 27, 2024

@Ladicek ok thanks. it's all yours. One thing to check out is the sorting is correct, since these names are written in a radix tree order where each branch is preceded by its peers and finally the immediate parent (prefix())

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

Yea the sorting is correct. It's the assumption that it's possible, when deserializing a name, to get the correct prefix by taking the previously deserialized name and call prefix() on it N times, where N is read from the serialized format of the name. That assumes canonical componentization.

I realized over the weekend that there's a simple fix. Instead of writing a depth of the name, which will later be used to find the prefix, we can just store an index of the prefix in the name table. Since names are written in order, the prefix must have already been written, so we can simply look it up from the table, instead of inspecting the previously deserialized name. In case of huge JARs with 1000s classes, indices can be large numbers and the total index size could increase. That can be mitigated relatively easily: instead of storing an [absolute] index, we can store a [relative] offset. Since offsets would always be negative, we would actually store the offset modified like this: -1 would be stored as 0, -2 would be stored as 1, -3 would be stored as 2, etc. To sum up, if int currentIndex is an index of the currently serialized name and int prefixIndex is an index of the prefix, we would write currentIndex - prefixIndex - 1.

That obviously requires a new persistent format version, so can't be fixed in 2.4.

EDIT: the offset format described above actually can't be used, because a prefix of null requires extra treatment. So we'll use 0 to mean a prefix of null, otherwise we'll store an absolute value of the prefix offset (so currentIndex - prefixIndex).

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

Aaaand with the 2 changes described above (fixed DotName.equals when both names are componentized, and fixed serialization/deserialization of name prefix), the entire scala-library-2.9.1-1.jar file can be successfully indexed and the index can be successfully serialized and deserialized again. Yay! :-)

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

I could write a test for the DotName.equals bug relatively easily, but I'm struggling to write a test for the name prefix serialization/deserialization issue. This small snippet of Scala code, compiled with Scala 2.11.12, reproduces the issue:

package test

trait Func[-T1, -T2, +R] {
  def apply(v1: T1, v2: T2): R

  def curried: T1 => T2 => R = {
    (x1: T1) => (x2: T2) => apply(x1, x2)
  }
}

But I'm struggling to write an equivalent Java code that would also exhibit the issue. Will continue digging for some time, not sure what to do if I'm not successful.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

I just realized that I could just put the Scala source code into the test-data submodule and be done with it. There's a Maven plugin for Scala that can be used to compile it. The path of least resistance! :-)

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

OK this is embarrassing. I spent more than a day trying to find a reproducer for the name prefix serialization/deserialization issue, and it looks like my test for DotName.equals reproduces the problem too, if I roundtrip that index. The reproducer is more subtle, because deserializing the index doesn't fail, it just builds a nonsensical index in memory. But that's good enough for me.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

Fixes are in #176. Due to the need to change persistent format, this can't be backported to 2.x, unfortunately.

from jandex.

Ladicek avatar Ladicek commented on August 27, 2024

Fixed in #176.

from jandex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.