Giter Site home page Giter Site logo

Comments (13)

aalmiray avatar aalmiray commented on June 2, 2024 3

🎉 This issue has been resolved in 1.1.0 (Release Notes)

from moditect.

hboutemy avatar hboutemy commented on June 2, 2024 3

@aalmiray I'm happy to confirm that latest Jackson 2.16.0 release is now fully reproducible, thanks to this 1.1.0 moditect release
https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/com/fasterxml/jackson/databind/README.md (will be updated tonight)

from moditect.

aalmiray avatar aalmiray commented on June 2, 2024 2

@hboutemy thank you for that. I can follow up.

from moditect.

cowtowncoder avatar cowtowncoder commented on June 2, 2024 2

Whoa! Update to latest Moditect for Jackson builds paid off.

from moditect.

agentgt avatar agentgt commented on June 2, 2024 2

Yes this is especially helpful for any projects that are annotation processors as having the module-info in an annotation project can be tricky (the processor will accidentally get loaded compiling itself).

And annotation processors should be reproducible because they get kicked off by the compiler so there are security concern there.

@hboutemy has been doing a fantastic job on reproducible builds and deserves a ton of praise for the PR and the reproducible project!

from moditect.

hboutemy avatar hboutemy commented on June 2, 2024 1

Hi @aalmiray , PR #211 created, can you merge and plan the next bugfix release, please?

from moditect.

aalmiray avatar aalmiray commented on June 2, 2024

Is it really the archive generator or could it be the timestamp parser at a previous step?

from moditect.

agentgt avatar agentgt commented on June 2, 2024

Per my comments on #185 I don't think it is the timestamp.

I tried building with -Duser.timezone=UTC which made all the time fields the same in the build but it still made a different jar.

I checked the module-info.class files and they are the same.

I will try an isolated test later of using FileSystems.newFileSystem

from moditect.

agentgt avatar agentgt commented on June 2, 2024

Darn my isolated test does not show a problem:

EDIT: I can reproduce it!

First copy a regular jar made by maven and call it original.jar and put in the CWD.

Now run this test which will pass but take note of the hash.

Now run the test again and the hash will change.

It appears that the Zip filesystem will produce the same results within the same JVM launch but changes across executions.

public class ZipTest {

	@Test
	public void testName()
			throws Exception {
		String hash1 = run();
		System.out.println(hash1);
		String hash2 = run();
		System.out.println(hash2);
		assertEquals(hash1, hash2);

	}
	
	String run() throws Exception {
		var original = Path.of("original.jar");
		var outputJar = Path.of("some.jar");
		Files.copy(original, outputJar, StandardCopyOption.REPLACE_EXISTING);
		Map<String, String> env = new HashMap<>();
		env.put("create", "true");
		byte[] clazz = "Lets use a string".getBytes(StandardCharsets.UTF_8);
		URI uri = URI.create("jar:" + outputJar.toUri());
		Instant timestamp = Instant.ofEpochSecond(1671757006);
		FileTime ft = FileTime.from(timestamp);
		try (FileSystem zipfs = FileSystems.newFileSystem(uri, env)) {
			Path path = zipfs.getPath("module-info.txt");
			Files.write(
					path,
					clazz,
					StandardOpenOption.CREATE,
					StandardOpenOption.WRITE,
					StandardOpenOption.TRUNCATE_EXISTING);
			Files.setLastModifiedTime(path, ft);
		}
		return sha256(outputJar);
		
	}
	
	String sha256(Path path) throws @NonNull NoSuchAlgorithmException, IOException {
		var bytes = Files.readAllBytes(path);
		MessageDigest digest = MessageDigest.getInstance("SHA-256");
		byte[] hash = digest.digest(bytes);
		return HexFormat.of().formatHex(hash);
	}
}

from moditect.

agentgt avatar agentgt commented on June 2, 2024

Here is an easy way to try it.

Save the below as ZipMain.java

import java.io.IOException;
import java.net.URI;
import java.nio.charset.StandardCharsets;
import java.nio.file.FileSystem;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardCopyOption;
import java.nio.file.StandardOpenOption;
import java.nio.file.attribute.FileTime;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.time.Instant;
import java.util.HashMap;
import java.util.HexFormat;
import java.util.Map;


public class ZipMain {

	public static void main(
			String[] args) {
		try {
			var hash = run(Path.of(args[0]));
			System.out.println(hash);
		}
		catch (Exception e) {
			e.printStackTrace();
		}
	}
	
	static String run(Path original) throws Exception {
		var outputJar = Path.of("output.jar");
		System.out.println("Copying " + original + " to " + outputJar);
		Files.copy(original, outputJar, StandardCopyOption.REPLACE_EXISTING);
		Map<String, String> env = new HashMap<>();
		env.put("create", "true");
		byte[] clazz = "Lets use a string".getBytes(StandardCharsets.UTF_8);
		URI uri = URI.create("jar:" + outputJar.toUri());
		Instant timestamp = Instant.ofEpochSecond(1671757006);
		FileTime ft = FileTime.from(timestamp);
		try (FileSystem zipfs = FileSystems.newFileSystem(uri, env)) {
			Path path = zipfs.getPath("module-info.txt");
			Files.write(
					path,
					clazz,
					StandardOpenOption.CREATE,
					StandardOpenOption.WRITE,
					StandardOpenOption.TRUNCATE_EXISTING);
			Files.setLastModifiedTime(path, ft);
		}
		return sha256(outputJar);
		
	}
	
	static String sha256(Path path) throws NoSuchAlgorithmException, IOException {
		var bytes = Files.readAllBytes(path);
		MessageDigest digest = MessageDigest.getInstance("SHA-256");
		byte[] hash = digest.digest(bytes);
		return HexFormat.of().formatHex(hash);
	}
}

Now run it:

java ZipMain.java some.jar

Take note of hash.

Run it again:

java ZipMain.java some.jar

Different hash.

from moditect.

agentgt avatar agentgt commented on June 2, 2024

If you use Apache Commons Compress:

Replace the run method with this one:

	String run() throws Exception {
		var original = Path.of("original.jar");
		var outputJar = Path.of("some.jar");
		//Files.copy(original, outputJar, StandardCopyOption.REPLACE_EXISTING);
		byte[] clazz = "Lets use a string".getBytes(StandardCharsets.UTF_8);
		Instant timestamp = Instant.ofEpochSecond(1671757006);
		FileTime ft = FileTime.from(timestamp);
		try (
				JarArchiveInputStream jis = new JarArchiveInputStream(Files.newInputStream(original));
				JarArchiveOutputStream jout = new JarArchiveOutputStream(
						Files.newOutputStream(
								outputJar,
								StandardOpenOption.CREATE,
								StandardOpenOption.WRITE,
								StandardOpenOption.TRUNCATE_EXISTING))) {
			ChangeSet cs = new ChangeSet();
			JarArchiveEntry entry = new JarArchiveEntry("modules-info.txt");
			entry.setLastModifiedTime(ft);
			cs.add(entry, new ByteArrayInputStream(clazz), true);
			ChangeSetPerformer performer = new ChangeSetPerformer(cs);
			performer.perform(jis, jout);
		}
		return sha256(outputJar);
		
	}

It returns the same hash across executions.

from moditect.

agentgt avatar agentgt commented on June 2, 2024

Here is my current solution at the moment that does not require a change to moditect. I have moditect generate my module-info.class with the normal add-module-info (is there a way to generate module-info.class without updating the jar?).

I then put that module-info.class somewhere and rename it to avoid issues. Then I use ant to update the jar:

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-antrun-plugin</artifactId>
        <version>3.1.0</version>
        <executions>
          <execution>
            <id>add-module-info</id>
            <phase>package</phase>
            <goals>
              <goal>run</goal>
            </goals>
            <configuration>
              <target>
                <copy file="${project.build.sourceDirectory}/module-info.klass" tofile="${project.build.directory}/antrun/module-info.class" />
                <jar update="true" jarfile="${project.build.directory}/${project.artifactId}-${project.version}.jar" modificationtime="${project.build.outputTimestamp}000">
                  <fileset file="${project.build.directory}/antrun/module-info.class" />
                </jar>
              </target>
            </configuration>
          </execution>
        </executions>
      </plugin>

Ant apparently updates the Jar safely without changing the hash. It doesn't appear to be using apache commons compress but it does not use the NIO virtual filesystem.

The zip NIO virtual filesystem appears to be the problem. I'm not sure what meta data its adding as its barely a byte worth of changes according to diffoscope.

Ant's jar code be something moditect borrows or calls instead of commons compress.

I just can't believe I'm the only one experiencing this... it is a big deal because I have an annotation processor library and I absolutely want that one jar to be reproducible for security reasons (since the compiler kicks it off).

@cowtowncoder (it appears jackson is using moditect) or @gunnarmorling

Have you guys tried running:

mvn clean install 
mvn clean verify artifact:compare

https://maven.apache.org/guides/mini/guide-reproducible-builds.html

from moditect.

hboutemy avatar hboutemy commented on June 2, 2024

@aalmiray today, I finally found time to dig into the jackson-databind reproducibility issue, and I used zipdetails to dig into jar files details to find where the non-reproducible bits come from.
Here is the result https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/com/fasterxml/jackson/databind/jackson-databind-2.15.2.diffoscope
It seems current code perfectly sets the usual modification time, but forgets to set access time and change time = fields that are not displayed by usual zip tools, but that is stored in zip file
I did not study how NIO can set these yet...

from moditect.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.