Skip to content

bug in tests: Java surefire forking is causing LightGBM to fail with serious C++ issues #122

@AlbertoEAF

Description

@AlbertoEAF

As of #121 LightGBM tests can still fail after the first run with a serious C++ error from the LightGBM side:

[ERROR] munmap_chunk(): invalid pointer
[ERROR] Aborted (core dumped)

After that the following messages will show up:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Openml LightGBM lib 3.0.1-with_model_locale_fix_for_java_and_streaming SUCCESS [  1.659 s]
[INFO] openml-lightgbm-module 1.0.12-SNAPSHOT ............. SUCCESS [  0.378 s]
[INFO] openml-lightgbm 1.0.12-SNAPSHOT .................... FAILURE [  9.691 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  12.312 s
[INFO] Finished at: 2022-05-27T14:48:38+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project openml-lightgbm: There are test failures.
[ERROR] 
[ERROR] Please refer to /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider && /home/alberto.ferreira/.sdkman/candidates/java/8.0.302-open/jre/bin/java -javaagent:/home/alberto.ferreira/.m2/repository/org/jacoco/org.jacoco.agent/0.8.4/org.jacoco.agent-0.8.4-runtime.jar=destfile=/home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/jacoco.exec -jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire/surefirebooter2472734679044173114.jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire 2022-05-27T14-48-29_842-jvmRun1 surefire8810960277057352224tmp surefire_07484658466683880775tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] com.feedzai.openml.provider.lightgbm.LightGBMBinaryClassificationModelTrainerTest
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider && /home/alberto.ferreira/.sdkman/candidates/java/8.0.302-open/jre/bin/java -javaagent:/home/alberto.ferreira/.m2/repository/org/jacoco/org.jacoco.agent/0.8.4/org.jacoco.agent-0.8.4-runtime.jar=destfile=/home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/jacoco.exec -jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire/surefirebooter2472734679044173114.jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire 2022-05-27T14-48-29_842-jvmRun1 surefire8810960277057352224tmp surefire_07484658466683880775tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] com.feedzai.openml.provider.lightgbm.LightGBMBinaryClassificationModelTrainerTest
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:748)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:305)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:265)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1314)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1159)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:932)
[ERROR] 	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] 	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
[ERROR] 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[ERROR] 	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[ERROR] 	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:957)
[ERROR] 	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:289)
[ERROR] 	at org.apache.maven.cli.MavenCli.main(MavenCli.java:193)
[ERROR] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] 	at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)

How to reproduce

# Go to repo root and run:
mvn clean install -DskipTests
cd openml-lightgbm  # Go to provider directory to run its tests.

Tests will always run with:

mvn clean test

However, they will always fail (at least after the first run) with:

mvn test

Conclusion

surefire settings: How to have mvn test work all the time

By simply changing maven-surefire-plugin's forking settings (see [1]) it's possible to have mvn test work all the time too.

This is the observed behaviour regarding mvn test:

  • default settings fail (forkCount=1, reuseForks=true)
  • disabling forks works (forkCount=0)
  • not reusing forks works (reuseForks=false)

Update: All fork settings can still fail after all. The failure odds change though. The only command that works all the time is still mvn clean test. This means the approaches described below don't fix the issue.

Analysis

Given that disabling forks altogether vs running the tests in a forked process (default) causes the issues, the problem seems caused by Java's test stack and not by the LightGBM C++ side (which always works with mvn clean test anyway).

Note: Upgrading to the latest versions of jacoco 0.8.6 and maven-surefire-plugin to 3.0.0-M6 didn't fix the issue either.

Possible solutions

Now, we can disable forks altogether, instead of forking the process before starting the tests once (default, which is failing); or we can set tests not to reuse forks. The latter option is slower as it forks the process for every test class.

Are there any downsides to doing these approaches? And which of those approaches is best?

References

1. maven-surefire-plugin forking docs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions