-
Notifications
You must be signed in to change notification settings - Fork 347
Open
Description
Hello guys, I am getting this warn
WARN Utils$: An error occurred while trying to read the S3 bucket lifecycle configuration
java.lang.NullPointerException
at java.lang.String.startsWith(String.java:1385)
at java.lang.String.startsWith(String.java:1414)
at com.databricks.spark.redshift.Utils$$anonfun$3.apply(Utils.scala:102)
at com.databricks.spark.redshift.Utils$$anonfun$3.apply(Utils.scala:98)
at scala.collection.Iterator$class.exists(Iterator.scala:753)
at scala.collection.AbstractIterator.exists(Iterator.scala:1157)
at scala.collection.IterableLike$class.exists(IterableLike.scala:77)
at scala.collection.AbstractIterable.exists(Iterable.scala:54)
at com.databricks.spark.redshift.Utils$.checkThatBucketHasObjectLifecycleConfiguration(Utils.scala:98)
at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:361)
at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:106)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
I have seen this issue here before, but it still occurs for me.
I do have a lifecycle configuration for my bucket. I've traced this warn to this piece of code
def checkThatBucketHasObjectLifecycleConfiguration(
tempDir: String,
s3Client: AmazonS3Client): Unit = {
try {
val s3URI = createS3URI(Utils.fixS3Url(tempDir))
val bucket = s3URI.getBucket
assert(bucket != null, "Could not get bucket from S3 URI")
val key = Option(s3URI.getKey).getOrElse("")
val hasMatchingBucketLifecycleRule: Boolean = {
val rules = Option(s3Client.getBucketLifecycleConfiguration(bucket))
.map(_.getRules.asScala)
.getOrElse(Seq.empty)
rules.exists { rule =>
// Note: this only checks that there is an active rule which matches the temp directory;
// it does not actually check that the rule will delete the files. This check is still
// better than nothing, though, and we can always improve it later.
rule.getStatus == BucketLifecycleConfiguration.ENABLED && key.startsWith(rule.getPrefix)
}
}
if (!hasMatchingBucketLifecycleRule) {
log.warn(s"The S3 bucket $bucket does not have an object lifecycle configuration to " +
"ensure cleanup of temporary files. Consider configuring `tempdir` to point to a " +
"bucket with an object lifecycle policy that automatically deletes files after an " +
"expiration period. For more information, see " +
"https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html")
}
} catch {
case NonFatal(e) =>
log.warn("An error occurred while trying to read the S3 bucket lifecycle configuration", e)
}
}I believe the exception is thrown because of this
key.startsWith(rule.getPrefix)
I checked the Amazon SDK documents, the method getPrefix returns null if the prefix wasn't set using the setPrefix method, therefore it will always return null in this case.
I have a very limited knowledge of the Amazon SDK and Scala, so I'm not really sure about this.
danielsan, mattgiguere, aallawala, awhillas, maxmithun and 3 more
Metadata
Metadata
Assignees
Labels
No labels