-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Issue
When two publishing tasks are being processed at the same time for two different publications the first page will end up in both indexes. This only happens when the items are published at the same time.
Reproduce
- Start publish home page within publication 1
- Start publish home page within publication 2
- Home page for publication 1 will be indexed correctly in SOLR index 1
- Home page for publication 2 will also be indexed correctly in SOLR index 2
- Home page for publication 1 will also be indexed in SOLR index 2 incorrectly
Setup
We have a setup with a single deployer using DXA and SI4T. As this is a dynamic site all content is in the broker. Within the deployer cd_stroage.config we have a single storage configuration id="defaultdb" which is setup as the catch all for publishing for each publication.
We are using the Indexer Node within the "defaultdb" Storage config to configure multiple SOLR URLs to allow different publications to have different SOLR indexes.
Diagnostic
After looking in to the logs the issue is that the instances of SolrIndexer are cached in SearchIndexProcessor in the feild INDEXER_HANDLERS. This means that when two publish processes are happerning at the same time they use the same instance of SolrIndexer.
Within SolrIndexer there are 4 fields : itemRemovals, itemAdds , binaryAdds and itemUpdates
private ConcurrentHashMap<String, BaseIndexData> itemRemovals = new ConcurrentHashMap<String, BaseIndexData>();
private ConcurrentHashMap<String, SearchIndexData> itemAdds = new ConcurrentHashMap<String, SearchIndexData>();
private ConcurrentHashMap<String, BinaryIndexData> binaryAdds = new ConcurrentHashMap<String, BinaryIndexData>();
private ConcurrentHashMap<String, SearchIndexData> itemUpdates = new ConcurrentHashMap<String, SearchIndexData>();
This is where the items to be indexed are added to.
Once they have been added then the commit function takes a snapshot of them and sends them to the index.
The bug occurs because the itemAdds field can be populated by different publication processes if those processes have the same storage ID even if they are for different publications.
Work around
Having specific storage ids within the cd_storage.config of the deployer one for each publication resolves this issue as the SolrIndexer objects are not shared accross different storage ids.
I have implemented this and done basic testing to ensure that the issue doesnt occur. This has resolved the problem and I am able to reliably publish two pages at the same time and the content not be added to the wrong SOLR index.
Possible Solutions
Either the itemAdds needs to be split up based on the publication they are targeted at within the SolrIndexer
or
The INDEXER_HANDLERS should be keyed by storage id and publications id.