SearchContextMissingException Failed to execute fetch phase [search/phase/fetch/id]

SearchContextMissingException Failed to execute fetch phase [search/phase/fetch/id] - java

Cluser: I am Using elasticsearch 1.3.1 with 6 nodes in different servers, which are all connected with by LAN. The bandwidth is high and the each one has 45 GB RAM in it.
Configuration The Heap size we allocated for the node to run is 10g. We do have the elasticsearch default configuration except the unique discoverym, cluster name, node name and we 2 zone. 3 node belongs to one zone and the other belongs to another zone.
indices: 15, total size of the indices is 76GB.
Now a days i am facing the SearchContextMissingException exception as DEBUG log. It smells like some search query has taken to much of time to fetch. but I checked with queries, there was no query to produce high amount of load to the cluster... I am wondering why this happen.
Issue: Due to this issue one by one all the nodes start to collect GC. and result in the OOM :(
Here is my exception. Please kindly explain me 2 things.
What is SearchContextMissingException? Why it happen?
How can we prevent the cluster from these type of query?
The Error:
[YYYY-MM-DD HH:mm:ss,039][DEBUG][action.search.type ] [es_node_01] [5031530]
Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [es_node_02][inet[/1x.x.xx.xx:9300]][search/phase/fetch/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [5031530]
at org.elasticsearch.search.SearchService.findContext(SearchService.java:480)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:450)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:793)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:782)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

If you can, update to 1.4.2. It fixes some known resilience issues, including cascading failures like you describe.
Regardless of that, the default configuration will definitely get you in trouble. Minimum, you may need to look at setting up circuitbreakers for e.g. field data caches.
Here's a snippet lifted from our production configuration. I assume you have also configured linux filehandles limits correctly: see here
# prevent swapping
bootstrap.mlockall: true
indices.breaker.total.limit: 70%
indices.fielddata.cache.size: 70%
# make elasticsearch work harder to migrate/allocate indices on startup (we have a lot of shards due to logstash); default was 2
cluster.routing.allocation.node_concurrent_recoveries: 8
# enable cors
http.cors.enabled: true
http.cors.allow-origin: /https?:\/\/(localhost|kibana.*\.linko\.io)(:[0-9]+)?/
index.query.bool.max_clause_count: 4096

The same error (or debug statement) still occurs in 1.6.0, and is not a bug.
When you create a new scroll request:
SearchResponse scrollResponse = client.prepareSearch(index).setTypes(types).setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(60000)).setSize(maxItemsPerScrollRequest).setQuery(ElasticSearchQueryBuilder.createMatchAllQuery()).execute().actionGet();
String scrollId = scrollResponse.getScrollId();
a new scroll id is created (apart from the scrollId the response is empty). To fetch the results:
long resultCounter = 0l; // to keep track of the number of results retrieved
Long nResultsTotal = null; // total number of items we will be expecting
do {
final SearchResponse response = client.prepareSearchScroll(scrollId).setScroll(new TimeValue(600000)).execute().actionGet();
// handle result
if(nResultsTotal==null) // if not initialized
nResultsTotal = response.getHits().getTotalHits(); //set total number of Documents
resultCounter += response.getHits().getHits().length; //keep track of the items retrieved
} while (resultCounter < nResultsTotal);
This approach works regardless of the number of shards you have. Another option is to add a break statement when:
boolean breakIf = response.getHits().getHits().length < (nShards * maxItemsPerScrollRequest);
The number of items to be returned is maxItemsPerScrollRequest (per shard!), so we'd expect the number of items requested multiplied by the number of shards. But when we have multiple shards, and one of those is out of documents, whereas others do not, then the former method will still give us all available documents. The latter will stop prematurely - I expect (haven't tried!)
Another way to stop seeing this exception (since it is 'only' DEBUG), is to open the logging.yml file in the config directory of ElasticSearch, then change:
action: DEBUG
to
action: INFO

Related

Changing alias in ElasticSearch returns 200 and acknowledged but does not change alias

Using elasticsearch 8.4.3 with Java 17 and a cluster of 3 nodes where 3 are master eligible, we start with following situation:
index products-2023-01-12-0900 which has an alias current-products
We then start a job that creates a new index products-2023-01-12-1520 and at the end using elastic-rest-client on client side and alias API, we make this call:
At 2023-01-12 16:27:26,893:
POST /_aliases
{"actions":[
{
"remove": {
"alias":"current-products",
"index":"products-*"
}
},
{
"add":{
"alias":"current-products",
"index":"products-2023-01-12-1520"}
}
]}
And we get the following response 26 millis after with HTTP response code 200:
{"acknowledged":true}
But looking at what we end up with, we still have old index with current-products alias.
I don't understand why it happens, and it does not happen 100% of the time (it happened 2 times out of around 10 indexations).
Is it a known bug ? or a regular behaviour ?
Edit for #warkolm:
GET /_cat/aliases?v before indexation as of now:
alias index filter routing.index routing.search is_write_index
current-products products-2023-01-13-1510 - - - -

It appears that there might be an issue with the way you are updating the alias. When you perform a POST request to the _aliases endpoint with the "remove" and "add" actions, Elasticsearch will update the alias based on the current state of the indices at the time the request is executed.
However, it is possible that there are other processes or actions that are also modifying the indices or aliases at the same time, and this can cause conflicts or inconsistencies. Additionally, when you use the wildcard character (*) in the "index" field of the "remove" action, it will remove the alias from all indices that match the pattern, which may not be the intended behavior.
To avoid this issue, you could try using the Indices Aliases API instead of the _aliases endpoint. This API allows you to perform atomic updates on aliases, which means that the alias will only be updated if all actions succeed, and will roll back if any of the actions fail. Additionally, instead of using the wildcard character, you can explicitly specify the index that you want to remove the alias from.
Here is an example of how you could use the Indices Aliases API to update the alias:
POST /_aliases
{
"actions": [
{ "remove": { "index": "products-2023-01-12-0900", "alias": "current-products" } },
{ "add": { "index": "products-2023-01-12-1520", "alias": "current-products" } }
]
}
This way, the alias will only be removed from the specific index "products-2023-01-12-0900" and added to the specific index "products-2023-01-12-1520". This can help avoid any conflicts or inconsistencies that may be caused by other processes or actions that are modifying the indices or aliases at the same time.
Additionally, it is recommended to use a version of elasticsearch that is equal or greater than 8.4.3, as it has many bug fixes that might be the cause of the issue you are facing.
In conclusion, the issue you are encountering may not be a known bug but it's a regular behavior if multiple processes are modifying the indices or aliases at the same time, and using the Indices Aliases API and specifying the exact index to remove or add the alias can help avoid this issue.

How to fix search returning hits for missing data in Liferay?

In Liferay we are looking for articles that satisfy certain conditions by using the following code:
Hits hits = indexSearcherHelper.search(searchContext, query);
Search query which we use is defined as:
BooleanFilter filter = new BooleanFilter();
filter.addRequiredTerm(Field.GROUP_ID, globalSiteId);
filter.addRequiredTerm(Field.STATUS, WorkflowConstants.STATUS_APPROVED);
filter.addRequiredTerm("ddmStructureKey", "TEST");
filter.addRequiredTerm("head", true);
MatchAllQuery query = new MatchAllQuery();
query.setPreBooleanFilter(filter);
and this search finds multiple hits.
Then we attempt to get the article like this:
JournalArticleResource journalArticleResource = journalArticleResourceLocalService.getArticleResource(GetterUtil.getLong(hits.toList().get(0).get(Field.ENTRY_CLASS_PK)));
JournalArticle article = journalArticleLocalService.getArticle(journalArticleResource.getGroupId(), journalArticleResource.getArticleId());
However, this produces following error:
No JournalArticleResource exists with the primary key 809477.
In 95% of cases, this code works as expected. But in some cases (on some environments), it appears that index search found results which are not valid. Why does this happen?
Can it be that index has some stale records that are from old, already deleted articles? Do we need to reindex the database?
UPDATE 1: I have observed a very strange behaviour of the index search:
The following code:
for (int counter = 0; counter < 10; counter++)
{
System.out.println(counter);
System.out.println(indexSearcherHelper.search(searchContext, query).toList().size());
}
produces this result:
0
0
1
4
2
7
3
0
4
4
5
7
6
0
7
4
8
7
9
0
There is only 1 result in reality that needs to be found. On all other environments this code keeps finding just one result in all 10 searches, since we added only 1 article.
In this case, however, it keeps finding no results, 4 results, 7 results and keeps repeating the same pattern.
What is going on here? Is database corrupted? Is it Liferay bug? How can the same search return different number of results?
(By the way, last year we did a live database migration from one server to another, that is, migration of the database while Liferay was up and running [not too good idea] to reduce the production downtime, so I am afraid that we might be experiencing the database corruption here.)
UPDATE 2: as requested in the comments, here is the version of the Liferay that we are using and an example of the search with values of some fields modified since this is a production example from closed source application.
Version:
Liferay Community Edition Portal 7.0.4 GA5 (Wilberforce / Build 7004 / October 23, 2017)
System.out.println(hits.toList().get(0));
{
ddmTemplateKey=[673861],
entryClassPK=[809477],
ddm__keyword__673858__LActive_hr_HR=[true],
publishDate=[20211116063000],
ddm__keyword__673858__SActive_hr_HR=[false],
ddm__keyword__673858__GNA_en_US_String_sortable=[ne],
ddm__text__673858__OList_hr_HR_String_sortable=[32554651079],
classNameId=[0],
ddm__keyword__673858__SActive_en_US_String_sortable=[false],
ddm__keyword__673858__O_hr_HR_String_sortable=[opis pop upa],
modified_sortable=[1637050218921],
title_hr_HR=[Test ss n],
ddm__keyword__673858__O_en_US=[Opis pop upa],
version=[2.4],
ddm__keyword__673858__B_en_US=[grey],
ddm__keyword__673858__SActive_hr_HR_String_sortable=[false],
ddm__keyword__673858__OAll_en_US_String_sortable=[false],
status=[0],
ddm__keyword__673858__GPA_en_US=[OK],
publishDate_sortable=[1637044200000],
content_hr_HR=[OK 32554651079 NE true Opis pop upa all true Test pop najnoviji Utorak grey false all false /ervices],
ddm__keyword__673858__TR_en_US=[all],
ddm__keyword__673858__B_hr_HR=[grey],
uid=[com.liferay.journal.model.JournalArticle_PORTLET_811280],
localized_title_en_US_sortable=[test ss n],
layoutUuid=[],
ddm__text__673858__OList_en_US=[32554651079],
ddm__keyword__673858__GNA_hr_HR=[NE],
ddm__keyword__673858__TR_en_US_String_sortable=[all],
ddm__keyword__673858__GNA_hr_HR_String_sortable=[ne],
createDate=[20211115132217],
ddm__keyword__673858__OAll_hr_HR_String_sortable=[false],
displayDate_sortable=[1637044200000],
ddm__keyword__673858__O_en_US_String_sortable=[opis pop upa],
entryClassName=[com.liferay.journal.model.JournalArticle],
ddm__keyword__673858__N_en_US=[Test pop najnoviji Utorak],
ddm__keyword__673858__S_hr_HR_String_sortable=[all],
userId=[30588],
localized_title_en_US=[test ss n],
ddm__keyword__673858__N_hr_HR_String_sortable=[test pop najnoviji utorak],
ddm__keyword__673858__OListActive_hr_HR=[true],
ddm__keyword__673858__GPA_hr_HR_String_sortable [ok],
treePath=[, 673853],
ddm__keyword__673858__B_en_US_String_sortable=[grey],
ddm__keyword__673858__S_hr_HR=[all], groupId=[20152],
ddm__keyword__673858__B_hr_HR_String_sortable=[grey],
createDate_sortable=[1636982537964],
classPK=[0],
ddm__keyword__673858__S_en_US_String_sortable=[all],
ddm__keyword__673858__GPA_hr_HR=[OK],
scopeGroupId=[20152],
articleId_String_sortable=[809475],
ddm__keyword__673858__OAll_hr_HR=[false],
modified=[20211116081018],
ddm__keyword__673858__LActive_hr_HR_String_sortable=[true],
ddm__keyword__673858__L_hr_HR=[/ervices],
localized_title_hr_HR_sortable=[test ss n],
ddm__keyword__673858__L_en_US=[/ervices],
visible=[true],
ddmStructureKey=[TEST],
ddm__keyword__673858__OAll_en_US=[false],
defaultLanguageId=[hr_HR],
ddm__keyword__673858__L_hr_HR_String_sortable=[/ervices],
viewCount_sortable=[0],
folderId=[673853],
classTypeId=[673858],
ddm__text__673858__OList_hr_HR=[32554651079],
ddm__keyword__673858__TR_hr_HR_String_sortable=[all],
companyId=[20116],
rootEntryClassPK=[809477],
ddm__keyword__673858__LA_en_US_String_sortable=[true],
displayDate=[20211116063000],
ddm__keyword__673858__OListActive_hr_HR_String_sortable=[true],
ddm__keyword__673858__SActive_en_US=[false],
ddm__keyword__673858__OListActive_en_US=[true],
ddm__keyword__673858__LActive_en_US=[true],
content=[OK 32554651079 NE true Opis pop upa all true Test pop najnoviji Utorak grey false all false /ervices],
head=[true],
ddm__keyword__673858__GPA_en_US_String_sortable=[ok],
ddm__keyword__673858__OListActive_en_US_String_sortable=[true],
ratings=[0.0],
expirationDate_sortable=[9223372036854775807],
viewCount=[0],
ddm__text__673858__OList_en_US_String_sortable=[32554651079],
localized_title_hr_HR=[test ss n],
expirationDate=[99950812133000],
ddm__keyword__673858__N_en_US_String_sortable=[test pop najnoviji utorak],
roleId=[20123, 20124, 20126],
ddm__keyword__673858__S_en_US=[all],
articleId=[809475],
ddm__keyword__673858__N_hr_HR=[Test pop najnoviji Utorak],
userName=[tuser%40admin -],
localized_title=[test ss n],
stagingGroup=[false],
headListable=[true],
ddm__keyword__673858__L_en_US_String_sortable=[/ervices],
ddm__keyword__673858__O_hr_HR=[Opis pop upa],
ddm__keyword__673858__TR_hr_HR=[all],
ddm__keyword__673858__GNA_en_US=[NE]
}

You might be using the wrong service, try using the journalArticleLocalService.
The id of the journal article resource is the id of the journal article plus 1, so if you have more than one article, in most cases it wont produce the error, but will return the wrong article.

Perhaps you are hitting some inconsistencies in your Elasticsearch index: JournalArticles that don't exist in the Database but they exist in the Elasticsearch.
You can double check this and correct it using my Liferay Index Checker, see https://github.com/jorgediaz-lr/index-checker#readme
Once you have installed it, you have to:
Check the "Display orphan index entries" option
Click on "Check Index"
If you have any orphan results, you can remove them clicking on the "Remove orphans" button.

Why Spark dataframe cache doesn't work here

I just wrote a toy class to test Spark dataframe (actually Dataset since I'm using Java).
Dataset<Row> ds = spark.sql("select id,name,gender from test2.dummy where dt='2018-12-12'");
ds = ds.withColumn("dt", lit("2018-12-17"));
ds.cache();
ds.write().mode(SaveMode.Append).insertInto("test2.dummy");
//
System.out.println(ds.count());
According to my understanding, there're 2 actions, "insertInto" and "count".
I debug the code step by step, when running "insertInto", I see several lines of:
19/01/21 20:14:56 INFO FileScanRDD: Reading File path: hdfs://ip:9000/root/hive/warehouse/test2.db/dummy/dt=2018-12-12/000000_0, range: 0-451, partition values: [2018-12-12]
When running "count", I still see similar logs:
19/01/21 20:15:26 INFO FileScanRDD: Reading File path: hdfs://ip:9000/root/hive/warehouse/test2.db/dummy/dt=2018-12-12/000000_0, range: 0-451, partition values: [2018-12-12]
I have 2 questions:
1) When there're 2 actions on same dataframe like above, if I don't call ds.cache or ds.persist explicitly, will the 2nd action always causes the re-executing of the sql query?
2) If I understand the log correctly, both actions trigger hdfs file reading, does that mean the ds.cache() actually doesn't work here? If so, why it doesn't work here?
Many thanks.

It's because you append into the table where ds is created from, so ds needs to be recomputed because the underlying data changed. In such cases, spark invalidates the cache. If you read e.g. this Jira (https://issues.apache.org/jira/browse/SPARK-24596):
When invalidating a cache, we invalid other caches dependent on this
cache to ensure cached data is up to date. For example, when the
underlying table has been modified or the table has been dropped
itself, all caches that use this table should be invalidated or
refreshed.
Try to run the ds.count before inserting into the table.

I found that the other answer doesn't work. What I had to do was break lineage such that the df I was writing does not know that one of its source is the table I am writing to. To break lineage, I created a copy df using
copy_of_df = sql_context.createDataframe(df.rdd)

Best practice for SOLR partial index in order to update attributes that change frequently in Hybris

My scenario is like this.
Solr Indexing happens for a product and then product approval status is made unapproved from backoffice. After then, when you search the related words that is placed in description of the product or directly product code from website, you get a server error since the product that is made unapproved is still placed in solr.
If you perform any type of indexing from backoffice manually, it works again. But it is not a good solution since there might be lots of products whose status is changed or that is not a solution which happens instantly. If you use cronjob for indexing, that is not a fast solution again.You get server error until cronjob starts to work.
I would like to update solr index instantly for the attributes which changes frequently like price, status, etc. For instance, when an attribute changes, Is it a good way to start partial index immediately in java code? If it is, how? (by IndexerService?). For another solution, Is it a better idea to make http request to solr for the attribute?
In summary, I am looking for the best solution to perform partial index.
Any ideas?

For this case you need to write two new important SOLR-Configuration parts:
1) A new SOLR-Cronjob that trigger the indexing
2) A new SOLR-IndexerQuery for indexing with your special requirements.
When you have a look at the default stuff from hybris you see:
INSERT_UPDATE CronJob;code[unique=true];job(code);singleExecutable;sessionLanguage(isocode);active;
;backofficeSolrIndexerUpdateCronJob;backofficeSolrIndexerUpdateJob;false;en;false;
INSERT Trigger;cronJob(code);active;activationTime;year;month;day;hour;minute;second;relative;weekInterval;daysOfWeek;
;backofficeSolrIndexerUpdateCronJob;true;;-1;-1;-1;-1;-1;05;false;0;;
This part above is to configure when the job should run. You can modify him, that he should run ever 5 seconds for example.
INSERT_UPDATE SolrIndexerQuery; solrIndexedType(identifier)[unique = true]; identifier[unique = true]; type(code); injectCurrentDate[default = true]; injectCurrentTime[default = true]; injectLastIndexTime[default = true]; query; user(uid)
; $solrIndexedType ; $solrIndexedType-updateQuery ; update ; false ; false ; false ; "SELECT DISTINCT {PK} FROM {Product AS p JOIN VariantProduct AS vp ON {p.PK}={vp.baseProduct} } WHERE {p.modifiedtime} >= ?lastStartTimeWithSuccess OR {vp.modifiedtime} >= ?lastStartTimeWithSuccess" ; admin
The second part here is the more important. Here you define which products should be indexed. Here you can see that the UPDATE-Job is looking for every Product that was modified. Here you could write a new FlexibleSearch with your special requirements.
tl;tr Answear: You have to write a new performant solrIndexerQuery that could be trigger every 5 seconds

Iterating over every document in Lotus Domino

I'd like iterate over every document in a (probably big) Lotus Domino database and be able to continue it from the last one if the processing breaks (network connection error, application restart etc.). I don't have write access to the database.
I'm looking for a way where I don't have to download those documents from the server which were already processed. So, I have to pass some starting information to the server which document should be the first in the (possibly restarted) processing.
I've checked the AllDocuments property and the DocumentColletion.getNthDocument method but this property is unsorted so I guess the order can change between two calls.
Another idea was using a formula query but it does not seem that ordering is possible with these queries.
The third idea was the Database.getModifiedDocuments method with a corresponding Document.getLastModified one. It seemed good but
it looks to me that the ordering of the returned collection is not documented and based on creation time instead of last modification time.
Here is a sample code based on the official example:
System.out.println("startDate: " + startDate);
final DocumentCollection documentCollection =
database.getModifiedDocuments(startDate, Database.DBMOD_DOC_DATA);
Document doc = documentCollection.getFirstDocument();
while (doc != null) {
System.out.println("#lastmod: " + doc.getLastModified() +
" #created: " + doc.getCreated());
doc = documentCollection.getNextDocument(doc);
}
It prints the following:
startDate: 2012.07.03 08:51:11 CEDT
#lastmod: 2012.07.03 08:51:11 CEDT #created: 2012.02.23 10:35:31 CET
#lastmod: 2012.08.03 12:20:33 CEDT #created: 2012.06.01 16:26:35 CEDT
#lastmod: 2012.07.03 09:20:53 CEDT #created: 2012.07.03 09:20:03 CEDT
#lastmod: 2012.07.21 23:17:35 CEDT #created: 2012.07.03 09:24:44 CEDT
#lastmod: 2012.07.03 10:10:53 CEDT #created: 2012.07.03 10:10:41 CEDT
#lastmod: 2012.07.23 16:26:22 CEDT #created: 2012.07.23 16:26:22 CEDT
(I don't use any AgentContext here to access the database. The database object comes from a session.getDatabase(null, databaseName) call.)
Is there any way to reliably do this with the Lotus Domino Java API?

If you have access to change the database, or could ask someone to do so, then you should create a view that is sorted on a unique key, or modified date, and then just store the "pointer" to the last document processed.
Barring that, you'll have to maintain a list of previously processed documents yourself. In that case you can use the AllDocuments property and just iterate through them. Use the GetFirstDocument and GetNextDocument as they are reportedly faster than GetNthDocument.
Alternatively you could make two passes, one to gather a list of UNIDs for all documents, which you'll store, and then make a second pass to process each document from the list of UNIDs you have (using GetDocumentByUNID method).

I don't use the Java API, but in Lotusscript, I would do something like this:
Locate a view displaying all documents in the database. If you want the agent to be really fast, create a new view. The first column should be sorted and could contain the Universal ID of the document. The other columns contains all the values you want to read in your agent, in your example that would be the created date and last modified date.
Your code could then simply loop through the view like this:
lastSuccessful = FunctionToReadValuesSomewhere() ' Returns 0 if empty
Set view = thisdb.GetView("MyLookupView")
Set col = view.AllEntries
Set entry = col.GetFirstEntry
cnt = 0
Do Until entry is Nothing
cnt = cnt + 1
If cnt > lastSuccessful Then
universalID = entry.ColumnValues(0)
createDate = entry.ColumnValues(1)
lastmodifiedDate = entry.ColumnValues(2)
Call YourFunctionToDoStuff(universalID, createDate, lastmodifiedDate)
Call FunctionToStoreValuesSomeWhere(cnt, universalID)
End If
Set entry = col.GetFirstEntry
Loop
Call FunctionToClearValuesSomeWhere()
Simply store the last successful value and Universal ID in say a text file or environment variable or even profile document in the database.
When you restart the agent, have some code that check if the values are blank (then return 0), otherwise return the last successful value.

Agents already keep a field to describe documents that they have not yet processed, and these are automatically updated via normal processing.
A better way of doing what you're attempting to do might be to store the results of a search in a profile document. However, if you're trying to relate to documents in a database you do not have write permission to, the only thing you can do is keep a list of the doclinks you've already processed (and any information you need to keep about those documents), or a sister database holding one document for each doclink plus multiple fields related to the processing you've done on them. Then, transfer the lists of IDs and perform the matching on the client to do per-document lookups.

Lotus Notes/Domino databases are designed to be distributed across clients and servers in a replicated environment. In the general case, you do not have a guarantee that starting at a given creation or mod time will bring you consistent results.
If you are 100% certain that no replicas of your target database are ever made, then you can use getModifiedDocuments and then write a sort routine to place (modDateTime,UNID) pairs into a SortedSet or other suitable data structure. Then you can process through the Set, and if you run into an error you can save the modDateTime of the element that you were attempting to process as your restart point. There may be a few additional details for you to work out to avoid duplicates, however, if there are multiple documents with the exact same modDateTime stamp.
I want to make one final remark. I understand that you are asking about Java, but if you are working on a backup or archiving system for compliance purposes, the Lotus C API has special functions that you really should look at.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

SearchContextMissingException Failed to execute fetch phase [search/phase/fetch/id] - java

Related

Changing alias in ElasticSearch returns 200 and acknowledged but does not change alias

How to fix search returning hits for missing data in Liferay?

Why Spark dataframe cache doesn't work here

Best practice for SOLR partial index in order to update attributes that change frequently in Hybris

Iterating over every document in Lotus Domino

Categories

Resources