Iterating over every document in Lotus Domino

Iterating over every document in Lotus Domino - java

I'd like iterate over every document in a (probably big) Lotus Domino database and be able to continue it from the last one if the processing breaks (network connection error, application restart etc.). I don't have write access to the database.
I'm looking for a way where I don't have to download those documents from the server which were already processed. So, I have to pass some starting information to the server which document should be the first in the (possibly restarted) processing.
I've checked the AllDocuments property and the DocumentColletion.getNthDocument method but this property is unsorted so I guess the order can change between two calls.
Another idea was using a formula query but it does not seem that ordering is possible with these queries.
The third idea was the Database.getModifiedDocuments method with a corresponding Document.getLastModified one. It seemed good but
it looks to me that the ordering of the returned collection is not documented and based on creation time instead of last modification time.
Here is a sample code based on the official example:
System.out.println("startDate: " + startDate);
final DocumentCollection documentCollection =
database.getModifiedDocuments(startDate, Database.DBMOD_DOC_DATA);
Document doc = documentCollection.getFirstDocument();
while (doc != null) {
System.out.println("#lastmod: " + doc.getLastModified() +
" #created: " + doc.getCreated());
doc = documentCollection.getNextDocument(doc);
}
It prints the following:
startDate: 2012.07.03 08:51:11 CEDT
#lastmod: 2012.07.03 08:51:11 CEDT #created: 2012.02.23 10:35:31 CET
#lastmod: 2012.08.03 12:20:33 CEDT #created: 2012.06.01 16:26:35 CEDT
#lastmod: 2012.07.03 09:20:53 CEDT #created: 2012.07.03 09:20:03 CEDT
#lastmod: 2012.07.21 23:17:35 CEDT #created: 2012.07.03 09:24:44 CEDT
#lastmod: 2012.07.03 10:10:53 CEDT #created: 2012.07.03 10:10:41 CEDT
#lastmod: 2012.07.23 16:26:22 CEDT #created: 2012.07.23 16:26:22 CEDT
(I don't use any AgentContext here to access the database. The database object comes from a session.getDatabase(null, databaseName) call.)
Is there any way to reliably do this with the Lotus Domino Java API?

If you have access to change the database, or could ask someone to do so, then you should create a view that is sorted on a unique key, or modified date, and then just store the "pointer" to the last document processed.
Barring that, you'll have to maintain a list of previously processed documents yourself. In that case you can use the AllDocuments property and just iterate through them. Use the GetFirstDocument and GetNextDocument as they are reportedly faster than GetNthDocument.
Alternatively you could make two passes, one to gather a list of UNIDs for all documents, which you'll store, and then make a second pass to process each document from the list of UNIDs you have (using GetDocumentByUNID method).

I don't use the Java API, but in Lotusscript, I would do something like this:
Locate a view displaying all documents in the database. If you want the agent to be really fast, create a new view. The first column should be sorted and could contain the Universal ID of the document. The other columns contains all the values you want to read in your agent, in your example that would be the created date and last modified date.
Your code could then simply loop through the view like this:
lastSuccessful = FunctionToReadValuesSomewhere() ' Returns 0 if empty
Set view = thisdb.GetView("MyLookupView")
Set col = view.AllEntries
Set entry = col.GetFirstEntry
cnt = 0
Do Until entry is Nothing
cnt = cnt + 1
If cnt > lastSuccessful Then
universalID = entry.ColumnValues(0)
createDate = entry.ColumnValues(1)
lastmodifiedDate = entry.ColumnValues(2)
Call YourFunctionToDoStuff(universalID, createDate, lastmodifiedDate)
Call FunctionToStoreValuesSomeWhere(cnt, universalID)
End If
Set entry = col.GetFirstEntry
Loop
Call FunctionToClearValuesSomeWhere()
Simply store the last successful value and Universal ID in say a text file or environment variable or even profile document in the database.
When you restart the agent, have some code that check if the values are blank (then return 0), otherwise return the last successful value.

Agents already keep a field to describe documents that they have not yet processed, and these are automatically updated via normal processing.
A better way of doing what you're attempting to do might be to store the results of a search in a profile document. However, if you're trying to relate to documents in a database you do not have write permission to, the only thing you can do is keep a list of the doclinks you've already processed (and any information you need to keep about those documents), or a sister database holding one document for each doclink plus multiple fields related to the processing you've done on them. Then, transfer the lists of IDs and perform the matching on the client to do per-document lookups.

Lotus Notes/Domino databases are designed to be distributed across clients and servers in a replicated environment. In the general case, you do not have a guarantee that starting at a given creation or mod time will bring you consistent results.
If you are 100% certain that no replicas of your target database are ever made, then you can use getModifiedDocuments and then write a sort routine to place (modDateTime,UNID) pairs into a SortedSet or other suitable data structure. Then you can process through the Set, and if you run into an error you can save the modDateTime of the element that you were attempting to process as your restart point. There may be a few additional details for you to work out to avoid duplicates, however, if there are multiple documents with the exact same modDateTime stamp.
I want to make one final remark. I understand that you are asking about Java, but if you are working on a backup or archiving system for compliance purposes, the Lotus C API has special functions that you really should look at.

Related

How to evolve schema in Janusgraph?

I uploaded movie and user data to janusgraph and initially I made index on movieId but, later I realised I need to index movie title as well. I need to do query based on movie title and without indexing movie id, it's giving me warning "Query requires iterating over all vertices". So, I added the code:
JanusGraphManagement mgmt = graph.openManagement();
PropertyKey title = mgmt.getPropertyKey("title");
JanusGraphManagement.IndexBuilder movieNameIndexBuilder = mgmt.buildIndex("title", Vertex.class)
.addKey(title);
movieNameIndexBuilder.unique();
JanusGraphIndex movieTitleIndex = movieNameIndexBuilder.buildCompositeIndex();
mgmt.setConsistency(movieTitleIndex, ConsistencyModifier.LOCK);
mgmt.commit();
Still I'm getting the same warning "Query requires iterating over all vertices" when I'm querying on movie title.
Thank you

Got the solution from Janusgraph gitter channel:
The index isn't available immediately if the the indexed property key was created in a previous management transaction as JanusGraph might need to reindex existing data now. That is a process you have to trigger manually. You can read more about this in the chapter Index Management of the docs.
That is why it's recommended to create all indices in the same transaction where you create the property keys if possible

Why Spark dataframe cache doesn't work here

I just wrote a toy class to test Spark dataframe (actually Dataset since I'm using Java).
Dataset<Row> ds = spark.sql("select id,name,gender from test2.dummy where dt='2018-12-12'");
ds = ds.withColumn("dt", lit("2018-12-17"));
ds.cache();
ds.write().mode(SaveMode.Append).insertInto("test2.dummy");
//
System.out.println(ds.count());
According to my understanding, there're 2 actions, "insertInto" and "count".
I debug the code step by step, when running "insertInto", I see several lines of:
19/01/21 20:14:56 INFO FileScanRDD: Reading File path: hdfs://ip:9000/root/hive/warehouse/test2.db/dummy/dt=2018-12-12/000000_0, range: 0-451, partition values: [2018-12-12]
When running "count", I still see similar logs:
19/01/21 20:15:26 INFO FileScanRDD: Reading File path: hdfs://ip:9000/root/hive/warehouse/test2.db/dummy/dt=2018-12-12/000000_0, range: 0-451, partition values: [2018-12-12]
I have 2 questions:
1) When there're 2 actions on same dataframe like above, if I don't call ds.cache or ds.persist explicitly, will the 2nd action always causes the re-executing of the sql query?
2) If I understand the log correctly, both actions trigger hdfs file reading, does that mean the ds.cache() actually doesn't work here? If so, why it doesn't work here?
Many thanks.

It's because you append into the table where ds is created from, so ds needs to be recomputed because the underlying data changed. In such cases, spark invalidates the cache. If you read e.g. this Jira (https://issues.apache.org/jira/browse/SPARK-24596):
When invalidating a cache, we invalid other caches dependent on this
cache to ensure cached data is up to date. For example, when the
underlying table has been modified or the table has been dropped
itself, all caches that use this table should be invalidated or
refreshed.
Try to run the ds.count before inserting into the table.

I found that the other answer doesn't work. What I had to do was break lineage such that the df I was writing does not know that one of its source is the table I am writing to. To break lineage, I created a copy df using
copy_of_df = sql_context.createDataframe(df.rdd)

Best practice for SOLR partial index in order to update attributes that change frequently in Hybris

My scenario is like this.
Solr Indexing happens for a product and then product approval status is made unapproved from backoffice. After then, when you search the related words that is placed in description of the product or directly product code from website, you get a server error since the product that is made unapproved is still placed in solr.
If you perform any type of indexing from backoffice manually, it works again. But it is not a good solution since there might be lots of products whose status is changed or that is not a solution which happens instantly. If you use cronjob for indexing, that is not a fast solution again.You get server error until cronjob starts to work.
I would like to update solr index instantly for the attributes which changes frequently like price, status, etc. For instance, when an attribute changes, Is it a good way to start partial index immediately in java code? If it is, how? (by IndexerService?). For another solution, Is it a better idea to make http request to solr for the attribute?
In summary, I am looking for the best solution to perform partial index.
Any ideas?

For this case you need to write two new important SOLR-Configuration parts:
1) A new SOLR-Cronjob that trigger the indexing
2) A new SOLR-IndexerQuery for indexing with your special requirements.
When you have a look at the default stuff from hybris you see:
INSERT_UPDATE CronJob;code[unique=true];job(code);singleExecutable;sessionLanguage(isocode);active;
;backofficeSolrIndexerUpdateCronJob;backofficeSolrIndexerUpdateJob;false;en;false;
INSERT Trigger;cronJob(code);active;activationTime;year;month;day;hour;minute;second;relative;weekInterval;daysOfWeek;
;backofficeSolrIndexerUpdateCronJob;true;;-1;-1;-1;-1;-1;05;false;0;;
This part above is to configure when the job should run. You can modify him, that he should run ever 5 seconds for example.
INSERT_UPDATE SolrIndexerQuery; solrIndexedType(identifier)[unique = true]; identifier[unique = true]; type(code); injectCurrentDate[default = true]; injectCurrentTime[default = true]; injectLastIndexTime[default = true]; query; user(uid)
; $solrIndexedType ; $solrIndexedType-updateQuery ; update ; false ; false ; false ; "SELECT DISTINCT {PK} FROM {Product AS p JOIN VariantProduct AS vp ON {p.PK}={vp.baseProduct} } WHERE {p.modifiedtime} >= ?lastStartTimeWithSuccess OR {vp.modifiedtime} >= ?lastStartTimeWithSuccess" ; admin
The second part here is the more important. Here you define which products should be indexed. Here you can see that the UPDATE-Job is looking for every Product that was modified. Here you could write a new FlexibleSearch with your special requirements.
tl;tr Answear: You have to write a new performant solrIndexerQuery that could be trigger every 5 seconds

Using API calls to retrieve commit ID that is generated when you revert a Gerrit change

I've been trying to retrieve the generated commit ID from a reverted Gerrit change using https://github.com/uwolfer/gerrit-rest-java-client but haven't been able to find a way to do so.
One of the ways I've been trying to get this ID is trying to get access to the related changes list.
The REST API documentation shows that you can use a query to retrieve this list.
How can I retrieve this list using API calls?
Is there another way to retrieve this commit ID?
I want to use this to track reverts and be able to analyze possible impacts this revert has on the project.

Found a way to solve this issue. What I did was adding a "&o=MESSAGES" tag to the query to retrieve the full change history list where the revert message gives you the target commit ID.
I then transfer the Collection<> that is returned into a list so I can easily access all the messages.
Collection<ChangeMessageInfo> messageColl = gerritClient.changes().query("commit:<commitID>&o=MESSAGES").get().get(0).messages;
final List<ChangeMessageInfo> messageList = new ArrayList<>(messageColl);
The revert message is usually the last entry of the change history log.
List of tags that can be used in a similar manner can be found here. You need to scroll down a bit to find the tags.
UPDATE:
Found an even more effecient way of finding the reverted commits.
With the code below you are able to retrieve the body message below the subject on Gerrit which in turn enables the possibility to query the commit ID that is presented in that field.
List<String> revertedCommits = new ArrayList<>();
revertedCommits.add(<commitID>);
String revertedCommit = "unknown";
Map<String, RevisionInfo> revisionInfo = gerritClient.changes().query("commit:" + revertedCommits.get(revertedCommits.size() - 1) + "&o=CURRENT_REVISION&o=COMMIT_FOOTERS").get().get(0).revisions;
Pattern p = Pattern.compile(Pattern.quote("This reverts commit ") + "(.*?)" + Pattern.quote("."));
Matcher m = p.matcher(revisionInfo.values().iterator().next().commitWithFooters);
while (m.find()) {
revertedCommit = m.group(1);
}
This can then be iterated through to find all the reverts connected to the first commit.
Note that I use the "&o=CURRENT_REVISION" and "&o=COMMIT_FOOTERS" tags in the query to access this information. Without these tags, you only get an empty array.

Lotus getting field from database

I am new to lotus. I need to get some info from Lotus database with Java. I have database:
Session session = NotesFactory.createSession(host, user, pwd);
Database database = session.getDatabase(server, database);
I have that info:
field - fldContractorCode;
form - form="formAgreement";
For example field is "abcde";
So how I can get info from that database? I need to use seatch formula? Or what methods I need to use? Thanx for help.
UPD
Now I am using such way:
DocumentCollection collection = DATABASE.search("form=\"formAgreement\"");
Document doc = collection.getFirstDocument();
while(doc != null) {
doc.getItemValueString("fldContractorCode");
doc = collection.getNextDocument();
}
And it works fine for me, but I think that way is not very comfortable because to find some document for example with field="abcd" I need to itearte over collection every time...
So that why I am asking for some way to find document by the field value. And I dont understand what is VIEW in database and where to get this VIEW name.

In your existing code, you can just change one line:
DocumentCollection collection = DATABASE.search("form=\"formAgreement\ & "fldContractorCode=\"abcd\"");
However, this will be slow if the database contains many documents. For best performance, you should consider using Domino Designer to add a new view to your database and using the getDocumentByKey() method suggested in the other answers. If that is not an option, Simon's suggestion of using the FTSearch() method is faster than the Search() method, but only if a full text index exists for the database. It also has a slightly different syntax for the search string.

There are a number of ways to get the document.
1. Search for the document from a view, where the first column of the view contains a sorted value of the fldContractorCode.
For example:
String key = "abide";
View view = db.getView("viewName");
Document doc = view.getDocumentByKey(key, true);
2. You can use the Database FTSearch Method to do a full text search to find the document. You will need the database to have a full text index created.
3. If you know the UNID or notes ID of the document you can use getDocumentByUNID() or getDocumentByID().
Your question is quite broad, so I recommend reading the Infocenter as it details sample code for each method.
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/topic/com.ibm.designer.domino.main.doc/H_NOTESDATABASE_CLASS_JAVA.html

You will have to drill down to the DOCUMENT (not Form) you want to retrieve the field from.
Lotus Notes has a very easy to understand hierarchical way to get to where you want. You will need to instantiate objects in this sequence:
Session
Database
View
Document
Let's say you have a view called $(sysAgreements) that list all forms "formAgreement".
Its selection formula would be something like this:
SELECT Form="formAgreement"
To get to the document or documents you want you will do something like this:
Session session = NotesFactory.createSession(host, user, pwd);
Database database = session.getDatabase(server, database);
View view = database.getView("$(sysAgreements)");
Document doc = view.getDocumentByKey(VIEW_KEY);
String fieldContent = doc.getItemValueString("fldContractorCode");
There are several ways to retrieve info from a Notes database. This is one of them. Bear in mind that they key used by Notes to search a view with getDocumentByKey is the 1st sorted column.
If you want to get multiple documents you can use:
DocumentCollection docCol = view.getAllDocumentsByKey(VIEW_KEY);
and then iterate over it.
Avoid doing ftsearch because it's slow and a bit painful to Notes. Prefere looking up in the views.
Also another powerful source of help is the Notes help. Get the help database from a computer that has the Notes Development Client installed. But pay attention to the name of the help you're picking, there are 3 helps in Notes: the client, development and administration. Development is what you want.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Iterating over every document in Lotus Domino - java

Related

How to evolve schema in Janusgraph?

Why Spark dataframe cache doesn't work here

Best practice for SOLR partial index in order to update attributes that change frequently in Hybris

Using API calls to retrieve commit ID that is generated when you revert a Gerrit change

Lotus getting field from database

Categories

Resources