I am pretty new to Elastic Search. I have following things in my code OR looking for solution to 'No Node Available Exception' problem in the following scenario.
1) We have ES running on System with 1 Node and 1 Cluster.
2) We have 4 indexes on ES. (Each index has different type of data, Example: Customer
Preference / Customer Address / Customer Interests / Customer Basic Details)
3) We have WebApplication (as webservice) running on Tomcat.
4) We are calling webservice method as Controller's. -This will receive request from
consumers in the form JSON data.
5) Based on that data(Example: If consumers asks for Customer Preference for given
customer Id then we go to 'Customer Preferences' index) we will go to service(using
Spring) layers.
6) In each of the service layer we get TransportClient instance in SingleTon object and
wait for its response and return the result to Controller.
In a Scenario if consumer asks for all 4 types of data for a Customer, and if we ask first for preference, address, interests and basic details in sequence. It works well. But this adds to performance. So we want these things to process and get data parallel.
So we used Spring Task Executors to do this parallel. In that case we get data from one index and others will get 'No Node Available Exception'. Its pretty random to on say on which data we get this problem.
Pleas help me here.
Thanks in advance!....
I had a similar issue, when trying to write data into a same ES node from multiple web applications. I fixed it by creating separate nodes for each.
I suggest you to try these settings of ES
client.transport.sniff=true
sniffOnConnectionFault=true
Also you can get the data from 4 indexes in a single query.
For Customer Preference / Customer Address / Customer Interests / Customer Basic Details.
Example code:
SearchRequestBuilder srb = client
.prepareSearch("preference_index", "address_index", "interests_index", "details_index")
.setTypes("preference_doc", "address_doc", "interests_doc", "details_doc")
.setSearchType(SearchType.DEFAULT);
QueryBuilder boolBuilder = QueryBuilders.boolQuery().should(
QueryBuilders.matchQuery("id_customer", "14"));
SearchResponse response = srb.setSize(4).execute().actionGet();
SearchHit[] docs = response.getHits().getHits();
Related
I have a Apache Camel route between two JPA endpoints:
from("jpa://Data").to("jpa://DataConverted");
I basically want to do two things: fetch and copy data from my Data entity table to a similar dataConverted entity table in another database, and mark my Data entities with data.hasBeenCopied(true), only after successfully copying it though.
My route looks as follows:
from("jpa://Data").process(ex -> {
Data data = ex.getIn().getBody(Data.class);
DataConverted dataConverted = convertData(data);
ex.getMessage().setBody(dataConverted);
})
.recipientList(constant("direct:DataConverted","direct:updateFlag")).end();
from("direct:DataConverted").to("jpa://DataConverted").end;
from("direct:updateFlag").process(ex -> {
DataConverted dataConverted = ex.getIn().getBody(DataConverted.class);
var originalData = myDao.getData(dataConverted.getId());
originalData.setHasBeenCopied(true);
}).to("jpa://Data).end();
This runs without error, however it isn't setting the flag in my original database!
What did work was to call data.setHasBeenCopied(true); in the first process directly after from("jpa://Data") - however, this means that the flag is already set and if something happens with the copy process (e.g. the target database isn't available) the route will crash but the flag will stay set for that one data entity.
Note that I haven't called transacted() on my route as that didn't work out for me (multiple interfering transactions were opened).
Any idea how to proceed? Is Camel unable to update existing data via .to()? I can add my Camel configurations of the endpoints and such if needed, but it would probably get a bit long.
I uploaded movie and user data to janusgraph and initially I made index on movieId but, later I realised I need to index movie title as well. I need to do query based on movie title and without indexing movie id, it's giving me warning "Query requires iterating over all vertices". So, I added the code:
JanusGraphManagement mgmt = graph.openManagement();
PropertyKey title = mgmt.getPropertyKey("title");
JanusGraphManagement.IndexBuilder movieNameIndexBuilder = mgmt.buildIndex("title", Vertex.class)
.addKey(title);
movieNameIndexBuilder.unique();
JanusGraphIndex movieTitleIndex = movieNameIndexBuilder.buildCompositeIndex();
mgmt.setConsistency(movieTitleIndex, ConsistencyModifier.LOCK);
mgmt.commit();
Still I'm getting the same warning "Query requires iterating over all vertices" when I'm querying on movie title.
Thank you
Got the solution from Janusgraph gitter channel:
The index isn't available immediately if the the indexed property key was created in a previous management transaction as JanusGraph might need to reindex existing data now. That is a process you have to trigger manually. You can read more about this in the chapter Index Management of the docs.
That is why it's recommended to create all indices in the same transaction where you create the property keys if possible
My scenario is like this.
Solr Indexing happens for a product and then product approval status is made unapproved from backoffice. After then, when you search the related words that is placed in description of the product or directly product code from website, you get a server error since the product that is made unapproved is still placed in solr.
If you perform any type of indexing from backoffice manually, it works again. But it is not a good solution since there might be lots of products whose status is changed or that is not a solution which happens instantly. If you use cronjob for indexing, that is not a fast solution again.You get server error until cronjob starts to work.
I would like to update solr index instantly for the attributes which changes frequently like price, status, etc. For instance, when an attribute changes, Is it a good way to start partial index immediately in java code? If it is, how? (by IndexerService?). For another solution, Is it a better idea to make http request to solr for the attribute?
In summary, I am looking for the best solution to perform partial index.
Any ideas?
For this case you need to write two new important SOLR-Configuration parts:
1) A new SOLR-Cronjob that trigger the indexing
2) A new SOLR-IndexerQuery for indexing with your special requirements.
When you have a look at the default stuff from hybris you see:
INSERT_UPDATE CronJob;code[unique=true];job(code);singleExecutable;sessionLanguage(isocode);active;
;backofficeSolrIndexerUpdateCronJob;backofficeSolrIndexerUpdateJob;false;en;false;
INSERT Trigger;cronJob(code);active;activationTime;year;month;day;hour;minute;second;relative;weekInterval;daysOfWeek;
;backofficeSolrIndexerUpdateCronJob;true;;-1;-1;-1;-1;-1;05;false;0;;
This part above is to configure when the job should run. You can modify him, that he should run ever 5 seconds for example.
INSERT_UPDATE SolrIndexerQuery; solrIndexedType(identifier)[unique = true]; identifier[unique = true]; type(code); injectCurrentDate[default = true]; injectCurrentTime[default = true]; injectLastIndexTime[default = true]; query; user(uid)
; $solrIndexedType ; $solrIndexedType-updateQuery ; update ; false ; false ; false ; "SELECT DISTINCT {PK} FROM {Product AS p JOIN VariantProduct AS vp ON {p.PK}={vp.baseProduct} } WHERE {p.modifiedtime} >= ?lastStartTimeWithSuccess OR {vp.modifiedtime} >= ?lastStartTimeWithSuccess" ; admin
The second part here is the more important. Here you define which products should be indexed. Here you can see that the UPDATE-Job is looking for every Product that was modified. Here you could write a new FlexibleSearch with your special requirements.
tl;tr Answear: You have to write a new performant solrIndexerQuery that could be trigger every 5 seconds
I'm currently facing very slow/ no response on a collection looking by ID. I have ~ 2 milion of documents in a partitioned collection. If lookup the document using the partitionKey and id the response is immediate
SELECT * FROM c WHERE c.partitionKey=123 AND c.id="20566-2"
if I try using only the id
SELECT * FROM c WHERE c.id="20566-2"
the response never returns, java client seems freezed and I have the same situation using the Data Explorer from Azure Portal. I tried also looking up by another field that isn't the id or the partitionKey and the response always returns. When I try the select from Java client I always set the flag to enable cross partition query.
The next thing to try is to avoid the character "-" in the ID to test if this character blocks the query (anyway I didn't find anything on the documentation)
The issue is related to your Java code. Due to Azure DocumentDB Java SDK wrapped the DocumentDB REST APIs, according to the reference of REST API Query Documents, as #DanCiborowski-MSFT said, the header x-ms-documentdb-query-enablecrosspartition explains your issue reason as below.
Header: x-ms-documentdb-query-enablecrosspartition
Required/Type: Optional/Boolean
Description: If the collection is partitioned, this must be set to True to allow execution across multiple partitions. Queries that filter against a single partition key, or against single-partitioned collections do not need to set the header.
So you need to set True to enable cross partition for querying across multiple partitions without a partitionKey in where clause via pass a instance of class FeedOption to the method queryDocuments, as below.
FeedOptions queryOptions = new FeedOptions();
queryOptions.setEnableCrossPartitionQuery(true); // Enable query across multiple partitions
String collectionLink = collection.getSelfLink();
FeedResponse<Document> queryResults = documentClient.queryDocuments(
collectionLink,
"SELECT * FROM c WHERE c.id='20566-2'", queryOptions);
I'd like iterate over every document in a (probably big) Lotus Domino database and be able to continue it from the last one if the processing breaks (network connection error, application restart etc.). I don't have write access to the database.
I'm looking for a way where I don't have to download those documents from the server which were already processed. So, I have to pass some starting information to the server which document should be the first in the (possibly restarted) processing.
I've checked the AllDocuments property and the DocumentColletion.getNthDocument method but this property is unsorted so I guess the order can change between two calls.
Another idea was using a formula query but it does not seem that ordering is possible with these queries.
The third idea was the Database.getModifiedDocuments method with a corresponding Document.getLastModified one. It seemed good but
it looks to me that the ordering of the returned collection is not documented and based on creation time instead of last modification time.
Here is a sample code based on the official example:
System.out.println("startDate: " + startDate);
final DocumentCollection documentCollection =
database.getModifiedDocuments(startDate, Database.DBMOD_DOC_DATA);
Document doc = documentCollection.getFirstDocument();
while (doc != null) {
System.out.println("#lastmod: " + doc.getLastModified() +
" #created: " + doc.getCreated());
doc = documentCollection.getNextDocument(doc);
}
It prints the following:
startDate: 2012.07.03 08:51:11 CEDT
#lastmod: 2012.07.03 08:51:11 CEDT #created: 2012.02.23 10:35:31 CET
#lastmod: 2012.08.03 12:20:33 CEDT #created: 2012.06.01 16:26:35 CEDT
#lastmod: 2012.07.03 09:20:53 CEDT #created: 2012.07.03 09:20:03 CEDT
#lastmod: 2012.07.21 23:17:35 CEDT #created: 2012.07.03 09:24:44 CEDT
#lastmod: 2012.07.03 10:10:53 CEDT #created: 2012.07.03 10:10:41 CEDT
#lastmod: 2012.07.23 16:26:22 CEDT #created: 2012.07.23 16:26:22 CEDT
(I don't use any AgentContext here to access the database. The database object comes from a session.getDatabase(null, databaseName) call.)
Is there any way to reliably do this with the Lotus Domino Java API?
If you have access to change the database, or could ask someone to do so, then you should create a view that is sorted on a unique key, or modified date, and then just store the "pointer" to the last document processed.
Barring that, you'll have to maintain a list of previously processed documents yourself. In that case you can use the AllDocuments property and just iterate through them. Use the GetFirstDocument and GetNextDocument as they are reportedly faster than GetNthDocument.
Alternatively you could make two passes, one to gather a list of UNIDs for all documents, which you'll store, and then make a second pass to process each document from the list of UNIDs you have (using GetDocumentByUNID method).
I don't use the Java API, but in Lotusscript, I would do something like this:
Locate a view displaying all documents in the database. If you want the agent to be really fast, create a new view. The first column should be sorted and could contain the Universal ID of the document. The other columns contains all the values you want to read in your agent, in your example that would be the created date and last modified date.
Your code could then simply loop through the view like this:
lastSuccessful = FunctionToReadValuesSomewhere() ' Returns 0 if empty
Set view = thisdb.GetView("MyLookupView")
Set col = view.AllEntries
Set entry = col.GetFirstEntry
cnt = 0
Do Until entry is Nothing
cnt = cnt + 1
If cnt > lastSuccessful Then
universalID = entry.ColumnValues(0)
createDate = entry.ColumnValues(1)
lastmodifiedDate = entry.ColumnValues(2)
Call YourFunctionToDoStuff(universalID, createDate, lastmodifiedDate)
Call FunctionToStoreValuesSomeWhere(cnt, universalID)
End If
Set entry = col.GetFirstEntry
Loop
Call FunctionToClearValuesSomeWhere()
Simply store the last successful value and Universal ID in say a text file or environment variable or even profile document in the database.
When you restart the agent, have some code that check if the values are blank (then return 0), otherwise return the last successful value.
Agents already keep a field to describe documents that they have not yet processed, and these are automatically updated via normal processing.
A better way of doing what you're attempting to do might be to store the results of a search in a profile document. However, if you're trying to relate to documents in a database you do not have write permission to, the only thing you can do is keep a list of the doclinks you've already processed (and any information you need to keep about those documents), or a sister database holding one document for each doclink plus multiple fields related to the processing you've done on them. Then, transfer the lists of IDs and perform the matching on the client to do per-document lookups.
Lotus Notes/Domino databases are designed to be distributed across clients and servers in a replicated environment. In the general case, you do not have a guarantee that starting at a given creation or mod time will bring you consistent results.
If you are 100% certain that no replicas of your target database are ever made, then you can use getModifiedDocuments and then write a sort routine to place (modDateTime,UNID) pairs into a SortedSet or other suitable data structure. Then you can process through the Set, and if you run into an error you can save the modDateTime of the element that you were attempting to process as your restart point. There may be a few additional details for you to work out to avoid duplicates, however, if there are multiple documents with the exact same modDateTime stamp.
I want to make one final remark. I understand that you are asking about Java, but if you are working on a backup or archiving system for compliance purposes, the Lotus C API has special functions that you really should look at.