How to create different documents in elastic search but same ID - java

Suppose I have 10 instances of an object that has some fields. One of the field specify the ID. Two other fields have data regarding the location (latitude and longitude). One field has date time information.
In these 10 instances say the ID remains the same but fields related to date time and location changes.
Instance1 = id - 123; lat - 58.00; lon - 16.00; date - 2017-07-11 12:19:00
Instance1 = id - 123; lat - 60.00; lon - 17.00; date - 2017-07-11 12:29:00
Instance1 = id - 123; lat - 62.00; lon - 18.00; date - 2017-07-11 12:39:00
Instance1 = id - 123; lat - 64.00; lon - 19.00; date - 2017-07-11 12:49:00
Instance1 = id - 123; lat - 66.00; lon - 20.00; date - 2017-07-11 12:59:00
The above data is dummy data. But you can observe that id remains the same, just position and time changes.
Using java, spring-data-elasticsearch i am able to put the information in elastic search.
The problem that i face is that in elastic search the total number of documents that are created is 1.
First a document with id as 123 is created and then that document is updated 9 times. So the final document has _version as 10.
How can i create 10 different documents in elastic search with same id.
Another thing to note is that these fields are not present in same class, the actual data object is quite complex, but as a whole the data object will have these fields. Also the data object can have different type of instances but these fields will be present.
If these fields would have been present in a same class then i could have used the fields for generating hashcode, so that a unique has code is generated and thus be used for ID.
Please can someone suggest how can i achieve in creating different documents...or a single document but having 10 sub instances in elastic search.
Thanks

from documentation
_id field Each document has an _id that uniquely identifies it, which is indexed so that documents can be looked up either with the GET API
or the ids query.
Spring will map id in class to _id in elasticsearch document.
_id in elasticsearch plays very specific role and can not accommodate discribed functionality.
I would suggest you to use different field name like InstanceId. In this case it will be possible to have same value in different documents.

Related

GAE JAVA - Getting Object by Property

I have Entity say, User { id (primary_key), phone } to be stored in Datastore.
While Retrieving same i can use getObjectById(User.class,id) to get object. Is there way to get object with non-key property, say phone.
As per the documentation, Datastore creates index updates for Property "phone" too.
How do we use this index to get result?
You can simply use a JDO Query like following and Datastore will query on the non-key phone property (assuming you haven't set to to be unindexed).
q = pm.newQuery(User.class,"phone == '1234567890'");

Neo4j - querying the nodes based on time property with "<" or ">" using Scala/Neo4j API

Am able to query the nodes based on a keyword like below
val articles = article_content_index.query("article_data: keyword")
But, I also need to query the nodes based on the time property, I tried doing this
val time = 1000
val articles = article_content_index.query("article_data: keyword AND time >"+time).iterator()
I got the java.lang.NullPointerException. So, How do I query the nodes based on time property?

Building a datastructure in Java

I am trying to construct a DTO object to ferry the data from data layer to view layer.
The data is as follows:
There are 7 days(dates can be used as a key in Map or any other datastructure)
The individual dates will contain multiple records.
Each record contains contact details obtained from multiple tables.
One record needs to be constructed from 3 rows in the result table. ie:
A record may return three rows with same values for all columns except for the user details; which contains details like id,name and designation.
When I display, I need to show their name as Manager and assistant manager in the same row.
Data Layer
T01 25/12/2012 ABC XYZ Manager
T01 25/12/2012 ABC IJK Asst.manager
Display:
Date 1
TaskID Taskdeadline TaskGivenBy Task assigned to Manager Task Assigned toAsst.Manager
T01 25/12/2012 ABC XYZ IJK
T02 1/1/2013 BCE WUV MNO
Solution I tried:
Map<Date,Map<Position,Object>>
Map<25/12/2012,Map<(Manager,Object details),(Asst Manager,Object details)>
and then repeat it. But I guess I am storing duplicate data. I don't think this is an ideal solution

Iterating over every document in Lotus Domino

I'd like iterate over every document in a (probably big) Lotus Domino database and be able to continue it from the last one if the processing breaks (network connection error, application restart etc.). I don't have write access to the database.
I'm looking for a way where I don't have to download those documents from the server which were already processed. So, I have to pass some starting information to the server which document should be the first in the (possibly restarted) processing.
I've checked the AllDocuments property and the DocumentColletion.getNthDocument method but this property is unsorted so I guess the order can change between two calls.
Another idea was using a formula query but it does not seem that ordering is possible with these queries.
The third idea was the Database.getModifiedDocuments method with a corresponding Document.getLastModified one. It seemed good but
it looks to me that the ordering of the returned collection is not documented and based on creation time instead of last modification time.
Here is a sample code based on the official example:
System.out.println("startDate: " + startDate);
final DocumentCollection documentCollection =
database.getModifiedDocuments(startDate, Database.DBMOD_DOC_DATA);
Document doc = documentCollection.getFirstDocument();
while (doc != null) {
System.out.println("#lastmod: " + doc.getLastModified() +
" #created: " + doc.getCreated());
doc = documentCollection.getNextDocument(doc);
}
It prints the following:
startDate: 2012.07.03 08:51:11 CEDT
#lastmod: 2012.07.03 08:51:11 CEDT #created: 2012.02.23 10:35:31 CET
#lastmod: 2012.08.03 12:20:33 CEDT #created: 2012.06.01 16:26:35 CEDT
#lastmod: 2012.07.03 09:20:53 CEDT #created: 2012.07.03 09:20:03 CEDT
#lastmod: 2012.07.21 23:17:35 CEDT #created: 2012.07.03 09:24:44 CEDT
#lastmod: 2012.07.03 10:10:53 CEDT #created: 2012.07.03 10:10:41 CEDT
#lastmod: 2012.07.23 16:26:22 CEDT #created: 2012.07.23 16:26:22 CEDT
(I don't use any AgentContext here to access the database. The database object comes from a session.getDatabase(null, databaseName) call.)
Is there any way to reliably do this with the Lotus Domino Java API?
If you have access to change the database, or could ask someone to do so, then you should create a view that is sorted on a unique key, or modified date, and then just store the "pointer" to the last document processed.
Barring that, you'll have to maintain a list of previously processed documents yourself. In that case you can use the AllDocuments property and just iterate through them. Use the GetFirstDocument and GetNextDocument as they are reportedly faster than GetNthDocument.
Alternatively you could make two passes, one to gather a list of UNIDs for all documents, which you'll store, and then make a second pass to process each document from the list of UNIDs you have (using GetDocumentByUNID method).
I don't use the Java API, but in Lotusscript, I would do something like this:
Locate a view displaying all documents in the database. If you want the agent to be really fast, create a new view. The first column should be sorted and could contain the Universal ID of the document. The other columns contains all the values you want to read in your agent, in your example that would be the created date and last modified date.
Your code could then simply loop through the view like this:
lastSuccessful = FunctionToReadValuesSomewhere() ' Returns 0 if empty
Set view = thisdb.GetView("MyLookupView")
Set col = view.AllEntries
Set entry = col.GetFirstEntry
cnt = 0
Do Until entry is Nothing
cnt = cnt + 1
If cnt > lastSuccessful Then
universalID = entry.ColumnValues(0)
createDate = entry.ColumnValues(1)
lastmodifiedDate = entry.ColumnValues(2)
Call YourFunctionToDoStuff(universalID, createDate, lastmodifiedDate)
Call FunctionToStoreValuesSomeWhere(cnt, universalID)
End If
Set entry = col.GetFirstEntry
Loop
Call FunctionToClearValuesSomeWhere()
Simply store the last successful value and Universal ID in say a text file or environment variable or even profile document in the database.
When you restart the agent, have some code that check if the values are blank (then return 0), otherwise return the last successful value.
Agents already keep a field to describe documents that they have not yet processed, and these are automatically updated via normal processing.
A better way of doing what you're attempting to do might be to store the results of a search in a profile document. However, if you're trying to relate to documents in a database you do not have write permission to, the only thing you can do is keep a list of the doclinks you've already processed (and any information you need to keep about those documents), or a sister database holding one document for each doclink plus multiple fields related to the processing you've done on them. Then, transfer the lists of IDs and perform the matching on the client to do per-document lookups.
Lotus Notes/Domino databases are designed to be distributed across clients and servers in a replicated environment. In the general case, you do not have a guarantee that starting at a given creation or mod time will bring you consistent results.
If you are 100% certain that no replicas of your target database are ever made, then you can use getModifiedDocuments and then write a sort routine to place (modDateTime,UNID) pairs into a SortedSet or other suitable data structure. Then you can process through the Set, and if you run into an error you can save the modDateTime of the element that you were attempting to process as your restart point. There may be a few additional details for you to work out to avoid duplicates, however, if there are multiple documents with the exact same modDateTime stamp.
I want to make one final remark. I understand that you are asking about Java, but if you are working on a backup or archiving system for compliance purposes, the Lotus C API has special functions that you really should look at.

Database timestamps not matching

I have an action in struts2 that will query the database for an object and then copy it with a few changes. Then, it needs to retrieve the new objectID from the copy and create a file called objectID.txt.
Here is relevant the code:
Action Class:
ObjectVO objectVOcopy = objectService.searchObjects(objectId);
//Set the ID to 0 so a new row is added, instead of the current one being updated
objectVOcopy.setObjectId(0);
Date today = new Date();
Timestamp currentTime = new Timestamp(today.getTime());
objectVOcopy.setTimeStamp(currentTime);
//Add copy to database
objectService.addObject(objectVOcopy);
//Get the copy object's ID from the database
int newObjectId = objectService.findObjectId(currentTime);
File inboxFile = new File(parentDirectory.getParent()+"\\folder1\\folder2\\"+newObjectId+".txt");
ObjectDAO
//Retrieve identifying ID of copy object from database
List<ObjectVO> object = getHibernateTemplate().find("from ObjectVO where timeStamp = ?", currentTime);
return object.get(0).getObjectId();
The problem is that more often than not, the ObjectDAO search method will not return anything. When debugging I've noticed that the Timestamp currentTime passed to it is usually about 1-2ms off the value in the database. I have worked around this bug changing the hibernate query to search for objects with a timestamp within 3ms of the one passed, but I'm not sure where this discrepancy is coming from. I'm not recalculating the currentTime; I'm using the same one to retrieve from the database as I am to write to the database. I'm also worried that when I deploy this to another server the discrepancy might be greater. Other than the objectID, this is the only unique identifier so I need to use it to get the copy object.
Does anyone know why this is occuring and is there a better work around than just searching through a range? I'm using Microsoft SQL Server 2008 R2 btw.
Thanks.
Precision in SQL Server's DATETIME data type does not precisely match what you can generate in other languages. SQL Server rounds to the nearest 0.003 - this is why you can say:
DECLARE #d DATETIME = '20120821 23:59:59.997';
SELECT #d;
Result:
2012-08-21 23:59:59.997
Then try:
DECLARE #d DATETIME = '20120821 23:59:59.999';
SELECT #d;
Result:
2012-08-22 00:00:00.000
Since you are using SQL Server 2008 R2, you should make sure to use the DATETIME2 data type instead of DATETIME.
That said, #RedFilter makes a good point - why are you relying on the time stamp when you can use the generated ID instead?
This feels wrong.
Other than the objectID, this is the only unique identifier
Databases have the concept of a unique identifier for a reason. You should really use that to retrieve an instance of your object.
You can use the get method on the Hibernate session and take advantage of the session and second level caches as well.
With your approach you execute a query everytime you retrieve your object.

Categories

Resources