Solr doesn't overwrite - duplicated uniqueKey entries - java

I have a problem with Solr 5.3.1 . My Schema is rather simple. I have one uniqueKey which is the "id" as string. indexed, stored and required, non-multivalued.
I add documents first with a "content_type:document_unfinished" and then overwrite the same document, with the same id but another content_type:document. The document is then twice in the index. Again, the only uniqueKey is "id", as string. The id is coming originally from a mysql-index primary int.
Also looks like this happens not only once:
http://lucene.472066.n3.nabble.com/uniqueKey-not-enforced-td4015086.html
http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-td4129651.html
In my case not all the documents in the index are duplicated, just some. I was assuming - initially - that they are getting overwritten on commit when the same uniqueKey exists in the index. Which doesn't seem to work like I expected it. I do not want to simply update some fields in the document, I want to completely replace it, with all the children.
Some stats: around 350k documents in the index. Mostly with childDocuments. The Documents are distinguished by a "content_type" field. I used SolrJ to import them in that way:
HttpSolrServer server = new HttpSolrServer(url);
server.add(a Collection<SolrInputDocument>);
server.commit();
I am always adding a whole document with all the children again. Its nothing overly fancy. I end up with duplicated documents for the same uniqueKey. There are no side injections. I run only Solr with the integrated Jetty. I do not open the lucene index in java "manually".
What I did then was to delete+insert again. That seemed to work for a while, but then started under some conditions give this error message:
Parent query yields document which is not matched by parents filter
The document where that happens seems to be completely random, just one thing seems to emerge: its a childDocument where it happens. I do not run anything special, basically downloaded the solr package from the website and run it with bin/solr start
Anyone any ideas?
EDIT 1
I think I found the problem, which seems to be a bug? To reproduce the issue:
I downloaded Solr 5.3.1 to a Debian in a virtualBox and started it with bin/solr start. Added a new core with the basic config set. Nothing changed at the basic config set, just copied it over and added the core.
This leads to two documents with the same id in the index:
SolrClient solrClient = new HttpSolrClient("http://192.168.56.102:8983/solr/test1");
SolrInputDocument inputDocument = new SolrInputDocument();
inputDocument.setField("id", "1");
inputDocument.setField("content_type_s", "doc_unfinished");
solrClient.add(inputDocument);
solrClient.commit();
solrClient.close();
solrClient = new HttpSolrClient("http://192.168.56.102:8983/solr/test1");
inputDocument = new SolrInputDocument();
inputDocument.setField("id", "1");
inputDocument.setField("content_type_s", "doc");
SolrInputDocument childDocument = new SolrInputDocument();
childDocument.setField("id","1-1");
childDocument.setField("content_type_s", "subdoc");
inputDocument.addChildDocument(childDocument);
solrClient.add(inputDocument);
solrClient.commit();
solrClient.close();
Searching with:
http://192.168.56.102:8983/solr/test1/select?q=%3A&wt=json&indent=true
leads to the following output:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"indent": "true",
"wt": "json",
"_": "1450078098465"
}
},
"response": {
"numFound": 3,
"start": 0,
"docs": [
{
"id": "1",
"content_type_s": "doc_unfinished",
"_version_": 1520517084715417600
},
{
"id": "1-1",
"content_type_s": "subdoc"
},
{
"id": "1",
"content_type_s": "doc",
"_version_": 1520517084838101000
}
]
}
}
What am I doing wrong?

Thanks for your feedback! I write this as answer since it is too long otherwise. I actually got the same response from the mailing list:
Mikhail Khludnev
Hello Sebastian,
Mixing standalone docs and blocks doesn't work. There are a plenty of
issues open.
On Wed, Mar 9, 2016 at 3:02 PM, Sebastian Riemer
wrote:
Hi,
to actually describe my problem in short, instead of just linking to
the test applicaton, using SolrJ I do the following:
1) Create a new document as a parent and commit
SolrInputDocument parentDoc = new SolrInputDocument();
parentDoc.addField("id", "parent_1");
parentDoc.addField("name_s", "Sarah Connor");
parentDoc.addField("blockJoinId", "1");
solrClient.add(parentDoc);
solrClient.commit();
2) Create a new document with the same unique-id as in 1) with a child
document appended
SolrInputDocument parentDocUpdateing = new SolrInputDocument();
parentDocUpdateing.addField("id", "parent_1");
parentDocUpdateing.addField("name_s", "Sarah Connor");
parentDocUpdateing.addField("blockJoinId", "1");
SolrInputDocument childDoc = new SolrInputDocument();
childDoc.addField("id", "child_1");
childDoc.addField("name_s", "John Connor");
childDoc.addField("blockJoinId", "1");
parentDocUpdateing.addChildDocument(childDoc);
solrClient.add(parentDocUpdateing);
solrClient.commit();
3) Results in 2 Documents with id="parent_1" in solr index
Is this normal behaviour? I thought the existing document should be
updated instead of generating a new document with same id.
For a full working test application please see orginal message.
Best regards,
Sebastian
I think it is a known issue, and there exist several tickets which kind of relate to this, but I am glad that there is a way to deal with it (adding child docs right from the beginning) (https://issues.apache.org/jira/browse/SOLR-6096, https://issues.apache.org/jira/browse/SOLR-5211, https://issues.apache.org/jira/browse/SOLR-7606)

Related

MongoDB Return specified document in Filters

I have this document
It contains this array of document named reviews
I tried this code to get the review that were posted by Théo but it keeps returning me the whole document (including sub-documents in reviews) not the one I'm specifing by with Filters.
Document document = collection.find(Filters.and(Filters.eq("reviews.reviewer_name", "Théo"))).first();
I really can't understand how to get only this specific document. Thanks for any help
If you're trying to do sub-document queries & only retrieve specific sub-documents, there's no way to do that with mongo's simple queries. However, you can use the aggregate pipeline to achieve this.
db.collection.aggregate([
// This is the same as your initial find query, it will limit the top-level docs to only be the ones you are interested in
{ $match: { 'reviewers.reviewer_name': 'Theo' } },
// You can now unwind the results, which will make all the sub-documents top-level
{ $unwind: '$reviewers' },
// Re-match to filter the reviewers, this will actually drop the unmatched reviewers
{ $match: { 'reviewers.reviewer_name': 'Theo' } },
// Now you can use a projection to get the final results you are looking for
{ $project: { reviewer: '$reviewers' } }
])
This will return an array of objects with a reviewer property, each element containing a single review. You can then use the pagination stages to trim the results:
db.collection.aggregate([
// ... same stages as above, and then:
{ $limit: 1 },
])
Not sure what the specific data structures would be with the Java driver you are using, but these are the general mongo queries that will do the trick.
If you want to read more about the aggregate pipeline, I recommend checking out the official documentation which is so awesome that I have it opened all day. They should have some Java examples on there.
Best of luck!

Updating QC ALM defect Comments Section using REST API

Using REST API in Java I am trying to update QC ALM. I am getting value in html format when I am trying to extract comment section for any Defect. So, if I want to add any comment, do I need to pass as HTML content with the previous comment or is there any other way?
I have tried by Just passing the comment but it removes all the previous comment and it does not show the person name who is updating the comment as it happens through GUI.
This would help someone who is new to HP REST API.
1. To find available API end-points,
GET /qcbin/rest/resouce-list
To get Users' full name
GET /qcbin/rest/domains/<domain_name>/projects/<project>/customization/users/<user_name>
To get defect comment, the below request fetches only Defect ID = 1 and outputs dev-comments field.
GET /qcbin/rest/domains/<domain_name>/projects/<project_name>/defects?query={id[1]}&fields=dev-comments
Sample JSON payload,
PUT /qcbin/rest/domains/<domain_name>/projects/<project>/defects/1
{
"Fields": [{
"Name": "dev-comments",
"values": [{
"value": "<html><body><span style=\"font-size:14px\">USER FULL NAME <USER_ID>, 2016-06-29:</span></font></b>\n<font color=\"#767676\" style=\"font-family:'hpsimplified-regular' , sans-serif\"><span style=\"font-size:14px\"> </span></font>Comment 1 \n</div> \n</body></html>"
},
{
"value": "<html><body><span style=\"font-size:14px\">USER FULL NAME <USER_ID>, 2016-06-29:</span></font></b>\n<font color=\"#767676\" style=\"font-family:'hpsimplified-regular' , sans-serif\"><span style=\"font-size:14px\"> </span></font>Comment 2 \n</div> \n</body></html>"
}]
}]
}

ElasticSearch script to update if the value not exist

I am trying to update an es document using Java.
My document is as follows
"_source": {
"gender": "male" ,
"names": ["name1"]
}
I need to add more names to names list. But I want no duplicates.
How can I update an array in an ES document without duplicate values?
I tried something like this. But it's not working.
client.prepareUpdate(index,type,id)
.addScriptParam("newobject", "newName")
.setScript("ctx._source.names.contains(newobject) ? ctx.op = \"none\" : ctx._source.names+=newobject ").execute().actionGet();
The idea would be to simply call unique() on the resulting list:
client.prepareUpdate(index,type,id)
.addScriptParam("newobject", "newName")
.setScript("ctx._source.names+=newobject; ctx._source.names = ctx._source.names.unique(); ").execute().actionGet();
Also for this to work, you need to make sure that scripting is enabled.

Serializing as javascript object

I am working on a Spring MVC and I want to insert javascript into the html output for analytics purpose. I am only partially familiar with serialization but I figured it does the job nicely rather than manually constructing a string containing javascript.
Would it be possible to generate something the following snippet? Any pointers would be great!
"emd" : new Date('6/6/2014')
Update:
I need to output a javascript object which has many fields which may be complex. Hence, on the backend I am gathering all the data into java beans with all the information and I plan to use Jackson mapper to convert to string that I can just output through JSP.
Generating the above snippet does not seem straightforward though, not sure if it is even possible. For context, the rest of that javascript looks something like this.
Analytics.items["item_123"] = {
//ratings and reviews
"rat" : a.b, //the decimal value for the rating
"rev" : xxxx, //integer
//list of flags that indicate how the product was displayed to the customer
//add as needed...tracking code will pick up flags as needed when they are available
"dec" : ["mbe", "green", "recycled"],
//delivery messaging
"delivery" : {
"cd" : new Date() //current date
"offers" : [{
"type" : "abcd"
"emd" : new Date('6/6/2014'),
"weekend" : true
}
]
},
};
JSON.stringify should do the trick. It will be built into your browser, unless you're using a very old browser, in which case you can use a polyfill.

Updating a document in Solr with Java

As everybody knows, the documentation of Solrj in the wiki is pretty poor. I managed to query the index using the CommonsHttpSolrServer, but never with the Embedded version. Anyway, now I'm using the EdgeNGrams to display auto-suggestions, and I have a field "count" in my index, so that I can sort the results by the number of times people queried the element.
What I want to do now, is to be able to update this "count" field in my Java program, which should be quite easy I guess? I looked at the test files from the source code, but it's very complicated, and trying to do something similar always failed for me. Maybe by using Solrj?
Thanks for your help.
Edit:
In my java code, I have:
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
What I expect to get at this point, is the cores defines in solr.xml present in the coreContainer, but there is no core there (but defaultCoreName says collection1). My solr.xml file is the same as in the example dir:
<solr persistent="false">
<cores adminPath="/admin/cores" defaultCoreName="collection1">
<core name="collection1" instanceDir="." />
</cores>
</solr>
Modified from this test example. To add a value to Solr and then subsequently modify it you can do the following:
//add value to Solr
doc = new SolrInputDocument();
doc.addField("id", "A");
doc.addField("value", 10);
client.add(doc);
client.commit();
//query Solr
SolrQuery q = new SolrQuery("id:A");
QueryResponse r = client.query(q);
//update value
SolrDocument oldDoc = r.getResults().get(0);
SolrInputDocument newDoc = new SolrInputDocument();
newDoc.addField("id", oldDoc.getFieldValue("id");
HashMap<String, Object> map = new HashMap<String, Object>();
map.put("inc", 15);
newDoc.addField("value", map);
client.add(newDoc);
client.commit();
This increments the original 10 value to 25. You can also "add" to an existing field or simply "set" an existing value by changing what command you put what you put the in the HashMap.
I finally just store this count in Solr, then retrieve it and update it, then run an update, since it is not possible to update a single field in Solr and also the count field, which could be very handy!

Categories

Resources