Updating a document in Solr with Java - java

As everybody knows, the documentation of Solrj in the wiki is pretty poor. I managed to query the index using the CommonsHttpSolrServer, but never with the Embedded version. Anyway, now I'm using the EdgeNGrams to display auto-suggestions, and I have a field "count" in my index, so that I can sort the results by the number of times people queried the element.
What I want to do now, is to be able to update this "count" field in my Java program, which should be quite easy I guess? I looked at the test files from the source code, but it's very complicated, and trying to do something similar always failed for me. Maybe by using Solrj?
Thanks for your help.
Edit:
In my java code, I have:
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
What I expect to get at this point, is the cores defines in solr.xml present in the coreContainer, but there is no core there (but defaultCoreName says collection1). My solr.xml file is the same as in the example dir:
<solr persistent="false">
<cores adminPath="/admin/cores" defaultCoreName="collection1">
<core name="collection1" instanceDir="." />
</cores>
</solr>

Modified from this test example. To add a value to Solr and then subsequently modify it you can do the following:
//add value to Solr
doc = new SolrInputDocument();
doc.addField("id", "A");
doc.addField("value", 10);
client.add(doc);
client.commit();
//query Solr
SolrQuery q = new SolrQuery("id:A");
QueryResponse r = client.query(q);
//update value
SolrDocument oldDoc = r.getResults().get(0);
SolrInputDocument newDoc = new SolrInputDocument();
newDoc.addField("id", oldDoc.getFieldValue("id");
HashMap<String, Object> map = new HashMap<String, Object>();
map.put("inc", 15);
newDoc.addField("value", map);
client.add(newDoc);
client.commit();
This increments the original 10 value to 25. You can also "add" to an existing field or simply "set" an existing value by changing what command you put what you put the in the HashMap.

I finally just store this count in Solr, then retrieve it and update it, then run an update, since it is not possible to update a single field in Solr and also the count field, which could be very handy!

Related

Storing values of an array on a separate file as the array values changes

So I am running some code which runs over 300k times. Each time this code runs, it returns up to 300k values. I am currently storing the results I get in an ArrayList:
List<List<Object>> thisList = new ArrayList<List<Object>();
for (int i = 0; i < 300000; i++) {
thisList.add(new ArrayList<Object>());
}
for (int i = 0; i < 300000; i++) {
List<Object> result = someCode();
for (Object obj : result) {
thisList.get(obj.id).add(obj.value);
}
}
In this code, everytime obj is obtained, it has a value obj.id which specifies the index in the List where obj.value has to be stored.
What would be the most efficient way to store the results elsewhere as the search continues? My code seems to stop working past iteration 400, most likely due to low memory issues. I have considered using a simple text document where each line represents a List<Object> but through some Googling, it seems there is no way to append to a specific line, and all suggestions seems to point towards overwriting the entire text document. I've never worked with databases before which is why I am trying to avoid that for now.
Would appreciate if someone can give me suggestions on what I could do.
Edit: Is there a method which does not use a database, where after each iteration of the outer for loop, the data can be stored?
For example, given a file which currently contains
List 0: obj.value1 obj.value2
List 1: obj.value1 obj.value4
...
List 300000: obj.value3 obj.value8
and result contains
{obj<1, 100>, obj<0, 3>, ...}
where each object is of the form obj<id, value>, the file becomes
List 0: obj.value1 obj.value2 obj.value3
List 1: obj.value1 obj.value4 obj.value100
...
List 300000: obj.value3 obj.value8
You could store it in an XML file using JAXB api
Here is a link with a little tutorial on JAXB:
https://dzone.com/articles/using-jaxb-for-xml-with-java
Or you could also store it in a JSON file usin json-simple api
Here's another little tutorial:
https://stackabuse.com/reading-and-writing-json-in-java/
These are the links to download JAXB and json-simple from maven:
JAXB: https://mvnrepository.com/artifact/javax.xml.bind/jaxb-api
json-simple: https://mvnrepository.com/artifact/com.googlecode.json-simple/json-simple
Hope it'll be useful to you

What's the difference between Document.addField and Document.setField in solrj?

While adding documents to an index in solr, I've noticed there are two ways to add data one is to addField another is to setField. Can you tell me when to use which method?
SolrInputDocument doc = new SolrInputDocument();
doc.setField("field_name", data);
doc.addField("field_name_2", data2);
SolrInputDocument.addField() - it will add another value to any existing values for the field.It works like an append
SolrInputDocument.setField() - it will overwrite anything that is already there.Discard existing values and start with a fresh list of values.

Elasticsearch Update indexdocument

I need to update an index document for an elasticsearch table and this is the code I have implemented. But it is not working, what's wrong and how should I implement this?
My code.
Map<String, Object> matching_result;
for (SearchHit hit : response_text.getHits()) {
matching_result = hit.getSource();
String flag_value = matching_result.get("flag").toString();
matching_result.put("flag", true);
}
String indexString = JSONConverter.toJsonString(matching_result);
IndexResponse response = client.prepareIndex("index_name", "data").setSource(indexString).execute().actionGet();
boolean created = response.isCreated();
System.out.println("created or updated--------------------->" + created);
System.out.println("flag value==========" + matching_result.get("flag"));
return actual_theme;
(JSONConverter.toJsonString) is our library class for converting to json string.
What is wrong with this query?
Instead of updating the existing document it is creating a new one. I want to change the existing one.
Based on your example code, it looks like by "update" you mean you are trying to replace the entire document. In order to do this, you must specify the id of the document you wish to update.
Using the Java API, in addition to calling setSource on the IndexRequestBuilder, you would also need to supply the id by calling setId. For example:
IndexResponse response = client.prepareIndex("index_name", "data")
.setSource(indexString)
.setId(123) <----- supply the ID of the document you want to replace
.execute()
.actionGet();
Otherwise, just so you know, in ES you have the option to do a partial update. That is, only update certain fields in the document. This can be done with a script or by providing a partial document. Have a look at the documentation for the Update API.
In either case, you need to provide ES with the ID for the document you wish to modify.

Solr doesn't overwrite - duplicated uniqueKey entries

I have a problem with Solr 5.3.1 . My Schema is rather simple. I have one uniqueKey which is the "id" as string. indexed, stored and required, non-multivalued.
I add documents first with a "content_type:document_unfinished" and then overwrite the same document, with the same id but another content_type:document. The document is then twice in the index. Again, the only uniqueKey is "id", as string. The id is coming originally from a mysql-index primary int.
Also looks like this happens not only once:
http://lucene.472066.n3.nabble.com/uniqueKey-not-enforced-td4015086.html
http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-td4129651.html
In my case not all the documents in the index are duplicated, just some. I was assuming - initially - that they are getting overwritten on commit when the same uniqueKey exists in the index. Which doesn't seem to work like I expected it. I do not want to simply update some fields in the document, I want to completely replace it, with all the children.
Some stats: around 350k documents in the index. Mostly with childDocuments. The Documents are distinguished by a "content_type" field. I used SolrJ to import them in that way:
HttpSolrServer server = new HttpSolrServer(url);
server.add(a Collection<SolrInputDocument>);
server.commit();
I am always adding a whole document with all the children again. Its nothing overly fancy. I end up with duplicated documents for the same uniqueKey. There are no side injections. I run only Solr with the integrated Jetty. I do not open the lucene index in java "manually".
What I did then was to delete+insert again. That seemed to work for a while, but then started under some conditions give this error message:
Parent query yields document which is not matched by parents filter
The document where that happens seems to be completely random, just one thing seems to emerge: its a childDocument where it happens. I do not run anything special, basically downloaded the solr package from the website and run it with bin/solr start
Anyone any ideas?
EDIT 1
I think I found the problem, which seems to be a bug? To reproduce the issue:
I downloaded Solr 5.3.1 to a Debian in a virtualBox and started it with bin/solr start. Added a new core with the basic config set. Nothing changed at the basic config set, just copied it over and added the core.
This leads to two documents with the same id in the index:
SolrClient solrClient = new HttpSolrClient("http://192.168.56.102:8983/solr/test1");
SolrInputDocument inputDocument = new SolrInputDocument();
inputDocument.setField("id", "1");
inputDocument.setField("content_type_s", "doc_unfinished");
solrClient.add(inputDocument);
solrClient.commit();
solrClient.close();
solrClient = new HttpSolrClient("http://192.168.56.102:8983/solr/test1");
inputDocument = new SolrInputDocument();
inputDocument.setField("id", "1");
inputDocument.setField("content_type_s", "doc");
SolrInputDocument childDocument = new SolrInputDocument();
childDocument.setField("id","1-1");
childDocument.setField("content_type_s", "subdoc");
inputDocument.addChildDocument(childDocument);
solrClient.add(inputDocument);
solrClient.commit();
solrClient.close();
Searching with:
http://192.168.56.102:8983/solr/test1/select?q=%3A&wt=json&indent=true
leads to the following output:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"indent": "true",
"wt": "json",
"_": "1450078098465"
}
},
"response": {
"numFound": 3,
"start": 0,
"docs": [
{
"id": "1",
"content_type_s": "doc_unfinished",
"_version_": 1520517084715417600
},
{
"id": "1-1",
"content_type_s": "subdoc"
},
{
"id": "1",
"content_type_s": "doc",
"_version_": 1520517084838101000
}
]
}
}
What am I doing wrong?
Thanks for your feedback! I write this as answer since it is too long otherwise. I actually got the same response from the mailing list:
Mikhail Khludnev
Hello Sebastian,
Mixing standalone docs and blocks doesn't work. There are a plenty of
issues open.
On Wed, Mar 9, 2016 at 3:02 PM, Sebastian Riemer
wrote:
Hi,
to actually describe my problem in short, instead of just linking to
the test applicaton, using SolrJ I do the following:
1) Create a new document as a parent and commit
SolrInputDocument parentDoc = new SolrInputDocument();
parentDoc.addField("id", "parent_1");
parentDoc.addField("name_s", "Sarah Connor");
parentDoc.addField("blockJoinId", "1");
solrClient.add(parentDoc);
solrClient.commit();
2) Create a new document with the same unique-id as in 1) with a child
document appended
SolrInputDocument parentDocUpdateing = new SolrInputDocument();
parentDocUpdateing.addField("id", "parent_1");
parentDocUpdateing.addField("name_s", "Sarah Connor");
parentDocUpdateing.addField("blockJoinId", "1");
SolrInputDocument childDoc = new SolrInputDocument();
childDoc.addField("id", "child_1");
childDoc.addField("name_s", "John Connor");
childDoc.addField("blockJoinId", "1");
parentDocUpdateing.addChildDocument(childDoc);
solrClient.add(parentDocUpdateing);
solrClient.commit();
3) Results in 2 Documents with id="parent_1" in solr index
Is this normal behaviour? I thought the existing document should be
updated instead of generating a new document with same id.
For a full working test application please see orginal message.
Best regards,
Sebastian
I think it is a known issue, and there exist several tickets which kind of relate to this, but I am glad that there is a way to deal with it (adding child docs right from the beginning) (https://issues.apache.org/jira/browse/SOLR-6096, https://issues.apache.org/jira/browse/SOLR-5211, https://issues.apache.org/jira/browse/SOLR-7606)

Handling special characters with lucene

i haven't found the answer to my problem so I decided to write my question to get some help.
I use lucene to index the objects in computer memory(they exist only in my java code). While processing the code I index (using WhitespaceAnalyzer) the field with value objA/4.
My problem starts when I want to find it after the indexation (also using WhitespaceAnalyzer).
When i create a query obj* , I find all objects that start with obj - if i create a query objA/4 I also can find this object.
However i don't know how to find all objects starting with objA/ , when I create a query objA/* lucene is changing it to obja/* and finds nothing.
I've checked and "/" is not a special character so i dont need any "\" preceding it.
So my question is how to ask to get all objects that starts with objA/ (for example - objA/0, objA/1, objA/2, objA/3)?
Are you using QueryParser.escape(String) to escape everything correctly?
The code i'm using:
String node = "objA/*";
Query node_query = MultiFieldQueryParser.parse(node, "nodeName", new WhitespaceAnalyzer());
BooleanQuery bq = new BooleanQuery();
bq.add(node_query, BooleanClause.Occur.MUST);
System.out.println("We're asking for - " + bq);
IndexSearcher looker = new IndexSearcher(rep_index);
Hits hits = looker.search(bq);

Categories

Resources