Search with "Hash sign" in Solr - java

I am new to Solr and facing problems while optimizing the search in solr.
When i search for "C4902AN#140", it displays results with "140" first and result with ""C4902AN#140" is appearing later.i.e. after results containing "140".But I want result with "C4902AN#140" before results having "140".
Thanks in advance!!!

you may have to check with tokenizer you used for field type definition in schema file.
if the field type has solr.standardTokenizer it will remove # character.
OR
you should consider boosting the document which has"C4902AN#140"
you can use elevate.xml file in config folder and just mention which document to appear first in the resultset for specific searchTerm string.

The Analyzer which you are using for this should be using KeyWordTokenizerFactory so that your whole word does not get Tokenized , but only a single token , i.e the word itself is generated .

Related

Create a DOCX reading data from Oracle database

I have a student database (Oracle 11G), I need to create a module(separate) which will generate a student's details in a well-formatted word document. When I give the student ID, I need all the info(Kind of a biodata) of the student in a docx file which is very presentable. I'm not sure how to start, I was exploring Python-docx and java DOCX4j. I need suggestion how can I achieve this. Is there any tool I can do this
Your help is highly appreciated
You could extract the data from Oracle into an XML format, then use content control data binding in your Word document to bind elements in the XML.
All you need to do is inject the XML into the docx as a custom xml part, and Word will display the results automatically.
docx4j can help you to the inject the XML. If you don't want to rely on Word to display the results, then you can use docx4j to also apply the bindings.
Or you could try simple variable replacement: https://github.com/plutext/docx4j/blob/master/src/samples/docx4j/org/docx4j/samples/VariableReplace.java
If you want a simple way to format your Word document directly from Java, you can try pxDoc.
The screenshot below provide an example of code and document generated from an Authors/Books model: whatever the way you request the data from your database, it is easy to render them in a well formatted document.
simple document generation example
Regarding your use case, you could also generate a document for all students at once. In the context of the screenshot example:
for (author:library.authors) {
var filename = 'c:/MyDocuments/'+author.name+'.docx'
document fileName:filename {
/** Content of my document */
}

Cannot read the first line of a JSON file in Java

I am trying to read some data from a JSON file that I generated from a MongoDB document. But when trying to read the first entry in the document, i get an exception:
org.json.JSONException: JSONObject["Uhrzeit"] not found.
This only happens with the first entry, reading other entrys does not cause an exception.
Using jsonObject.getString("") on any entry that is not the first returns the values as expected.
//Initiate Mongodb and declare the database and collection
MongoClient mongoClient = new MongoClient(new MongoClientURI("mongodb://localhost:27017"));
MongoDatabase feedbackDb = mongoClient.getDatabase("local");
MongoCollection<Document> feedback = feedbackDb.getCollection("RückmeldungenShort");
//gets all documents in a collection. "new Document()" is the filter, that returns all Documents
FindIterable<Document> documents = feedback.find(new Document());
//Iterates over all documents and converts them to JSONObjects for further use
for(Document doc : documents) {
JSONObject jsonObject = new JSONObject(doc.toJson());
System.out.print(jsonObject.toString());
System.out.print(jsonObject.getString("Uhrzeit"));
}
Printing jsonObject.toString() produces the JSON String for testing purposes (in one line):
{
"Ort":"Elsterwerda",
"Wetter-Keyword":"Anderes",
"Feedback\r":"Test Gelb\r",
"Betrag":"Gelb",
"Datum":"18.05.2018",
"Abweichung":"",
"Typ":"Vorhersage",
"_id":{
"$oid":"5b33453b75ef3c23f80fc416"
},
"Uhrzeit":"05:00"
}
Note, that the order in which the entries appear is mixed up and the first one appearing in the database was "Uhrzeit".
This is how it looks like:
The JSON file is valid according to https://jsonformatter.curiousconcept.com/ .
The "Uhrzeit" is even recognized within the JSONObject while in debug mode:
I assumed it might have something to do with the entries themselves, so I switched "Datum" and "Ort" to the first place in the document but that produced the same results.
There are lots of others that have posted on this error message, but it seems to me like they all had slightly different problems.
I imported a .csv with my data into MongoDB and read the documents from there. Somewhere in the process of reading the data, "\r"s were automatically generated where the line breaks were in my .csv (aka. at the end of each dataset). In this case at the key value pair "Feedback" (as seen in the last picture).
When checking my output again with another JSON validator, I noticed that there was an "invisible" symbol in my JSON file that caused the key not to be found. Now this symbol is located in front of the first key (after the MongoDB-id) when importing a .csv document to my DB. I imported a correct version of the .csv into my MongoDB and exported it again and the symbol reappeared.
The problem was that my .csv was in "Windows" format. Converting it to "Unix" format will get rid of the generated "\r"s. The "invisible" symbol was the UTF-8-BOM code that is added at the beginning of a document. You can reformat your .csv to be just UTF-8 and get rid of it that way.

Aspose LINQ text trims of

I have a LINQ Reporting Engine word file. Which has A field <<[ABC]>> . And its getting value from MySQL database.
The Field is a field that displays comments.Now the field type in SQL is Long-text so it can store large number of words. The problem is when a report is generated the field <<[ABC]>> has the text that is cut off in other words it only printing up-to few character around 380 . My question here is , there any specific limit that LINQ filed can display. And what can do to make all the text print with out any limit?
You can populate <<[ABC]>> with long text. There is no limit for number of characters. You can simply test it using following code example. Create a text file e.g. "text.txt" with some text and execute the following code.
DocumentBuilder builder = new DocumentBuilder();
builder.Write("<<[ABC]>>");
ReportingEngine engine = new ReportingEngine();
engine.BuildReport(builder.Document, File.ReadAllText(MyDir + "text.txt"), "ABC");
builder.Document.Save(MyDir + "18.4.docx");
I am working as Developer Evangelist at Aspose.

why & becomes &amp:amp; and how to solve this in XML?

I am using below tag to query the item from DB. The item presents in DB but not showing up because A&M became as A&amp;M instead of A&M. How to solve this?
<TEA>2720A 100 STATE A&amp;M RD VRAD</TEA>
A backend java code queries the item from DB like 'select * from aa where tea=2720A 100 STATE A&M RD VRAD' and returns no record but it is present in DB like A&M. This is the exact issue, how to solve this?
Double encoding, your string is encoded twice.
First encoding A&M -> A&amp*M
Second encoding A&amp*M -> A&amp*amp*M
Check your code for this issue
Of course representing & in XML is done with &.
If just text is stored in the DB, for instance extracted fron between two tags <name>A&M</name>, then any XML API should give "A&M" to be stored.
If entire XML is stored in the DB, one should store it as-is: "<name>A&M</name>"
The problem arises only when String manipulations are done. Say
String xml = "<name>A&M</name>";
String name = xml.replaceFirst("^.*<name>(.*)</name>.*$", "$1");
name = StringEscapeUtils.unescapeXML(name);
Here apache StringEscapeUtils is used. Not unescaping makes trouble.
It probably goes wrong, when mixing extracted text (should-be-unescaped text) with XML manipulation (DOM). And again placing it in an XML structure. The XML APIs in general return values with the entities unescaped, and escape the XML characters <>&"' to entities.
Especially be careful when editing in HTML, that uses the same entity; not showing the actual characters. Here StringEscapeUtils.unescapeHtml4 comes into play.

Splitting word file into multiple smaller word files using OLE Automation from java

I have been using OLE automation from java to access methods for word.
I managed to do the following using the OLE automation:
Open word document template file.
Mail merge the word document template with a csv datasource file.
Save mail merged file to a new word document file.
What i need to do now is to be able to open the mail merged file and then using OLE programmatically split it into multiple files. Meaning if the original mail merged file has 6000 pages and my max pages per file property is set to 3000 pages i need to create two new word document files and place the 1st 3000 pages in the one and the last 3000 pages into the other one.
On my first attempts i took the amount of rows in the csv file and multiplied it by the number of pages in the template to get the total amount of pages after it will be merged. Then i used the merging to create the multiple files. The problem however is that i cannot exactly calculate how many pages the merged document will be because in some case all say 9 pages of the template will not be used because of the data and the mergefields used. So in some cases one row will only create 3 pages (using the 9 page template) and others might create 9 pages (using the 9 page template) during mail merge time.
So the only solution is to merge all rows into one document and then split it into multiple documents therafter to ensure that the exact amount of pages like the 3000 pages property is indeed in each file until there are no more pages left from the original merged file.
I have tried a few things already by using the msdn site to get methods and their properties etc but have been unable to this.
On my last attempts now i have been trying to use GoTo to get to a specific page number and the remove the page. I was going to try do this one by one for each page until i get to where i want the file to start from and then save it as a new file but have been unable to do so as well.
Please can anyone suggest something that could help me out?
Thanks and Regards
Sean
An example to open a word file using the OLE AUTOMATION from jave is included below:
Code sample
OleAutomation documentsAutomation = this.getChildAutomation(this.wordAutomation, "Documents");
int [ ] id = documentsAutomation.getIDsOfNames(new String[]{"Open"});
Variant[] arguments = new Variant[1];
arguments[0] = new Variant(fileName); // where filename is the absolute path to the docx file
Variant invokeResult = documentsAutomation.invoke(id[0], arguments);
private OleAutomation getChildAutomation(OleAutomation automation, String childName) {
int[] id = automation.getIDsOfNames(new String[]{childName});
Variant pVarResult = automation.getProperty(id[0]);
return(pVarResult.getAutomation());
}
Code sample
Sounds like you've pegged it already. Another approach you could take which would avoid building then deleting would be to look at the parts of your template that can make the biggest difference to the number of your template (that is where the data can be multi-line). If you then take these fields and look at the font, line-spacing and line-width type of properties you'll be able to calculate the room your data will take in the template and limit your data at that point. Java FontMetrics can help you with that.

Categories

Resources