why & becomes &amp:amp; and how to solve this in XML?

why & becomes &amp:amp; and how to solve this in XML? - java

I am using below tag to query the item from DB. The item presents in DB but not showing up because A&M became as A&amp;M instead of A&M. How to solve this?
<TEA>2720A 100 STATE A&amp;M RD VRAD</TEA>
A backend java code queries the item from DB like 'select * from aa where tea=2720A 100 STATE A&M RD VRAD' and returns no record but it is present in DB like A&M. This is the exact issue, how to solve this?

Double encoding, your string is encoded twice.
First encoding A&M -> A&amp*M
Second encoding A&amp*M -> A&amp*amp*M
Check your code for this issue

Of course representing & in XML is done with &.
If just text is stored in the DB, for instance extracted fron between two tags <name>A&M</name>, then any XML API should give "A&M" to be stored.
If entire XML is stored in the DB, one should store it as-is: "<name>A&M</name>"
The problem arises only when String manipulations are done. Say
String xml = "<name>A&M</name>";
String name = xml.replaceFirst("^.*<name>(.*)</name>.*$", "$1");
name = StringEscapeUtils.unescapeXML(name);
Here apache StringEscapeUtils is used. Not unescaping makes trouble.
It probably goes wrong, when mixing extracted text (should-be-unescaped text) with XML manipulation (DOM). And again placing it in an XML structure. The XML APIs in general return values with the entities unescaped, and escape the XML characters <>&"' to entities.
Especially be careful when editing in HTML, that uses the same entity; not showing the actual characters. Here StringEscapeUtils.unescapeHtml4 comes into play.

Related

Search with "Hash sign" in Solr

I am new to Solr and facing problems while optimizing the search in solr.
When i search for "C4902AN#140", it displays results with "140" first and result with ""C4902AN#140" is appearing later.i.e. after results containing "140".But I want result with "C4902AN#140" before results having "140".
Thanks in advance!!!

you may have to check with tokenizer you used for field type definition in schema file.
if the field type has solr.standardTokenizer it will remove # character.
OR
you should consider boosting the document which has"C4902AN#140"
you can use elevate.xml file in config folder and just mention which document to appear first in the resultset for specific searchTerm string.

The Analyzer which you are using for this should be using KeyWordTokenizerFactory so that your whole word does not get Tokenized , but only a single token , i.e the word itself is generated .

Store base64 encoded string in HBase

I have a very specific requirement of storing PDF data in Hbase columns. The source of Data is Mongo DB, from where the base64 encoded data is read and I will need to bulk upload it to Hbase table.
I realized that in base64 encoded string there are a lot of "\n" character which splits the entire string into parts. Not sure if it is because of this, but when I store the string as it is, using a put :
put.add(Bytes.toBytes(ColFamilyName), Bytes.toBytes(columnName), Bytes.toBytes(data.replaceAll("\n","").toString()));
It is storing only the first line from the entire encoded string. Eg :
If the actual content was something like this :
"JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovQ3JlYXRvciAoQXBhY2hlIEZPUCBWZXJzaW9uIDEu
" +
"MSkKL1Byb2R1Y2VyIChBcGFjaGUgRk9QIFZlcnNpb24gMS4xKQovQ3JlYXRpb25EYXRlIChEOjIw\n" +
"MTUwODIyMTIxMjM1KzAzJzAwJykKPj4KZW5kb2JqCjUgMCBvYmoKPDwKICAvTiAzCiAgL0xlbmd0\n" +
It is storing only the first line which is :
JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovQ3JlYXRvciAoQXBhY2hlIEZPUCBWZXJzaW9uIDEu
in the column. Even after trying to remove the "\n" manually it is the same output.
Could someone please guide me in the right direction here ?

Currently, I am also working on Base64 encoding. As per my understanding, you should try using
org.apache.hadoop.hbase.util.Base64.encodeBytes(byte[] source, int option)
method where DONT_BREAK_LINES can be used as an option.
Please let me know if this works fine.

Managed to solve it. The issue was when reading the Base64 encoded data from MongoDB Source. Read the data from Mongo DB document DBObject as:
jsonObj.get("receiptContent").toString().replaceAll("\n","")
And stored it as such in Hbase. Even from the Hue HBase UI Browser I can see the PDF content now.

Missing fields when rendering lotus notes document to RTF,DXL with Java API

I'm attempting to render a notes document to RTF, then DXL using the Java API. Once I have the DXL, I'm converting it to HTML with an XSL stylesheet. My goal is to produce an HTML document that displays as close as possible to the document rendering in the notes client.
However, computed fields are missing from the rendered RTF and DXL.
Here is the code used to generate the DXL:
private String renderDocumentToDxl(lotus.domino.Document lotusDocument)
throws Exception {
Database db = getDatabase();
lotus.domino.Document tmp = db.createDocument();
RichTextItem rti = tmp.createRichTextItem("Body");
lotusDocument.computeWithForm(true, false);
lotusDocument.save();
lotusDocument.renderToRTItem(rti);
DxlExporter dxlExporter = getSession().createDxlExporter();
dxlExporter.setOutputDOCTYPE(false);
dxlExporter.setConvertNotesBitmapsToGIF(true);
return dxlExporter.exportDxl(tmp);
}
Fields added to the document by the call to computeWithForm are not present in the generated DXL.
Is there any way to get the computed fields into the generated DXL with the Java API? Or is there a better way to generate an HTML representation of a notes document using the domino Java API?

I'm not quite clear on your objective. There are two possibilities:
1) You want the items from lotusDocument to exist in tmp, and to be exported as actual tag data in the DXL. Your code does not do this.
2) You want the values of the non-hidden Items from lotusDocument to exist as text within the rich text Body item in tmp, and you want those values to be included within the DXL that is exported from tmp - as text within the tag for the Body item. This should be what your code is doing.
If you expected the former, then that's not what renderToRTItem does. What it does is the latter. I.e., it gives you a snapshot of the values of the items in lotusDocument - but if and only if they would be displayed to a user who opens the document. You do not get the items themselves, and they won't appear separately in the DXL. If that's all you expected, and it's not happening, then there's something else going wrong and you haven't given enough infornmation here to figure it out.
If you wanted the former, i.e., the actual items from lotusDocument to exist as separate tag elements within the DXL exported from tmp, then you should be using
lotusDocument.copyAllItems(tmp,true);,
or sequences of
Item tmpItem = lotusDocument.getFirstItem(itemName);
tmp.copyItem(tmpItem,"");

You can get the HTML representation of a RichText field with the URL
http://server/db.nsf/view/docunid/RichTextFieldname?OpenField
So, save your tmp document, get the docunid and read the result via http from URL
http://server/db.nsf/0/tmpdocunid/Body?OpenField
You don't need to call lotusDocument.computeWithForm as lotusDocument.renderToRTItem does execute form's input translation and validation formulas already.
Be aware that for both methods form's LotusScript code won't be executed - just in case your fields gets calculated this way.
In case you can use XPages this would be an alternative: http://linqed.eu/2014/07/11/getting-html-from-any-richtext-item/

Multiple words not getting searched , not taking space

when i pass string with space in bw the words to the servlet and run the android aaplication
error comes like this
03-01 09:32:41.110: E/Excepiton(1301): java.io.FileNotFoundException: http//address of server:8088/First/MyServlet?ads_title=test test&city=Pune
here ads_title=test test and city = Delhi
but it works fine when i pass single word string
like ads_title=test
and city = Delhi
but when i run query on sql with both the value that works that means query is fine.
String stringURL="http//laddress of server:8088/First/MyServlet" +
String.format("?ads_title=%s&city=%s",editText1.getText(),City);
that is where i am passing the values

Data sent as a URL must be "encoded" to ensure that all the data passes properly to the server to be interpreted correctly. Fortunately, Java provides a standard class URLEncoder and the encoding specified by the World Wide Web Consortium is "UTF-8 so, use
String finalURL = URLEncoder(stringURL,"UTF-8");
(That way you don't have to know what the encoding is for each special character.)

I agree with the comments (not sure why they didn't post as an answer though?) - you want to try encoding your URL - so that the space is handled correctly (%20)
Java URL encoding of query string parameters

Trailing null (\x00) characters when writing text to Accumulo

I am trying to write the name of a file into Accumulo. I am using accumulo-core-1.43.
For some reason, certain files seem to be written into Accumulo with trailing \x00 characters at the end of the name. The upload is coming through a Java servlet (using the jquery file upload plugin). In the servlet, I check the name of the file with a System.out.println and it looks normal, and I even tried unescaping the string with
org.apache.commons.lang.StringEscapeUtils.unescapeJava(...);
The actual writing to accumulo looks like this:
Mutation mut = new Mutation(new Text(checkSum));
Value val = new Value(new Text(filename).getBytes());
long timestamp = System.currentTimeMillis();
mut.put(new Text(colFam), new Text(EMPTY_BYTES), timestamp, val);
but nothing unusual showed up there (perhaps \x00 isn't escaped)? But then if I do a scan on my table in accumulo, there will be one or more \x00 in the file name.
The problem this seems to cause is that I return that string within XML when I retrieve a list of files (where it shows up) and pass that back to the browser, the the XSL that is supposed to render the information in the XML no longer works when there's these extra characters (not sure why that is the case either).
In chrome, for the response on these calls, I see that there's three red dots after the file name, and when I hover over it, \u0 pops up (which I think is a different representation of 0/null?).
Anyway, I'm just trying to figure out why this happens, or at the very least, how I can filter out \x00 characters before returning the file in Java. any ideas?

You are likely incorrectly using the Hadoop Text class -- this is not an error with Accumulo. Specifically, you make the mistake in your above example:
Value val = new Value(new Text(filename).getBytes());
You must adhere to the length of provided by the Text class. See the Text javadoc for more information. If you're using Hadoop-2.2.0, you can use the provided copyBytes method on Text. If you're on older version of Hadoop where this method doesn't yet exist, you can use something like the ByteBuffer class or the System.arraycopy method to get a copy of the byte[] with the proper limits enforced.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

why & becomes &amp:amp; and how to solve this in XML? - java

Double encoding, your string is encoded twice. First encoding A&M -> A&ampM Second encoding A&ampM -> A&ampampM Check your code for this issue

Related

Search with "Hash sign" in Solr

Store base64 encoded string in HBase

Missing fields when rendering lotus notes document to RTF,DXL with Java API

Multiple words not getting searched , not taking space

Trailing null (\x00) characters when writing text to Accumulo

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

why & becomes &amp:amp; and how to solve this in XML? - java

Double encoding, your string is encoded twice. First encoding A&M -> A&amp*M Second encoding A&amp*M -> A&amp*amp*M Check your code for this issue

Related

Search with "Hash sign" in Solr

Store base64 encoded string in HBase

Missing fields when rendering lotus notes document to RTF,DXL with Java API

Multiple words not getting searched , not taking space

Trailing null (\x00) characters when writing text to Accumulo

Categories

Resources

Double encoding, your string is encoded twice. First encoding A&M -> A&ampM Second encoding A&ampM -> A&ampampM Check your code for this issue