I have a very specific requirement of storing PDF data in Hbase columns. The source of Data is Mongo DB, from where the base64 encoded data is read and I will need to bulk upload it to Hbase table.
I realized that in base64 encoded string there are a lot of "\n" character which splits the entire string into parts. Not sure if it is because of this, but when I store the string as it is, using a put :
put.add(Bytes.toBytes(ColFamilyName), Bytes.toBytes(columnName), Bytes.toBytes(data.replaceAll("\n","").toString()));
It is storing only the first line from the entire encoded string. Eg :
If the actual content was something like this :
"JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovQ3JlYXRvciAoQXBhY2hlIEZPUCBWZXJzaW9uIDEu
" +
"MSkKL1Byb2R1Y2VyIChBcGFjaGUgRk9QIFZlcnNpb24gMS4xKQovQ3JlYXRpb25EYXRlIChEOjIw\n" +
"MTUwODIyMTIxMjM1KzAzJzAwJykKPj4KZW5kb2JqCjUgMCBvYmoKPDwKICAvTiAzCiAgL0xlbmd0\n" +
It is storing only the first line which is :
JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovQ3JlYXRvciAoQXBhY2hlIEZPUCBWZXJzaW9uIDEu
in the column. Even after trying to remove the "\n" manually it is the same output.
Could someone please guide me in the right direction here ?
Currently, I am also working on Base64 encoding. As per my understanding, you should try using
org.apache.hadoop.hbase.util.Base64.encodeBytes(byte[] source, int option)
method where DONT_BREAK_LINES can be used as an option.
Please let me know if this works fine.
Managed to solve it. The issue was when reading the Base64 encoded data from MongoDB Source. Read the data from Mongo DB document DBObject as:
jsonObj.get("receiptContent").toString().replaceAll("\n","")
And stored it as such in Hbase. Even from the Hue HBase UI Browser I can see the PDF content now.
Related
I am reading encoded data from database. When I decode data I get something like
%PDF-1.3
23 Obj
xref
10000000 123n
.
.
.
.
%EOF
I am guessing it's a metadata of PDF file with data. My question is how do I create PDF file out of this with readable data only.
Thanks in advance.
First you need to know what the data in DB represents. I would suggest to connect to the database using a general client. For example, for Oracle - SqlDeveloper, for MS Sql Server - Visual Sql Server, etc
Display the table which has a column of type long, clob or blob and try to save it to a file. Name the file with an extension .pdf. Try to open the saved file and see if it gets open correctly by a pdf reader.
If this is a case, saving it from Java is trivial. For example: http://www.astral-consultancy.co.uk/cgi-bin/hunbug/doco.cgi?11120
I am using below tag to query the item from DB. The item presents in DB but not showing up because A&M became as A&M instead of A&M. How to solve this?
<TEA>2720A 100 STATE A&M RD VRAD</TEA>
A backend java code queries the item from DB like 'select * from aa where tea=2720A 100 STATE A&M RD VRAD' and returns no record but it is present in DB like A&M. This is the exact issue, how to solve this?
Double encoding, your string is encoded twice.
First encoding A&M -> A&*M
Second encoding A&*M -> A&*amp*M
Check your code for this issue
Of course representing & in XML is done with &.
If just text is stored in the DB, for instance extracted fron between two tags <name>A&M</name>, then any XML API should give "A&M" to be stored.
If entire XML is stored in the DB, one should store it as-is: "<name>A&M</name>"
The problem arises only when String manipulations are done. Say
String xml = "<name>A&M</name>";
String name = xml.replaceFirst("^.*<name>(.*)</name>.*$", "$1");
name = StringEscapeUtils.unescapeXML(name);
Here apache StringEscapeUtils is used. Not unescaping makes trouble.
It probably goes wrong, when mixing extracted text (should-be-unescaped text) with XML manipulation (DOM). And again placing it in an XML structure. The XML APIs in general return values with the entities unescaped, and escape the XML characters <>&"' to entities.
Especially be careful when editing in HTML, that uses the same entity; not showing the actual characters. Here StringEscapeUtils.unescapeHtml4 comes into play.
I want to display characters of foreign languages in jasper reports. The reports passes the text to java code for RTF formatting. Unfortunately the mysql database returns decoded string like below with spaces removed
& iacute;
what I want to display is
í
Any suggestions how to do it with java?
text: bebida fría
from database: bebida fr& iacute;a
That are HTML entities. You can use StringEscapeUtils.unescapeHtml4 from apache commons library.
Still remains to see how your RTF handles Unicode.
If I understand your question, then you could use the unicode literal,
System.out.println("bebida fr\u00EDa");
Output is (the requested)
bebida fría
Check database table encoding. Also you can try to encode your string with proper encoding.
ByteBuffer encode = Charset.forName("UTF-8").encode(myString);
String encodedStr = new String(encode.array());
I have an Android app. Basically what it does is that user can search a car reference no. in EditText for example:- 270/30 and retrieve all the details of the particular car with the same column value in the database. I'm encoding this editext value in Android using URLEncoder and decoding it back in php webservice code. But the decoded value im getting is 027/13 ....instead of 270/30.
To make it clear more im here by pasting my java Encoding part
EditText SearchField=(EditText)findViewById(R.id.editText1);
String SearchValue= SearchField.getText().toString();
Now the encoding code in Asynctask is
data +="&" + URLEncoder.encode("data", "UTF-8") + "="+SearchValue;
Now the PHP part where i decoded this code
$data = urldecode($_POST['data']);
Please help me how to encode/decode this given format ...
Thanks in Advance
In your posted code you are only encoding the keyword data, not the actual data.
Your PHP decoded side is invalid, because it is decoding something that has not yet been encoded, you need to encode your actual search data like so.
data += "&" + URLEncoder.encode("data=" + SearchValue, "UTF-8");
or this, because the keyword data, does not need to be encoded.
data += "&data=" + URLEncoder(SearchValue, "UTF-8");
We are storing uploaded text files in a SQL server data. The field type is image.
The file upload and download correctly, what I want to do now is load the actual text content into a String variable directly from the database record.
Can anyone advise on how to do this please?
Depends on how you read the data from the db. If you get a byte array, you could use new String(bytes);
Btw, why don't you use the CLOB datatype (or the equivalent for your server) for the field? This should normally cause the Java driver to return the String directly.