Trailing null (\x00) characters when writing text to Accumulo

Trailing null (\x00) characters when writing text to Accumulo - java

I am trying to write the name of a file into Accumulo. I am using accumulo-core-1.43.
For some reason, certain files seem to be written into Accumulo with trailing \x00 characters at the end of the name. The upload is coming through a Java servlet (using the jquery file upload plugin). In the servlet, I check the name of the file with a System.out.println and it looks normal, and I even tried unescaping the string with
org.apache.commons.lang.StringEscapeUtils.unescapeJava(...);
The actual writing to accumulo looks like this:
Mutation mut = new Mutation(new Text(checkSum));
Value val = new Value(new Text(filename).getBytes());
long timestamp = System.currentTimeMillis();
mut.put(new Text(colFam), new Text(EMPTY_BYTES), timestamp, val);
but nothing unusual showed up there (perhaps \x00 isn't escaped)? But then if I do a scan on my table in accumulo, there will be one or more \x00 in the file name.
The problem this seems to cause is that I return that string within XML when I retrieve a list of files (where it shows up) and pass that back to the browser, the the XSL that is supposed to render the information in the XML no longer works when there's these extra characters (not sure why that is the case either).
In chrome, for the response on these calls, I see that there's three red dots after the file name, and when I hover over it, \u0 pops up (which I think is a different representation of 0/null?).
Anyway, I'm just trying to figure out why this happens, or at the very least, how I can filter out \x00 characters before returning the file in Java. any ideas?

You are likely incorrectly using the Hadoop Text class -- this is not an error with Accumulo. Specifically, you make the mistake in your above example:
Value val = new Value(new Text(filename).getBytes());
You must adhere to the length of provided by the Text class. See the Text javadoc for more information. If you're using Hadoop-2.2.0, you can use the provided copyBytes method on Text. If you're on older version of Hadoop where this method doesn't yet exist, you can use something like the ByteBuffer class or the System.arraycopy method to get a copy of the byte[] with the proper limits enforced.

Related

Creating a text file with java without using absolute path

following the question I asked before How to have my java project to use some files without using their absolute path? I found the solution but another problem popped up in creating text files that I want to write into.here's my code:
private String pathProvider() throws Exception {
//finding the location where the jar file has been located
String jarPath=URLDecoder.decode(getClass().getProtectionDomain().getCodeSource().getLocation().getPath(), "UTF-8");
//creating the full and final path
String completePath=jarPath.substring(0,jarPath.lastIndexOf("/"))+File.separator+"Records.txt";
return completePath;
}
public void writeRecord() {
try(Formatter writer=new Formatter(new FileWriter(new File(pathProvider()),true))) {
writer.format("%s %s %s %s %s %s %s %s %n", whichIsChecked(),nameInput.getText(),lastNameInput.getText()
,idInput.getText(),fieldOfStudyInput.getText(),date.getSelectedItem().toString()
,month.getSelectedItem().toString(),year.getSelectedItem().toString());
successful();
} catch (Exception e) {
failure();
}
}
this works and creates the text file wherever the jar file is running from but my problem is that when the information is been written to the file, the numbers,symbols, and English characters are remained but other characters which are in Persian are turned into question marks. like: ????? 111 ????? ????.although running the app in eclipse doesn't make this problem,running the jar does.
Note:I found the code ,inside pathProvider method, in some person's question.

Your pasted code and the linked question are complete red herrings - they have nothing whatsoever to do with the error you ran into. Also, that protection domain stuff is a hack and you've been told before not to write data files next to your jar files, it's not how OSes (are supposed to) work. Use user.home for this.
There is nothing in this method that explains the question marks - the string, as returned, has plenty of issues (see above), but NOT that it will result in question marks in the output.
Files are fundamentally bytes. Strings are fundamentally characters. Therefore, when you write code that writes a string to a file, some code somewhere is converting chars to bytes.
Make sure the place where that happens includes a charset encoding.
Use the new API (I think you've also been told to do this, by me, in an earlier question of yours) which defaults to UTF-8. Alternatively, specify UTF-8 when you write. Note that the usage of UTF-8 here is about the file name, not the contents of it (as in, if you put persian symbols in the file name, it's not about persian symbols in the contents of the file / in the contents you want to write).
Because you didn't paste the code, I can't give you specific details as there are hundreds of ways to do this, and I do not know which one you used.
To write to a file given a String representing its path:
Path p = Paths.get(completePath);
Files.write("Hello, World!", p);
is all you need. This will write as UTF_8, which can handle persian symbols (because the Files API defaults to UTF-8 if you specify no encoding, unlike e.g. new File, FileOutputStream, FileWriter, etc).
If you're using outdated APIs: new BufferedWriter(new OutputStreamWriter(new FileOutputStream(thePath), StandardCharsets.UTF-8) - but note that this is a resource leak bug unless you add the appropriate try-with-resources.
If you're using FileWriter: FileWriter is broken, never use this class. Use something else.
If you're converting the string on its own, it's str.getBytes(StandardCharsets.UTF_8), not str.getBytes().

How to get only the name of my PDF file

I'm developing a project for college which consist reading a CSV file and converting that to a PDF file. That part is fine, I have already done that.
In the end I need to show the name of the PDF file without the full path of where it was created. In other words, I just want the to show the name.
I search a lot to see if there is a simple method that show the name like Java has to show only the name of the File like
file.getName();

Whenever you use iText to create a PDF file, your code sets the target which usually is an OutputStream. If you use a FileOutputStream there, you know the file it writes to.
Thus, all you have to do to to show the name of the PDF File is to inspect your own code and check which target it sets.

Use getBaseName in Apache Commons IO.
getBaseName
public static String getBaseName(String filename)
Gets the base name, minus the full path and extension, from a full
filename.
This method will handle a file in either Unix or Windows format. The
text after the last forward or backslash and before the last dot is
returned.
a/b/c.txt --> c
a.txt --> a
a/b/c --> c
a/b/c/ --> ""
The output will be the same irrespective of the machine that the code
is running on.
Parameters:
filename - the filename to query, null returns null
Returns:
the name of the file without the path, or an empty string if none exists. Null bytes inside string will be removed
Source: https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/FilenameUtils.html#getBaseName(java.lang.String)
If you also need the extension, use getExtension. Which would probably always be .pdf, but you know, it's perfectly valid to have a PDF file without the .pdf filename extension. No sane person would do that but it is better to be prepared for insane users.

Missing fields when rendering lotus notes document to RTF,DXL with Java API

I'm attempting to render a notes document to RTF, then DXL using the Java API. Once I have the DXL, I'm converting it to HTML with an XSL stylesheet. My goal is to produce an HTML document that displays as close as possible to the document rendering in the notes client.
However, computed fields are missing from the rendered RTF and DXL.
Here is the code used to generate the DXL:
private String renderDocumentToDxl(lotus.domino.Document lotusDocument)
throws Exception {
Database db = getDatabase();
lotus.domino.Document tmp = db.createDocument();
RichTextItem rti = tmp.createRichTextItem("Body");
lotusDocument.computeWithForm(true, false);
lotusDocument.save();
lotusDocument.renderToRTItem(rti);
DxlExporter dxlExporter = getSession().createDxlExporter();
dxlExporter.setOutputDOCTYPE(false);
dxlExporter.setConvertNotesBitmapsToGIF(true);
return dxlExporter.exportDxl(tmp);
}
Fields added to the document by the call to computeWithForm are not present in the generated DXL.
Is there any way to get the computed fields into the generated DXL with the Java API? Or is there a better way to generate an HTML representation of a notes document using the domino Java API?

I'm not quite clear on your objective. There are two possibilities:
1) You want the items from lotusDocument to exist in tmp, and to be exported as actual tag data in the DXL. Your code does not do this.
2) You want the values of the non-hidden Items from lotusDocument to exist as text within the rich text Body item in tmp, and you want those values to be included within the DXL that is exported from tmp - as text within the tag for the Body item. This should be what your code is doing.
If you expected the former, then that's not what renderToRTItem does. What it does is the latter. I.e., it gives you a snapshot of the values of the items in lotusDocument - but if and only if they would be displayed to a user who opens the document. You do not get the items themselves, and they won't appear separately in the DXL. If that's all you expected, and it's not happening, then there's something else going wrong and you haven't given enough infornmation here to figure it out.
If you wanted the former, i.e., the actual items from lotusDocument to exist as separate tag elements within the DXL exported from tmp, then you should be using
lotusDocument.copyAllItems(tmp,true);,
or sequences of
Item tmpItem = lotusDocument.getFirstItem(itemName);
tmp.copyItem(tmpItem,"");

You can get the HTML representation of a RichText field with the URL
http://server/db.nsf/view/docunid/RichTextFieldname?OpenField
So, save your tmp document, get the docunid and read the result via http from URL
http://server/db.nsf/0/tmpdocunid/Body?OpenField
You don't need to call lotusDocument.computeWithForm as lotusDocument.renderToRTItem does execute form's input translation and validation formulas already.
Be aware that for both methods form's LotusScript code won't be executed - just in case your fields gets calculated this way.
In case you can use XPages this would be an alternative: http://linqed.eu/2014/07/11/getting-html-from-any-richtext-item/

Failing for Larger Input Files Only: FileServiceFactory getBlobKey throws IllegalArgumentException

I have a Google App Engine App that converts CSV to XML files. It works fine for small XML inputs, but refuses to finalize the file for larger inputed XML. The XML is read from, and the resulting csv files are written to, many times before finalization, over a long-running (multi-day duration) task. My problem is different than this FileServiceFactory getBlobKey throws IllegalArgumentException , since my code works fine both in production and development with small input files. So it's not that I'm neglecting to write to the file before closing/finalizing. However, when I attempt to read from a larger XML file. The input XML file is ~150MB, and the resulting set of 5 CSV files is each much smaller (perhaps 10MB each). I persisted the file urls for the new csv files, and even tried to close them with some static code, but I just reproduce the same error, which is
java.lang.IllegalArgumentException: creation_handle: String properties must be 500 characters or less. Instead, use com.google.appengine.api.datastore.Text, which can store strings of any length.
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedSingleValue(DataTypeUtils.java:242)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:207)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:173)
at com.google.appengine.api.datastore.Query$FilterPredicate.<init>(Query.java:900)
at com.google.appengine.api.datastore.Query$FilterOperator.of(Query.java:75)
at com.google.appengine.api.datastore.Query.addFilter(Query.java:351)
at com.google.appengine.api.files.FileServiceImpl.getBlobKey(FileServiceImpl.java:329)
But I know that it's not a String/Text data type issue, since I am already using similar length file service urls for the previous successful attempts with smaller files. It also wasn't an issue for the other stackoverflow post I linked above. I also tried putting one last meaningless write before finalizing, just in case it would help as it did for the other post, but it made no difference. So there's really no way for me to debug this... Here is my file closing code that is not working. It's pretty similar to the Google how-to example at http://developers.google.com/appengine/docs/java/blobstore/overview#Writing_Files_to_the_Blobstore .
log.info("closing out file 1");
try {
//locked set to true
FileWriteChannel fwc1 = fileService.openWriteChannel(csvFile1, true);
fwc1.closeFinally();
} catch (IOException ioe) {ioe.printStackTrace();}
// You can't get the blob key until the file is finalized
BlobKey blobKeyCSV1 = fileService.getBlobKey(csvFile1);
log.info("csv blob storage key is:" + blobKeyCSV1.getKeyString());
csvUrls[i-1] = blobKeyCSV1.getKeyString();
break;
At this point, I just want to finalize my new blob files for which I have the urls, but cannot. How can I get around this issue, and also, what may be the cause? Again, my code works for small files (~60 kB), but the input file of ~150MB fails). Thank you for any advice on what is causing this or how to get around it! Also, how long will my unfinalized files stick around for, before being deleted?

This issue was a bug in the Java MapReduce and Files API, which was recently fixed by Google. Read announcement here: groups.google.com/forum/#!topic/google-appengine/NmjYYLuSizo

Suppress Print Dialog when printing to Microsoft Document Image Writer from Oracle BPM 10g

We have an Oracle BPM 10g activity that:
Reads a form-fill protected Word document template.
Merges data into the fields.
Saves the merged/filled copy to the filesystem.
Prints the document to a selected, pre-defined printer, OR to the default printer.
All of this works fine when printing to a "real" printer. However, there is now a need to output the Word document to TIFF. Attempting to use "Microsoft Document Image Writer" as one of the printer selections does not work as expected. Normally, when printing to the Microsoft Document Image Writer from Word (or any other application) directly, you're prompted for a location to save the resultant file. This prompting does not occur when attempting to print from this particular activity in BPM 10g.
Ideally, we actually would like to bypass the dialog and output the TIFF directly to the filesystem. However, I have not found a way to control this programmatically. That is, being able to specify the destination filename in code. Right now, I'm just trying to get output to the Microsoft Document Image Writer at all, to make sure it works.
So, the bottom line question(s) is/are:
Can this be done? I.e., printing to Microsoft Document Image Writer
If yes, can the file location dialog be suppressed?
How?

You said nothing about the way you're automating Word.
In Word VBA, you may use this sample to print out the active document immediately without showing the print dialog:
Public Sub PrintToXPS()
'Presume that Microsoft XPS Document Writer was already
'set up as ActivePrinter
Dim strFilePath As String
strFilePath = "C:\temp\helloworld.xps"
ActiveDocument.PrintOut Background:=False, outputfilename:=strFilePath
End Sub
There's no need to use the print dialog instead. However, if you want to operate through the dialog object, that can be done in Word using a variable of type Word.Dialog and providing the necessary parameters, e.g.
Dim dlgFilePrint As Word.Dialog
Set dlgFilePrint = Application.Dialogs(wdDialogFilePrint)
dlgFilePrint.Update
dlgFilePrint.PrToFileName = strFilePath
dlgFilePrint.printtofile = True
'add other parameters as needed ...
'lock up parameter names in Word VBA Online Help using "WdWordDialog-Enumeration"
'as key word
dlgFilePrint.Execute
What I did here with the XPS printer, you may of course do also with any other printer.

Thank you, domke consulting.
After more searching, I found this forum post on MSDN.
Adding these registry entries to suppress the dialog box and suppress post-generation output seemed to do the trick:
In HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\MODI\MDI Writer
PrivateFlags = 17 (Decimal)
OpenInMODI = 0 (Decimal)
For our purposes, this seems to work fine if we call the printOut() method with the following relevant arguments (other arguments omitted here for brevity):
document.printOut(outputFileName : "C:\\temp\\fileName.tif", printToFile : true);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.