Failing for Larger Input Files Only: FileServiceFactory getBlobKey throws IllegalArgumentException

Failing for Larger Input Files Only: FileServiceFactory getBlobKey throws IllegalArgumentException - java

I have a Google App Engine App that converts CSV to XML files. It works fine for small XML inputs, but refuses to finalize the file for larger inputed XML. The XML is read from, and the resulting csv files are written to, many times before finalization, over a long-running (multi-day duration) task. My problem is different than this FileServiceFactory getBlobKey throws IllegalArgumentException , since my code works fine both in production and development with small input files. So it's not that I'm neglecting to write to the file before closing/finalizing. However, when I attempt to read from a larger XML file. The input XML file is ~150MB, and the resulting set of 5 CSV files is each much smaller (perhaps 10MB each). I persisted the file urls for the new csv files, and even tried to close them with some static code, but I just reproduce the same error, which is
java.lang.IllegalArgumentException: creation_handle: String properties must be 500 characters or less. Instead, use com.google.appengine.api.datastore.Text, which can store strings of any length.
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedSingleValue(DataTypeUtils.java:242)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:207)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:173)
at com.google.appengine.api.datastore.Query$FilterPredicate.<init>(Query.java:900)
at com.google.appengine.api.datastore.Query$FilterOperator.of(Query.java:75)
at com.google.appengine.api.datastore.Query.addFilter(Query.java:351)
at com.google.appengine.api.files.FileServiceImpl.getBlobKey(FileServiceImpl.java:329)
But I know that it's not a String/Text data type issue, since I am already using similar length file service urls for the previous successful attempts with smaller files. It also wasn't an issue for the other stackoverflow post I linked above. I also tried putting one last meaningless write before finalizing, just in case it would help as it did for the other post, but it made no difference. So there's really no way for me to debug this... Here is my file closing code that is not working. It's pretty similar to the Google how-to example at http://developers.google.com/appengine/docs/java/blobstore/overview#Writing_Files_to_the_Blobstore .
log.info("closing out file 1");
try {
//locked set to true
FileWriteChannel fwc1 = fileService.openWriteChannel(csvFile1, true);
fwc1.closeFinally();
} catch (IOException ioe) {ioe.printStackTrace();}
// You can't get the blob key until the file is finalized
BlobKey blobKeyCSV1 = fileService.getBlobKey(csvFile1);
log.info("csv blob storage key is:" + blobKeyCSV1.getKeyString());
csvUrls[i-1] = blobKeyCSV1.getKeyString();
break;
At this point, I just want to finalize my new blob files for which I have the urls, but cannot. How can I get around this issue, and also, what may be the cause? Again, my code works for small files (~60 kB), but the input file of ~150MB fails). Thank you for any advice on what is causing this or how to get around it! Also, how long will my unfinalized files stick around for, before being deleted?

This issue was a bug in the Java MapReduce and Files API, which was recently fixed by Google. Read announcement here: groups.google.com/forum/#!topic/google-appengine/NmjYYLuSizo

Related

Creating a text file with java without using absolute path

following the question I asked before How to have my java project to use some files without using their absolute path? I found the solution but another problem popped up in creating text files that I want to write into.here's my code:
private String pathProvider() throws Exception {
//finding the location where the jar file has been located
String jarPath=URLDecoder.decode(getClass().getProtectionDomain().getCodeSource().getLocation().getPath(), "UTF-8");
//creating the full and final path
String completePath=jarPath.substring(0,jarPath.lastIndexOf("/"))+File.separator+"Records.txt";
return completePath;
}
public void writeRecord() {
try(Formatter writer=new Formatter(new FileWriter(new File(pathProvider()),true))) {
writer.format("%s %s %s %s %s %s %s %s %n", whichIsChecked(),nameInput.getText(),lastNameInput.getText()
,idInput.getText(),fieldOfStudyInput.getText(),date.getSelectedItem().toString()
,month.getSelectedItem().toString(),year.getSelectedItem().toString());
successful();
} catch (Exception e) {
failure();
}
}
this works and creates the text file wherever the jar file is running from but my problem is that when the information is been written to the file, the numbers,symbols, and English characters are remained but other characters which are in Persian are turned into question marks. like: ????? 111 ????? ????.although running the app in eclipse doesn't make this problem,running the jar does.
Note:I found the code ,inside pathProvider method, in some person's question.

Your pasted code and the linked question are complete red herrings - they have nothing whatsoever to do with the error you ran into. Also, that protection domain stuff is a hack and you've been told before not to write data files next to your jar files, it's not how OSes (are supposed to) work. Use user.home for this.
There is nothing in this method that explains the question marks - the string, as returned, has plenty of issues (see above), but NOT that it will result in question marks in the output.
Files are fundamentally bytes. Strings are fundamentally characters. Therefore, when you write code that writes a string to a file, some code somewhere is converting chars to bytes.
Make sure the place where that happens includes a charset encoding.
Use the new API (I think you've also been told to do this, by me, in an earlier question of yours) which defaults to UTF-8. Alternatively, specify UTF-8 when you write. Note that the usage of UTF-8 here is about the file name, not the contents of it (as in, if you put persian symbols in the file name, it's not about persian symbols in the contents of the file / in the contents you want to write).
Because you didn't paste the code, I can't give you specific details as there are hundreds of ways to do this, and I do not know which one you used.
To write to a file given a String representing its path:
Path p = Paths.get(completePath);
Files.write("Hello, World!", p);
is all you need. This will write as UTF_8, which can handle persian symbols (because the Files API defaults to UTF-8 if you specify no encoding, unlike e.g. new File, FileOutputStream, FileWriter, etc).
If you're using outdated APIs: new BufferedWriter(new OutputStreamWriter(new FileOutputStream(thePath), StandardCharsets.UTF-8) - but note that this is a resource leak bug unless you add the appropriate try-with-resources.
If you're using FileWriter: FileWriter is broken, never use this class. Use something else.
If you're converting the string on its own, it's str.getBytes(StandardCharsets.UTF_8), not str.getBytes().

Merging PDFs with Sejda fails with stream output

Using Sejda 1.0.0.RELEASE, I basically followed the tutorial for splitting a PDF but tried merging instead (org.sejda.impl.itext5.MergeTask, MergeParameters, ...). All works great with the FileTaskOutput:
parameters.setOutput(new FileTaskOutput(new File("/some/path/merged.pdf")));
However I am unable to change this to StreamTaskOutput correctly:
OutputStream os = new FileOutputStream("/some/path/merged.pdf");
parameters.setOutput(new StreamTaskOutput(os));
parameters.setOutputName("merged.pdf");
No error is reported, but the resulting file cannot be read by Preview.app and is approximately 31 kB smaller (out of the ~1.2 MB total result) than the file saved above.
My first idea was: stream is not being closed properly! So I added os.close(); to the end of CompletionListener, still the same problem.
Remarks:
The reason I need to use StreamTaskOutput is that this merge logic will live in a web app, and the merged PDF will be sent directly over HTTP. I could store the temporary file and serve that one, but that is a hack.
Due to licencing issues, I cannot use the iText 5 version of the task.
Edit
Turns out, the reason is that StreamTaskOutput zips the result into a ZIP file! OutputWriterHelper.copyToStream() is the culprit. If I rename merged.pdf to merged.zip, it's a valid ZIP file containing a perfectly valid merged.pdf file!
Could anyone (dear authors of the library) comment on why this is happening?

The idea is that when a task consumes a MultipleOutputTaskParameters producing multiple output documents, the StreamTaskOutput has to group them to be able to write all of them to a stream output. Unfortunately Sejda currently applies the same logic to SingleOutputTaskParameters, hence your issue. We can fix this in Sejda 2.0 because it makes more sense to directly stream the out document in case of SingleOutputTaskParameters. For Sejda 1.x I'm not sure how to address this remaining compatible with the existing behaviour.

Weka CSV loader limit

I'm using Weka for a sentiment analysis project i'm working on. I'm using Weka CSV Loader to load the training instances from a csv file, but for some reason if i want to load more than 70 instances, the program gives me an "java.lang.ArrayIndexOutOfBoundsException: 2" exception. I found that u can give options to Weka CSV Loader
-B
The size of the in memory buffer (in rows).
(default: 100)
this one beeing maybe the one i need to set, to get rid of this error, but i'm not sure how to do this from a Java project. If anyone can help me with this, i would appreciate it greatly
UPDATE: The buffer size change didn't help the problems comes from somewhere else
How i'm using the loader:
private void getTrainingDataset(final String INPUT_FILENAME)
{
try{
//reading the training dataset from CSV file
CSVLoader trainingLoader =new CSVLoader();
trainingLoader.setSource(new File(INPUT_FILENAME));
inputDataset = trainingLoader.getDataSet();
}catch(IOException ex)
{
System.out.println("Exception in getTrainingDataset Method");
}
}
UPDATE: for those who want to know where the exception occurs
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at weka.core.converters.CSVLoader.getInstance(CSVLoader.java:1251)
at weka.core.converters.CSVLoader.readData(CSVLoader.java:866)
at weka.core.converters.CSVLoader.readHeader(CSVLoader.java:1150)
at weka.core.converters.CSVLoader.getStructure(CSVLoader.java:924)
at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:836)
at sentimentanalysis.SentimentAnalysis.getTrainingDataset(SentimentAnalysis.java:209)
at sentimentanalysis.SentimentAnalysis.trainClassifier(SentimentAnalysis.java:134)
at sentimentanalysis.SentimentAnalysis.main(SentimentAnalysis.java:282)
UPDATE: Even for under 70 instances, after a few, the Classifier also gives an error. Everything works fine for like 10-20 instances but it all goes to shit for more :)

Weka read CSV two times, first pass limited to buffersize (in rows) to extract classes of nominal attributes, the second pass read the entire file.
the classes of each nominal attribute much match the classes of the training set (no more, no less).
increase the value of the buffersize to more than the number of rows
if still an error occurs then look for a class that it is not in the both files.

Trailing null (\x00) characters when writing text to Accumulo

I am trying to write the name of a file into Accumulo. I am using accumulo-core-1.43.
For some reason, certain files seem to be written into Accumulo with trailing \x00 characters at the end of the name. The upload is coming through a Java servlet (using the jquery file upload plugin). In the servlet, I check the name of the file with a System.out.println and it looks normal, and I even tried unescaping the string with
org.apache.commons.lang.StringEscapeUtils.unescapeJava(...);
The actual writing to accumulo looks like this:
Mutation mut = new Mutation(new Text(checkSum));
Value val = new Value(new Text(filename).getBytes());
long timestamp = System.currentTimeMillis();
mut.put(new Text(colFam), new Text(EMPTY_BYTES), timestamp, val);
but nothing unusual showed up there (perhaps \x00 isn't escaped)? But then if I do a scan on my table in accumulo, there will be one or more \x00 in the file name.
The problem this seems to cause is that I return that string within XML when I retrieve a list of files (where it shows up) and pass that back to the browser, the the XSL that is supposed to render the information in the XML no longer works when there's these extra characters (not sure why that is the case either).
In chrome, for the response on these calls, I see that there's three red dots after the file name, and when I hover over it, \u0 pops up (which I think is a different representation of 0/null?).
Anyway, I'm just trying to figure out why this happens, or at the very least, how I can filter out \x00 characters before returning the file in Java. any ideas?

You are likely incorrectly using the Hadoop Text class -- this is not an error with Accumulo. Specifically, you make the mistake in your above example:
Value val = new Value(new Text(filename).getBytes());
You must adhere to the length of provided by the Text class. See the Text javadoc for more information. If you're using Hadoop-2.2.0, you can use the provided copyBytes method on Text. If you're on older version of Hadoop where this method doesn't yet exist, you can use something like the ByteBuffer class or the System.arraycopy method to get a copy of the byte[] with the proper limits enforced.

Unable to print PNG files using Java Print Services (Everything else works fine)

I am using the Java print services to print a PNG file, however it is sending erroneous output to the printer. What actually gets printed (when I use a PNG) is some text saying:
ERROR: /syntaxerror in --%ztokenexec_continue--
Operand stack:
--nostringval-
There seems to be some more text, but that is kind of lost out of the page margins. I am setting the DocFlavor to DocFlavor.INPUT_STREAM.PNG and the specified file is actually an InputStream (Just changing the DoccFlavor to DocFlavor.INPUT_STREAM.PDF and using a pdf file works).
I have also tried it with different PNG files, but the problem persists. For what its worth, even PostScript seems to be working.
The errors that are being printed look quite similar to the gd (or ImageMagick?) errors. So, my best assumption right now is that the conversion from PNG -> PS is failing.
The code is as follows:
PrintService printService = this.getPrintService("My printer name");
final Doc doc = new SimpleDoc(document, DocFlavor.INPUT_STREAM.PNG, null);
final DocPrintJob printJob = printService.createPrintJob();
Here, getPrintService fetches a print service and is fetching a valid one. As for the document, here is how I get it:
File pngFile = new File("/home/rprabhu/temp/myprintfile.png");
FileInputStream document = new FileInputStream(pngFile);
I have no clue why it is going wrong, and I don't see any errors being output to the console as well.
Any help is greatly appreciated. Thanks.

Printing is always a messy business – inevitably so, because you have to worry about tedious details such as the size of a page, the margin sizes, and how many pages you're going to need for your output. As you might expect, the process for printing an image is different from printing text and you may also have the added complication of several printers with different capabilities being available, so with certain types of documents you need to select an appropriate printer.
Please see below links :
http://vineetreynolds.wordpress.com/2005/12/12/silent-print-a-pdf-print-pdf-programmatically/
http://hillert.blogspot.com/2011/12/java-print-service-frustrations.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.