I am creating an UI where I ask the user to upload an existing excel file and I am getting this error. I have done some searching, and tried to use a POIFileSystem object before passing in the FileInputStream but that didn't. I am getting this error of the line that creates the workbook.
This my code:
public static Cell readExcelFile(byte[] byteFile){
if (file == null){
System.out.println("file is empty");
} else {
InputStream input = new ByteArrayInputStream(byteFile);
POIFileSytem fsPOI = new POIFileSytem(input);
HSSFWorkbook wb = new HSSFWorkbook(fsPOI);
//continues to read the file
}
}
I found an answer! So I as you see from the code above I passed my file as a byte array. Prior to that in my POST request I upload the file as a string and decode it from base 64 and into an byte array, that were the error occured. It wasn't decode correctly. I printed out what the String version of the file from the POST endpoint and that consist of "data:application/vmd.ms-excel;base64,OMBR4K(this the file data encoded in base 64 but Im not going to type all of it). Basically it wasn't decoding correctly because of everything before the comma was a regular text header. I spilt the string and saved everything after the comma in another string, and converted that into a byte array and decoded. Then my file was acceptable and I was able the create the workbook, and I also had to use .getSheetAt(int) instead of .getSheet("string")
Related
I need to read an excel(.xls) file that i'm receiving.
Using the regular charsets like UTF-8, Cp1252, ISO-8859-1, UTF-16LE, none of these helped me, the characters are still malformed.
So i search ended up using juniversalchardet, it showed me that the charset was MacCyrillic, used MacCyrillic to read the file, but still the same weird outcome.
When i open the file on excel everything is fine, all the characters are fine, since its portuguese its filled whit Ç ~ and such. But opening whit notepad or trough java the file is all messed up.
But if open the file on my excel and then save it again like .txt it becomes readable
My method to find the charset
public static void lerCharset(String fileName) throws IOException {
byte[] buf = new byte[50000000];
FileInputStream fis = new FileInputStream(fileName);
// (1)
UniversalDetector detector = new UniversalDetector(null);
// (2)
int nread;
while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
// (3)
detector.dataEnd();
// (4)
String encoding = detector.getDetectedCharset();
if (encoding != null) {
System.out.println("Detected encoding = " + encoding);
} else {
System.out.println("No encoding detected.");
}
// (5)
detector.reset();
fis.close();
}
How can i discover the correct charset?
Should i try a different aproach? Like making my java re-save the excel and then start reading?
If I'm understanding your question, you're trying to read the excel file like a text file.
The challenge is that .xls files are actually binary files containing the text, formatting, sheet information, macro information, etc...
You'd either need to save the files as .csv (Either via Excel before running your program or through your program directly), upgrade them to .xlsx (which has numerous libraries that can read the file as an XML at that point) or use a library (such as apache POI or anything similar) or even query the data out using ADO.
Good luck and I hope that's what you were implying via your question.
Code:
WorkbookSettings workbookSettings = new WorkbookSettings();
WorkbookSettings.setEncoding("Cp1252");
I'm working with Java's encryption library and getting a IllegalBlockSizeException.
I am currently trying to extract database contents in XML file format. During the data dump, I am creating a manifest file with a string that gets decrypted using a key defined in the database.
Later, when the contents of the XML's files are loaded into another database, it gets the key from that database and uses it to decrypt the manifest. If the decrypted manifest does not match the original contents, that means the encryption keys in the source and destination databases do not match and the user is notified of this.
The following is the code. The EncryptionEngine object is a singleton that uses the Java encryption library to abstract away a lot of the details of encryption. Assume that it works correctly, as it's fairly old and mature code.
This is all in a class I've made. First, we have these data members:
private final String encryptedManifestContents;
private final static String DECRYPTED_MANIFEST_CONTENTS = "This file contains the encrypted string for validating data in the dump and load process";
final static String ENCRYPTED_MANIFEST_FILENAME = "manifest.bin";
First the encryption process. The string is encrypted like the following:
final EncryptionEngine encryptionEngine = EncryptionEngine.getInstance();
encryptedManifestContents = encryptionEngine.symmetricEncrypt(DECRYPTED_MANIFEST_CONTENTS); // The contents get converted to bytes via getBytes("UTF-8")
Then written to the manifest file (destination is just a variable holding the file path as a string):
EncryptedManifestUtil encryptedManifestUtil = new EncryptedManifestUtil(); // The class I've created. The constructor is the code above, which just initialized the EncryptionEngine and encrypted the manifest string.
manifestOut = new FileOutputStream(destination + "/" + ENCRYPTED_MANIFEST_FILENAME);
manifestOut.write(encryptedManifestUtil.encryptedManifestContents.getBytes("UTF-8"));
At this point, the encryption process is done. We've taken a String, encrypted it, and written the contents to a file, in that order. Now when someone loads the data, the decryption process starts:
BufferedReader fileReader = new BufferedReader(new FileReader(filename)); // Filename is the manifest's file name and location
final EncryptionEngine encryptionEngine = EncryptionEngine.getInstance();
String decryptedManifest = encryptionEngine.decryptString(fileReader.readLine().getBytes("UTF-8")); // This is a symmetric decrypt
When the decryption happens, it throws this exception:
Caused by: javax.crypto.IllegalBlockSizeException: last block incomplete in decryption
at org.bouncycastle.jcajce.provider.symmetric.util.BaseBlockCipher.engineDoFinal(Unknown Source)
at javax.crypto.Cipher.doFinal(DashoA13*..)
It appears to read and write correctly to the file, but the contents are gibberish to me. The result from the fileReader.readLine() is:
9�Y�������䖷�߾��=Ă��� s7Cx�t�b��_-(�b��LFA���}�6�f����Ps�n�����ʢ�#�� �%��%�5P�p
Thanks for the help.
EDIT: So I changed the way I write to a file.
Recall this line:
encryptedManifestContents = encryptionEngine.symmetricEncrypt(DECRYPTED_MANIFEST_CONTENTS);
The encrypt first gets the bytes from the inputted string, then decrypts, then changes the bytes back to a string by first encoding it to the base 64 bytes. Then it converts the base 64 bytes array back to a string.
With this in mind, I changed the file writer to a PrintWriter instead of a FileOutputStream and directly write the string to the file instead of the bytes. I'm still getting the error unfortunately. However there seem to be less of the � in the resulting String from the read line.
It looks like the problem is with your fileReader.readLine() - you're writing a byte stream to a file, and then reading it back in as a string. Instead, you should either read in a byte stream, e.g. refer to this question, or else use Base64 Encoding to convert your byte array to a string, write it to a file, read it from a file, and convert it back to a byte array.
I believe you are incorrectly using a Reader which is an object defined to read characters when you actually want to be dealing strictly in bytes. This is most likely not the entirety of your problem but if you are writing bytes you should read bytes not characters.
This question already has answers here:
How do I read / convert an InputStream into a String in Java?
(62 answers)
Closed 8 years ago.
I want to directly read a file, put it into a string without storing the file locally. I used to do this with an old project, but I don't have the source code anymore. I used to be able to get the source of my website this way.
However, I don't remember if I did it by "InputStream to String array of lines to String", or if I directly read it into a String.
Was there a function for this, or am I remembering wrong?
(Note: this function would be the PHP equivalent of file_get_contents($path))
You need to use InputStreamReader to convert from a binary input stream to a Reader which is appropriate for reading text.
After that, you need to read to the end of the reader.
Personally I'd do all this with Guava, which has convenience methods for this sort of thing, e.g. CharStreams.toString(Readable).
When you create the InputStreamReader, make sure you supply the appropriate character encoding - if you don't, you'll get junk text out (just like trying to play an MP3 file as if it were a WAV, for example).
Check out apache-commons-io and for your use case FileUtils.readFileToString(File file)
(should not be to hard to get a File form the path).
You can use the library or have a look at the code - as this is open.
There is no direct way to read a File into a String.
But there is a quick alternative - read the File into a Byte array and convert it into a String.
Untested:
File f = new File("/foo/bar");
InputStream fStream = new FileInputStream(f);
ByteArrayOutputStream bStream = new ByteArrayOutputStream();
for(int data = fStream.read(); data > -1; data = fStream.read()) {
b.write(data);
}
String theResult = new String(bStream.toByteArray(), "UTF-8");
I'm reading a file line by line, like this:
FileReader myFile = new FileReader(File file);
BufferedReader InputFile = new BufferedReader(myFile);
// Read the first line
String currentRecord = InputFile.readLine();
while(currentRecord != null) {
currentRecord = InputFile.readLine();
}
But if other types of files are uploaded, it will still read their contents. For instance, if the uploaded file is an image, it will output junk characters when reading the file. So my question is: how can I check the file is CSV for sure before reading it?
Checking extension of the file is kind of lame since someone can upload a file that is not CSV but has a .csv extension. Thanks in advance.
Determining the MIME type of a file is not something easy to do, especially if ASCII sections can be mixed with binary ones.
Actually, when you look at how a java mail system does determine the MIME type of an email, it does involve reading all bytes in it, and applying some "rules".
Check out MimeUtility.java
If the primary type of this datasource is "text" and if all the bytes in its input stream are US-ASCII, then the encoding is "7bit".
If more than half of the bytes are non-US-ASCII, then the encoding is "base64".
If less than half of the bytes are non-US-ASCII, then the encoding is "quoted-printable".
If the primary type of this datasource is not "text", then if all the bytes of its input stream are US-ASCII, the encoding is "7bit".
If there is even one non-US-ASCII character, the encoding is "base64".
#return "7bit", "quoted-printable" or "base64"
As mentioned by mmyers in a deleted comment, JavaMimeType is supposed to do the same thing, but:
it is dead since 2006
it does involve reading the all content!
:
File file = new File("/home/bibi/monfichieratester");
InputStream inputStream = new FileInputStream(file);
ByteArrayOutputStream byteArrayStream = new ByteArrayOutputStream();
int readByte;
while ((readByte = inputStream.read()) != -1) {
byteArrayStream.write(readByte);
}
String mimetype = "";
byte[] bytes = byteArrayStream.toByteArray();
MagicMatch m = Magic.getMagicMatch(bytes);
mimetype = m.getMimeType();
So... since you are reading the all content of the file anyway, you could take advantage of that to determine the type based on that content and your own rules.
Java Mime Magic may be of use. It'll analyse mime-types from files and inputstreams. I can't vouch for it's functionality, however.
This link may provide further info. It provides several different means of determining how to do what you want (or at least something similar).
I would perhaps be tempted to write something specific to your problem domain. e.g. determining the number of comma-separated values per line and rejecting if it's not within certain limits. Then split on the commas and parse each entry according to requirements (e.g. are they doubles/floats/valid Strings - and if strings, what encoding). I think you may have to do this anyway, given that someone may upload a file that starts like a CSV but is corrupted half-way through.
This problem seems to happen inconsistently. We are using a java applet to download a file from our site, which we store temporarily on the client's machine.
Here is the code that we are using to save the file:
URL targetUrl = new URL(urlForFile);
InputStream content = (InputStream)targetUrl.getContent();
BufferedInputStream buffered = new BufferedInputStream(content);
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int letter;
while((letter = buffered.read()) != -1)
fos.write(letter);
fos.close();
Later, I try to access that file by using:
ObjectInputStream keyInStream = new ObjectInputStream(new FileInputStream(savedFile));
Most of the time it works without a problem, but every once in a while we get the error:
java.io.StreamCorruptedException: invalid stream header: 0D0A0D0A
which makes me believe that it isn't saving the file correctly.
I'm guessing that the operations you've done with getContent and BufferedInputStream have treated the file like an ascii file which has converted newlines or carriage returns into carriage return + newline (0x0d0a), which has confused ObjectInputStream (which expects serialized data objects.
If you are using an FTP URL, the transfer may be occurring in ASCII mode.
Try appending ";type=I" to the end of your URL.
Why are you using ObjectInputStream to read it?
As per the javadoc:
An ObjectInputStream deserializes primitive data and objects previously written using an ObjectOutputStream.
Probably the error comes from the fact you didn't write it with ObjectOutputStream.
Try reading it wit FileInputStream only.
Here's a sample for binary ( although not the most efficient way )
Here's another used for text files.
There are 3 big problems in your sample code:
You're not just treating the input as bytes
You're needlessly pulling the entire object into memory at once
You're doing multiple method calls for every single byte read and written -- use the array based read/write!
Here's a redo:
URL targetUrl = new URL(urlForFile);
InputStream is = targetUrl.getInputStream();
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int count;
byte[] buff = new byte[16 * 1024];
while((count = is.read(buff)) != -1) {
fos.write(buff, 0, count);
}
fos.close();
content.close();
You could also step back from the code and check to see if the file on your client is the same as the file on the server. If you get both files on an XP machine, you should be able to use the FC utility to do a compare (check FC's help if you need to run this as a binary compare as there is a switch for that). If you're on Unix, I don't know the file compare program, but I'm sure there's something.
If the files are identical, then you're looking at a problem with the code that reads the file.
If the files are not identical, focus on the code that writes your file.
Good luck!