How to read multiple-encoded zip file - java

In my java web application when I upload a Zip file (thread dump), I get inputstream in servlet. I use the Zip4j library to unzip the file and then write it into a file. This zip file has multi encoded content (UTF-8, windows-1252, ISO-8859-1, ISO-8859-2, IBM424_rtl). When I open the output file, I see some characters like this Mac OS X 2 € ² ATTR ² ˜
Here is a sample code. Can you please let me know how can I fix this issue?
// Using Zip4j library to uncompress ZIP format
ZipInputStream zis = new ZipInputStream(iStream);
FileOutputStream zos = new FileOutputStream("output_file.txt");
ByteArrayOutputStream out = new ByteArrayOutputStream();
LocalFileHeader localFileHeader = zis.getNextEntry();
while (localFileHeader != null) {
if(localFileHeader.isDirectory()) {
localFileHeader = zis.getNextEntry();
continue;
}
IOUtils.copy(zis, out);
localFileHeader = zis.getNextEntry();
}
InputStreamReader isr = new InputStreamReader(new ByteArrayInputStream(out.toByteArray()));
BufferedReader reader = new BufferedReader(isr);
String str;
while ((str = reader.readLine()) != null) {
// This is a custom method that will return the charset of the input string using apache tikka library
String encoding = CharsetDetector.detectCharset(str);
zos.write(str.getBytes(encoding));
zos.write("\n".getBytes());
}
isr.close();
reader.close();
zos.close();
zis.close();
// Method is used to detect charset
public static String detectCharset(String text) throws IOException {
org.apache.tika.parser.txt.CharsetDetector detector = new org.apache.tika.parser.txt.CharsetDetector();
detector.setText(text.getBytes());
String charset = detector.detect().getName();
return charset;
}
Note: I am running application on windows machine.
Thanks in advance!

Related

how to decode/ get encoding of file (Power BI desktop file)

I am having power BI desktop report(pbix) internal file (DataMashup), which i am trying to decode.
My Aim is to create Power-BI desktop report, Data Model using any programming language. I am using Java for initial.
files are encoded with some encoding technique.
I tried to get encoding of file and it is returning windows 1254. but decoding is not happening.
File f = new File("example.txt");
String[] charsetsToBeTested = {"UTF-8", "windows-1254", "ISO-8859-7"};
CharsetDetector cd = new CharsetDetector();
Charset charset = cd.detectCharset(f, charsetsToBeTested);
if (charset != null) {
try {
InputStreamReader reader = new InputStreamReader(new FileInputStream(f), charset);
int c = 0;
while ((c = reader.read()) != -1) {
System.out.print((char)c);
}
reader.close();
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
}catch(IOException ioe){
ioe.printStackTrace();
}
}else{
System.out.println("Unrecognized charset.");
}
Unzipping of file is also not working
public void unZipIt(String zipFile, String outputFolder)
{
byte buffer[] = new byte[1024];
try
{
File folder = new File(outputFolder);
if(!folder.exists())
{
folder.mkdir();
}
ZipInputStream zis = new ZipInputStream(new FileInputStream(zipFile));
System.out.println(zis);
System.out.println(zis.getNextEntry());
for(ZipEntry ze = zis.getNextEntry(); ze != null; ze = zis.getNextEntry())
{
String fileName = ze.getName();
System.out.println(ze);
File newFile = new File((new StringBuilder(String.valueOf(outputFolder))).append(File.separator).append(fileName).toString());
System.out.println((new StringBuilder("file unzip : ")).append(newFile.getAbsoluteFile()).toString());
(new File(newFile.getParent())).mkdirs();
FileOutputStream fos = new FileOutputStream(newFile);
int len;
while((len = zis.read(buffer)) > 0)
{
fos.write(buffer, 0, len);
}
fos.close();
}
zis.closeEntry();
zis.close();
System.out.println("Done");
}
catch(IOException ex)
{
ex.printStackTrace();
}
}
The file contains a binary header and then XML with UTF-8 specified.
The header data seems to hold the file name (Config/Package.xml), so assuming a zip format is understandable. With a zip format also there would be binary data at the end of file.
Maybe the file was downloaded using FTP, and a text conversion ("\n" to "\r\n") was done. Then the zip would be corrupted. Renaming the file to .zip might help testing the file with zip tools.
Try first the .tar format. This would be logical as the XML file is not compressed. Add .tar to the file ending.
Otherwise, if the content is always UTF-8 XML:
Path f = Paths.get("example.txt");
String start ="<?xml";
String end = ">";
byte[] bytes = Files.readAllBytes(f);
String s = new String(bytes, StandardCharsets.ISO_8859_1); // Single byte encoding.
int startI = s.indexOf(start);
int endI = s.lastIndexOf(end) + end.length();
//bytes = Arrays.copyOfRange(bytes, startI, endI);
String xml = new String(bytes, startI, endI - startI, StandardCharsets.UTF_8);
You can use the System.IO.Packaging library to extract the Power BI data mashup. It uses the OPC package standard, see here.

Blank pages in pdf after downloading it from web

I am trying to download a PDF file with HttpClient, it is downloading the PDF file but pages are blank. I can see the bytes on console from response if I print them. But when I try to write it to file it is producing a blank file.
FileUtils.writeByteArrayToFile(new File(outputFilePath), bytes);
However the file is showing correct size of 103KB and 297KB as expected but its just blank!!
I tried with Output stream as well like:
FileOutputStream fileOutputStream = new FileOutputStream(outFile);
fileOutputStream.write(bytes);
Also tried to write with UTF-8 coding like:
Writer out = new BufferedWriter( new OutputStreamWriter(
new FileOutputStream(outFile), "UTF-8"));
String str = new String(bytes, StandardCharsets.UTF_8);
try {
out.write(str);
} finally {
out.close();
}
Nothing is working for me. Any suggestion is highly appreciated..
Update: I am using DefaultHttpClient.
HttpGet httpget = new HttpGet(targetURI);
HttpResponse response = null;
String htmlContents = null;
try {
httpget = new HttpGet(url);
response = httpclient.execute(httpget);
InputStreamReader dataStream=new InputStreamReader(response.getEntity().getContent());
byte[] bytes = IOUtils.toByteArray(dataStream);
...
You do
InputStreamReader dataStream=new InputStreamReader(response.getEntity().getContent());
byte[] bytes = IOUtils.toByteArray(dataStream);
As has already been mentioned in comments, using a Reader class can damage binary data, e.g. PDF files. Thus, you should not wrap your content in an InputStreamReader.
As your content can be used to construct an InputStreamReader, though, I assume response.getEntity().getContent() returns an InputStream. Such an InputStream usually can be directly used as IOUtils.toByteArray argument.
So:
InputStream dataStream=response.getEntity().getContent();
byte[] bytes = IOUtils.toByteArray(dataStream);
should already work for you!
Here is a method I use to download a PDF file from a specific URL. The method requires two string arguments, an url string (example: "https://www.ibm.com/support/knowledgecenter/SSWRCJ_4.1.0/com.ibm.safos.doc_4.1/Planning_and_Installation.pdf") and a destination folder path to download the PDF file (or whatever) into. If the destination path does not exist within the local file system then it is automatically created:
public boolean downloadFile(String urlString, String destinationFolderPath) {
boolean result = false; // will turn to true if download is successful
if (!destinationFolderPath.endsWith("/") && !destinationFolderPath.endsWith("\\")) {
destinationFolderPath+= "/";
}
// If the destination path does not exist then create it.
File foldersToMake = new File(destinationFolderPath);
if (!foldersToMake.exists()) {
foldersToMake.mkdirs();
}
try {
// Open Connection
URL url = new URL(urlString);
// Get just the file Name from URL
String fileName = new File(url.getPath()).getName();
// Try with Resources....
try (InputStream in = url.openStream(); FileOutputStream outStream =
new FileOutputStream(new File(destinationFolderPath + fileName))) {
// Read from resource and write to file...
int length = -1;
byte[] buffer = new byte[1024]; // buffer for portion of data from connection
while ((length = in.read(buffer)) > -1) {
outStream.write(buffer, 0, length);
}
}
// File Successfully Downloaded");
result = true;
}
catch (MalformedURLException ex) { ex.printStackTrace(); }
catch (IOException ex) { ex.printStackTrace(); }
return result;
}

My JAVA code save a .txt in a different encoding (ANSI, UTF, ...) depending of the operating system

I am trying to save a .txt file since a JAVA code, in a Windows 7 machine, and it encodes the code in ANSI, but when I do the same in a Windows Server 2000 the code is saved in UTF.
I am doing different testings and I checked that the encoding is changing when I run the code each time in Windows Server 2000 without changes on the code.
I´m saving the file in a zip file and the code is the next (I have changed "Cp1252" by "ISO-8859-1" but the result is the same):
public byte[] getBytesZipFile(String nombreFichero, String input) throws IOException {
String tempdir = System.getProperty("java.io.tmpdir");
if (!(tempdir.endsWith("/") || tempdir.endsWith("\\"))) {
tempdir = tempdir + System.getProperty("file.separator");
}
File tempFile = new File(tempdir + nombreFichero + ".txt");
try {
BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(tempFile), "Cp1252"));
bufferedWriter.write(input);
bufferedWriter.flush();
bufferedWriter.close();
ByteArrayOutputStream byteArrayOutputStreambos = new ByteArrayOutputStream();
ZipOutputStream zipOutputStream = new ZipOutputStream(byteArrayOutputStreambos);
FileInputStream fileInputStream = new FileInputStream(tempFile);
zipOutputStream.putNextEntry(new ZipEntry(tempFile.getName()));
byte[] buf = new byte[1024];
int len;
while ((len = fileInputStream.read(buf)) > 0) {
zipOutputStream.write(buf, 0, len);
}
zipOutputStream.closeEntry();
fileInputStream.close();
zipOutputStream.flush();
zipOutputStream.close();
return byteArrayOutputStreambos.toByteArray();
} finally {
tempFile.delete();
}
}
Thanks by the help and answers and regards
It is because of the default encoding of the JVM.
Check this question for how to change the default encoding: Setting the default Java character encoding?
And check this external articel for setting the encoding of your specific file: http://www.mkyong.com/java/how-to-write-utf-8-encoded-data-into-a-file-java/

NegativeArraySizeException while unzipping a ZipInputStream

I try to send some files from server to client using ZipInputStream/ZipOutputStream.
in server, everythng goes well, but in client, when I wanna unzip it, the size of the file is -1
so it fails. what should I do and why it happens?
socket = new Socket("127.0.0.1",3000);
String outDir = "C:\\here";
BufferedInputStream bis = new BufferedInputStream(socket.getInputStream());
ZipInputStream zips = new ZipInputStream(bis);
ZipEntry zipEntry = null;
while(null != (zipEntry = zips.getNextEntry())){
String fileName = zipEntry.getName();
File outFile = new File(outDir + "/" + fileName);
System.out.println(outFile.getName()+" "+zipEntry.getCompressedSize());
if(zipEntry.isDirectory()){
File zipEntryFolder = new File(zipEntry.getName());
if(zipEntryFolder.exists() == false){
outFile.mkdirs();
}
continue;
}else{
File parentFolder = outFile.getParentFile();
if(parentFolder.exists() == false){
parentFolder.mkdirs();
}
}
System.out.println("ZipEntry::"+zipEntry.getCompressedSize());
FileWriter fW=new FileWriter(outFile);
try (BufferedWriter bfW = new BufferedWriter(fW)) {
bfW.write(zips.getNextEntry().toString());
}
}
socket.close();
}
the result for zipEntry.getCompressedSize(); is equal to -1. but right after writing it into socket in server, i check the size and it is the actual size. so I feel puzzled.
the exception that IDE gives is Error in Client invalid stored block lengths
You can't ever rely on getSize returning the actual uncompressed size. You need to call zips.read from the ZipInputStream until it doesn't read any more bytes.

Spring: how to parse uploaded zip file?

I uploaded my zip archive to the server and want to open .txt and .jpg files in it. I successfully get my archive in my Controller and get the name of each file via ZipEntry. Now I want to open it but for this I should get a full path to my file.
I haven't found how I can do that. Could you suggest some approach how to do that ?
Update
I try to use example have been suggested below but I am not be able open the file
ZipFile zFile = new ZipFile("trainingDefaultApp.zip");
I have got the FileNotFoundException
So I return to my start point. I have upload form in Java Spring application. In controller I had got a zip archive as byte[]
#RequestMapping(method = RequestMethod.POST)
public String create(UploadItem uploadItem, BindingResult bindingResult){
try {
byte[] zip = uploadItem.getFileData().getBytes();
saveFile(zip);
Then I had got each ZipEntry
InputStream is = new ByteArrayInputStream(zip);
ZipInputStream zis = new ZipInputStream(is);
ZipEntry entry = null;
while ((entry = zis.getNextEntry()) != null) {
String entryName = entry.getName();
if (entryName.equals("readme.txt")) {
ZipFile zip = new ZipFile(entry.getName()); // here I had got an exception
According to docs I did all right but as for me it is strange to pass the file name only and suspect that you successfully will open the file
I resolve my uissue. The solution is work directly with ZipInputStream. Here the code:
private void saveFile(byte[] zip, String name, String description) throws IOException {
InputStream is = new ByteArrayInputStream(zip);
ZipInputStream zis = new ZipInputStream(is);
Application app = new Application();
ZipEntry entry = null;
while ((entry = zis.getNextEntry()) != null) {
String entryName = entry.getName();
if (entryName.equals("readme.txt")) {
new Scanner(zis); //!!!
//...
zis.closeEntry();
zipFile.getInputStream(ZipEntry entry) will return you the inputstream for the specific entry.
Check out the javadocs for ZipFile.getInputStream() - http://docs.oracle.com/javase/6/docs/api/java/util/zip/ZipFile.html#getInputStream(java.util.zip.ZipEntry).
Update:
I misread your question. For using the ZipInputStream, there is sample code on Oracle's website (http://java.sun.com/developer/technicalArticles/Programming/compression/) that shows you how to read from the stream. See the first code sample: Code
Sample 1: UnZip.java.
Copying here, it is reading from the entry and writing it directly to a file, but you could replace that with whatever logic you need:
ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null) {
System.out.println("Extracting: " +entry);
int count;
byte data[] = new byte[BUFFER];
// write the files to the disk
FileOutputStream fos = new FileOutputStream(entry.getName());
dest = new
BufferedOutputStream(fos, BUFFER);
while ((count = zis.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, count);
}
}

Categories

Resources