Change encoding of existing file with Java?

Change encoding of existing file with Java? - java

I need to programatically change the encoding of a set of *nix scripts to UTF-8 from Java. I won't write anything to them, so I'm trying to find what's the easiest|fastest way to do this. The files are not too many and are not that big. I could:
"Write" an empty string using an OutputStream with UTF-8 set as encoding
Since I'm already using FileUtils (from Apache Commons), I could read|write the contents of these files, passing UTF-8 as encoding
Not a big deal, but has anyone run into this case before? Are there any cons on either approach?

As requested, and since you're using commons io, here is example code (error checking to the wind):
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
public class Main {
public static void main(String[] args) throws IOException {
String filename = args[0];
File file = new File(filename);
String content = FileUtils.readFileToString(file, "ISO8859_1");
FileUtils.write(file, content, "UTF-8");
}
}

Related

Copy All Type of Files in Java

I'm trying to make a simple program to copy file of any type. I write the code as below.
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.File;
public class CopyExample {
public static void main(String[] args) throws Exception {
File f = new File("image.jpg");
FileInputStream is = new FileInputStream(f);
FileOutputStream os = new FileOutputStream("copy-image.png");
byte[] ar = new byte[(int)f.length()];
is.read(ar);
os.write(ar);
is.close();
os.close();
}
}
I already tested this code for .txt , .jpg , .png, .pdf It is working fine.
But I want to ask is it fine? or is there any other way to do this in better way?

Copying a file is not about its file extension or type. It is about its content. If file is so big maybe computer's memory will not be enough.
Apache's FileUtils may be useful for your question.
this Q&A may help you.
And this article is about your question

Java 7 provides Files class that you could use to copy a file
Files.copy(src,dest);

Downloading HTML instead of File

I'm using Java code to download a file from the Internet and save it to some directory.
However, the code downloads the HTML source code of the page instead of the file contents.
The code below illustrates the problem:
import java.awt.*;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;
import java.nio.channels.Channels;
import java.nio.channels.ReadableByteChannel;
public class JavaFileDownloadTest
{
public static void download(String remoteURL, String targetFilePath)
throws IOException
{
URL downloadableFile = new URL(remoteURL);
ReadableByteChannel readableByteChannel = Channels.newChannel(downloadableFile.openStream());
FileOutputStream fileOutputStream = new FileOutputStream(targetFilePath);
fileOutputStream.getChannel().transferFrom(readableByteChannel, 0, Long.MAX_VALUE);
}
public static void main(String[] arguments) throws IOException
{
String userHome = System.getProperty("user.home");
String fileName = "Test.txt";
String targetFilePath = userHome + File.separator + "Downloads" + File.separator + fileName;
download("http://bullywiiplaza.cuccfree.com/" + fileName, targetFilePath);
Desktop.getDesktop().open(new File(targetFilePath));
}
}
The file located here contains the text
Hello StackOverflow!
However, when downloaded using the above code, I'm getting the HTML source code as file content instead:
<html><body><script type="text/javascript" src="/aes.js" ></script><script>function toNumbers(d){var e=[];d.replace(/(..)/g,function(d){e.push(parseInt(d,16))});return e}function toHex(){for(var d=[],d=1==arguments.length&&arguments[0].constructor==Array?arguments[0]:arguments,e="",f=0;f<d.length;f++)e+=(16>d[f]?"0":"")+d[f].toString(16);return e.toLowerCase()}var a=toNumbers("f655ba9d09a112d4968c63579db590b4"),b=toNumbers("98344c2eee86c3994890592585b49f80"),c=toNumbers("ae71113e4baf38cee1c1aacf0ae66c00");document.cookie="__test="+toHex(slowAES.decrypt(c,2,a,b))+"; expires=Thu, 31-Dec-37 23:55:55 GMT; path=/"; document.cookie="referrer="+escape(document.referrer); location.href="http://bullywiiplaza.cuccfree.com/Test.txt?ckattempt=1";</script><noscript>This site requires Javascript to work, please enable Javascript in your browser or use a browser with Javascript support</noscript></body></html>
Why is this and how do I fix it? I already tried various libraries and methods for downloading files but all of them yielded this same "faulty" result.

I think the target url executes some javascript to provide the file. This script has to be interpreted (and executed) by some javascript engine.
So you need either some resolution to get the real file url (and not just the javascript) or integrate some javascript engine to execute the script code and get the result.
I think this could help you: Executing javascript in java - Opening a URL and getting links
or better:
http://www.java2s.com/Code/Java/JDK-6/ExecuteJavascriptscriptinafile.htm

I switched the website hoster to this one and now the code from above works as expected.

http://bullywiiplaza.cuccfree.com/Test.txt doesn't exist. I think the url should be https://bullywiiplaza.cuccfree.com/Test.txt which exists.

Open a local PDF document at an arbitrary page using java

As the title says, I have a PDF document which is stored locally and using Java I would like to open it on an arbitrary page. My question is much the same as this question, however the proposed solution seems rather hacky so I would prefer a more conventional answer if possible. I understand that the code shown below will not work because #page=5 should be appended to the URL in the browser and not the file path, however I'm really not sure what to try next. Any help would be much appreciated!
import java.io.File;
import java.io.IOException;
import java.net.URL;
public class OpenPdfTest {
public OpenPdfTest(){
try {
File myFile = new File("test.pdf");
URL url = myFile.toURI().toURL();
Process p = Runtime.getRuntime().exec("rundll32 url.dll,FileProtocolHandler " + url + "#page=5");
} catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args){
new OpenPdfTest();
}
}

What about using http://tika.apache.org/ and read the whole file, convert it and use the part of the pdf File that you want. You can read in any File you want with Apache Tika. With this Lib you can open any kind of files, also pdf-Files and proceed them.
Take my Answer just as a first guess.

java.io.File: accessing files with invalid filename encodings

Because the constructor of java.io.File takes a java.lang.String as argument, there is seemingly no possibility to tell it which filename encoding to expect when accessing the filesystem layer. So when you generally use UTF-8 as filename encoding and there is some filename containing an umlaut encoded as ISO-8859-1, you are basically **. Is this correct?
Update: because noone seemingly gets it, try it yourself: when creating a new file, the environment variable LC_ALL (on Linux) determines the encoding of the filename. It does not matter what you do inside your source code!
If you want to give a correct answer, demonstrate that you can create a file (using regular Java means) with proper ISO-8859-1 encoding while your JVM assumes LC_ALL=en_US.UTF-8. The filename should contain a character like ö, ü, or ä.
BTW: if you put filenames with encoding not appropriate to LC_ALL into maven's resource path, it will just skip it....
Update II.
Fix this: https://github.com/jjYBdx4IL/filenameenc
ie. make the f.exists() statement become true.
Update III.
The solution is to use java.nio.*, in my case you had to replace File.listFiles() with Files.newDirectoryStream(). I have updated the example at github. BTW: maven seems to still use the old java.io API.... mvn clean fails.

The solution is to use the new API and file.encoding. Demonstration:
fge#alustriel:~/tmp/filenameenc$ echo $LC_ALL
en_US.UTF-8
fge#alustriel:~/tmp/filenameenc$ cat Test.java
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class Test
{
public static void main(String[] args)
{
final String testString = "a/üöä";
final Path path = Paths.get(testString);
final File file = new File(testString);
System.out.println("Files.exists(): " + Files.exists(path));
System.out.println("File exists: " + file.exists());
}
}
fge#alustriel:~/tmp/filenameenc$ install -D /dev/null a/üöä
fge#alustriel:~/tmp/filenameenc$ java Test
Files.exists(): true
File exists: true
fge#alustriel:~/tmp/filenameenc$ java -Dfile.encoding=iso-8859-1 Test
Files.exists(): false
File exists: true
fge#alustriel:~/tmp/filenameenc$
One less reason to use File!

Currently I am sitting at a Windows machine, but assuming you can fetch the file system encoding:
String encoding = System.getProperty("file.encoding");
String encoding = system.getEnv("LC_ALL");
Then you have the means to check whether a filename is valid. Mind: Windows can represent Unicode filenames, and my own Linux of course uses UTF-8.
boolean validEncodingForFileName(String name) {
try {
byte[] bytes = name.getBytes(encoding);
String nameAgain = new String(bytes, encoding);
return name.equals(nameAgain); // Nothing lost?
} catch (UnsupportedEncodingException ex) {
return false; // Maybe true, more a JRE limitation.
}
}
You might try whether File is clever enough (I cannot test it):
boolean validEncodingForFileName(String name) {
return new File(name).getCanonicalPath().endsWith(name);
}

How I fixed java.io.File (on Solaris 5.11):
set the LC_* environment variable(s) in the shell/globally.
eg. java -DLC_ALL="en_US.ISO8859-1" does not work!
make sure the set locale is installed on the system
Why does that fix it?
Java internally calls nl_langinfo() to find out the encoding of paths on the HD, which does not notice environment variables set "for java" via -DVARNAME.
Secondly, this falls back to C/ASCII if the locale set by eg. LC_ALL is not installed.

String can represent any encoding:
new File("the file name with \u00d6")
or
new File("the file name with Ö")

You can set the Encoding while reading and writing the File. as a example when you write to file you can give the encoding to your out put stream writer as follows. new OutputStreamWriter(new FileOutputStream(fileName), "UTF-8") .
When you read a file you can give the decoding character set as flowing class constructor . InputStreamReader(InputStream in, CharsetDecoder dec)

Jave FIle Not Found

I have the following code for reading a excel sheet in Java using the Apache POI. Although the file exists, why does it give me a FileNotFound Exception?
import org.apache.poi.hssf.usermodel.HSSFSheet;
import java.io.FileInputStream;
import java.io.File;
public class ReadFromExcel {
public static void main(String[] args) {
FileInputStream file = new FileInputStream(new File("C:\\Personal\\test.xlsx"));
}
}
I just copied and pasted the File location from windows explorer so I know that the file exists for sure. Then Why can't Java find it?
Used same path with the "File" class instead of "FileInputStream" and it works fine. What is special about paths in the class FileInputStream?

Try this code:
import org.apache.poi.hssf.usermodel.HSSFSheet;
import java.io.FileInputStream;
import java.io.File;
import java.io.FileNotFoundException;
public class ReadFromExcel {
public static void main(String[] args) throws FileNotFoundException {
File f=new File("C:"+File.separator+"Personal"+File.separator+"test.xlsx");
FileInputStream file=null;
if(f.exists()) {
file = new FileInputStream(f);
//rest of code
} else{
System.out.println("The file does not exist!Please enter correct filename!");
}
}
}
I have 3 things to point out:
Firstly you have not added a try/catch block.My IDE simply does not let it compile!
Using File.separator is the more recommended way instead of using "\" or "/" if you are using file paths as they depend on OS.They make your code more portable.
Checking that the file whether exists or not using f.exists() would let you know if actually the file you are trying to pass as parameter to FileInputStream exists.
Sure that would help!!

Maybe the first two '\' only represent one '\', so you can use file path as "C:\\Personal\test.xlsx"

Suggestion: call File.canRead() to see if you have permissions to open the file.
Java new File() says FileNotFoundException but file exists
There are three cases where a FileNotFoundException may be thrown.
The named file does not exist.
The named file is actually a directory.
The named file cannot be opened for reading for some reason.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Change encoding of existing file with Java? - java

Related

Copy All Type of Files in Java

Downloading HTML instead of File

Open a local PDF document at an arbitrary page using java

java.io.File: accessing files with invalid filename encodings

Jave FIle Not Found

Categories

Resources