following the question I asked before How to have my java project to use some files without using their absolute path? I found the solution but another problem popped up in creating text files that I want to write into.here's my code:
private String pathProvider() throws Exception {
//finding the location where the jar file has been located
String jarPath=URLDecoder.decode(getClass().getProtectionDomain().getCodeSource().getLocation().getPath(), "UTF-8");
//creating the full and final path
String completePath=jarPath.substring(0,jarPath.lastIndexOf("/"))+File.separator+"Records.txt";
return completePath;
}
public void writeRecord() {
try(Formatter writer=new Formatter(new FileWriter(new File(pathProvider()),true))) {
writer.format("%s %s %s %s %s %s %s %s %n", whichIsChecked(),nameInput.getText(),lastNameInput.getText()
,idInput.getText(),fieldOfStudyInput.getText(),date.getSelectedItem().toString()
,month.getSelectedItem().toString(),year.getSelectedItem().toString());
successful();
} catch (Exception e) {
failure();
}
}
this works and creates the text file wherever the jar file is running from but my problem is that when the information is been written to the file, the numbers,symbols, and English characters are remained but other characters which are in Persian are turned into question marks. like: ????? 111 ????? ????.although running the app in eclipse doesn't make this problem,running the jar does.
Note:I found the code ,inside pathProvider method, in some person's question.
Your pasted code and the linked question are complete red herrings - they have nothing whatsoever to do with the error you ran into. Also, that protection domain stuff is a hack and you've been told before not to write data files next to your jar files, it's not how OSes (are supposed to) work. Use user.home for this.
There is nothing in this method that explains the question marks - the string, as returned, has plenty of issues (see above), but NOT that it will result in question marks in the output.
Files are fundamentally bytes. Strings are fundamentally characters. Therefore, when you write code that writes a string to a file, some code somewhere is converting chars to bytes.
Make sure the place where that happens includes a charset encoding.
Use the new API (I think you've also been told to do this, by me, in an earlier question of yours) which defaults to UTF-8. Alternatively, specify UTF-8 when you write. Note that the usage of UTF-8 here is about the file name, not the contents of it (as in, if you put persian symbols in the file name, it's not about persian symbols in the contents of the file / in the contents you want to write).
Because you didn't paste the code, I can't give you specific details as there are hundreds of ways to do this, and I do not know which one you used.
To write to a file given a String representing its path:
Path p = Paths.get(completePath);
Files.write("Hello, World!", p);
is all you need. This will write as UTF_8, which can handle persian symbols (because the Files API defaults to UTF-8 if you specify no encoding, unlike e.g. new File, FileOutputStream, FileWriter, etc).
If you're using outdated APIs: new BufferedWriter(new OutputStreamWriter(new FileOutputStream(thePath), StandardCharsets.UTF-8) - but note that this is a resource leak bug unless you add the appropriate try-with-resources.
If you're using FileWriter: FileWriter is broken, never use this class. Use something else.
If you're converting the string on its own, it's str.getBytes(StandardCharsets.UTF_8), not str.getBytes().
Related
I'm having a problem on Java file encoding.
I have a Java program will save a input stream as a file with a given file name, the code snippet is like:
File out = new File(strFileName);
Files.copy(inStream, out.toPath());
It works fine on Windows unless the file name contains some special characters like Ö, with these characters in the file name, the saved file will display a garbled file name on Windows.
I understand that by applying JVM option -Dfile.encoding=UTF-8 this issue can be fixed, but I would have a solution in my code rather than ask all my users to change their JVM options.
While debugging the program I can see the file name string always shows the correct character, so I guess the problem is not about internal encoding.
Could someone please explain what went wrong behind the scene? and is there a way to avoid this problem programmatically? I tried get the bytes from the string and change the encoding but it doesn't work.
Thanks.
Using the URLEncoder class would work:
String name = URLEncoder.encode("fileName#", "UTF-8");
File output = new File(name);
Im having a strange issue trying to write in text files with strings which contain characters like "ñ", "á".. and so on. Let me first show you my little piece of code:
import java.io.*;
public class test {
public static void main(String[] args) throws Exception {
String content = "whatever";
int c;
c = System.in.read();
content = content + (char)c;
FileWriter fw = new FileWriter("filename.txt");
BufferedWriter bw = new BufferedWriter(fw);
bw.write(content);
bw.close();
}
}
In this example, im just reading a char from the keyboard input and appending it to a given string; then writting the final string into a txt. The problem is that if I type an "ñ" for example (i have a Spanish layout keyboard), when i check the txt, it shows a strange char "¤" where there should be a "ñ", that is, the content of the file is "whatever¤". The same happens with "ç", "ú"..etc. However it writes it fine ("whateverñ") if i just forget about the keyboard input and i write:
...
String content = "whateverñ";
...
or
...
content = content + "ñ";
...
It makes me think that there might be something wrong with the read() method? Or maybe im using it wrongly? or should i use a different method to get the keyboard input? or..? Im a bit lost here.
(Im using the jdk 7u45 # Windows 7 Pro x64)
So ...
It works (i.e. you can read the accented characters on the output file) if you write them as literal strings.
It doesn't work when you read them from System.in and then write them.
This suggests that the problem is on the input side. Specifically, I think your console / keyboard must be using a character encoding for the input stream that does not match the encoding that Java thinks should be used.
You should be able to confirm this tentative diagnosis by outputting the characters you are reading in hexadecimal, and then checking the codes against the unicode tables (which you can find at unicode.org for example).
It strikes me as "odd" that the "platform default encoding" appears to be working on the output side, but not the input side. Maybe someone else can explain ... and offer a concrete suggestion for fixing it. My gut feeling is that the problem is in the way your keyboard is configured, not in Java or your application.
files do not remember their encoding format, when you look at a .txt, the text editor makes a "best guess" to the encoding used.
if you try to read the file into your program again, the text should be back to normal.
also, try printing the "strange" character directly.
I have a Google App Engine App that converts CSV to XML files. It works fine for small XML inputs, but refuses to finalize the file for larger inputed XML. The XML is read from, and the resulting csv files are written to, many times before finalization, over a long-running (multi-day duration) task. My problem is different than this FileServiceFactory getBlobKey throws IllegalArgumentException , since my code works fine both in production and development with small input files. So it's not that I'm neglecting to write to the file before closing/finalizing. However, when I attempt to read from a larger XML file. The input XML file is ~150MB, and the resulting set of 5 CSV files is each much smaller (perhaps 10MB each). I persisted the file urls for the new csv files, and even tried to close them with some static code, but I just reproduce the same error, which is
java.lang.IllegalArgumentException: creation_handle: String properties must be 500 characters or less. Instead, use com.google.appengine.api.datastore.Text, which can store strings of any length.
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedSingleValue(DataTypeUtils.java:242)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:207)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:173)
at com.google.appengine.api.datastore.Query$FilterPredicate.<init>(Query.java:900)
at com.google.appengine.api.datastore.Query$FilterOperator.of(Query.java:75)
at com.google.appengine.api.datastore.Query.addFilter(Query.java:351)
at com.google.appengine.api.files.FileServiceImpl.getBlobKey(FileServiceImpl.java:329)
But I know that it's not a String/Text data type issue, since I am already using similar length file service urls for the previous successful attempts with smaller files. It also wasn't an issue for the other stackoverflow post I linked above. I also tried putting one last meaningless write before finalizing, just in case it would help as it did for the other post, but it made no difference. So there's really no way for me to debug this... Here is my file closing code that is not working. It's pretty similar to the Google how-to example at http://developers.google.com/appengine/docs/java/blobstore/overview#Writing_Files_to_the_Blobstore .
log.info("closing out file 1");
try {
//locked set to true
FileWriteChannel fwc1 = fileService.openWriteChannel(csvFile1, true);
fwc1.closeFinally();
} catch (IOException ioe) {ioe.printStackTrace();}
// You can't get the blob key until the file is finalized
BlobKey blobKeyCSV1 = fileService.getBlobKey(csvFile1);
log.info("csv blob storage key is:" + blobKeyCSV1.getKeyString());
csvUrls[i-1] = blobKeyCSV1.getKeyString();
break;
At this point, I just want to finalize my new blob files for which I have the urls, but cannot. How can I get around this issue, and also, what may be the cause? Again, my code works for small files (~60 kB), but the input file of ~150MB fails). Thank you for any advice on what is causing this or how to get around it! Also, how long will my unfinalized files stick around for, before being deleted?
This issue was a bug in the Java MapReduce and Files API, which was recently fixed by Google. Read announcement here: groups.google.com/forum/#!topic/google-appengine/NmjYYLuSizo
Background:
I have 2 machines: one is running German windows 7 and my PC running English(with Hebrew locale) windows 7.
In my Perl code I'm trying to check if the file that I got from the German machine exists on my machine.
The file name is ßßßzllpoöäüljiznppü.txt
Why is it failed when I do the following code:
use Encode;
use Encode::locale;
sub UTF8ToLocale
{
my $str = decode("utf8",$_[0]);
return encode(locale, $str);
}
if(!-e UTF8ToLocale($read_file))
{
print "failed to open the file";
}
else
{
print $read_file;
}
Same thing goes also when I'm trying to open the file:
open (wtFile, ">", UTF8ToLocale($read_file));
binmode wtFile;
shift #_;
print wtFile #_;
close wtFile;
The file name is converted from German to utf8 in my java application and this is passed to the perl script.
The perl script takes this file name and convert it from utf8 to the system locale, see UTF8ToLocale($read_file) function call, and I believe that is the problem.
Questions:
Can you please tell me what is the OS file system charset encoding?
When I create German file name in OS that the locale is Hebrew in which Charset is it saved?
How do I solve this problem?
Update:
Here is another code that I run with hard coded file name on my PC, the script file is utf8 encoded:
use Encode;
use Encode::locale;
my $string = encode("utf-16",decode("utf8","C:\\TestPerl\\ßßßzllpoöäüljiznppü.txt"));
if (-e $string)
{
print "exists\r\n";
}
else
{
print "not exists\r\n"
}
The output is "not exists".
I also tried different charsets: cp1252, cp850, utf-16le, nothing works.
If I'm changing the file name to English or Hebrew(my default locale) it works.
Any ideas?
Windows 7 uses UTF-16 internally [citation needed] (I don't remember the byte order). You don't need to convert file names because of that. However, if you transport the file via a FAT file system (eg an old USB stick) or other non Unicode aware file systems these benefits will get lost.
The locale setting you are talking about only affects the language of the user interface and the apparent folder names (Programme (x86) vs. Program Files (x86) with the latter being the real name in the file system).
The larger problem I can see is the internal encoding of the file contents that you want to transfer as some applications may default to different encodings depending on the locale. There is no solution to that except being explicit when the file is created. Sticking to UTF-8 is generally a good idea.
And why do you convert the file names with another tool? Any Unicode encoding should be sufficient for transfer.
Your script does not work because you reference an undefined global variable called $read_file. Assuming your second code block is not enclosed in any scope, especially not in a sub, then the #_ variable is not available. To get command line arguments you should consider using the #ARGV array. The logic ouf your script isn't clear anyway: You print error messages to STDOUT, not STDERR, you "decode" the file name and then print out the non-decoded string in your else-branch, you are paranoid about encodings (which is generally good) but you don't specify an encoding for your output stream etc.
I have a text.txt file which contains following txt.
Kontagent Announces Partnership with Global Latino Social Network Quepasa
Released By Kontagent
I read this text file into a string documentText.
documentText.subString(0,9) gives Kontagent, which is good.
But, documentText.subString(87,96) gives y Kontage in windows (IntelliJ Idea) and gives Kontagent in Unix environment. I am guessing it is happening because of blank line in the file (after which the offset got screwed). But, I cannot understand, why I get two different results. I need to get one result in the both the environments.
To read file as string I used all the functions talked about here
How do I create a Java string from the contents of a file? . But, I still get same results after using any of the functions.
Currently I am using this function to read the file into documentText String:
public static String readFileAsString(String fileName)
{
File file = new File(fileName);
StringBuilder fileContents = new StringBuilder((int)file.length());
Scanner scanner = null;
try {
scanner = new Scanner(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
String lineSeparator = System.getProperty("line.separator");
try {
while(scanner.hasNextLine()) {
fileContents.append(scanner.nextLine() + lineSeparator);
}
return fileContents.toString();
} finally {
scanner.close();
}
}
EDIT: Is there a way to write a general function which will work for both windows and UNIX environments. Even if file is copied in text mode.
Because, unfortunately, I cannot guarantee that everyone who is working on this project will always copy files in binary mode.
The Unix file probably uses the native Unix EOL char: \n, whereas the Windows file uses the native Windows EOL sequence: \r\n. Since you have two EOLs in your file, there is a difference of 2 chars. Make sure to use a binary file transfer, and all the bytes will be preserved, and everything will run the same way on both OSes.
EDIT: in fact, you are the one which appends an OS-specific EOL (System.getProperty("line.separator")) at the end of each line. Just read the file as a char array using a Reader, and everything will be fine. Or use Guava's method which does it for you:
String s = CharStreams.toString(new FileReader(fileName));
On Windows, a newline character \n is prepended by \r or a carriage return character. This is non-existent in Linux. Transferring the file from one operating system to the other will not strip/append such characters but occasionally, text editors will auto-format them for you.
Because your file does not include \r characters (presumably transferred straight from Linux), System.getProperty("line.separator") will return \r\n and account for non-existent \r characters. This is why your output is 2 characters behind.
Good luck!
Based on input you guys provided, I wrote something like this
documentText = CharStreams.toString(new FileReader("text.txt"));
documentText = this.documentText.replaceAll("\\r","");
to strip off extra \r if a file has \r.
Now,I am getting expect result in windows environment as well as unix. Problem solved!!!
It works fine irrespective of what mode file has been copied.
:) I wish I could chose both of your answer, but stackoverflow doesn't allow.