I have a properties file that maps german characters to their hex value (00E4). I had to encode this file with "iso-8859-1" as it was the only way to get the german characters to display. What I'm trying to do is go through german words and check if these characters appear anywhere in the string and if they do replace that value with the hex format. For instance replace the german char with \u00E4.
The code replaces the character fine but instead on one backlash, I'm getting two like so \\u00E4. You can see in the code I'm using "\\u" to try and print \u, but that's not what happens. Any ideas of where I'm going wrong here?
private void createPropertiesMaps(String result) throws FileNotFoundException, IOException
{
Properties importProps = new Properties();
Properties encodeProps = new Properties();
// This props file contains a map of german strings
importProps.load(new InputStreamReader(new FileInputStream(new File(result)), "iso-8859-1"));
// This props file contains the german character mappings.
encodeProps.load(new InputStreamReader(
new FileInputStream(new File("encoding.properties")),
"iso-8859-1"));
// Loop through the german characters
encodeProps.forEach((k, v) ->
{
importProps.forEach((key, val) ->
{
String str = (String) val;
// Find the index of the character if it exists.
int index = str.indexOf((String) k);
if (index != -1)
{
// create new string, replacing the german character
String newStr = str.substring(0, index) + "\\u" + v + str.substring(index + 1);
// set the new property value
importProps.setProperty((String) key, newStr);
if (hasUpdated == false)
{
hasUpdated = true;
}
}
});
});
if (hasUpdated == true)
{
// Write new file
writeNewPropertiesFile(importProps);
}
}
private void writeNewPropertiesFile(Properties importProps) throws IOException
{
File file = new File("import_test.properties");
OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream(file), "UTF-8");
importProps.store(writer, "Unicode Translations");
writer.close();
}
The point is that you are not writing a simple text-file but a java properties-file. In a properties-file the backslash-character is an escape-character, so if your property-value contains a backslash Java is so kind to escape it for you - which is not what you want in your case.
You might try to circumvent Java's property-file-mechanism by writing a plian text-file that can be read back in as a proerties-file, but that would mean doing all the formatting that gets provided automatically by the Properties-class manually.
Related
I'm writing a program with a text file in java, what I need to do is to modify the specific string in the file.
For example, the file has a line(the file contains many lines)like "username,password,e,d,b,c,a"
And I want to modify it to "username,password,f,e,d,b,c"
I have searched much but found nothing. How to deal with that?
In general you can do it in 3 steps:
Read file and store it in String
Change the String as you need (your "username,password..." modification)
Write the String to a file
You can search for instruction of every step at Stackoverflow.
Here is a possible solution working directly on the Stream:
public static void main(String[] args) throws IOException {
String inputFile = "C:\\Users\\geheim\\Desktop\\lines.txt";
String outputFile = "C:\\Users\\geheim\\Desktop\\lines_new.txt";
try (Stream<String> stream = Files.lines(Paths.get(inputFile));
FileOutputStream fop = new FileOutputStream(new File(outputFile))) {
stream.map(line -> line += " manipulate line as required\n").forEach(line -> {
try {
fop.write(line.getBytes());
} catch (IOException e) {
e.printStackTrace();
}
});
}
}
You can try like this:
First, read the file line by line and check each line if the string you want to replace exists in that, replace it, and write the content in another file. Do it until you reach EOF.
import java.io.*;
public class Files {
void replace(String stringToReplace, String replaceWith) throws IOException {
BufferedReader in = new BufferedReader(new FileReader("/home/asn/Desktop/All.txt"));
BufferedWriter out = new BufferedWriter(new FileWriter("/home/asn/Desktop/All-copy.txt"));
String line;
while((line=in.readLine())!=null) {
if (line.contains(stringToReplace))
line = line.replace(stringToReplace, replaceWith);
out.write(line);
out.newLine();
}
in.close();
out.close();
}
public static void main(String[] args) throws IOException {
Files f = new Files();
f.replace("amount", "####");
}
}
If you want to use the same file store the content in a buffer(String array or List) and then write the content of the buffer in the same file.
If your file look similar to this:
username:username123,
password:password123,
After load file to String you can do something like this:
int startPosition = file.indexOf("username") + 8; //+8 is length of username with colon
String username;
for(int i=startPosition; i<file.length(); i++) {
if(file.charAt(i) != ',') {
username += Character.toString(file.charAt(i));
} else {
break;
}
System.out.println(username); //should prong username
}
After edit all thing you want to edit, save edited string to file.
There are much ways to solve this issue. Read String docs to get to know operations on String. Without your code we cannot help you enough aptly.
The algorithm is as follows:
Open a temporary file to save edited copy.
Read input file line by line.
Check if the current line needs to be replaced
Various methods of String class may be used to do this:
equals: Compares this string to the specified object. The result is true if and only if the argument is not null and is a String object that represents the same sequence of characters as this object.
equalsIgnoreCase: Compares this String to another String, ignoring case considerations.
contains: Returns true if and only if this string contains the specified sequence of char values.
matches (String regex): Tells whether or not this string matches the given regular expression.
startsWith: Tests if this string starts with the specified prefix (case sensitive).
endsWith: Tests if this string starts with the specified prefix (case sensitive).
There are other predicate functions: contentEquals, regionMatches
If the required condition is true, provide replacement for currentLine:
if (conditionMet) {
currentLine = "Your replacement";
}
Or use String methods replace/replaceFirst/replaceAll to replace the contents at once.
Write the current line to the output file.
Make sure the input and output files are closed when all lines are read from the input file.
Replace the input file with the output file (if needed, for example, if no change occurred, there's no need to replace).
I have a large text file I want to format. Say the input file is called inputFile and output file is called outputFile.
This is my code for using BufferedReader and BufferedWriter
Here is my code
public static void readAndWrite(String fileNameToRead, String fileNameToWrite) {
try{
BufferedReader fr = new BufferedReader(
new FileReader(String.format("%s.txt", fileNameToRead)));
BufferedWriter out = new BufferedWriter(
new FileWriter(String.format("%s.txt", fileNameToWrite), true));
String currentTmp = "";
String tmp = "";
String test = "work \nwork";
out.append(test);
while((tmp = fr.readLine()) != null) {
tmp = tmp.trim();
if(tmp.isEmpty()) {
currentTmp = currentTmp.trim();
out.append(currentTmp);
out.newLine();
out.newLine();
currentTmp = "";
} else {
currentTmp = currentTmp.concat(" ").concat(tmp);
}
}
if(!currentTmp.equals("")) {
out.write(currentTmp);
}
fr.close();
out.close();
} catch (IOException e) {
System.out.println("exception occoured" + e);
}
}
public static void main(String[] args) {
String readFile = "inPutFile";
String writeFile = "outPutFile";
readAndWrite(readFile, writeFile);
}
The problem is that the test string inside the code which have '\n' can we converted to a new line with BufferedWriter. But if I put the same string in the text file it would not perform the same.
In a more easy way to see is that I want my input file have this
work\n
work
and output as
work
work
I am using mac, so the separator should be '\n'
work\n
if you see the "\n" in your file, it is not a new line character. It is just two characters.
The trim() method will not remove those characters.
Instead you might have something like:
if (tmp.endsWith("\n")
tmp = tmp.substring(0, tmp.length() - 2);
I am using mac, so the separator should be '\n'
You should use the newline character for the platform. So when writing to your file the code should be:
} else {
currentTmp = currentTmp.concat(" ").concat(tmp);
out.append( currentTmp );
out.newLine();
}
The newline() method will use the appropriate new line String for the platform.
Edit:
You need to understand what an escape character is in Java. When you use:
String text = "test\n"
and write the string to a file, only 5 characters are written to the file, not 6. The "\n" is an escape sequence which will cause the ascii value for the new line character to be added to the file. This character is not displayable so you can't see it in the file.
After #camickr answer, I think I realized the problem. Some how if I have a text in the file like this
work \nwork
The \n won't be treated as a single char ('\n'), rather it has been treated as two chars. I think thats why when the BufferWriter writes the input string it won't treat it as a new line.
I thought this was only an issue with Python 2 but have run into a similar issue now with java (Windows 10, JDK8).
My searches have lead to little resolution so far.
I read from 'stdin' input stream this value: Viļāni. When I print it to console I get this: Vi????ni.
Relevant code snippets are as follows:
BufferedReader in = new BufferedReader(new InputStreamReader(System.in, StandardCharsets.UTF_8));
ArrayList<String> corpus = new ArrayList<String>();
String inputString = null;
while ((inputString = in.readLine()) != null) {
corpus.add(inputString);
}
String[] allCorpus = new String[corpus.size()];
allCorpus = corpus.toArray(allCorpus);
for (String line : allCorpus) {
System.out.println(line);
}
Further expansion on my problem as follows:
I read a file containing the following 2 lines:
を
Sōten_Kōro
When I read this from disk and output to a second file I get the following output:
ã‚’
S�ten_K�ro
When I read the file from stdin using cat testinput.txt | java UTF8Tester I get the following output:
???
S??ten_K??ro
Both are obviously wrong. I need to be able to print the correct characters to console and file. My sample code is as follows:
public class UTF8Tester {
public static void main(String args[]) throws Exception {
BufferedReader stdinReader = new BufferedReader(new InputStreamReader(System.in, StandardCharsets.UTF_8));
String[] stdinData = readLines(stdinReader);
printToFile(stdinData, "stdin_out.txt");
BufferedReader fileReader = new BufferedReader(new FileReader("testinput.txt"));
String[] fileData = readLines(fileReader);
printToFile(fileData, "file_out.txt");
}
private static void printToFile(String[] data, String fileName)
throws FileNotFoundException, UnsupportedEncodingException {
PrintWriter writer = new PrintWriter(fileName, "UTF-8");
for (String line : data) {
writer.println(line);
}
writer.close();
}
private static String[] readLines(BufferedReader reader) throws IOException {
ArrayList<String> corpus = new ArrayList<String>();
String inputString = null;
while ((inputString = reader.readLine()) != null) {
corpus.add(inputString);
}
String[] allCorpus = new String[corpus.size()];
return corpus.toArray(allCorpus);
}
}
Really stuck here and help would really be appreciated! Thanks in advance. Paul
System.in/out will use the default Windows character set.
Java String will use Unicode internally.
FileReader/FileWriter are old utility classes that use the default character set, hence they are for non-portable local files only.
The error you saw, was a special character as two bytes UTF-8 sequence, but every (special UTF-8) byte interpreted as the default single byte encoding, but with a value not present, hence twice a ? substitution.
Required is that the character can be entered on System.in in the default charset.
Then the String was converted from the default charset.
Writing it to file in UTF-8 needs to specify UTF-8.
Hence:
BufferedReader stdinReader = new BufferedReader(new InputStreamReader(System.in));
String[] stdinData = readLines(stdinReader);
printToFile(stdinData, "stdin_out.txt");
Path path = Paths.get("testinput-utf8.txt");
List<String> lines = Files.readAllLines(path); // Here the default is UTF-8!
Path path = Paths.get("testinput-winlatin1.txt");
List<String> lines = Files.readAllLines(path, "Windows-1252");
Files.write(lines, Paths.get("file_out.txt"), StandardCharsets.UTF_8);
To check whether your current computer system handles Japanese:
System.out.println("Hiragana letter Wo '\u3092'."); // Either を or ?.
Seeing ? the conversion to the default system encoding could not deliver.
を is U+3092, u-encoded as ASCII with \u3092.
To create an UTF-8 text under Windows:
Files.write(Paths.get("out-utf8.txt"),
"\uFEFFHiragana letter Wo '\u3092'.".getBytes(StandardCharsets.UTF_8));
Here I use an ugly (generally unneeded) BOM marker char \uFEFF (a zero-width space) that will let Windows Notepad recognize the text being in UTF-8.
I am trying to make function which will remove diacritic(dont want to use Normalizer on purpose).Function looks like
private static String normalizeCharacter(Character curr) {
String sdiac = "áäčďéěíĺľňóôőöŕšťúůűüýřžÁÄČĎÉĚÍĹĽŇÓÔŐÖŔŠŤÚŮŰÜÝŘŽ";
String bdiac = "aacdeeillnoooorstuuuuyrzAACDEEILLNOOOORSTUUUUYRZ";
char[] s = sdiac.toCharArray();
char[] b = bdiac.toCharArray();
String ret;
for(int i = 0; i < sdiac.length(); i++){
if(curr == s[i])
curr = b[i];
}
ret = curr.toString().toLowerCase();
ret = ret.replace("\n", "").replace("\r","");
return ret;
}
funcion is called like this(every charracter from file is sent to this function)
private static String readFile(String fName) {
File f = new File(fName);
StringBuilder sb = new StringBuilder();
try{
FileInputStream fStream = new FileInputStream(f);
Character curr;
while(fStream.available() > 0){
curr = (char) fStream.read();
sb.append(normalizeCharacter(curr));
System.out.print(normalizeCharacter(curr));
}
}catch(IOException e){
e.printStackTrace();
}
return sb.toString();
}
file text.txt contains this: ľščťžýáíéúäôň and i expect lcstzyaieuaonin return from program but insted of expected string i get this ¾è yaieuaoò. I know that problem is somewhere in encoding but dont know where. Any ideas ?
You are trying to convert bytes into characters.
However, the character ľ is not represented as a single byte. Its unicode representation is U+013E, and its UTF-8 representation is C4 BE. Thus, it is represented by two bytes. The same is true for the other characters.
Suppose the encoding of your file is UTF-8. Then you read the byte value C4, and then you convert it to a char. This will give you the character U+00C4 (Ä), not U+013E. Then you read the BE, and it is converted to the character U+00BE (¾).
So don't confuse bytes and characters. Instead of using the InputStream directly, you should wrap it with a Reader. A Reader is able to read charecters based on the encoding it is created with:
BufferedReader reader = new BufferedReader(
new InputStreamReader(
new FileInputStream(f), StandardCharsets.UTF_8
)
);
Now, you'll be able to read characters or even whole lines and the encoding will be done directly.
int readVal;
while ( ( readVal = reader.read() ) != -1 ) {
curr = (char)readVal;
// ... the rest of your code
}
Remember that you are still reading an int if you are going to use read() without parameters.
The problem is that when I read a string and then, try to write each characters in separate line, into a .txt file, although System.out.println will show correct characters, when I write them into a .txt file, for the 's it will write some weird characters instead. To illustrate, here is an example: suppose we have this line Second subject’s layout of same 100 pages. and we want to write it into a .txt file, using the following code:
public static void write(String Swrite) throws IOException {
if(!file.exists()){
file.createNewFile();
}
FileOutputStream fop=new FileOutputStream(file,true);
if(Swrite!=null)
for(final String s : Swrite.split(" ")){
fop.write(s.toLowerCase().getBytes());
fop.write(System.getProperty("line.separator").getBytes());
}
fop.flush();
fop.close();
}
the written file would look like this for the word, subject's: subject’s. I have no idea why this happens.
Try something like the following. It frees you from having to deal with character encoding.
PrintWriter pw = null;
try {
pw = new PrintWriter(file);
if (Swrite!=null)
for (String s : Swrite.split(" ")) {
pw.println(s);
}
}
}
finally {
if (pw != null) {
pw.close();
}
}
How about something like this:
// The file to read the input from and write the output to.
// Original content: Second subject’s layout of same 100 pages.
File file = new File("C:\\temp\\file.txt");
// The charset of the file, in our case UTF-8.
Charset utf8Charset = Charset.forName("UTF-8");
// Read all bytes from the file and create a string out of it (with the correct charset).
String inputString = new String(Files.readAllBytes(file.toPath()), utf8Charset);
// Create a list of all output lines
List<String> lines = new ArrayList<>();
// Add the original line and than an empty line for clarity sake.
lines.add(inputString);
lines.add("");
// Convert the input string to lowercase and iterate over it's char array.
// Than for each char create a string which is a new line.
for(char c : inputString.toLowerCase().toCharArray()){
lines.add(new String(new char[]{c}));
}
// Write all lines in the correct char encoding to the file
Files.write(file.toPath(), lines, utf8Charset);
It all has to do with the used charsets as commented above.