I've to admit that I'm not really an expert with encoding stuff etc. I've the following problem: my program has to read an text file which contains not only std. ASCII but "special chars and languages" like "..офіціалнов назвов Російска.." So let's assume that this is the content of the file: офіціалнов назвов Російска
Now I'd like to split the whole file content in single words and create another file which list all these words in lines like:
офіціалнов
назвов
Російска
My problem is: if I put these single words into an HashMap and read the values from it -> the encoding is lost. This is my code:
final StringBuffer fileData = new StringBuffer(1000);
final BufferedReader reader = new BufferedReader(
new FileReader("fileIn.txt"));
char[] buf = new char[1024];
int numRead = 0;
while ((numRead = reader.read(buf)) != -1)
{
final String readData = String.valueOf(buf, 0, numRead);
fileData.append(readData);
buf = new char[1024];
}
reader.close();
String mergedContent = fileData.toString();
mergedContent = mergedContent.replaceAll("\\<.*?>", " ");
mergedContent = mergedContent.replaceAll("\\r\\n|\\r|\\n", " ");
final BufferedWriter out = new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream("fileOut.txt")));
final HashMap<String, String> wordsMap = new HashMap<String, String>();
final String test[] = mergedContent.split(" ");
for (final String string : test)
{
wordsMap.put(string, string);
}
for (final String string : wordsMap.values())
{
out.write(string + "\n");
}
out.close();
This snippet destroys the encodig. The funny thing is: if I don't put the values into the HashMap but store them immediately into the output file like:
...
for (final String string : test)
{
out.write(string + "\n");
//wordsMap.put(string, string);
}
//for (final String string : wordsMap.values())
//{
// out.write(string + "\n");
//}
out.close();
...then it works like I expect.
What I'm doing wrong?
Try using new InputStreamReader(new FileInputStream(file), "UTF-8") and then the same thing with the output. And make sure your file is encoded in UTF-8
The hashmap can't possibly make anything to the encoding.
Related
i'm stuck on this part. The aim is to take the values from an file.ini with this format
X = Y
X1 = Y1
X2 = Y2
take the Y values and replace them in a scxml file instead of the corresponding X keys, and save the new file.scxml
As you can see from my pasted code, i use the HashMap to take the key and values printed correctly, that although it seems right the code to replace the values works only for the first entry of the HashMap.
The code is currently as follows:
public String getPropValues() throws IOException {
try {
Properties prop = new Properties();
String pathconf = this.pathconf;
String pathxml = this.pathxml;
//Read file conf
File inputFile = new File(pathconf);
InputStream is = new FileInputStream(inputFile);
BufferedReader br = new BufferedReader(new InputStreamReader(is));
//load the buffered file
prop.load(br);
String name = prop.getProperty("name");
//Read xml file to get the format
FileReader reader = new FileReader(pathxml);
String newString;
StringBuffer str = new StringBuffer();
String lineSeparator = System.getProperty("line.separator");
BufferedReader rb = new BufferedReader(reader);
//read file.ini to HashMap
Map<String, String> mapFromFile = getHashMapFromFile();
//iterate over HashMap entries
for(Map.Entry<String, String> entry : mapFromFile.entrySet()){
System.out.println( entry.getKey() + " -> " + entry.getValue() );
//replace values
while ((newString = rb.readLine()) != null){
str.append(lineSeparator);
str.append(newString.replaceAll(entry.getKey(), entry.getValue()));
}
}
rb.close();
String pathwriter = pathxml + name + ".scxml";
BufferedWriter bw = new BufferedWriter(new FileWriter(new File(pathwriter)));
bw.write(str.toString());
//flush the stream
bw.flush();
//close the stream
bw.close();
} catch (Exception e) {
System.out.println("Exception: " + e);
}
return result;
}
so my .ini file is for example
Apple = red
Lemon = yellow
it print key and values correctly:
Apple -> red
Lemon -> yellow
but replace in the file only Apple with red and not the others key
The problem lays in your control flow order.
By the time the first iteration in your for loop, which corresponds to the first entry Apple -> red, runs it would caused the BufferedReader rb to reach the end of stream, hence doing nothing for subsequent iterations.
You have then either to reinitialize the BufferedReader for each iteration, or better, inverse the looping over your Map entries to be within the BufferedReader read loop:
EDIT (following #David hints)
You should can assign the resulting replaced value to the line replacement that will be appended to the result file at each line iteration:
public String getPropValues() throws IOException {
try {
// ...
BufferedReader rb = new BufferedReader(reader);
//read file.ini to HashMap
Map<String, String> mapFromFile = getHashMapFromFile();
//replace values
while ((newString = rb.readLine()) != null) {
// iterate over HashMap entries
for (Map.Entry<String, String> entry : mapFromFile.entrySet()) {
newString = newString.replace(entry.getKey(), entry.getValue());
}
str.append(lineSeparator)
.append(newString);
}
rb.close();
// ...
} catch (Exception e) {
System.out.println("Exception: " + e);
}
return result;
}
Relatively new to programming. I want to read a URL, modify the text string, then write it to a line-separated csv textfile.
The read & modify parts run. Also, outputting the string to terminal (using Eclipse) looks fine (csv, line by line), like this;
data_a,data_b,data_c,...
data_a1,data_b1,datac1...
data_a2,data_b2,datac2...
.
.
.
But I'm unable to write the same string to file - it just becomes a one-liner (see my below for-loops, attempts no. 1 & 2);
data_a,data_b,data_c,data_a1,data_b1,datac1,data_a2,data_b2,datac2...
I guess I'm looking for a way to, in the FileWriter or BufferedWriter loops, convert the string finalDataA to array string (i.e. include the string suffix "[0]") but I have not yet found such an approach that would not give errors of the type "Cannot convert String to String[]". Any suggestions?
String data = "";
String dataHelper = "";
try {
URL myURL = new URL(url);
HttpURLConnection myConnection = (HttpURLConnection) myURL.openConnection();
if (myConnection.getResponseCode() == URLStatus.HTTP_OK.getStatusCode()) {
BufferedReader in = new BufferedReader(new InputStreamReader(myConnection.getInputStream()));
while ((data = in.readLine()) != null) {
dataHelper = dataHelper + "\n" + data;
}
in.close();
String trimmedData = dataHelper.trim().replaceAll(" +", ",");
String parts[] = trimmedData.split(Pattern.quote(")"));// ,1.,");
String dataA = parts[1];
String finalDataA[] = dataA.split("</PRE>");
// parts 2&3 removed in this example
// Console output for testing purpose - This prints out many many lines of csv-data
System.out.println(finalDataA[0]);
//This returns the value 1
System.out.println(finalDataA.length);
// Attempt no. 1 to write to file - writes a oneliner
for(int i = 0; i < finalDataA.length; i++) {
try (BufferedWriter bw = new BufferedWriter(new FileWriter(pathA, true))) {
String s;
s = finalDataA[i];
bw.write(s);
bw.newLine();
bw.flush();
}
}
// Attempt no. 2 to write to file - writes a oneliner
FileWriter fw = new FileWriter(pathA);
for (int i = 0; i < finalDataA.length; i++) {
fw.write(finalDataA[i] + "\n");
}
fw.close();
}
} catch (Exception e) {
System.out.println("Exception" +e);
}
Create the BufferedWriter and the FileWriter ahead of the for loop, not every time around it.
From your code comments, finalDataA has one element, so the for-loop will be executed only once. Try splitting finalDataA[0] into rows.
Something like this:
String endOfLineToken = "..."; //your variant
String[] lines = finalDataA[0].split(endOfLineToken)
BufferdWriter bw = new BufferedWriter(new FileWriter(pathA, true));
try
{
for (String line: lines)
{
bw.write(line);
bw.write(endOfLineToken);//to put back line endings
bw.newLine();
bw.flush();
}
}
catch (Exception e) {}
I have an object which is serialised and written to a file.
Before de serialising the file back into an object instance, I want to maliciously edit the txt in the file.
//FILE TAMPER
//Lexical block: Tamper
{
String output = null;
//Lexical block make output
{
LinkedList<String> lls = new LinkedList<String>();
//Lexical block: Reader
{
BufferedReader br = new BufferedReader(new FileReader(fileString));
while (br.ready()) {
String readLine = br.readLine();
lls.add(readLine);
}
br.close();
}
//Lexical block: manipulate
{
//Henry Crapper
final String[] llsToArray = lls.toArray(new String[lls.size()]);
for (int i = 0; i < llsToArray.length; i++) {
String line = llsToArray[i];
if (line.contains("Henry")) {
line = line.replace("Henry",
"Fsekc");
llsToArray[i] = line;
}
if (line.contains("Crapper")) {
line = line.replace("Crapper",
"Dhdhfie");
llsToArray[i] = line;
}
lls = new LinkedList<String>(Arrays.asList(llsToArray));
}
}
//Lexical block: write output
{
StringBuilder sb = new StringBuilder();
for (String string : lls) {
sb.append(string).append('\n');
}
output = sb.toString();
}
}
//Lexical block: Writer
{
BufferedWriter bw = new BufferedWriter(new FileWriter(fileString));
bw.write(output);
bw.close();
}
}
However the edited file isn't correct and has some unusual characters.
//Before
¨Ìsr&Snippets.Parsed.EmployeeSerialization0I
bankBalanceLnametLjava/lang/String;xp•Åt
Henry Crappe
//After
ÔøΩÔøΩsr&Snippets.Parsed.EmployeeSerialization0I
bankBalanceLnametLjava/lang/String;xpÔøΩÔøΩt
Fsekc Dhdhfie
I'm guessing there is some sort of non readable character issue or something?
Answer continued in a new question is here
A file which contains a serialized object instance is a binary file: you should not edit it with a BufferedWriter. Edit it with a RandomAccessFile, for example.
If you are wondering of why, the charset used in a Writer could not map one-to-one with a byte. Saving all the file would change also unexpected positions.
I've got a Spring MVC app with a file upload capability. Files are passed to the controller as MultipartFile from which it's easy to get an InputStream. I'm uploading zip files that contain CSVs and I'm struggling to find a way to open the CSVs and read them a line at a time. There are plenty of examples on the 'net of reading into a fixed sizes buffer. I've tried this, but the buffers don't concatenate very well and it soon gets out of sync and uses a lot of memory:
ZipEntry entry = input.getNextEntry();
while(entry != null)
{
if (entry.getName().matches("Data/CSV/[a-z]{0,1}[a-z]{0,1}.csv"))
{
final String fullPath = entry.getName();
final String filename = fullPath.substring(fullPath.lastIndexOf('/') + 1);
visitor.startFile(filename);
final StringBuilder fileContent = new StringBuilder();
final byte[] buffer = new byte[1024];
while (input.read(buffer) > 0)
fileContent.append(new String(buffer));
final String[] lines = fileContent.toString().split("\n");
for(String line : lines)
{
final String[] columns = line.split(",");
final String postcode = columns[0].replace(" ", "").replace("\"", "");
if (columns.length > 3)
visitor.location(postcode, "", "");
}
visitor.endFile();
}
entry = input.getNextEntry();
}
There must be a better way that actually works.
Not clear if this suits your need, but have you tried opencsv (http://opencsv.sourceforge.net)? Their example is really intuitive:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}
For your case, all you will need is to wrap the zipped file stream into a buffered reader and pass the reader to create a CSVReader and use it:
FileInputStream fis = new FileInputStream(file);
GZIPInputStream gis = new GZIPInputStream(fis);
InputStreamReader isr = new InputStreamReader(gis);
BufferedReader br = new BufferedReader(isr);
CSVReader reader = new CSVReader(br);
You could use a BufferedReader which includes the convenient readLine() method and wont load the entire contents of the file into memory e.g.
BufferedReader in = new BufferedReader(new InputStreamReader(input), 1024);
String line=null;
while((line=br.readLine())!=null) {
String[] columns = line.split(",");
//rest of your code
}
I have a text file with the following contents:
one
two
three
four
I want to access the string "three" by its position in the text file in Java.I found the substring concept on google but unable to use it.
so far I am able to read the file contents:
import java.io.*;
class FileRead
{
public static void main(String args[])
{
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
I want to apply the substring concept to the file.It asks for the position and displays the string.
String Str = new String("Welcome to Tutorialspoint.com");
System.out.println(Str.substring(10, 15) );
If you know the byte offsets within the file that you are interested in then it's straightforward:
RandomAccessFile raFile = new RandomAccessFile("textfile.txt", "r");
raFile.seek(startOffset);
byte[] bytes = new byte[length];
raFile.readFully(bytes);
raFile.close();
String str = new String(bytes, "Windows-1252"); // or whatever encoding
But for this to work you have to use byte offsets, not character offsets - if the file is encoded in a variable-width encoding such as UTF-8 then there's no way to seek directly to the nth character, you have to start at the top of the file and read and discard the first n-1 characters.
look for \r\n (linebreaks) in your text file. This way you should be able to count the rows containing your string.
your file in reality looks like this
one\r\n
two\r\n
three\r\n
four\r\n
You seem to be looking for this. The code I posted there works on the byte level, so it may not work for you. Another option is to use the BufferedReader and just read a single character in a loop like this:
String getString(String fileName, int start, int end) throws IOException {
int len = end - start;
if (len <= 0) {
throw new IllegalArgumentException("Length of string to output is zero or negative.");
}
char[] buffer = new char[len];
BufferedReader reader = new BufferedReader(new FileReader(fileName));
for (int i = 0; i < start; i++) {
reader.read(); // Ignore the result
}
reader.read(buffer, 0, len);
return new String(buffer);
}