I have a .txt file that will be accessed by many users, possibly at the same time (or close to that) and because of that I need a way modify that txt file without creating a temporary file and I haven't found answer or solution to this. So far, I only found this approach ->
Take existing file -> modify something -> write it to a new file (temp file) -> delete the old file.
But his approach is not good to me, I need something like: Take existing file -> modify it -> save it.
Is this possible? I'm really sorry if this question already exists, I tried searching Stack-overflow and I read thru Oracle Docs but I haven't found solution that suits my needs.
EDIT:
After modification, file would stay the same size as before. For example imagine list of students, each student can have value 1 or 0 (passed or failed the exam)
So in this case I would just need to update one character per row in a file (that is per, student). Example:
Lee Jackson 0 -> Lee Jackson 0
Bob White 0 -> would become -> Bob White 1
Jessica Woo 1 -> Jessica Woo 1
In the example above we have a file with 3 records one below other and I need to update 2nd record while 1st and 3rd would became the same and all that without creating a new file.
Here's a potential approach using RandomAccessFile. The idea would be to use readline to read it in strings but to remember the position in the file so you can go back there and write a new line. It's still risky in case anything in the text encoding would change byte lenght, because that could overwrite the line break for example.
void modifyFile(String file) throws IOException {
try (RandomAccessFile raf = new RandomAccessFile(file, "rw")) {
long beforeLine = raf.getFilePointer();
String line;
while ((line = raf.readLine()) != null) {
// edit the line while keeping its length identical
if (line.endsWith("0")) {
line = line.substring(0, line.length() - 1) + "1";
}
// go back to the beginning of the line
raf.seek(beforeLine);
// overwrite the bytes of that line
raf.write(line.getBytes());
// advance past the line break
String ignored = raf.readLine();
// and remember that position again
beforeLine = raf.getFilePointer();
}
}
}
Handling correct String encoding is tricky in this case. If the file isn't in the encoding used by readline() and getBytes(), you could workaround that by doing
// file is in "iso-1234" encoding which is made up.
// reinterpret the byte as the correct encoding first
line = new String(line.getBytes("ISO-8859-1"), "iso-1234");
... modify line
// when writing use the expected encoding
raf.write(line.getBytes("iso-1234"));
See How to read UTF8 encoded file using RandomAccessFile?
Try storing the changes you want to make to a file in the RAM (string or linked list of strings). If you read in the file to a linked list of strings (per line of the file) and write a function to merge the string you want to insert into that linked list of lines from the file and then rewrite the file entirely by putting down every line from the linked list it should give you what you want. Heres what I mean in psudocode the order is important here.
By reading in the file and setting after input we minimize interference with other users.
String lineYouWantToWrite = yourInput
LinkedList<String> list = new LinkedList<String>()
while (file has another line)
list.add(file's next line)
add your string to whatever index of list you want
write list to file line by line, file's first line = list[1]...
Related
EDIT:
I have a semi-working solution at the bottom.
Or, the original text:
I have a local CSV file. The file is encoded in utf16le. I want to read the file into memory in java, modify it, then write it out. I have been having incredibly strange problems for hours.
The source of the file is Facebook leads generation. It is a CSV. Each line of the file contains the text "2022-08-08". However when I read in the line with a buffered reader, all String methods fail. contains("2022-08-08") returns false. I print out the line directly after checking, and it indeed contains the text "2022-08-08". So the String methods are totally failing.
I think it's possibly due to encoding but I'm not sure. I tried pasting the code into this website for help, but any part of the code that includes copy pasted strings from the CSV file refuses to paste into my browser.
int i = s.indexOf("2022");
if (i < 0) {
System.out.println(s.contains("2022") + ", "+s);
continue;
}
Prints: false, 2022-08-08T19:57:51+07:00
There are tons of invisible characters in the CSV file and in my IDE everywhere I have copy pasted from the file. I know the characters are there because when I backspace them it deletes the invisible character instead of the actual character I would expect it to delete.
Please help me.
EDIT:
This code appears to fix the problem. I think partially the problem is Facebook's encoding of the file, and partially because the file is from user generated inputs and there are a few very strange inputs. If anyone has more to add or a better solution I will award it. Not sure exactly why it works. Combined from different sources that had sparse explanation.
Is there a way to determine the encoding automatically? Windows Notepad is able to do it.
BufferedReader fr = new BufferedReader(new InputStreamReader(new FileInputStream(new File("C:\\New folder\\form.csv")), "UTF-16LE"));
BufferedWriter fw = Files.newBufferedWriter(Paths.get("C:\\New folder", "form3.txt"));
String s;
while ((s = fr.readLine()) != null) {
s = s.replaceAll("\\p{C}", "?").replaceAll("[^A-Za-z0-9],", "").replaceAll("[^\\x00-\\x7F]", "");
//doo stuff with s normally
}
You can verify what you're getting from the stream by
byte[] b = s.getBytes(StandardCharsets.UTF_16BE);
System.out.println(Arrays.toString(b));
I think the searching condition for indexOf could be wrong:
int i = s.indexOf("2022");
if (i < 0) {
System.out.println(s.contains("2022") + ", "+s);
continue;
}
Maybe the condition should be (i != -1), if I'm not wrong too much.
It's a little tricky, because for (i < 0) the string should not contain "2022".
I am writing a small java method that needs to read test data from a file on my win10 laptop.
The test data has not been formed yet but it will be text based.
I need to write a method that reads the data and analyses it character by character.
My questions are:
what is the simplest format to create and read the file....I was looking at JSON, something that does not look particularly complex but is it the best for a very simple application?
My second question (and I am a novice). If the file is in a text file on my laptop.....how do I tell my java code where to find it....how do I ask java to navigate the win10 operating system?
You can also map the text file into java objects (It depends on your text file).
For example, we have a text file that contains person name and family line by line like:
Foo,bar
John,doe
So for parse above text file and map it into a java object we can :
1- Create a Person Object
2- Read and parse the file (line by line)
Create Person Class
public class Person {
private String name;
private String family;
//setters and getters
}
Read The File and Parse line by line
public static void main(String[] args) throws IOException {
//Read file
//Parse line by line
//Map into person object
List<Person> personList = Files
.lines(Paths
.get("D:\\Project\\Code\\src\\main\\resources\\person.txt"))
.map(line -> {
//Get lines of test and split by ","
//It split words of the line and push them into an array of string. Like "John,Doe" -> [John,Doe]
List<String> nameAndFamily = Splitter.on(",").trimResults().omitEmptyStrings().splitToList(line);
//Create a new Person and get above words
Person person = new Person();
person.setName(nameAndFamily.get(0));
person.setFamily(nameAndFamily.get(1));
return person;
}
).collect(Collectors.toList());
//Process the person list
personList.forEach(person -> {
//You can whatever you want to the each person
//Print
System.out.println(person.getName());
System.out.println(person.getFamily());
});
}
Regarding your first question, I can't say much, without knowing anything about the data you like to write/read.
For your second question, you would normally do something like this:
String pathToFile = "C:/Users/SomeUser/Documents/testdata.txt";
InputStream in = new FileInputStream(pathToFile);
As your data gains more complexity you should probably think about using a defined format, if that is possible, something like JSON, YAML or similar for example.
Hope this helps a bit. Good luck with your project.
As for the format the text file needs to take, you should elaborate a bit on the kind of data. So I can't say much there.
But to navigate the file system, you just need to write the path a bit different:
The drive letter is a single character at the beginning of the path i.e. no colon ":"
replace the backslash with a slash
then you should be set.
So for example...
C:\users\johndoe\documents\projectfiles\mydatafile.txt
becomes
c/users/johndoe/documents/projectfiles/mydatafile.txt
With this path, you can use all the IO classes for file manipulation.
Recently started Java and have been trying to make a database sorts of program which reads from a preset text file, the user can either search for a definition using the term or keywords/terms within the definition itself. The searching by term works fine but the key term always outputs not found.
FileReader fr = new FileReader("text.txt");
BufferedReader br = new BufferedReader(fr);
boolean found = false;
String line = br.readLine(); // first line so the term itself
String lineTwo = br.readLine(); // second line which is the definition
do {
if (lineTwo.toLowerCase().contains(keyterm.toLowerCase())) {
found = true;
System.out.println("Found "+keyterm);
System.out.println(line);
System.out.println(lineTwo);
}
} while ((br.readLine()!=null)&(!found));
if (!found){System.out.println("Not Found");} br.close(); fr.close();
This is my method used to check for the key term which works partially, it seems to be able to find the first two lines. Which causes it to output the definition of the first term if the key term is there however it doesn't work for any of the other terms.
edit
The text file it reads from looks something like this:
term
definition
term
definition
Each have their own line.
Edit 2
Thanks to #Matthew Kerian it now checks through the whole file, changing the end of the do while loop to
while (((lineTwo = br.readLine())!=null)&(!found));
It now finds the actual definition but is now outputting the wrong term with it.
Edit 3 The key term is defined by the users input
Edit 4 If it wasn't clear the output in the end I am looking for is either the definition of the term/key term if it is in the txt file or just not found if its not found.
Edit 5 Tried to look at what it was outputting and noticed it was outputting array (the first term in the text file) after every "lineTwo" it seems as though line is not updating.
Final Edit Managed to crudely solve the problem by making another text file with it flipped in the way it goes term definition it now goes definition term, lets me call upon the next line once the definition is found so it reads properly.
lineTwo is not begin refreshed with new data. Something like this would work better:
do {
if (lineTwo.toLowerCase().contains(keyterm.toLowerCase())) {
found = true;
System.out.println("Found "+keyterm);
System.out.println(line);
System.out.println(lineTwo);
}
} while (((lineTwo = br.readLine())!=null)&(!found));
We're still checking for EOF by checking nullness, but by setting it equal to line two we're constantly refreshing our buffer.
I have a .CSV file containing 100 000 records. I need to parse through a set of records and then delete it. Then again parse the next set of records till the end. How to do it? A code snippet will be very helpful.
I tried but I am not able to delete the records and reuse the same CSV file left with remaining set of records.
This can not be done efficiently, since CSV is a sequential file format. Say you have
"some text", "adsf"
"more text", "adfgagqwe"
"even more text", "adsfasdf"
...
and you want to remove the second line:
"some text", "adsf"
"even more text", "adsfasdf"
...
you need to move up all subsequent lines (which in your case can be 100 000 ...), which involves reading them at their old location and writing them to the new one. That is, deleting the first of 100 000 lines involves reading and writing 99 999 lines of text, which will take a while ...
It is therefore worthwhile to consider alternatives. For instance, if you are trying to process a file, and want to keep track of how far you got, it is far more efficient store the line number (or offset in bytes) you were at, and leave the input file intact. This will also prevent corrupting the file if your program crashes while deleting the lines. Another approach is to first split the file into many small files (perhaps 1000 lines each), process each file in its entirety and then delete the file.
However, if you truly must delete lines from a CSV file, the most robust way is to read the entire file, write all records you want to keep to a new file, delete the original file, and finally rename the new file to the original file.
You cannot edit or delete the existing data of a file. Ideally you should generate a new file for your output. In your case, once you reach the point to delete the existing data, you can create a new file, copy the remaining lines to the file and use this new file as input
code:
File infile =new File("C:\\MyInputFile.txt");
File outfile =new File("C:\\MyOutputFile.txt");
instream = new FileInputStream(infile);
outstream = new FileOutputStream(outfile);
byte[] buffer = new byte[1024];
int length;
/*copying the contents from input stream to
* output stream using read and write methods
*/
while ((length = instream.read(buffer)) > 0){
outstream.write(buffer, 0, length);
}
//Closing the input/output file streams
instream.close();
outstream.close();
Below code is tested working fine, you can erase any line in existing csv file using below code, so please check and let me know, you will have to put row number in array to delete,
File f=new File(System.getProperty("user.home")+"/Desktop/c.csv");
RandomAccessFile ra=new RandomAccessFile(f,"rw");
ra.seek(0);
long p=ra.getFilePointer();
byte b[]=ra.readLine().getBytes();
char c=' ';//44 for comma 32 for white space
for(int i=0;i<b.length;i++){
if(b[i]!=44){//Replace all except comma
b[i]=32;
}
}
ra.seek(p);//Go to intial pointer of line
ra.write(b);//write blank line with commas as column separators
ra.close();
I'm having a very difficult time debugging a problem with an application I've been building. The problem itself I cannot seem to reproduce with a representitive test program with the same issue which makes it difficult to demonstrate. Unfortunately I cannot share my actual source because of security, however, the following test represents fairly well what I am doing, the fact that the files and data are unix style EOL, writing to a zip file with a PrintWriter, and the use of StringBuilders:
public class Tester {
public static void main(String[] args) {
// variables
File target = new File("TESTSAVE.zip");
PrintWriter printout1;
ZipOutputStream zipStream;
ZipEntry ent1;
StringBuilder testtext1 = new StringBuilder();
StringBuilder replacetext = new StringBuilder();
// ensure file replace
if (target.exists()) {
target.delete();
}
try {
// open the streams
zipStream = new ZipOutputStream(new FileOutputStream(target, true));
printout1 = new PrintWriter(zipStream);
ent1 = new ZipEntry("testfile.txt");
zipStream.putNextEntry(ent1);
// construct the data
for (int i = 0; i < 30; i++) {
testtext1.append("Testing 1 2 3 Many! \n");
}
replacetext.append("Testing 4 5 6 LOTS! \n");
replacetext.append("Testing 4 5 6 LOTS! \n");
// the replace operation
testtext1.replace(21, 42, replacetext.toString());
// write it
printout1 = new PrintWriter(zipStream);
printout1.println(testtext1);
// save it
printout1.flush();
zipStream.closeEntry();
printout1.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
The heart of the problem is that the file I see at my side is producing a file of 16.3k characters. My friend, whether he uses the app on his pc or whether he looks at exactly the same file as me sees a file of 19.999k characters, the extra characters being a CRLF followed by a massive number of null characters. No matter what application, encoding or views I use, I cannot at all see these nul characters, I only see a single LF at the last line, but I do see a file of 20k. In all cases there is a difference between what is seen with the exact same files on the two machines even though both are windows machines and both are using the same editing softwares to view.
I've not yet been able to reproduce this behaviour with any amount of dummy programs. I have been able to trace the final line's stray CRLF to my use of println on the PrintWriter, however. When I replaced the println(s) with print(s + '\n') the problem appeared to go away (the file size was 16.3k). However, when I returned the program to println(s), the problem does not appear to return. I'm currently having the files verified by a friend in france to see if the problem really did go away (since I cannot see the nuls but he can), but this behaviour has be thoroughly confused.
I've also noticed that the StringBuilder's replace function states "This sequence will be lengthened to accommodate the specified String if necessary". Given that the stringbuilders setLength function pads with nul characters and that the ensureCapacity function sets capacity to the greater of the input or (currentCapacity*2)+2, I suspected a relation somewhere. However, I have only once when testing with this idea been able to get a result that represented what I've seen, and have not been able to reproduce it since.
Does anyone have any idea what could be causing this error or at least have a suggestion on what direction to take the testing?
Edit since the comments section is broken for me:
Just to clarify, the output is required to be in unix format regardless of the OS, hence the use of '\n' directly rather than through a formatter. The original StringBuilder that is inserted into is not in fact generated to me but is the contents of a file read in by the program. I'm happy the reading process works, as the information in it is used heavily throughout the application. I've done a little probing too and found that directly prior to saving, the buffer IS the correct capacity and that the output when toString() is invoked is the correct length (i.e. it contains no null characters and is 16,363 long, not 19,999). This would put the cause of the error somewhere between generating the string and saving the zip file.
Finally found the cause. Managed to reproduce the problem a few times and traced the cause down not to the output side of the code but the input side. My file reading function was essentially this:
char[] buf;
int charcount = 0;
StringBuilder line = new StringBuilder(2048);
InputStreamReader reader = new InputStreamReader(stream);// provides a line-wise read
BufferedReader file = new BufferedReader(reader);
do { // capture loop
try {
buf = new char[2048];
charcount = file.read(buf, 0, 2048);
} catch (IOException e) {
return null; // unknown IO error
}
line.append(buf);
} while (charcount != -1);
// close and output
problem was appending a buffer that wasnt full, so the later values were still at their initial values of null. Reason I couldnt reproduce it was because some data filled in the buffers nicely, some didn't.
Why I couldn't seem to view the problem on my text editors I still have no idea of, but I should be able to resolve this now. Any suggestions on the best way to do so are welcome, as this is part of one of my long term utility libraries I want to keep it as generic and optimised as possible.