I am writing a code in Java to read lines from a file. It is required by the problem statement that the code reads the same file multiple times. However, it should only read new lines, without using any flag of any sort. Please suggest ways on how I can approach this. Any ideas are welcome.
There is no way to "only read new lines." To achieve what you're looking to do, I would suggest caching the old version of the file and comparing the new file every time you re-read it with the old cached one. You will be able to detect the new lines and any other change in the file. After you are done analyzing, overwrite the old cache saving the newest read.
Related
In Java 8 I see new method is added called lines() in Files class which can be used to read a file line by line in Java. Does it work for huge files? I mean can we load first 1000 lines then second set of 1000 lines. I have huge file with 1GB, Will it work?
Could someone share code snippet how to use it?
Does it work for huge files? [...] I have huge file with 1GB, Will it
work?
As far as I can see it should work well for big files as well (but I haven't tried):
try(Stream<String> lines = Files.lines(path)){
lines.filter(...).map(...)....foreach(...);
}
I mean can we load first 1000 lines then second set of 1000 lines.
How many lines are read at one time is implementation specific to Files.lines (which probably uses a BufferedReader, but I might be wrong).
From the API (embolden by me)
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed.
This very strongly suggests that you can use this on any arbitrarily sized file, assuming your code doesn't hold all of the content in memory.
I have a large log file with client-id as one of the fields in each log line. I would like to split this large log file in to several files grouped by client-id. So, if the original file has 10 lines with 10 unique client-ids, then at the end there will be 10 files with 1 line in each.
I am trying to do this in Scala and don't want to load the entire file in to memory, load one line at a time using scala.io.Source.getLines(). That is working nicely. But, I don't have a good way to write it out in to separate files one line at a time. I can think of two options:
Create a new PrintWriter backed by a BufferedWriter (Files.newBufferedWriter) for every line. This seems inefficient.
Create a new PrintWriter backed by a BufferedWriter for every output File, hold on to these PrintWriters and keep writing to them till we read all lines in the original log file and the close them. This doesn't seems a very functional way to do in Scala.
Being new to Scala I am not sure of there are other better way to accomplish something like this. Any thoughts or ideas are much appreciated.
You can do the second option in pretty functional, idiomatic Scala. You can keep track of all of your PrintWriters, and fold over the lines of the file:
import java.io._
import scala.io._
Source.fromFile(new File("/tmp/log")).getLines.foldLeft(Map.empty[String, PrintWriter]) {
case (printers, line) =>
val id = line.split(" ").head
val printer = printers.get(id).getOrElse(new PrintWriter(new File(s"/tmp/log_$id")))
printer.println(line)
printers.updated(id, printer)
}.values.foreach(_.close)
Maybe in a production level version, you'd want to wrap the I/O operations in a try (or Try), and keep track of failures that way, while still closing all the PrintWriters at the end.
I have to read a sequential file which has over a million of records. I have to read each line/record and have to delete that record/line from the file and keep on reading.
Not finding any example on how to do that without using temporary file or creating/recreating a new file of the same name.
These are text files. Each file is about .5 GB big and we have over a million lines/records in each file.
Currently we are copying all the records to memory as we do not want to re-process any record if any thing happens in the middle of the processing of a file.
Assuming that the file in question is a simple sequential file - you can't. In the Java file model, deleting part of a file implies deleting all of it after the deletion point.
Some alternative approaches are:
In your process copy the file, omitting the parts you want deleted. This is the normal way of doing this.
Overwrite the parts of the file you want deleted with some value that you know never occurs in the file, and then at a later date copy the file, removing the marked parts.
Store the entire file in memory, edit it as required, and write it again. Just because you have a million records doesn't make that impossible. If your files are 0.5GB, as you say, then this approach is almost certainly viable.
Each time you delete some record, copy all of the contents of the file after the deletion to its new position. This will be incredibly inefficient and error-prone.
Unless you can store the file in memory, using a temporary file is the most efficient. That's why everyone does it.
If this is some kind of database, then that's an entirely different question.
EDIT: Since I answered this. comments have indicated that what the user wants to do is use deletion to keep track of which records have already been processed. If that is the case, there are much simpler ways of doing this. One good way is to write a file which just contains a count of how many bytes (or records) of the file have been processed. If the processor crashes, update the file by deleting the records that have been processed and start again.
Files are unstructured streams of bytes; there is no record structure. You can not "delete" a "line" from an unstructured stream of bytes.
The basic algorithm you need to use is this:
create temporary file.
open input file
if at the end of the file, goto line 7
read a line from the input file
if the line is not to be deleted, write it to the temporary file
goto line 3
close the input file.
close the temporary file.
delete (or just rename) the input file.
rename (or move) the temporary file to have the original name of the input file.
There is a similar question asked, "Java - Find a line in a file and remove".
Basically they all use a temp file, there is no harm doing so. So why not just do it? It will not affect your performance much and can avoid some errors.
Why not a simple sed -si '/line I want to delete/d' big_file?
I am trying to download a file from a server in a user specified number of parts (n). So there is a file of x bytes divided into n parts with each part downloading a piece of the whole file at the same time. I am using threads to implement this, but I have not worked with http before and do not really understand how downloading a file really works. I have read up on it and it seems "Range" needs to be used, but I do not know how to download different parts and being able to append them without corrupting the data.
(Since it's a homework assignment I will only give you a hint)
Appending to a single file will not help you at all, since this will mess up the data. You have two alternatives:
Download from each thread to a separate temporary file and then merge the temporary files in the right order to create the final file. This is probably easier to conceive, but a rather ugly and inefficient approach.
Do not stick to the usual stream-style semantics - use random access (1, 2) to write data from each thread straight to the right location within the output file.
I have created j2me application for read write of text file
now at time of reading I read one line and send it to server. after that I want to remove that line from text file.
I am not getting how to do it. in some example I found solution as copy original file content in one object then remove that string from object and then delete original file and create new with that new object.
I don't think it as good approach. is there any other way to do so???
Edit:
actually problem is like one application is writing some data in text file and my another application read one line send to server and remove that line.
Now if I go with the approach like copy new object and delete file and write new file with new object then I will found one problem
if file is deleted then first application can't found that file so it may create new file
with only one data and second application will create new file based on new object
so my data will be lost
Edit:
Even I tried to do same thing with RMS but when both application is accessing same RMS at that time all data in RMS file are clear. First application open RMS for writing and second Open for sync and delete. but at time when both are opening RMS all data clear.
Is it possible to set lock on RMS file from one application??
No, that's how you do it.
You can't delete a line from the beginning of a file. You would need to re-write the file without that line.
(Note that this is not specific to java)
As records are inserted i was creating single file for single record in one specific folder
now as that file is read by background application and send to server that will be deleted by application.
so it solve concurrency problem in file read write.
i know it is not good approach but i didn't find any other good approach.
Most file systems don't have a mechanism for deleting stuff in the middle. (pretty sure that's the case in j2me). So a standard practice is open a new file; copy the old file up to the point where the unwanted line goes, skip it, then copy the rest of the file. I know it sounds inelegant but that's just how it is :)