file.txt have 10 line
Integer[] lineWanted ={2,5,1};
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
for (int i = 0; i < lineWanted.length; i++) {
List<String> lineList = br.lines()
.skip(indexes[i]-1)
.limit(1)
.collect(Collectors.toList());
System.out.println(lineList);
}
But code is skipping lines and then counting.
i.e. O/p getting are for line 2, 7 and 8.
If you insist on doing it this way, look carefully at what skip (and limit) is doing. You are skipping to the line index you want, but from the current position in the file. I.e., you get to line 2 correctly, then skip 5 lines (actually 4 from skip + 1 from limit). This puts you at 7, where you get one line to get to 8.
The "correct" way to implement this would be to pre-sort lineWanted, keep track of the previous index, and increment by the difference between the current and previous indices. However, as #tsolakp points out, multiple calls to lines is effectively an undefined operation: you just shouldn't do it.
The specification of BufferedReader.lines() makes it pretty clear that after processing the Stream, the BufferedReader is in an undefined state and can not be used afterwards. So unless you have a strong reason to use a BufferedReader, it’s recommended to use Files.lines to get the stream of lines from a file, which prevents any attempt to reuse the underlying reader in the first place.
You could achieve the goal by repeatedly creating a new stream, but that bears an unacceptable overhead. Keep in mind that even if you skip lines, the file contents have to be processed anyway, to identify the line boundaries, before they can be skipped. And I/O operations are generally expensive compared to computations.
A compromise is to identify the maximum wanted line number first, to avoid processing more lines than necessary (via limit) and the minimum wanted line number to avoid unnecessary intermediate storage (via skip) for a single Stream operation collecting into a temporary List. This may temporarily hold some unwanted lines between the minimum and maximum, but will be more efficient than multiple I/O based Stream operations in most cases:
int[] lineWanted = {2, 5, 1};
IntSummaryStatistics iss = Arrays.stream(lineWanted).summaryStatistics();
List<String> lineList;
try(Stream<String> lines = Files.lines(Paths.get("file.txt"))) {
lineList = lines
.limit(iss.getMax()+1).skip(iss.getMin())
.collect(Collectors.toList());
}
lineList = Arrays.stream(lineWanted)
.map(i -> i-iss.getMin())
.mapToObj(lineList::get)
.collect(Collectors.toList());
System.out.println(lineList);
If you really cannot sort your linesWanted list, the best way would probably be buffering the whole document into String[] of lines, but it all depends on how you want to access the data. Do you want to read only a few lines from one document, or will you be reading the whole document, just in random order?
Just have to move BR inside the loop (No better option able to find, as wanted to do it with BufferedReader).
Files.readAllLines(path).get(lineNo)
is another option which give directly the line but not helpful in my case.
Related
How to join list of millions of values into a single String by appending '\n' at end of each line -
Input data is in a List:
list[0] = And the good south wind still blew behind,
list[1] = But no sweet bird did follow,
list[2] = Nor any day for food or play
list[3] = Came to the mariners' hollo!
Below code joins the list into a string by appending new line character at the end -
String joinedStr = list.collect(Collectors.joining("\n", "{", "}"));
But, the problem is if the list has millions of data the joining fails. My guess is String object couldn't handle millions line due to large size.
Please give suggestion.
The problem with trying to compose a gigantic string is that you have to keep the entire thing in memory before you do anything further with it.
If the string is too big to fit in memory, you have only two options:
increase the available memory, or
avoid keeping a huge string in memory in the first place
This string is presumably destined for some further processing - maybe it's being written to a blob in a database, or maybe it is the body of an HTTP response. It's not being constructed just for fun.
It is probably much more preferable to write to some kind of stream (maybe an implementation of OutputStream) that can be read one character at a time. The consumer can optionally buffer based on the delimiter if they are aware of the context of what you're sending, or they can wait until they have the entire thing.
Preferably you would use something which supports back pressure so that you can pause writing if the consumer is too slow.
Exactly how this looks will depend on what you're trying to accomplish.
Maybe you can do it with a StringBuilder, which is designed specifically for handling large Strings. Here's how I'd do it:
StringBuilder sb = new StringBuilder();
for (String s : list) sb.append(s).append("\n");
return s.toString();
Haven't tested this code though, but it should work
I need to build an application which scans through a large amount of files. These files contain blocks with some data about a sessions, in which each line has a different value. E.g.: "=ID: 39487".
At that point I have that line, but the problem I now face is that I need the value n lines above that ID. I was thinking about an Iterator but it only has forward methods. I also thought about saving the results in a List but that defies the reason to use Stream and some files are huge so that would cause memory problems.
I was wondering if something like this is possible using the Stream API (Files)? Or perhaps a better question, is there a better way to approach this?
Stream<String> lines = Files.lines(Paths.get(file.getName()));
Iterator<String> search = lines.iterator();
You can't arbitrarily read backwards and forwards through the file with the same reader (no matter if you're using streams, iterators, or a plain BufferedReader.)
If you need:
m lines before a given line
n lines after the given line
You don't know the value of m and n in advance, until you reach that line
...then you essentially have three options:
Read the whole file once, keep it in memory, and then your task is trivial (but this uses the most memory.)
Read the whole file once, mark the line numbers that you need, then do a second pass where you extract the lines you require.
Read the whole file once, storing some form of metadata about line lengths as you go, then use a RandomAccessFile to extract the specific bits you need without having to read the whole file again.
I'd suggest given the files are huge, the second option here is probably the most realistic. The third will probably give you better performance, but will require much more in the way of development effort.
As an alternative if you can guarantee that both n and m are below a certain value, and that value is a reasonable size - you could also just keep a certain number of lines in a buffer as you're processing the file, and read through that buffer when you need to read lines "backwards".
Try my library. abacus-util
try(Reader reader = new FileReader(yourFile)) {
StreamEx.of(reader)
.sliding(n, n, ArrayList::new)
.filter(l -> l.get(l.size() - 1).contains("=ID: 39487"))
./* then do your work */
}
No matter how big your file is. as long as n is small number, not millions
I use Java 8 streams a lot to process files but so far always line-by-line.
What I want is a function, which gets a BufferedReader br and should read an specific number of words (seperated by "\\s+") and should leave the BufferedReader at the exact position, where the number of words was reached.
Right now I have a version, which reads the file linewise:
final int[] wordCount = {20};
br
.lines()
.map(l -> l.split("\\s+"))
.flatMap(Arrays::stream)
.filter(s -> {
//Process s
if(--wordCount[0] == 0) return true;
return false;
}).findFirst();
This obviously leaves the Inputstream at the position of the next line of the
20th word.
Is there a way to get a stream which reads less than a line from the inputstream?
EDIT
I am parsing a file where the first word contains the number of following words. I read this word and then accordingly read in the specific number of words. The file contains multiple such sections, where each section is parsed in the described function.
Having read all the helpful comments, it becomes clear to me, that using a Scanner is the right choice for this problem and that Java 9 will have a Scanner class which provides stream features (Scanner.tokens() and Scanner.findAll()).
Using Streams the way I described it will give me no guarantee, that the reader will be at specific position, after the terminal operation of the stream (API docs), therefore making streams the wrong choice for parsing a structure, where you parse only a section and have to keep track of the position.
Regarding your original problem: I assume your file looks like this:
5 a section of five words 3 three words
section 2 short section 7 this section contains a lot
of words
And you want to get the output like this:
[a, section, of, five, words]
[three, words, section]
[short, section]
[this, section, contains, a, lot, of, words]
In general Stream API is badly suitable for such problems. Writing plain old loop looks a better solution here. If you still want to see Stream API based solution, I can suggest using my StreamEx library which contains headTail() method allowing you to easily write custom stream-transformation logic. Here's how your problem could be solved using the headTail:
/* Transform Stream of words like 2, a, b, 3, c, d, e to
Stream of lists like [a, b], [c, d, e] */
public static StreamEx<List<String>> records(StreamEx<String> input) {
return input.headTail((count, tail) ->
makeRecord(tail, Integer.parseInt(count), new ArrayList<>()));
}
private static StreamEx<List<String>> makeRecord(StreamEx<String> input, int count,
List<String> buf) {
return input.headTail((head, tail) -> {
buf.add(head);
return buf.size() == count
? records(tail).prepend(buf)
: makeRecord(tail, count, buf);
});
}
Usage example:
String s = "5 a section of five words 3 three words\n"
+ "section 2 short section 7 this section contains a lot\n"
+ "of words";
Reader reader = new StringReader(s);
Stream<List<String>> stream = records(StreamEx.ofLines(reader)
.flatMap(Pattern.compile("\\s+")::splitAsStream));
stream.forEach(System.out::println);
The result looks exactly as desired output above. Replace reader with your BufferedReader or FileReader to read from the input file. The stream of records is lazy: at most one record is preserved by the stream at a time and if you short-circuit, the rest of the input will not be read (well, of course the current file line will be read to the end). The solution, while looks recursive, does not eat stack or heap, so it works for huge files as well.
Explanation:
The headTail() method takes a two-argument lambda which is executed at most once during the outer stream terminal operation execution, when stream element is requested. The lambda receives the first stream element (head) and the stream which contains all other original elements (tail). The lambda should return a new stream which will be used instead of the original one. In records we have:
return input.headTail((count, tail) ->
makeRecord(tail, Integer.parseInt(count), new ArrayList<>()));
First element of the input is count: convert it to number, create empty ArrayList and call makeRecord for the tail. Here's makeRecord helper method implementation:
return input.headTail((head, tail) -> {
First stream element is head, add it to the current buffer:
buf.add(head);
Target buffer size is reached?
return buf.size() == count
If yes, call the records for the tail again (process the next record, if any) and prepend the resulting stream with single element: current buffer.
? records(tail).prepend(buf)
Otherwise, call myself for the tail (to add more elements to the buffer).
: makeRecord(tail, count, buf);
});
Context
I've stumble upon a rather annoying problem : I've a program with a lot of data source that are able to stream the same type of elements and I want to "map" each availiable element in the program (element order doesn't matter).
Therefore I've tried to reduce my Stream<Stream<T>> streamOfStreamOfT; into a simple Stream<T> streamOfT; using streamOfT = streamOfStreamOfT.reduce(Stream.empty(), Stream::concat);
Since element order is not important for me, I've tried to parallelize the reduce operation with a .parallel() : streamOfT = streamOfStreamOfT.parallel().reduce(Stream.empty(), Stream::concat); But this triggers an java.lang.IllegalStateException: stream has already been operated upon or closed
Example
To experience it yourself just play with the following main (java 1.8u20) by commenting / uncommenting the .parallel()
public static void main(String[] args) {
// GIVEN
List<Stream<Integer>> listOfStreamOfInts = new ArrayList<>();
for (int j = 0; j < 10; j++) {
IntStream intStreamOf10Ints = IntStream.iterate(0, i -> i + 1)
.limit(10);
Stream<Integer> genericStreamOf10Ints = StreamSupport.stream(
intStreamOf10Ints.spliterator(), true);
listOfStreamOfInts.add(genericStreamOf10Ints);
}
Stream<Stream<Integer>> streamOfStreamOfInts = listOfStreamOfInts
.stream();
// WHEN
Stream<Integer> streamOfInts = streamOfStreamOfInts
// ////////////////
// PROBLEM
// |
// V
.parallel()
.reduce(Stream.empty(), Stream::concat);
// THEN
System.out.println(streamOfInts.map(String::valueOf).collect(
joining(", ")));
}
Question
Can someone explain this limitation ? / find a better way of handling parallel reduction of stream of streams
Edit 1
Following #Smutje and #LouisWasserman comments it seems that .flatMap(Function.identity()) is a better option that tolerates .parallel() streams
The form of reduce you are using takes an identity value and an associative combining function. But Stream.empty() is not a value; it has state. Streams are not data structures like arrays or collections; they are carriers for pushing data through possibly-parallel aggregate operations, and they have some state (like whether the stream has been consumed or not.) Think about how this works; you're going to build a tree where the same "empty" stream appears in more than one leaf. When you try to use this stateful not-an-identity twice (which won't happen sequentially, but will happen in parallel), the second time you try and traverse through that empty stream, it will quite correctly be seen to be already used.
So the problem is, you're simply using this reduce method incorrectly. The problem is not with the parallelism; it is simply that the parallelism exposed the underlying problem.
Secondly, even if this "worked" the way you think it should, you would only get parallelism building the tree that represents the flattened stream-of-streams; when you go to do the joining, that's a sequential stream pipeline there. Ooops.
Thirdly, even if this "worked" the way you think it should, you're going to add a lot of element-access overhead by building up concatenated streams, and you're not going to get the benefit of parallelism that you are seeking.
The simple answer is to flatten the streams:
String joined = streamOfStreams.parallel()
.flatMap(s -> s)
.collect(joining(", "));
Well, this might be a silly problem.
I just want a faster implementation of following problem
I want to take three integer input in a single line eg:
10 34 54
One way is to make a BufferedReader and then use readLine()
which will read the whole line as a string
then we can use StringTokenizer to separate three integer. (Slow implemetation)
Another way is to use 'Scanner' and take input by nextInt() method. (Slower than previous method)
I want a fast implementation to take such type of inputs since I have to read more than 2,000,000 lines and these implementations are very slow.
My implementation:
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
for(i=0;i<n;i++) {
str=br.readLine();
st = new StringTokenizer(str);
t1=Integer.parseInt(st.nextElement().toString());
t2=Integer.parseInt(st.nextElement().toString());
z=Long.parseLong(st.nextElement().toString());
}
This one is looped for n times. ( n is number of entries)
Since I know each line will contain only three integer there is no need to check for hasMoreElements()
I just want a faster implementation of following problem.
The chances are that you DON'T NEED a faster implementation. Seriously. Not even with a 2 million line input file.
The chances are that:
more time is spent processing the file than reading it, and
most of the "read time" is spent doing things at the operating system level, or simply waiting for the next disk block to be read.
My advice is to not bother optimizing this unless the application as a whole takes too long to run. And when you find that this is the case, profile the application, and use the profile stats to tell you where it could be worthwhile spending effort on optimization.
(My gut feeling is that there is not much to be gained by optimizing this part of your application. But don't rely on that. Profile it!)
Here's a basic example that will be pretty fast:
public static void main(String[] args) throws IOException {
BufferedReader reader = new BufferedReader(new FileReader("myfile.txt"));
String line;
while ((line = reader.readLine()) != null) {
for (String s : line.split(" ")) {
final int i = Integer.parseInt(s);
// do something with i...
}
}
reader.close();
}
However your task is fundamentally going to take time.
If you are doing this on a website and reaching a timeout, you should consider doing it in a background thread, and send a response to the user saying that the data is being processed. You'll probably need to add a way for the user to check on the progress.
Here is what I mean when I say "specialized scanner". Depending upon parser's (or split's) efficiency, this might be a bit faster (it probably is not):
BufferedReader br=new BufferedReader(...);
for(i=0;i<n;i++)
{
String str=br.readLine();
long[] resultLongs = {-1,-1,-1};
int startPos=0;
int nextLongIndex=0;
for (int p=0;p<str.length();p++)
{
if (str.charAt(p)== ' ')
{
String intAsStr=str.substring(startPos, p-1);
resultLongs[nextLongIndex++]=Integer.parseInt(intAsStr);
startpos=p+1;
}
}
// t1, t2 and z are in resultLongs[0] through resultLongs[2]
}
Hths.
And of course this fails miserably if the input file contains garbage, i.e. anything else but longs separated by blanks.
And in addition, to minimize the "roundtrips" to the OS, it is a good idea to supply the buffered reader with a nonstandard (bigger-than-standard) buffer.
The other hint I gave in the comment refined: If you have to read such a huge text file more than once, i.e. more than once after it has been updated, you could read all longs into a data structure (maybe a List of elements that hold three longs), and stream that into a "cache" file. Next time, compare the text file's timestamp to the "cache" file's. If it is older, read the cache file. Since stream I/O does not serialize longs into its string representation, you will see much, much better reading times.
EDIT: Missed the startPos reassignment.
EDIT2: Added the cache idea explanation.