Under what conditions does Java's Scanner.hasNextLine() block? - java

The javadoc for Scanner.hasNextLine() states:
Returns true if there is another line in the input of this scanner.
This method may block while waiting for input. The scanner does
not advance past any input.
Under what conditions will the method block?

It depends on the source that the Scanner gets the input from.
For example, if it's a file, the entire input is available, so hasNextLine() wouldn't block (since it can know with certainty when the end of the file is reached and there's no more input.
On the other hand, if the source is standard input, there can always be more input - the user can always type more input - so hasNextLine() would block until the user types in a new line of input.

How to decide if it will block?
To decide if hasNextLine will block or not is unfortunately not a supported use case.
This is because the underlying sources doesn't always provide an API for peeking in the stream. Put differently, the implementation of hasNextLine calls methods that themselves may block so the problem is sort of inherent.
So, what to do?
If this is indeed a required use case, I would recommend one of the following approaches:
Make sure the conditions are right for the hasNextLine. Only provide the scanner with sources that have a definite end (such as a file or string) and never an "open ended" input such as System.in.
If this is part of an API you could wrap the scanner in your own class that only exposes "safe" constructors.
Roll your own class from scratch that does have a willHasNextLineBlock type of method. This could probably be implemented somewhat robustly using InputStream.available.
Under the category of super ugly workarounds we find:
Making an attempt at calling hasNextLine in a separate thread and see if it returns within reasonable time, as follows:
boolean wouldBlock = false;
Thread t = new Thread(() -> s.hasNextLine());
t.start();
try {
t.join(100);
} catch (InterruptedException e) {
wouldBlock = true;
}
Use a custom input stream (something like a peekable stream that one could tap into before calling hasNextLine. Something like the this
CustomStream wrapped = new CustomStream(originalSource)
Scanner s = new Scanner(wrapped);
...
if (wrapped.hasNextLine())
// s.hasNextLine would not block
else
// s.hasNextLine would block
(Note however that this is somewhat unsafe, since the scanner may have buffered some data from the CustomStream.)

Assuming by "decide if it will block" you mean that you want to know when it will bock.
Have a look at where the input is assigned in the hasNextLine method
String result = findWithinHorizon(linePattern(), 0);
Now, have a look at the findWithinHorizon method
public String findWithinHorizon(Pattern pattern, int horizon) {
ensureOpen();
if (pattern == null)
throw new NullPointerException();
if (horizon < 0)
throw new IllegalArgumentException("horizon < 0");
clearCaches();
// Search for the pattern
while (true) { //it may block here if it never break
String token = findPatternInBuffer(pattern, horizon);
if (token != null) {
matchValid = true;
return token;
}
if (needInput)
readInput();
else
break; // up to end of input
}
return null;
}
As you can see, it will loop infinitely until the end is reached, or until it succeed to read.
findPatternInBuffer is a private method of the Scanner class that try to read the input.
private String findPatternInBuffer(Pattern pattern, int horizon) {
matchValid = false;
matcher.usePattern(pattern);
int bufferLimit = buf.limit();
int horizonLimit = -1;
int searchLimit = bufferLimit;
if (horizon > 0) {
horizonLimit = position + horizon;
if (horizonLimit < bufferLimit)
searchLimit = horizonLimit;
}
matcher.region(position, searchLimit);
if (matcher.find()) {
if (matcher.hitEnd() && (!sourceClosed)) {
// The match may be longer if didn't hit horizon or real end
if (searchLimit != horizonLimit) {
// Hit an artificial end; try to extend the match
needInput = true;
return null;
}
// The match could go away depending on what is next
if ((searchLimit == horizonLimit) && matcher.requireEnd()) {
// Rare case: we hit the end of input and it happens
// that it is at the horizon and the end of input is
// required for the match.
needInput = true;
return null;
}
}
// Did not hit end, or hit real end, or hit horizon
position = matcher.end();
return matcher.group();
}
if (sourceClosed)
return null;
// If there is no specified horizon, or if we have not searched
// to the specified horizon yet, get more input
if ((horizon == 0) || (searchLimit != horizonLimit))
needInput = true;
return null;
}
I posted the whole method to give you a better idea of what I meant by "succeed to read".

Related

How to terminate a loop that is reading an input?

I'm wondering how terminate a loop when the end of input is reached. I searched a lot for this but the only solutions I encounter envolve using Scanner which I'm not using. Instead, I am using the following function that reads each line of the input however I'm not quite understanding how can I end a loop that is constantly reading random numbers which means I can't simply put a clause on the while(clause) to reach the end of the loop.
CODE:
The loop that i'm talking about:
public static void main(String[] args) {
String str = "";
while (true){
str = readLn(200);
}
}
The method using for read lines:
static String readLn (int maxLg){ //utility function to read from stdin
byte lin[] = new byte [maxLg]; int lg = 0, car = -1;
String line = "";
try {
while (lg < maxLg){
car = System.in.read();
if ((car < 0) || (car == '\n')) break; lin [lg++] += car;
} }
catch (IOException e){
return (null);
}
if ((car < 0) && (lg == 0)) return (null); // eof
return (new String (lin, 0, lg));
}
If a stream is closed, the .read() call will return -1, which causes car to be less than 0, which causes the method to return; null if nothing is read yet, otherwise a string with the contents you got so far.
Even if you loop this method, it'll just keep returning null - once a stream starts returning -1 it will continue to do so.
Most likely, the stream is NOT, in fact, 'closed'. System.in doesn't close just because you wish it so, or because you stop typing. Maybe you'll type some more.
One easy way to 'close' system.in is to pipe a file into your process. something like:
echo "hello" | java YourApp
if you insist on keyboard input, you're looking at CTRL+Z or CTRL+D depending on OS in order to get standard in to be considered 'closed', and a little praying.
the while loop in your main should either break if str is null, or needs to be a do/while construct, which whiles as long as str != null, or just while on str != null, but then make sure to initialize str to a nonnull value or it won't even enter the while in the first place.
The while loop in your readLn method is already causing that loop to end if standard in is closed.
For your special case (there are more efficient ways to read a string from the stdin), the return value in readLn is either a string or null for eof.
So you can terminate the loop if the returned value is null:
while (true){
str = readLn(200);
if (str == null) {
break;
}
}

Get Java Scanner Input Without Advancing Scanner

Get out the value of the scanner without advancing it - Java
I want to get the value of the input in the scanner without advancing it. Currently, I am using my scanners input as System.in.
final var sc = new Scanner(System.in);
I know of the hasNext methods on scanner, and they are currently my best/only way to check its input without advancing it.
Here is how I ensure a positive integral input from sc for example.
public static int getPositiveIntegerInput(Scanner sc) {
System.out.println("Please input a positive integer");
sc.useDelimiter("\n");
while (!sc.hasNextInt() || sc.hasNext(".*[^\\d].*")) {
System.out.println("Invalid input, please try again");
sc.nextLine();
}
return sc.nextInt();
}
I want to extend this notion of checking sc's input without advancing it to actually getting sc's input without advancing it.
What I have tried to to this point
I have gone through the implementation details of hasNext() on Scanner.
Implementation of hasNext:
public final class Scanner {
public boolean hasNext(Pattern pattern) {
ensureOpen();
if (pattern == null)
throw new NullPointerException();
hasNextPattern = null;
saveState();
modCount++;
while (true) {
if (getCompleteTokenInBuffer(pattern) != null) {
matchValid = true;
cacheResult();
return revertState(true);
}
if (needInput)
readInput();
else
return revertState(false);
}
}
}
It seemed to me at least, that one can get scanner's input from the method getCompleteTokenInBuffer, but truly I don't really understand how it works. I don't know if that method alone gets the value of scanner without advancing it, or if it advances it then something else reverts it back to the state it was in before the input as if it has not advanced at all, or if it gets it in combination with something else, or really how at all.
I have been playing around with invoking the private methods Scanner through Java's reflection API, to try to actually return the token holding sc's input value without actually advancing methods (but to be honest, I'm just playing around with it and don't know how to actually accomplish what I want to do).
public static void main(String[] args) {
final var sc = new Scanner(System.in);
sc.useDelimiter("\n");
var str = "";
try {
Method method = Scanner.class.getDeclaredMethod("getCompleteTokenInBuffer", Pattern.class);
method.setAccessible(true);
str = (String) method.invoke(sc, Pattern.compile(".*"));
} catch (Exception e) {
System.out.println("Well, that didn't work!");
System.out.println("Exception: " + e);
}
System.out.println("getCompleteTokenInBuffer: " + str);
// Prints: "getCompleteTokenInBuffer: null"
}
Note: The method above does not wait for an input before get the value of sc's input and hence returns a value of null.
Goal:
Just to reiterate, I would like to find away to capture and return a Scanner object's input value with actually advancing it.
What you're looking for might otherwise be referred to as a peek function.
This answer on another thread indicates that you might be served by creating a wrapper class around Scanner that implements this functionality, since the Scanner class itself does not implement it.

Reading a line from a Stream and return immediately

Which method should i be using (and which class) for reading a line from a given InputStream, which in case there is no line to read, or actually in any case, returns immediatly?
For clarity, I want to know which class provides a method that reads a line from an InputStream, and returns Immedgiatly - e.g do not block if there is no line to read.
For exemple, BufferedReader.readLine() does block as far as i know.
public final String pollLine(final BufferedReader reader)
throws IOException {
/* pick a reasonable look ahead */
reader.mark(512);
while (reader.ready()) {
final int ch = reader.read();
if (ch == -1
|| Character.getType(ch) == Character.LINE_SEPARATOR) {
reader.reset();
return reader.readLine();
}
}
reader.reset();
return null;
}
Sorry for any errors, I've typed this response on the small touchscreen keyboard of a cell phone.
To answer your question, you can query Reader.ready to determine whether you can safely read without blocking.
Returns:
True if the next read() is guaranteed not to block for input, false otherwise. Note that returning false does not guarantee that the next read will block.

for loop for inputting variables into array not working?

I'm trying to input 3 different variables into an array inside a while loop, as long as i don't enter stop for any of the variables. the while loop is only suppose to let me input a second variable value if the 1st variable isn't stop, and likewise with inputting a third variable value
Right now, the first loop goes fine and i can input all 3 variables, but the 2nd and 3rd time, the for loop outputs the first variable, but doesn't allow me to input a value before skipping to the 2nd variable.
ex of what i mean:
name:afasdf
extra info:afdsaf
unit cost:123123214
name: extra info: adflskjflk
also, entering Stop isn't ending the loop either
unit cost:123217
i know that this loop works when there's only one variable, and i've tried using a for loop instead of a while loop, and adding tons and tons of else statements, but it seems to stay the same
is there something wrong with the way i set up my breakers?
is the way i set up the last breaker(the one that stops even when i put stop for a double variable) messing up the rest of hte loop?
thank you so much
here is my code
ArrayItem s = new ArrayItem();
String Name = null, ID = null;
double Money = 0;
boolean breaker = false;
while(breaker ==false)
{
System.out.print("Name:" + "\t");
Name = Input.nextLine();
if(Name.equals("Stop")) //see if the program should stop
breaker = true;
System.out.print("Extra Info:" + "\t");
Details = Input.nextLine();
if(ID.equals("Stop"))
breaker = true;
System.out.print("Unit Cost:" + "\t");
Money = Input.nextDouble();
// suppose to let me stop even if i input stop
// when the variable is suppose to be a double
if(Input.equals("stop") || Input.equals("stop"))
breaker = true;
else
s.SetNames(Name);
s.SetInfo(Details);
s.SetCost(Money);
}
A couple of things about the code: "Name:" + "\t" can be simplified ot "Name:\t". This is true for the rest of the code. In Java, it's customary to use camelcase where the first word is lowercase. For example, s.SetMoney would be s.setMoney. Also, variables follow the same rules where Money would be money, and ID would be id. If your teacher is teaching you otherwise, then follow their style.
The loop should also be a do-while loop:
do
{
// read each value in sequence, and then check to see if you should stop
// you can/should simplify this into a function that returns the object
// that returns null if the value should stop (requiring a capital D
// double for the return type)
if ( /* reason to stop */)
{
break;
}
s.setNames(name);
s.setId(id);
s.setMoney(money);
} while (true);
private String getString(Scanner input)
{
String result = input.nextLine();
// look for STOP
if (result.equalsIgnoreCase("stop"))
{
result = null;
}
return result;
}
private Double getDouble(Scanner input)
{
Double result = null;
// read the line is a string looking for STOP
String line = getString(input);
// null if it's STOP
if (line != null)
{
try
{
result = Double.parseDouble(line);
}
catch (NumberFormatException e)
{
// not a valid number, but not STOP either!
}
}
return result;
}
There are a lot of concepts in there, but they should help as you progress. I'll let you put the pieces together.
Also, you did need to fix the brackets, but that's not the only issue. Because Money is a double, you must read the value as a String. I suspect that Input is a Scanner object, so you can check Input.hasNextDouble() if it's not, then you can conditionally check the String value to see if it's "stop" (note: you are checking for "Stop" and "stop", which are not equal). Your last, no-chances check compares the Scanner to "stop", which will never be true. Check
System.out.print("Unit Cost:\t");
if (Input.hasNextDouble())
{
Money = Input.nextDouble();
// you can now set your object
// ...
}
// it's not a double; look for "stop"
else if (Input.nextLine().equalsIgnoreCase("stop"))
{
// exit loop
break;
}
// NOTE: if it's NOT a double or stop, then you have NOT exited
// and you have not set money
breaker = true;
while(breaker){
Name = readInput("Name");
Details = readInput("Details");
Money = Double.parseDouble(readInput("Money"));
if(Name.equals("stop") || Details.equals("stop"))
breaker = false;
else {
// set ArrayItem
}
}
private static String readInput(String title){
System.out.println(title+":");
//... read input
// return value
}

Optimizing a lot of Scanner.findWithinHorizon(pattern, 0) calls

I'm building a process which extracts data from 6 csv-style files and two poorly laid out .txt reports and builds output CSVs, and I'm fully aware that there's going to be some overhead searching through all that whitespace thousands of times, but I never anticipated converting about 50,000 records would take 12 hours.
Excerpt of my manual matching code (I know it's horrible that I use lists of tokens like that, but it was the best thing I could think of):
public static String lookup(Pattern tokenBefore,
List<String> tokensAfter)
{
String result = null;
while(_match(tokenBefore)) { // block until all input is read
if(id.hasNext())
{
result = id.next(); // capture the next token that matches
if(_matchImmediate(tokensAfter)) // try to match tokensAfter to this result
return result;
} else
return null; // end of file; no match
}
return null; // no matches
}
private static boolean _match(List<String> tokens)
{
return _match(tokens, true);
}
private static boolean _match(Pattern token)
{
if(token != null)
{
return (id.findWithinHorizon(token, 0) != null);
} else {
return false;
}
}
private static boolean _match(List<String> tokens, boolean block)
{
if(tokens != null && !tokens.isEmpty()) {
if(id.findWithinHorizon(tokens.get(0), 0) == null)
return false;
for(int i = 1; i <= tokens.size(); i++)
{
if (i == tokens.size()) { // matches all tokens
return true;
} else if(id.hasNext() && !id.next().matches(tokens.get(i))) {
break; // break to blocking behaviour
}
}
} else {
return true; // empty list always matches
}
if(block)
return _match(tokens); // loop until we find something or nothing
else
return false; // return after just one attempted match
}
private static boolean _matchImmediate(List<String> tokens)
{
if(tokens != null) {
for(int i = 0; i <= tokens.size(); i++)
{
if (i == tokens.size()) { // matches all tokens
return true;
} else if(!id.hasNext() || !id.next().matches(tokens.get(i))) {
return false; // doesn't match, or end of file
}
}
return false; // we have some serious problems if this ever gets called
} else {
return true; // empty list always matches
}
}
Basically wondering how I would work in an efficient string search (Boyer-Moore or similar). My Scanner id is scanning a java.util.String, figured buffering it to memory would reduce I/O since the search here is being performed thousands of times on a relatively small file. The performance increase compared to scanning a BufferedReader(FileReader(File)) was probably less than 1%, the process still looks to be taking a LONG time.
I've also traced execution and the slowness of my overall conversion process is definitely between the first and last like of the lookup method. In fact, so much so that I ran a shortcut process to count the number of occurrences of various identifiers in the .csv-style files (I use 2 lookup methods, this is just one of them) and the process completed indexing approx 4 different identifiers for 50,000 records in less than a minute. Compared to 12 hours, that's instant.
Some notes (updated 6/6/2010):
I still need the pattern-matching behaviour for tokensBefore.
All ID numbers I need don't necessarily start at a fixed position in a line, but it's guaranteed that after the ID token is the name of the corresponding object.
I would ideally want to return a String, not the start position of the result as an int or something.
Anything to help me out, even if it saves 1ms per search, will help, so all input is appreciated. Thankyou!
Usage scenario 1: I have a list of objects in file A, who in the old-style system have an id number which is not in file A. It is, however, POSSIBLY in another csv-style file (file B) or possibly still in a .txt report (file C) which each also contain a bunch of other information which is not useful here, and so file B needs to be searched through for the object's full name (1 token since it would reside within the second column of any given line), and then the first column should be the ID number. If that doesn't work, we then have to split the search token by whitespace into separate tokens before doing a search of file C for those tokens as well.
Generalised code:
String field;
for (/* each record in file A */)
{
/* construct the rest of this object from file A info */
// now to find the ID, if we can
List<String> objectName = new ArrayList<String>(1);
objectName.add(Pattern.quote(thisObject.fullName));
field = lookup(objectSearchToken, objectName); // search file B
if(field == null) // not found in file B
{
lookupReset(false); // initialise scanner to check file C
objectName.clear(); // not using the full name
String[] tokens = thisObject.fullName.split(id.delimiter().pattern());
for(String s : tokens)
objectName.add(Pattern.quote(s));
field = lookup(objectSearchToken, objectName); // search file C
lookupReset(true); // back to file B
} else {
/* found it, file B specific processing here */
}
if(field != null) // found it in B or C
thisObject.ID = field;
}
The objectName tokens are all uppercase words with possible hyphens or apostrophes in them, separated by spaces (a person's name).
As per aioobe's answer, I have pre-compiled the regex for my constant search tokens, which in this case is just \r\n. The speedup noticed was about 20x in another one of the processes, where I compiled [0-9]{1,3}\\.[0-9]%|\r\n|0|[A-Z'-]+, although it was not noticed in the above code with \r\n. Working along these lines, it has me wondering:
Would it be better for me to match \r\n[^ ] if the only usable matches will be on lines beginning with a non-space character anyway? It may reduce the number of _match executions.
Another possible optimisation is this: concatenate all tokensAfter, and put a (.*) beforehand. It would reduce the number of regexes (all of which are literal anyway) that would be compiled by about 2/3, and also hopefully allow me to pull out the text from that grouping instead of keeping a "potential token" from every line with an ID on it. Is that also worth doing?
The above situation could be resolved if I could get java.util.Scanner to return the token previous to the current one after a call to findWithinHorizon.
Something to start with: Every single time you run id.next().matches(tokens.get(i)) the following code is executed:
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
return m.matches();
Compiling a regular expression is non-trivial and you should consider compiling the patterns once and for all in your program:
pattern[i] = Pattern.compile(tokens.get(i));
And then simply invoke something like
pattern[i].matcher(str).matches()

Categories

Resources