Is this a good way of parsing a string? - java

My program reads lines from a plain text file w/ lines formatted: <integer>;<integer>%n, where ; is the delimiter. It compares the two parsed integers against 2 other known values and increments tallyArray[i] if they match.
I currently use:
try {
scan = new Scanner(new BufferedReader(new FileReader("LogFileToBeRead.txt")));
for (int i = 0; i < tallyArraySize; i++) {
explodedLogLine = scan.nextLine().split(";");
if (IntReferenceVal1 == Integer.parseInt(explodedLogLine[0]) && IntReferenceVal2 == Integer.parseInt(explodedLogLine[1])) {
tallyArray[i]++;
}
}
} finally {
if (scan != null) { scan.close(); }
}
I was wondering if there were any serious faults with this method. It does not need to be production-quality.
Also, is there a standard way of parsing a string like this?
EDIT: We can assume the text file is perfectly formatted. But I see the importance for accounting for possible exceptions.

You are not handling NumberFormatExceptions thrown by the Integer.parseInt() method calls. If there's one bad line, execution exits your for loop.
You aren't vetting the integrity of the file you are reading from. If there isn't a ; character or if the Strings aren't actually numbers, execution simply exits the code block you posted.
If you can assume the file is perfectly formatted, and you're set on using a Scanner, you can add ; as a delimiter to the Scanner:
scan = new Scanner(new BufferedReader(new FileReader("LogFileToBeRead.txt")));
scan.useDelimiter(Pattern.compile("(;|\\s)"));
for (int i = 0; i < tallyArraySize; i++) {
int ref1 = scan.nextInt();
int ref2 = scan.nextInt();
if (IntReferenceVal1 == ref1 &&
IntReferenceVal2 == ref2) {
tallyArray[i]++;
}
}
And simply call Scanner.nextInt() twice for each line.

According to me There are three flaws in the program.
Delimiter ; what if there is delimiter is removed by accident or added by accident
There should be check on explodedLogLine that it is of length 2 and it is not null otherwise it will result in unexpected runtime error
You should catch NumberFormatException format exception since you can never be sure that Input is always a number
A simple illustration below gives you idea how things will go wrong.
String str = "3;;3";
System.out.println(Arrays.toString(str.split(";")));
This code will print [3, , 3] in such case your program will produce NumberFormatException as "" string can not be parsed to Integer.

Related

Input multiple lines using hasNextLine() is not working in the way that I expected it to

I'm trying to input multiple lines in java by using hasNextline() in the while loop.
Scanner sc = new Scanner(System.in);
ArrayList<String> lines = new ArrayList<>();
while (sc.hasNextLine()) {
lines.add(sc.nextLine());
System.out.println(lines)
}
The code is inside the main method. But the print method in thewhile loop doesn't print the last line of my input. Also, while loop doesn't seem to break.
What should I do to print whole lines of input and finally break the while loop and end the program?
Since an answer that explains why hasNextLine() might be giving "unexpected" result has been linked / given in a comment, instead of repeating the answer, I'm giving you two examples that might give you "expected" result. Whether any of them suits your needs really depends on what kind of input you need the program to deal with.
Assuming you want the loop to be broken by an empty line:
while (true) {
String curLine = sc.nextLine();
if (curLine.isEmpty())
break;
lines.add(curLine);
System.out.println(curLine);
}
Assuming you want the loop to be broken by two consecutive empty lines:
while (true) {
String curLine = sc.nextLine();
int curSize = lines.size();
String LastLine = curSize > 0 ? lines.get(curSize-1) : "";
if (curLine.isEmpty() && LastLine.isEmpty())
break;
lines.add(curLine);
System.out.println(curLine);
}
// lines.removeIf(e -> e.isEmpty());

How to remove line breaks and empty lines from String

I am trying to run a mapreduce job on hadoop which reads the fifth entry of a tab delimited file (fifth entry are user reviews) and then do some sentiment analysis and word count on them.
However, as you know with user reviews, they usually include line breaks and empty lines. My code iterates through the words of each review to find keywords and check sentiment if keyword is found.
The problem is as the code iterates through the review, it gives me ArrayIndexOutofBoundsException Error because of these line breaks and empty lines in one review.
I have tried using replaceAll("\r", " ") and replaceAll("\n", " ") to no avail.
I have also tried if(tokenizer.countTokens() == 2){
word.set(tokenizer.nextToken());}
else {
}
also to no avail. Below is my code:
public class KWSentiment_Mapper extends Mapper<LongWritable, Text, Text, IntWritable> {
ArrayList<String> keywordsList = new ArrayList<String>();
ArrayList<String> posWordsList = new ArrayList<String>();
ArrayList<String> tokensList = new ArrayList<String>();
int e;
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] line = value.toString().split("\t");
String Review = line[4].replaceAll("[\\-\\+\\\\)\\.\\(\"\\{\\$\\^:,]", "").toLowerCase();
StringTokenizer tokenizer = new StringTokenizer(Review);
while (tokenizer.hasMoreTokens()) {
// 1- first read the review line and store the tokens in an arraylist, 2-
// iterate through review to check for KW if found
// 3-check if there's PosWord near (upto +3 and -2)
// 4- setWord & context.write 5- null the review line arraylist
String CompareString = tokenizer.nextToken();
tokensList.add(CompareString);
}
{
for (int i = 0; i < tokensList.size(); i++)
{
for (int j = 0; j < keywordsList.size(); j++) {
boolean flag = false;
if (tokensList.get(i).startsWith(keywordsList.get(j)) == true) {
for (int e = Math.max(0, i - 2); e < Math.min(tokensList.size(), i + 4); e++) {
if (posWordsList.contains(tokensList.get(e))) {
word.set(keywordsList.get(j));
context.write(word, one);
flag = true;
break; // breaks out of e loop }}
}
}
}
if (flag)
break;
}
}
tokensList.clear();
}
}
Expected results are such that:
Take these two cases of reviews where error occurs:
Case 1: "Beautiful and spacious!
I highly recommend this place and great host."
Case 2: "The place in general was really silent but we didn't feel stayed.
Aside from this, the bathroom is big and the shower is really nice but there problem. "
The system should read the whole review as one line and iterate through the words in it. However, it just stops as it finds a line break or an empty line as in case 2.
Case 1 should be read such as: "Beautiful and spacious! I highly recommend this place and great host."
Case 2 should be:"The place in general was really silent but we didn't feel stayed. Aside from this, the bathroom is big and the shower is really nice but there problem. "
I am running out of time and would really appreciate help here.
Thanks!
So, I hope I am understanding what what you are trying to do....
If I am reading what you have above correctly, the value of 'value' passed into your map function above contains the delimited value that you would like to parse the user reviews out of. If that is the case, I believe we can make use of the escaping functionality in the opencsv library using tabs as your delimiting character instead of commas to correctly populate the user review field:
http://opencsv.sourceforge.net
In this example we are reading one line from the input that is passed in and parsing it into 'columns' base on the tab character and placing the results in the 'nextLine' array. This will allow us to use the escaping functionality of the CSVReader without reading an actual file and instead using the value of the text passed into your map function.
StringReader reader = new StringReader(value.toString());
CSVReader csvReader = new CSVReader(reader, '\t', '\"', '\\', 0);
String [] nextLine = csvReader.readNext();
if(nextLine != null && nextLine.length >= 5) {
// Do some stuff
}
In the example that you pasted above, I think even that split("\n") will be problematic as tabs within a user review split into two results in the result in addition to new lines being treated as new records. But, both of these characters are legal as long as they are inside a quoted value (as they should be in a properly escaped file and as they are in your example). CSVReader should handle all of these.
Validate each line at the start of the map method, so that you know line[4] exists and isn't null.
if (value == null || value.toString == null) {
return;
}
String[] line = value.toString().split("\t");
if (line == null || line.length() < 5 || line[4] == null) {
return;
}
As for line breaks, you'll need to show some sample input. By default MapReduce passes each line into the map method independently, so if you do want to read multiple lines as one message, you'll have to write a custom InputSplit, or pre-format your data so that all data for each review is on the same line.

Double#parseDouble not throwing a NullPointerException when attemping to parse empty substrings/tokens

I was working on some code using try-catch and I needed empty substrings to throw an exception when doing Double.parseDouble() (in this case, I presume it would be a NullPointerException).
My question is why this code doesn't throw an exception if I enter something like , , , (space-comma-space-comma-space-comma) or similar (which should split the string into three whitespaces, if I understand correctly):
Scanner input = new Scanner(System.in);
String[] inputParts = null;
String inputLine = input.nextLine();
// the matches() here prevents this from happening, but I still don't understand
// why an exception isn't thrown
if ((inputLine.contains(",") || inputLine.contains(" ")) && !inputLine.matches("\\s+")) {
inputParts = inputLine.split("\\s*(,*\\s+)|(,+)");
}
for (int i = 0; i < inputParts.length; ++i) {
// this prints nothing -- not even a new line. Same behavior even if I don't parseDouble
// and just print the string directly
System.out.println(Double.parseDouble(inputParts[i]));
}
If I try to parseDouble from an empty string "" or " " without taking user input like this it does throw an exception.
I'm quite confused as to why this is happening, considering the code I was working on does work except when I enter something like the above (although I fixed it by checking to see if each substring was only whitespace and throwing the appropriate exception manually).
Thanks.

String not populating properly

I am writing a program that is going to read a string from a file, and then remove anything that isn't 1-9 or A-Z or a-z. The A-Z values need to become lowercase. Everything seems to run fine, I have no errors, however my output is messed up. It seems to skip certain characters for no reason whatsoever. I've looked at it and tweaked it but nothing works. Can't figure out why it is randomly skipping certain characters because I believe my if statements are correct. Here is the code:
String dataIn;
int temp;
String newstring= "";
BufferedReader file = new BufferedReader(new FileReader("palDataIn.txt"));
while((dataIn=file.readLine())!=null)
{
newstring="";
for(int i=0;i<dataIn.length();i++)
{
temp=(int)dataIn.charAt(i);
if(temp>46&&temp<58)
{
newstring=newstring+dataIn.charAt(i);
}
if(temp>96&&temp<123)
{
newstring=newstring+dataIn.charAt(i);
}
if(temp>64&&temp<91)
{
newstring=newstring+Character.toLowerCase(dataIn.charAt(i));
}
i++;
}
System.out.println(newstring);
}
So to give you an example, the first string I read in is :
A sample line this is.
The output after my program runs through it is this:
asmlietis
So it is reading the A making it lowercase, skips the space like it is suppose to, reads the s in, but then for some reason skips the "a" and the "m" and goes to the "p".
You're incrementing i in the each of the blocks as well as in the main loop "header". Indeed, because you've got one i++; in an else statement for the last if statement, you're sometimes incrementing i twice during the loop.
Just get rid of all the i++; statements other than the one in the for statement declaration. For example:
newstring="";
for(int i=0;i<dataIn.length();i++)
{
temp=(int)dataIn.charAt(i);
if(temp>46&&temp<58)
{
newstring=newstring+dataIn.charAt(i);
}
if(temp>96&&temp<123)
{
newstring=newstring+dataIn.charAt(i);
}
if(temp>64&&temp<91)
{
newstring=newstring+Character.toLowerCase(dataIn.charAt(i));
}
}
I wouldn't stop editing there though. I'd also:
Use a char instead of an int as the local variable for the current character you're looking at
Use character literals for comparisons, to make it much clearer what's going on
Use a StringBuilder to build up the string
Declare the variable for the output string for the current line within the loop
Use if / else if to make it clear you're only expecting to go into one branch
Combine the two paths that both append the character as-is
Fix the condition for numbers (it's incorrect at the moment)
Use more whitespace for clarity
Specify a locale in toLower to avoid "the Turkey problem" with I
So:
String line;
while((line = file.readLine()) != null)
{
StringBuilder builder = new StringBuilder(line.length());
for (int i = 0; i < line.length(); i++) {
char current = line.charAt(i);
// Are you sure you want to trim 0?
if ((current >= '1' && current <= '9') ||
(current >= 'a' && current <= 'z')) {
builder.append(current);
} else if (current >= 'A' && current <= 'Z') {
builder.append(Character.toLowerCase(current, Locale.US));
}
}
System.out.println(builder);
}

Space Replacement for Float/Int/Double

I am working on a class assignment this morning and I want to try and solve a problem I have noticed in all of my team mates programs so far; the fact that spaces in an int/float/double cause Java to freak out.
To solve this issue I had a very crazy idea but it does work under certain circumstances. However the problem is that is does not always work and I cannot figure out why. Here is my "main" method:
import java.util.Scanner; //needed for scanner class
public class Test2
{
public static void main(String[] args)
{
BugChecking bc = new BugChecking();
String i;
double i2 = 0;
Scanner in = new Scanner(System.in);
System.out.println("Please enter a positive integer");
while (i2 <= 0.0)
{
i = in.nextLine();
i = bc.deleteSpaces(i);
//cast back to float
i2 = Double.parseDouble(i);
if (i2 <= 0.0)
{
System.out.println("Please enter a number greater than 0.");
}
}
in.close();
System.out.println(i2);
}
}
So here is the class, note that I am working with floats but I made it so that it can be used for any type so long as it can be cast to a string:
public class BugChecking
{
BugChecking()
{
}
public String deleteSpaces(String s)
{
//convert string into a char array
char[] cArray = s.toCharArray();
//now use for loop to find and remove spaces
for (i3 = 0; i3 < cArray.length; i3++)
{
if ((Character.isWhitespace(cArray[i3])) && (i3 != cArray.length)) //If current element contains a space remove it via overwrite
{
for (i4 = i3; i4 < cArray.length-1;i4++)
{
//move array elements over by one element
storage1 = cArray[i4+1];
cArray[i4] = storage1;
}
}
}
s = new String(cArray);
return s;
}
int i3; //for iteration
int i4; //for iteration
char storage1; //for storage
}
Now, the goal is to remove spaces from the array in order to fix the problem stated at the beginning of the post and from what I can tell this code should achieve that and it does, but only when the first character of an input is the space.
For example, if I input " 2.0332" the output is "2.0332".
However if I input "2.03 445 " the output is "2.03" and the rest gets lost somewhere.
This second example is what I am trying to figure out how to fix.
EDIT:
David's suggestion below was able to fix the problem. Bypassed sending an int. Send it directly as a string then convert (I always heard this described as casting) to desired variable type. Corrected code put in place above in the Main method.
A little side note, if you plan on using this even though replace is much easier, be sure to add an && check to the if statement in deleteSpaces to make sure that the if statement only executes if you are not on the final array element of cArray. If you pass the last element value via i3 to the next for loop which sets i4 to the value of i3 it will trigger an OutOfBounds error I think since it will only check up to the last element - 1.
If you'd like to get rid of all white spaces inbetween a String use replaceAll(String regex,String replacement) or replace(char oldChar, char newChar):
String sBefore = "2.03 445 ";
String sAfter = sBefore.replaceAll("\\s+", "");//replace white space and tabs
//String sAfter = sBefore.replace(' ', '');//replace white space only
double i = 0;
try {
i = Double.parseDouble(sAfter);//parse to integer
} catch (NumberFormatException nfe) {
nfe.printStackTrace();
}
System.out.println(i);//2.03445
UPDATE:
Looking at your code snippet the problem might be that you read it directly as a float/int/double (thus entering a whitespace stops the nextFloat()) rather read the input as a String using nextLine(), delete the white spaces then attempt to convert it to the appropriate format.
This seems to work fine for me:
public static void main(String[] args) {
//bugChecking bc = new bugChecking();
float i = 0.0f;
String tmp = "";
Scanner in = new Scanner(System.in);
System.out.println("Please enter a positive integer");
while (true) {
tmp = in.nextLine();//read line
tmp = tmp.replaceAll("\\s+", "");//get rid of spaces
if (tmp.isEmpty()) {//wrong input
System.err.println("Please enter a number greater than 0.");
} else {//correct input
try{//attempt to convert sring to float
i = new Float(tmp);
}catch(NumberFormatException nfe) {
System.err.println(nfe.getMessage());
}
System.out.println(i);
break;//got correct input halt loop
}
}
in.close();
}
EDIT:
as a side note please start all class names with a capital letter i.e bugChecking class should be BugChecking the same applies for test2 class it should be Test2
String objects have methods on them that allow you to do this kind of thing. The one you want in particular is String.replace. This pretty much does what you're trying to do for you.
String input = " 2.03 445 ";
input = input.replace(" ", ""); // "2.03445"
You could also use regular expressions to replace more than just spaces. For example, to get rid of everything that isn't a digit or a period:
String input = "123,232 . 03 445 ";
input = input.replaceAll("[^\\d.]", ""); // "123232.03445"
This will replace any non-digit, non-period character so that you're left with only those characters in the input. See the javadocs for Pattern to learn a bit about regular expressions, or search for one of the many tutorials available online.
Edit: One other remark, String.trim will remove all whitespace from the beginning and end of your string to turn " 2.0332" into "2.0332":
String input = " 2.0332 ";
input = input.trim(); // "2.0332"
Edit 2: With your update, I see the problem now. Scanner.nextFloat is what's breaking on the space. If you change your code to use Scanner.nextLine like so:
while (i <= 0) {
String input = in.nextLine();
input = input.replaceAll("[^\\d.]", "");
float i = Float.parseFloat(input);
if (i <= 0.0f) {
System.out.println("Please enter a number greater than 0.");
}
System.out.println(i);
}
That code will properly accept you entering things like "123,232 . 03 445". Use any of the solutions in place of my replaceAll and it will work.
Scanner.nextFloat will split your input automatically based on whitespace. Scanner can take a delimiter when you construct it (for example, new Scanner(System.in, ",./ ") will delimit on ,, ., /, and )" The default constructor, new Scanner(System.in), automatically delimits based on whitespace.
I guess you're using the first argument from you main method. If you main method looks somehow like this:
public static void main(String[] args){
System.out.println(deleteSpaces(args[0]);
}
Your problem is, that spaces separate the arguments that get handed to your main method. So running you class like this:
java MyNumberConverter 22.2 33
The first argument arg[0] is "22.2" and the second arg[1] "33"
But like other have suggested, String.replace is a better way of doing this anyway.

Categories

Resources