I have to write a program that will parse baseball player info and hits,out,walk,ect from a txt file. For example the txt file may look something like this:
Sam Slugger,h,h,o,s,w,w,h,w,o,o,o,h,s
Jill Jenks,o,o,s,h,h,o,o
Will Jones,o,o,w,h,o,o,o,o,w,o,o
I know how to parse the file and can get that code running perfect. The only problem I am having is that we should only be printing the name for each player and 3 or their plays. For example:
Sam Slugger hit,hit,out
Jill Jenks out, out, sacrifice fly
Will Jones out, out, walk
I am not sure how to limit this and every time I try to cut it off at 3 I always get the first person working fine but it breaks the loop and doesn't do anything for all the other players.
This is what I have so far:
import java.util.Scanner;
import java.io.*;
public class ReadBaseBall{
public static void main(String args[]) throws IOException{
int count=0;
String playerData;
Scanner fileScan, urlScan;
String fileName = "C:\\Users\\Crust\\Documents\\java\\TeamStats.txt";
fileScan = new Scanner(new File(fileName));
while(fileScan.hasNext()){
playerData = fileScan.nextLine();
fileScan.useDelimiter(",");
//System.out.println("Name: " + playerData);
urlScan = new Scanner(playerData);
urlScan.useDelimiter(",");
for(urlScan.hasNext(); count<4; count++)
System.out.print(" " + urlScan.next() + ",");
System.out.println();
}
}
}
This prints out:
Sam Slugger, h, h, o,
but then the other players are voided out. I need help to get the other ones printing as well.
Here, try this one using FileReader
Assuming your file content format is like this
Sam Slugger,h,h,o,s,w,w,h,w,o,o,o,h,s
Jill Johns,h,h,o,s,w,w,h,w,o,o,o,h,s
with each player in the his/her own line then this can work for you
BufferedReader reader;
try {
reader = new BufferedReader(new FileReader(new File("file.txt")));
String line = "";
while ((line = reader.readLine()) != null) {
String[] values_per_line = line.split(",");
System.out.println("Name:" + values_per_line[0] + " "
+ values_per_line[1] + " " + values_per_line[2] + " "
+ values_per_line[3]);
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
otherwise if they are lined all in like one line which would not make sense then modify this sample.
Sam Slugger,h,h,o,s,w,w,h,w,o,o,o,h,s| John Slugger,h,h,o,s,w,w,h,w,o,o,o,h,s
BufferedReader reader;
try {
reader = new BufferedReader(new FileReader(new File("file.txt")));
String line = "";
while ((line = reader.readLine()) != null) {
// token identifier is a space
String[] data = line.trim().split("|");
for (int i = 0; i < data.length; i++)
System.out.println("Name:" + data[0].split(",")[0] + " "
+ data[1].split(",")[1] + " "
+ data[2].split(",")[2] + " "
+ data[3].split(",")[3]);
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
You need to reset your count car in the while loop:
while(fileScan.hasNext()){
count = 0;
...
}
First Problem
Change while(fileScan.hasNext())) to while(fileScan.hasNextLine()). Not a breaking problem but when using scanner you usually put sc.* right after a sc.has*.
Second Problem
Remove the line fileScan.useDelimiter(","). This line doesn't do anything in this case but replaces the default delimiter so the scanner no longer splits on whitespace. Which doesn't matter when using Scanner.nextLine, but can have some nasty side effects later on.
Third Problem
Change this line for(urlScan.hasNext(); count<4; count++) to while(urlScan.hasNext()). Honestly I'm surprised that line even compiled and if it did it only read the first 4 from the scanner.
If you want to limit the amount processed for each line you can replace it with
for( int count = 0; count < limit && urlScan.hasNext( ); count++ )
This will limit the amount read to limit while still handling lines that have less data than the limit.
Make sure that each of your data sets is separated by a line otherwise the output might not make much sense.
You shouldn't have multiple scanners on this - assuming the format you posted in your question you can use regular expressions to do this.
This demonstrates a regular expression to match a player and to use as a delimiter for the scanner. I fed the scanner in my example a string, but the technique is the same regardless of source.
int count = 0;
Pattern playerPattern = Pattern.compile("\\w+\\s\\w+(?:,\\w){1,3}");
Scanner fileScan = new Scanner("Sam Slugger,h,h,o,s,w,w,h,w,o,o,o,h,s Jill Jenks,o,o,s,h,h,o,o Will Jones,o,o,w,h,o,o,o,o,w,o,o");
fileScan.useDelimiter("(?<=,\\w)\\s");
while (fileScan.hasNext()){
String player = fileScan.next();
Matcher m = playerPattern.matcher(player);
if (m.find()) {
player = m.group(0);
} else {
throw new InputMismatchException("Players data not in expected format on string: " + player);
}
System.out.println(player);
count++;
}
System.out.printf("%d players found.", count);
Output:
Sam Slugger,h,h,o
Jill Jenks,o,o,s
Will Jones,o,o,w
The call to Scanner.delimiter() sets the delimiter to use for retrieving tokens. The regex (?<=,\\w)\\s:
(?< // positive lookbehind
,\w // literal comma, word character
)
\s // whitespace character
Which delimits the players by the space between their entries without matching anything but that space, and fails to match the space between the names.
The regular expression used to extract up to 3 plays per player is \\w+\\s\\w+(?:,\\w){1,3}:
\w+ // matches one to unlimited word characters
(?: // begin non-capturing group
,\w // literal comma, word character
){1,3} // match non-capturing group 1 - 3 times
Related
I have a scenario at which i have to parse CSV files from different sources, the parsing code is very simple and straightforward.
String csvFile = "/Users/csv/country.csv";
String line = "";
String cvsSplitBy = ",";
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
// use comma as separator
String[] country = line.split(cvsSplitBy);
System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]");
}
} catch (IOException e) {
e.printStackTrace();
}
my problem come from the CSV delimiter character, i have many different formats, some time it is a , sometimes it is a ;
is there is any way to determine the delimiter character before parsing the file
univocity-parsers supports automatic detection of the delimiter (also line endings and quotes). Just use it instead of fighting with your code:
CsvParserSettings settings = new CsvParserSettings();
settings.detectFormatAutomatically();
CsvParser parser = new CsvParser(settings);
List<String[]> rows = parser.parseAll(new File("/path/to/your.csv"));
// if you want to see what it detected
CsvFormat format = parser.getDetectedFormat();
Disclaimer: I'm the author of this library and I made sure all sorts of corner cases are covered. It's open source and free (Apache 2.0 license)
Hope this helps.
Yes, but only if the delimiter characters are not allowed to exist as regular text
The most simple answer is to have a list with all the available delimiter characters and try to identify which character is being used. Even though, you have to place some limitations on the files or the person/people that created them. Look a the following two scenarios:
Case 1 - Contents of file.csv
test,test2,test3
Case 2 - Contents of file.csv
test1|test2,3|test4
If you have prior knowledge of the delimiter characters, then you would split the first string using , and the second one using |, getting the same result. But, if you try to identify the delimiter by parsing the file, both strings can be split using the , character, and you would end up with this:
Case 1 - Result of split using ,
test1
test2
test3
Case 2 - Result of split using ,
test1|test2
3|test4
By lacking the prior knowledge of which delimiter character is being used, you cannot create a "magical" algorithm that will parse every combination of text; even regular expressions or counting the number of appearance of a character will not save you.
Worst case
test1,2|test3,4|test5
By looking the text, one can tokenize it by using | as the delimiter. But the frequency of appearance of both , and | are the same. So, from an algorithm's perspective, both results are accurate:
Correct result
test1,2
test3,4
test5
Wrong result
test1
2|test3
4|test5
If you pose a set of guidelines or you can somehow control the generation of the CSV files, then you could just try to find the delimiter used with String.contains() method, employing the aforementioned list of characters. For example:
public class MyClass {
private List<String> delimiterList = new ArrayList<>(){{
add(",");
add(";");
add("\t");
// etc...
}};
private static String determineDelimiter(String text) {
for (String delimiter : delimiterList) {
if(text.contains(delimiter)) {
return delimiter;
}
}
return "";
}
public static void main(String[] args) {
String csvFile = "/Users/csv/country.csv";
String line = "";
String cvsSplitBy = ",";
String delimiter = "";
boolean firstLine = true;
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
if(firstLine) {
delimiter = determineDelimiter(line);
if(delimiter.equalsIgnoreCase("")) {
System.out.println("Unsupported delimiter found: " + delimiter);
return;
}
firstLine = false;
}
// use comma as separator
String[] country = line.split(delimiter);
System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]");
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Update
For a more optimized way, in determineDelimiter() method instead of the for-each loop, you can employ regular expressions.
If the delimiter can appear in a data column, then you are asking for the impossible. For example, consider this first line of a CSV file:
one,two:three
This could be either a comma-separated or a colon-separated file. You can't tell which type it is.
If you can guarantee that the first line has all its columns surrounded by quotes, for example if it's always this format:
"one","two","three"
then you may be able to use this logic (although it's not 100% bullet-proof):
if (line.contains("\",\""))
delimiter = ',';
else if (line.contains("\";\""))
delimiter = ';';
If you can't guarantee a restricted format like that, then it would be better to pass the delimiter character as a parameter.
Then you can read the file using a widely-known open-source CSV parser such as Apache Commons CSV.
While I agree with Lefteris008 that it is not possible to have the function that correctly determine all the cases, we can have a function that is both efficient and give mostly correct result in practice.
def head(filename: str, n: int):
try:
with open(filename) as f:
head_lines = [next(f).rstrip() for x in range(n)]
except StopIteration:
with open(filename) as f:
head_lines = f.read().splitlines()
return head_lines
def detect_delimiter(filename: str, n=2):
sample_lines = head(filename, n)
common_delimiters= [',',';','\t',' ','|',':']
for d in common_delimiters:
ref = sample_lines[0].count(d)
if ref > 0:
if all([ ref == sample_lines[i].count(d) for i in range(1,n)]):
return d
return ','
My efficient implementation is based on
Prior knowledge such as list of common delimiter you often work with ',;\t |:' , or even the likely hood of the delimiter to be used so that I often put the regular ',' on the top of the list
The frequency of the delimiter appear in each line of the text file are equal. This is to resolve the problem that if we read a single line and see the frequency to be equal (false detection as Lefteris008) or even the right delimiter to appear less frequent as the wrong one in the first line
The efficient implementation of a head function that read only first n lines from the file
As you increase the number of test sample n, the likely hood that you get a false answer reduce drastically. I often found n=2 to be adequate
Add a condition like this,
String [] country;
if(line.contains(",")
country = line.split(",");
else if(line.contains(";"))
country=line.split(";");
That depends....
If your datasets are always the same length and/or the separator NEVER occurs in your datacolumns, you could just read the first line of the file, look at it for the longed for separator, set it and then read the rest of the file using that separator.
Something like
String csvFile = "/Users/csv/country.csv";
String line = "";
String cvsSplitBy = ",";
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
// use comma as separator
if (line.contains(",")) {
cvsSplitBy = ",";
} else if (line.contains(";")) {
cvsSplitBy = ";";
} else {
System.out.println("Wrong separator!");
}
String[] country = line.split(cvsSplitBy);
System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]");
}
} catch (IOException e) {
e.printStackTrace();
}
Greetz Kai
I've a huge text file, I'd like to search for specific words and print three or more then this number OF THE WORDS AFTER IT so far I have done this
public static void main(String[] args) {
String fileName = "C:\\Users\\Mishari\\Desktop\\Mesh.txt";
String line = null;
try {
FileReader fileReader =
new FileReader(fileName);
BufferedReader bufferedReader =
new BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
bufferedReader.close();
} catch(FileNotFoundException ex) {
System.out.println(
"Unable to open file '" +
fileName + "'");
} catch(IOException ex) {
System.out.println(
"Error reading file '"
+ fileName + "'");
}
}
It's only for printing the file can you advise me what's the best way of doing it.
You can look for the index of word in line using this method.
int index = line.indexOf(word);
If the index is -1 then that word does not exist.
If it exist than takes the substring of line starting from that index till the end of line.
String nextWords = line.substring(index);
Now use String[] temp = nextWords.split(" ") to get all the words in that substring.
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
if (line.contains("YOUR_SPECIFIC_WORDS")) { //do what you need here }
}
By the sounds of it what you appear to be looking for is a basic Find & Replace All mechanism for each file line that is read in from file. In other words, if the current file line that is read happens to contain the Word or phrase you would like to add words after then replace that found word with the very same word plus the other words you want to add. In a sense it would be something like this:
String line = "This is a file line.";
String find = "file"; // word to find in line
String replaceWith = "file (plus this stuff)"; // the phrase to change the found word to.
line = line.replace(find, replaceWith); // Replace any found words
System.out.println(line);
The console output would be:
This is a file (plus this stuff) line.
The main thing here though is that you only want to deal with actual words and not the same phrase within another word, for example the word "and" and the word "sand". You can clearly see that the characters that make up the word 'and' is also located in the word 'sand' and therefore it too would be changed with the above example code. The String.contains() method also locates strings this way. In most cases this is undesirable if you want to specifically deal with whole words only so a simple solution would be to use a Regular Expression (RegEx) with the String.replaceAll() method. Using your own code it would look something like this:
String fileName = "C:\\Users\\Mishari\\Desktop\\Mesh.txt";
String findPhrase = "and"; //Word or phrase to find and replace
String replaceWith = findPhrase + " (adding this)"; // The text used for the replacement.
boolean ignoreLetterCase = false; // Change to true to ignore letter case
String line = "";
try {
FileReader fileReader = new FileReader(fileName);
BufferedReader bufferedReader = new BufferedReader(fileReader);
while ((line = bufferedReader.readLine()) != null) {
if (ignoreLetterCase) {
line = line.toLowerCase();
findPhrase = findPhrase.toLowerCase();
}
if (line.contains(findPhrase)) {
line = line.replaceAll("\\b(" + findPhrase + ")\\b", replaceWith);
}
System.out.println(line);
}
bufferedReader.close();
} catch (FileNotFoundException ex) {
System.out.println("Unable to open file: '" + fileName + "'");
} catch (IOException ex) {
System.out.println("Error reading file: '" + fileName + "'");
}
You will of course notice the escaped \b word boundary Meta Characters within the regular expression used in the String.replaceAll() method specifically in the line:
line = line.replaceAll("\\b(" + findPhrase + ")\\b", replaceWith);
This allows us to deal with whole words only.
I am a newbie to programming in Java. I want to split the paragraphs in one file into sentences and write them in a different file. Also there should be mechanism to identify which sentence comes from which paragraph.The code I have used so far is mentioned below. But this code breaks:
Former Secretary of Finance Dr. P.B. Jayasundera is being questioned by the police Financial Crime Investigation Division.
into
Former Secretary of Finance Dr.
P.B.
Jayasundera is being questioned by the police Financial Crime Investigation Division.
How can I correct it? Thanks in advance.
import java.io.*;
class trial4{
public static void main(String args[]) throws IOException
{
FileReader fr = new FileReader("input.txt");
BufferedReader br = new BufferedReader(fr);
String s;
OutputStream out = new FileOutputStream("output10.txt");
String token[];
while((s = br.readLine()) != null)
{
token = s.split("(?<=[.!?])\\s* ");
for(int i=0;i<token.length;i++)
{
byte buf[]=token[i].getBytes();
for(int j=0;j<buf.length;j=j+1)
{
out.write(buf[j]);
if(j==buf.length-1)
out.write('\n');
}
}
}
fr.close();
}
}
I referenced all the similar questions posted on StackOverFlow. But those answers couldn't help me solve this.
How about using a negative-lookbehind in conjunction with a replace. Simply said: Replace all line endings that don't have "something special" before them with the line end followed by newline.
A list of "known abbreviations" will be needed. There's no guarantee as to how long those can be or how short a word at the end of a line might be. (See? 'be' if quite short already!)
class trial4{
public static void main(String args[]) throws IOException {
FileReader fr = new FileReader("input.txt");
BufferedReader br = new BufferedReader(fr);
PrintStream out = new PrintStream(new FileOutputStream("output10.txt"));
String s = br.readLine();
while(s != null) {
out.print( //Prints newline after each line in any case
s.replaceAll("(?i)" //Make the match case insensitive
+ "(?<!" //Negative lookbehind
+ "(\\W\\w)|" //Single non-word followed by word character (P.B.)
+ "(\\W\\d{1,2})|" //one or two digits (dates!)
+ "(\\W(dr|mr|mrs|ms))" //List of known abbreviations
+ ")" //End of lookbehind
+"([!?\\.])" //Match end-ofsentence
, "$5" //Replace with end-of-sentence found
+System.lineSeparator())); //Add newline if found
s = br.readLine();
}
}
}
As mentioned in the comment "it will be reasonable hard" to break text into paragraphs without formalizing the requirements. Take a look at BreakIterator - especially SentenceInstance. You might roll out your own BreakIterator since it breaks the same as you get with regexp, except that it is more abstract. Or try to find a 3rd party solution like http://deeplearning4j.org/sentenceiterator.html which can be trained to tokenize your input.
Example with BreakIterator:
String str = "Former Secretary of Finance Dr. P.B. Jayasundera is being questioned by the police Financial Crime Investigation Division.";
BreakIterator bilus = BreakIterator.getSentenceInstance(Locale.US);
bilus.setText(str);
int last = bilus.first();
int count = 0;
while (BreakIterator.DONE != last)
{
int first = last;
last = bilus.next();
if (BreakIterator.DONE != last)
{
String sentence = str.substring(first, last);
System.out.println("Sentence:" + sentence);
count++;
}
}
System.out.println("" + count + " sentences found.");
i have a question. I have a text file with some names and numbers arranged like this :
Cheese;10;12
Borat;99;55
I want to read the chars and integers from the file until the ";" symbol, println them, then continue, read the next one, println etc. Like this :
Cheese -> println , 10-> println, 99 -> println , and on to the next line and continue.
I tried using :
BufferedReader flux_in = new BufferedReader (
new InputStreamReader (
new FileInputStream ("D:\\test.txt")));
while ((line = flux_in.readLine())!=null &&
line.contains(terminator)==true)
{
text = line;
System.out.println(String.valueOf(text));
}
But it reads the entire line, doesn`t stop at the ";" symbol. Setting the 'contains' condition to false does not read the line at all.
EDIT : Partially solved, i managed to write this code :
StringBuilder sb = new StringBuilder();
// while ((line = flux_in.readLine())!=null)
int c;
String terminator_char = ";";
while((c = flux_in.read()) != -1) {
{
char character = (char) c;
if (String.valueOf(character).contains(terminator_char)==false)
{
// System.out.println(String.valueOf(character) + " : Char");
sb.append(character);
}
else
{
continue;
}
}
}
System.out.println(String.valueOf(sb) );
Which returns a new string formed out of the characters from the read one, but without the ";". Still need a way to make it stop on the first ";", println the string and continue.
This simple code does the trick, thanks to Stefan Vasilica for the ideea :
Scanner scan = new Scanner(new File("D:\\testfile.txt"));
// Printing the delimiter used
scan.useDelimiter(";");
System.out.println("Delimiter:" + scan.delimiter());
// Printing the tokenized Strings
while (scan.hasNext()) {
System.out.println(scan.next());
}
// closing the scanner stream
scan.close();
Read the characters from file 1 by 1
Delete the 'contains' condition
Use a stringBuilder() to build yourself the strings 1 by 1
Each stringBuilder stops when facing a ';' (say you use an if clause)
I didn't test it because I'm on my phone. Hope this helps
I'm trying to build a program that takes in files and outputs the number of words in the file. It works perfectly when everything is under one whole paragraph. However, when there are multiple paragraphs, it doesn't take into account the first word of the new paragraph. For example, if a file reads "My name is John" , the program will output "4 words". However, if a file read"My Name Is John" with each word being a new paragraph, the program will output "1 word". I know it must be something about my if statement, but I assumed that there are spaces before the new paragraph that would take the first word in a new paragraph into account.
Here is my code in general:
import java.io.*;
public class HelloWorld
{
public static void main(String[]args)
{
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("health.txt");
// Use DataInputStream to read binary NOT text.
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
int word2 =0;
int word3 =0;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
;
int wordLength = strLine.length();
System.out.println(strLine);
for(int i = 0 ; i < wordLength -1 ; i++)
{
Character a = strLine.charAt(i);
Character b= strLine.charAt(i + 1);
**if(a == ' ' && b != '.' &&b != '?' && b != '!' && b != ' ' )**
{
word2++;
//doesnt take into account 1st character of new paragraph
}
}
word3 = word2 + 1;
}
System.out.println("There are " + word3 + " "
+ "words in your file.");
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
I've tried adjusting the if statement multiple teams, but it does not seem to make a difference. Does anyone know where I'm messing up?
I'm a pretty new user and asked a similar question a couple days back with people accusing me of demanding too much of users, so hopefully this narrows my question a bit. I just am really confused on why its not taking into account the first word of a new paragraph. Please let me know if you need any more information. Thanks!!
Firstly, your counting logic is incorrect. Consider:
word3 = word2 + 1;
Think about what this does. Every time through your loop, when you read a line, you essentially count the words in that line, then reset the total count to word2 + 1. Hint: If you want to count the total number in the file, you'd want to increment word3 each time, rather than replace it with the current line's word count.
Secondly, your word parsing logic is slightly off. Consider the case of a blank line. You would see no words in it, but you treat the word count in the line as word2 + 1, which means you are incorrectly counting a blank line as 1 word. Hint: If the very first character on the line is a letter, then the line starts with a word.
Your approach is reasonable although your implementation is slightly flawed. As an alternate option, you may want to consider String.split() on each line. The number of elements in the resulting array is the number of words on the line.
By the way, you can increase readability of your code, and make debugging easier, if you use meaningful names for your variables (e.g. totalWords instead of word3).
if your paragraph is not started by whitespace, then your if condition won't count the first word.
"My name is John" , the program will output "4 words", this is not correct, because you miss the first word but add one after.
Try this:
String strLine;
strLine = strLine.trime();//remove leading and trailing whitespace
String[] words = strLine.split(" ");
int numOfWords = words.length;
I personally prefer a regular Scanner with token-based scanning for this sort of thing. How about something like this:
int words = 0;
Scanner lineScan = new Scanner(new File("fileName.txt"));
while (lineScan.hasNext()) {
Scanner tokenScan = new Scanner(lineScan.Next());
while (tokenScan.hasNext()) {
tokenScan.Next();
words++;
}
}
This iterates through every line in the file. And for every line in the file, it iterates through every token (in this case words) and increments the word count.
I am not sure what you mean by "paragraph", however I tried to use capital letters as you suggested and it worked perfectly fine. I used Appache Commons IO library
package Project1;
import java.io.*;
import org.apache.commons.io.*;
public class HelloWorld
{
private static String fileStr = "";
private static String[] tokens;
public static void main(String[]args)
{
try{
// Open the file that is the first
// command line parameter
try {
File f = new File("c:\\TestFile\\test.txt");
fileStr = FileUtils.readFileToString(f);
tokens = fileStr.split(" ");
System.out.println("Words in file : " + tokens.length);
}
catch(Exception ex){
System.out.println(ex);
}
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}