Remove integers at the start of multiple line - java

I've a txt file having over thousand line of text that has some integers at the starting.
Like:
22Ahmedabad, AES Institute of Computer Studies
526Ahmedabad, Indian Institute of Managment
561Ahmedabad, Indus Institute of Technology & Engineering
745Ahmedabad, Lalbhai Dalpatbhai College of Engineering
I want to store all the lines in another file without the integers.
The code I've written is:
while (s.hasNextLine()){
String sentence=s.nextLine();
int l=sentence.length();
c++;
try{//printing P
FileOutputStream ffs = new FileOutputStream ("ps.txt",true);
PrintStream p = new PrintStream ( ffs );
for (int i=0;i<l;i++){
if ((int)sentence.charAt(i)<=48 && (int)sentence.charAt(i)>=57){
p.print(sentence.charAt(i));
}
}
p.close();
}
catch(Exception e){}
}
But it outputs a blank file.

There are a couple of things in your code that should be improved:
Don't re-open the output file with every line. Just keep it open the whole time.
You are removing all numbers, not just numbers at the beginning - is that your intention?
Do you know any number that is both <= 48 and >= 57 at the same time?
Scanner.nextLine() does not include line returns, so you'll need a call to p.println() after every line.
Try this:
// open the file once
FileOutputStream ffs = new FileOutputStream ("ps.txt");
PrintStream p = new PrintStream ( ffs );
while (s.hasNextLine()){
String sentence=s.nextLine();
int l=sentence.length();
c++;
try{//printing P
for (int i=0;i<l;i++){
// check "< 48 || > 57", which is non-numeric range
if ((int)sentence.charAt(i)<48 || (int)sentence.charAt(i)>57){
p.print(sentence.charAt(i));
}
}
// move to next line in output file
p.println();
}
catch(Exception e){}
}
p.close();

You can apply this regular expression to each line that you read from the file:
String str = ... // read the next line from the file
str = str.replaceAll("^[0-9]+", "");
The regular expression ^[0-9]+ matches any number of digits at the beginning of the line. replaceAll method replaces the match with an empty string.

On top of mellamokb comments, you should avoid "magic numbers". There's no guarantee that that the digits will fall within the expected range of ASCII codes.
You can simply detect if a character is a digit using Character.isDigit
String value = "22Ahmedabad, AES Institute of Computer Studies";
int index = 0;
while (Character.isDigit(value.charAt(index))) {
index++;
}
if (index < value.length()) {
System.out.println(value.substring(index));
} else {
System.out.println("Nothing but numbers here");
}
(Nb dasblinkenlight has posted some excellent regular expression, which would probably easier to use, but if you're like, regexp turns my brain inside out :P)

Related

Looping input files with the file class

I am trying to figure out how to loop a filename. I have 10 files
each file is called Myfile01.txt , Myfile02.txt, Myfile03.txt all the way to Myfile10.txt
I did something like this for the first file name , it is in java.
String bob = new String("C:\\bob\\Myfile01.txt");
File file = new File(bob);
Scanner myinput = new Scanner(file);
Each file contains around 200 lines of data which i am storing in an array and using .hasnext to target what data goes into what array. each name is seperated with a line.
for (int i=0;i<=200;i++)
{
rank[i] = input.next();
firstname[i] =input.next();
lastname[i] = input.next();
dadname[i] = input.next();
momname[i] = input.next();
}
now when i finish storing everything in the text file I am looking for a way to go to the next txt document with a loop to avoid clunkiness. I can hardcode it but it would not be good style.
Thanks for any suggestions!
A loop and String.format should give you what you need:
for (int i = 1; i <= 10; i++) {
String bob = String.format("C:\\bob\\Myfile%02d.txt", Integer.valueOf(i));
// ...
}
The format pattern %02d pads an integer with a zero given that it is less than two digits in length, as defined in the syntax for string formatting.
If you want to walk through subdirectories you may also try:
try {
Files.walk(Paths.get(directory)).filter(f -> Pattern.matches("myFile\\d{2}\\.txt", f.toFile().getName())).forEach(f -> {
System.out.println("WHAT YOU WANT TO DO WITH f");
});
} catch (IOException e) {
e.printStackTrace();
}

Parsing txt file

I have to write a program that will parse baseball player info and hits,out,walk,ect from a txt file. For example the txt file may look something like this:
Sam Slugger,h,h,o,s,w,w,h,w,o,o,o,h,s
Jill Jenks,o,o,s,h,h,o,o
Will Jones,o,o,w,h,o,o,o,o,w,o,o
I know how to parse the file and can get that code running perfect. The only problem I am having is that we should only be printing the name for each player and 3 or their plays. For example:
Sam Slugger hit,hit,out
Jill Jenks out, out, sacrifice fly
Will Jones out, out, walk
I am not sure how to limit this and every time I try to cut it off at 3 I always get the first person working fine but it breaks the loop and doesn't do anything for all the other players.
This is what I have so far:
import java.util.Scanner;
import java.io.*;
public class ReadBaseBall{
public static void main(String args[]) throws IOException{
int count=0;
String playerData;
Scanner fileScan, urlScan;
String fileName = "C:\\Users\\Crust\\Documents\\java\\TeamStats.txt";
fileScan = new Scanner(new File(fileName));
while(fileScan.hasNext()){
playerData = fileScan.nextLine();
fileScan.useDelimiter(",");
//System.out.println("Name: " + playerData);
urlScan = new Scanner(playerData);
urlScan.useDelimiter(",");
for(urlScan.hasNext(); count<4; count++)
System.out.print(" " + urlScan.next() + ",");
System.out.println();
}
}
}
This prints out:
Sam Slugger, h, h, o,
but then the other players are voided out. I need help to get the other ones printing as well.
Here, try this one using FileReader
Assuming your file content format is like this
Sam Slugger,h,h,o,s,w,w,h,w,o,o,o,h,s
Jill Johns,h,h,o,s,w,w,h,w,o,o,o,h,s
with each player in the his/her own line then this can work for you
BufferedReader reader;
try {
reader = new BufferedReader(new FileReader(new File("file.txt")));
String line = "";
while ((line = reader.readLine()) != null) {
String[] values_per_line = line.split(",");
System.out.println("Name:" + values_per_line[0] + " "
+ values_per_line[1] + " " + values_per_line[2] + " "
+ values_per_line[3]);
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
otherwise if they are lined all in like one line which would not make sense then modify this sample.
Sam Slugger,h,h,o,s,w,w,h,w,o,o,o,h,s| John Slugger,h,h,o,s,w,w,h,w,o,o,o,h,s
BufferedReader reader;
try {
reader = new BufferedReader(new FileReader(new File("file.txt")));
String line = "";
while ((line = reader.readLine()) != null) {
// token identifier is a space
String[] data = line.trim().split("|");
for (int i = 0; i < data.length; i++)
System.out.println("Name:" + data[0].split(",")[0] + " "
+ data[1].split(",")[1] + " "
+ data[2].split(",")[2] + " "
+ data[3].split(",")[3]);
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
You need to reset your count car in the while loop:
while(fileScan.hasNext()){
count = 0;
...
}
First Problem
Change while(fileScan.hasNext())) to while(fileScan.hasNextLine()). Not a breaking problem but when using scanner you usually put sc.* right after a sc.has*.
Second Problem
Remove the line fileScan.useDelimiter(","). This line doesn't do anything in this case but replaces the default delimiter so the scanner no longer splits on whitespace. Which doesn't matter when using Scanner.nextLine, but can have some nasty side effects later on.
Third Problem
Change this line for(urlScan.hasNext(); count<4; count++) to while(urlScan.hasNext()). Honestly I'm surprised that line even compiled and if it did it only read the first 4 from the scanner.
If you want to limit the amount processed for each line you can replace it with
for( int count = 0; count < limit && urlScan.hasNext( ); count++ )
This will limit the amount read to limit while still handling lines that have less data than the limit.
Make sure that each of your data sets is separated by a line otherwise the output might not make much sense.
You shouldn't have multiple scanners on this - assuming the format you posted in your question you can use regular expressions to do this.
This demonstrates a regular expression to match a player and to use as a delimiter for the scanner. I fed the scanner in my example a string, but the technique is the same regardless of source.
int count = 0;
Pattern playerPattern = Pattern.compile("\\w+\\s\\w+(?:,\\w){1,3}");
Scanner fileScan = new Scanner("Sam Slugger,h,h,o,s,w,w,h,w,o,o,o,h,s Jill Jenks,o,o,s,h,h,o,o Will Jones,o,o,w,h,o,o,o,o,w,o,o");
fileScan.useDelimiter("(?<=,\\w)\\s");
while (fileScan.hasNext()){
String player = fileScan.next();
Matcher m = playerPattern.matcher(player);
if (m.find()) {
player = m.group(0);
} else {
throw new InputMismatchException("Players data not in expected format on string: " + player);
}
System.out.println(player);
count++;
}
System.out.printf("%d players found.", count);
Output:
Sam Slugger,h,h,o
Jill Jenks,o,o,s
Will Jones,o,o,w
The call to Scanner.delimiter() sets the delimiter to use for retrieving tokens. The regex (?<=,\\w)\\s:
(?< // positive lookbehind
,\w // literal comma, word character
)
\s // whitespace character
Which delimits the players by the space between their entries without matching anything but that space, and fails to match the space between the names.
The regular expression used to extract up to 3 plays per player is \\w+\\s\\w+(?:,\\w){1,3}:
\w+ // matches one to unlimited word characters
(?: // begin non-capturing group
,\w // literal comma, word character
){1,3} // match non-capturing group 1 - 3 times

Eliminating unnecessary space on a String array using Java?

I have a String array in Java (Using the NetBeans IDE) containing a small text. When inserting the lines of text into the array I ended up with too much unnecessary space and characters which I would like to get rid of. Here is an example:
1 experimental investigation of the aerodynamics of a wing in a slipstream . an experimental study of a wing [...] flow theory . an empirical [...] .
2 small viscosity . in the study of high-speed [...] vorticity . the discussion here is restricted to two-dimensional incompressible steady flow .
As you can see, in some cases I end up with 3 spaces between a period and the next word. How do I get rid of the extra space and characters such as periods, commas, etc?
Edit: Here is the process.
-Inserting the text on x position within the String array:
try{
coleccion = new File (File location);
fr = new FileReader (coleccion);
br = new BufferedReader(fr);
String numDoc = " ";
int pos = 0;
while((numDoc=br.readLine())!=null){
if(numDoc.contains(".W")){
while((numDoc=br.readLine())!= null && !numDoc.contains(".I")){
if(Text[pos] != null) {
Text[pos] = Texto[pos] + " " + numDoc;s
}
else {
Text[pos] = numDoc;
}
}
pos++;
}
}
}
catch(Exception e){
e.printStackTrace();
}
-Printing the array (1400 positions):
for(int i=0; i<=1399; i++){
//System.out.println(ID[i]);
System.out.println(i+1 + " " + Texto[i]);
}
-Extra info on the initial problem:
.txt file to arrays using Java
So i guess that by unnecessary space you mean consecutive spaces, if that's the case then this is what you want : https://stackoverflow.com/a/2932439/4088809
replaceAll("^ +| +$|( )+", "$1")
that's what you're looking for, just apply it to the entire line you get with the readLine, that should do the trick.

Loading a file not working

Why doesn't String m get the string that is contained in the saved file?
It is the only thing in the saved file. I need it because it is a date string which I would then split into three integers: day, month and year.
public void load(){
try{
Scanner fileReader = new Scanner(new File("SimpleDateSave"));
String m = fileReader.next();
fileReader.close();
}
catch(FileNotFoundException error){
System.out.println("File not found");
}
}
Use m.nextLine() instead of m.next(). next() only pulls in a word at a time.
You put to m only first next part from your file.
Accord to your code:
String m = fileReader.next();
If your file contains white space at lines. For example as here - 13 02 1988.
It takes only first number => m = 13 And rest will be omitted.
You have few solution to solve this:
use while() loop with condition:
while (fileReader.hasNext()) {
// do smt with fileReader
}
or use nextLine() instead of next().
It takes all line before the end of line(this probably depends from your OS). You can read at Wiki page more about it.

How to take first word of new paragraph into consideration?

I'm trying to build a program that takes in files and outputs the number of words in the file. It works perfectly when everything is under one whole paragraph. However, when there are multiple paragraphs, it doesn't take into account the first word of the new paragraph. For example, if a file reads "My name is John" , the program will output "4 words". However, if a file read"My Name Is John" with each word being a new paragraph, the program will output "1 word". I know it must be something about my if statement, but I assumed that there are spaces before the new paragraph that would take the first word in a new paragraph into account.
Here is my code in general:
import java.io.*;
public class HelloWorld
{
public static void main(String[]args)
{
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("health.txt");
// Use DataInputStream to read binary NOT text.
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
int word2 =0;
int word3 =0;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
;
int wordLength = strLine.length();
System.out.println(strLine);
for(int i = 0 ; i < wordLength -1 ; i++)
{
Character a = strLine.charAt(i);
Character b= strLine.charAt(i + 1);
**if(a == ' ' && b != '.' &&b != '?' && b != '!' && b != ' ' )**
{
word2++;
//doesnt take into account 1st character of new paragraph
}
}
word3 = word2 + 1;
}
System.out.println("There are " + word3 + " "
+ "words in your file.");
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
I've tried adjusting the if statement multiple teams, but it does not seem to make a difference. Does anyone know where I'm messing up?
I'm a pretty new user and asked a similar question a couple days back with people accusing me of demanding too much of users, so hopefully this narrows my question a bit. I just am really confused on why its not taking into account the first word of a new paragraph. Please let me know if you need any more information. Thanks!!
Firstly, your counting logic is incorrect. Consider:
word3 = word2 + 1;
Think about what this does. Every time through your loop, when you read a line, you essentially count the words in that line, then reset the total count to word2 + 1. Hint: If you want to count the total number in the file, you'd want to increment word3 each time, rather than replace it with the current line's word count.
Secondly, your word parsing logic is slightly off. Consider the case of a blank line. You would see no words in it, but you treat the word count in the line as word2 + 1, which means you are incorrectly counting a blank line as 1 word. Hint: If the very first character on the line is a letter, then the line starts with a word.
Your approach is reasonable although your implementation is slightly flawed. As an alternate option, you may want to consider String.split() on each line. The number of elements in the resulting array is the number of words on the line.
By the way, you can increase readability of your code, and make debugging easier, if you use meaningful names for your variables (e.g. totalWords instead of word3).
if your paragraph is not started by whitespace, then your if condition won't count the first word.
"My name is John" , the program will output "4 words", this is not correct, because you miss the first word but add one after.
Try this:
String strLine;
strLine = strLine.trime();//remove leading and trailing whitespace
String[] words = strLine.split(" ");
int numOfWords = words.length;
I personally prefer a regular Scanner with token-based scanning for this sort of thing. How about something like this:
int words = 0;
Scanner lineScan = new Scanner(new File("fileName.txt"));
while (lineScan.hasNext()) {
Scanner tokenScan = new Scanner(lineScan.Next());
while (tokenScan.hasNext()) {
tokenScan.Next();
words++;
}
}
This iterates through every line in the file. And for every line in the file, it iterates through every token (in this case words) and increments the word count.
I am not sure what you mean by "paragraph", however I tried to use capital letters as you suggested and it worked perfectly fine. I used Appache Commons IO library
package Project1;
import java.io.*;
import org.apache.commons.io.*;
public class HelloWorld
{
private static String fileStr = "";
private static String[] tokens;
public static void main(String[]args)
{
try{
// Open the file that is the first
// command line parameter
try {
File f = new File("c:\\TestFile\\test.txt");
fileStr = FileUtils.readFileToString(f);
tokens = fileStr.split(" ");
System.out.println("Words in file : " + tokens.length);
}
catch(Exception ex){
System.out.println(ex);
}
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}

Categories

Resources