Can I do this - token=str.split(" "||","); - java

import java.io.*;
import java.text.DecimalFormat;
import java.text.NumberFormat;
public class TrimTest{
public static void main(String args[]) throws IOException{
String[] token = new String[0];
String opcode;
String strLine="";
String str="";
try{
// Open and read the file
FileInputStream fstream = new FileInputStream("a.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
//Read file line by line and storing data in the form of tokens
if((strLine = br.readLine()) != null){
token = strLine.split(" ");// split w.r.t spaces
token = strLine.split(" "||",") // split if there is a space or comma encountered
}
in.close();//Close the input stream
}
catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
int i;
int n = token.length;
for(i=0;i<n;i++){
System.out.println(token[i]);
}
}
}
If the input MOVE R1,R2,R3
Split with respect to space or comma and save it into and array token[]
I want output as:
MOVE
R1
R2
R3
Thanks in Advance.

Try token = strLine.split(" |,").
split uses regex as argument and or in regex is |. You can also use character class like [\\s,] which is equal to \\s|, and means \\s = any white space (like normal space, tab, new line mark) OR comma".

You want
token = strLine.split("[ ,]"); // split if there is a space or comma encountered
Square brackets denote a character class. This class contains a space and a comma and the regex will match on any character of the character class.

Change it to strLine.split(" |,"), or maybe even strLine.split("\\s+|,").

Related

Regex for replacing Exact String match [duplicate]

My input:
1. end
2. end of the day or end of the week
3. endline
4. something
5. "something" end
Based on the above discussions, If I try to replace a single string using this snippet, it removes the appropriate words from the line successfully
public class DeleteTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
String delete="end";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+delete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
}
My output If I use the above snippet:(Also my expected output)
1.
2. of the day or of the week
3. endline
4. something
5. "something"
But when I include more words to delete, and for that purpose when I use Set, I use the below code snippet:
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
Set<String> toDelete = new HashSet<>();
toDelete.add("end");
toDelete.add("something");
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+toDelete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
I get my output as: (It just removes the space)
1. end
2. endofthedayorendoftheweek
3. endline
4. something
5. "something" end
Can u guys help me on this?
Click here to follow the thread
You need to create an alternation group out of the set with
String.join("|", toDelete)
and use as
line = line.replaceAll("\\b(?:"+String.join("|", toDelete)+")\\b", "");
The pattern will look like
\b(?:end|something)\b
See the regex demo. Here, (?:...) is a non-capturing group that is used to group several alternatives without creating a memory buffer for the capture (you do not need it since you remove the matches).
Or, better, compile the regex before entering the loop:
Pattern pat = Pattern.compile("\\b(?:" + String.join("|", toDelete) + ")\\b");
...
line = pat.matcher(line).replaceAll("");
UPDATE:
To allow matching whole "words" that may contain special chars, you need to Pattern.quote those words to escape those special chars, and then you need to use unambiguous word boundaries, (?<!\w) instead of the initial \b to make sure there is no word char before and (?!\w) negative lookahead instead of the final \b to make sure there is no word char after the match.
In Java 8, you may use this code:
Set<String> nToDel = new HashSet<>();
nToDel = toDelete.stream()
.map(Pattern::quote)
.collect(Collectors.toCollection(HashSet::new));
String pattern = "(?<!\\w)(?:" + String.join("|", nToDel) + ")(?!\\w)";
The regex will look like (?<!\w)(?:\Q+end\E|\Qsomething-\E)(?!\w). Note that the symbols between \Q and \E is parsed as literal symbols.
The problem is that you're not creating the correct regex for replacing the words in the set.
"\\b"+toDelete+"\\b" will produce this String \b[end, something]\b which is not what you need.
To fix that you can do something like this:
for(String del : toDelete){
line = line.replaceAll("\\b"+del+"\\b", "");
}
What this does is to go through the set, produce a regex from each word and remove that word from the line String.
Another approach will be to produce a single regex from all the words in the set.
Eg:
String regex = "";
for(String word : toDelete){
regex+=(regex.isEmpty() ? "" : "|") + "(\\b"+word+"\\b)";
}
....
line = line.replace(regex, "");
This should produce a regex that looks something like this: (\bend\b)|(\bsomething\b)

Formatting a text file java

I am trying to format a text file. I want to delete all the new line characters except the ones that are used to start a new alinea. By that I mean if the line in the text file is whitespace I want to keep it but all the other newlines need to be deleted.
here is what I have so far:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintWriter;
import java.util.Scanner;
public class Formatting {
public static void main(String[] args) throws FileNotFoundException {
Scanner in = new Scanner(System.in);
System.out.println("give file name: ");
String filename = in.next();
File inputfile = new File(filename);
Scanner reader = new Scanner(inputfile);
String newline = System.getProperty("line.separator");
PrintWriter out = new PrintWriter("NEW " + filename);
while(reader.hasNextLine()) {
String line = reader.nextLine();
if (line.length() > 2 && line.contains(newline)) {
String replaced = line.substring(0,line.length()) + ' ';
out.print(replaced);
}
else {
out.print(line + ' ');
}
}
in.close();
out.close();
}
}
however now my first if statement never gets executed. Every newline just gets deleted.
Can anybody help me here? It would be very much appreciated.
This may help you , read comments to get idea what is the use of each line .
// 3. compress multiple newlines to single newlines
line = line.replaceAll("[\\n]+", "\n");
// 1. compress all non-newline whitespaces to single space
line = line.replaceAll("[\\s&&[^\\n]]+", " ");
// 2. remove spaces from begining or end of lines
line = line.replaceAll("(?m)^\\s|\\s$", "");

How to split a file into several tokens

I was trying to tokenize an input file from sentences into tokens(words).
For example,
"This is a test file." into five words "this" "is" "a" "test" "file", omitting the punctuations and the white spaces. And store them into an arraylist.
I tried to write some codes like this:
public static ArrayList<String> tokenizeFile(File in) throws IOException {
String strLine;
String[] tokens;
//create a new ArrayList to store tokens
ArrayList<String> tokenList = new ArrayList<String>();
if (null == in) {
return tokenList;
} else {
FileInputStream fStream = new FileInputStream(in);
DataInputStream dataIn = new DataInputStream(fStream);
BufferedReader br = new BufferedReader(new InputStreamReader(dataIn));
while (null != (strLine = br.readLine())) {
if (strLine.trim().length() != 0) {
//make sure strings are independent of capitalization and then tokenize them
strLine = strLine.toLowerCase();
//create regular expression pattern to split
//first letter to be alphabetic and the remaining characters to be alphanumeric or '
String pattern = "^[A-Za-z][A-Za-z0-9'-]*$";
tokens = strLine.split(pattern);
int tokenLen = tokens.length;
for (int i = 1; i <= tokenLen; i++) {
tokenList.add(tokens[i - 1]);
}
}
}
br.close();
dataIn.close();
}
return tokenList;
}
This code works fine except I found out that instead of make a whole file into several words(tokens), it made a whole line into a token. "area area" becomes a token, instead of "area" appeared twice. I don't see the error in my codes. I believe maybe it's something wrong with my trim().
Any valuable advices is appreciated. Thank you so much.
Maybe I should use scanner instead?? I'm confused.
I think Scanner is more approprate for this task. As to this code, you should fix regex, try "\\s+";
Try pattern as String pattern = "[^\\w]"; in the same code

UVa #494 - regex [^a-zA-z]+ to split words using Java

I was playing with UVa #494 and I managed to solve it with the code below:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
class Main {
public static void main(String[] args) throws IOException{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String line;
while((line = in.readLine()) != null){
String words[] = line.split("[^a-zA-z]+");
int cnt = words.length;
// for some reason it is counting two words for 234234ddfdfd and words[0] is empty
if(cnt != 0 && words[0].isEmpty()) cnt--; // ugly fix, if has words and the first is empty, reduce one word
System.out.println(cnt);
}
System.exit(0);
}
}
I built the regex "[^a-zA-z]+" to split the words so for example the strings abc..abc or abc432abc should be splitted as ["abc", "abc"]. However, when I try the string 432abc, I have as a result ["", "abc"] - the first element from words[] is just an empty string but I was expecting to have just ["abc"]. I can't figure out why this regex gives me the first element as "" for this case.
Check the split reference page: split reference
Each element of separator defines a separate delimiter character. If
two delimiters are adjacent, or a delimiter is found at the beginning
or end of this instance, the corresponding array element contains
Empty. The following table provides examples.
Since you have several consecutive delimiter characters, you get empty array elements
Prints the count of number of words
public static void main(String[] args) throws IOException {
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String line;
while ((line = in.readLine()) != null) {
Pattern pattern = Pattern.compile("[a-zA-z]+");
Matcher matcher = pattern.matcher(line);
int count = 0;
while (matcher.find()) {
count++;
System.out.println(matcher.group());
}
System.out.println(count);
}
}

Read multiline text with values separated by whitespaces

I have a following test file :
Jon Smith 1980-01-01
Matt Walker 1990-05-12
What is the best way to parse through each line of this file, creating object with (name, surname, birthdate) ? Of course this is just a sample, the real file has many records.
import java.io.*;
class Record
{
String first;
String last;
String date;
public Record(String first, String last, String date){
this.first = first;
this.last = last;
this.date = date;
}
public static void main(String args[]){
try{
FileInputStream fstream = new FileInputStream("textfile.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
while ((strLine = br.readLine()) != null) {
String[] tokens = strLine.split(" ");
Record record = new Record(tokens[0],tokens[1],tokens[2]);//process record , etc
}
in.close();
} catch (Exception e){
System.err.println("Error: " + e.getMessage());
}
}
}
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ScannerReadFile {
public static void main(String[] args) {
//
// Create an instance of File for data.txt file.
//
File file = new File("tsetfile.txt");
try {
//
// Create a new Scanner object which will read the data from the
// file passed in. To check if there are more line to read from it
// we check by calling the scanner.hasNextLine() method. We then
// read line one by one till all line is read.
//
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
System.out.println(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
This:
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
Could also be changed to
while (scanner.hasNext()) {
String line = scanner.next();
Which will read whitespace.
You could do
Scanner scanner = new Scanner(file).useDelimiter(",");
To do a custom delimiter
At the time of the post, now you have three different ways to do this. Here you just need to parse the data you need. You could read the the line, then split or read one by one and everything 3 would a new line or a new person.
At first glance, I would suggest the StringTokenizer would be your friend here, but having some experience doing this for real, in business applications, what you probably cannot guarantee is that the Surname is a single name (i.e. someone with a double barrelled surname, not hyphenated would cause you problems.
If you can guarantee the integrity of the data then, you code would be
BufferedReader read = new BufferedReader(new FileReader("yourfile.txt"));
String line = null;
while( (line = read.readLine()) != null) {
StringTokenizer tokens = new StringTokenizer(line);
String firstname = tokens.nextToken();
...etc etc
}
If you cannot guarantee the integrity of your data, then you would need to find the first space, and choose all characters before that as the last name, find the last space and all characters after that as the DOB, and everything inbetween is the surname.
Use a FileReader for reading characters from a file, use a BufferedReader for buffering these characters so you can read them as lines. Then you have a choice.. Personally I'd use String.split() to split on the whitespace giving you a nice String Array, you could also tokenize this string.
Of course you'd have to think about what would happen if someone has a middle name and such.
Look at BufferedReader class. It has readLine method. Then you may want to split each line with space separators to construct get each individual field.

Categories

Resources