Java - String splitting - java

I read a txt with data in the following format: Name Address Hobbies
Example(Bob Smith ABC Street Swimming)
and Assigned it into String z
Then I used z.split to separate each field using " " as the delimiter(space) but it separated Bob Smith into two different strings while it should be as one field, same with the address. Is there a method I can use to get it in the particular format I want?
P.S Apologies if I explained it vaguely, English isn't my first language.
String z;
try {
BufferedReader br = new BufferedReader(new FileReader("desc.txt"));
z = br.readLine();
} catch(IOException io) {
io.printStackTrace();
}
String[] temp = z.split(" ");

If the format of name and address parts is fixed to consist of two parts, you could just join them:
String z = ""; // z must be initialized
// use try-with-resources to ensure the reader is closed properly
try (BufferedReader br = new BufferedReader(new FileReader("desc.txt"))) {
z = br.readLine();
} catch(IOException io) {
io.printStackTrace();
}
String[] temp = z.split(" ");
String name = String.join(" ", temp[0], temp[1]);
String address = String.join(" ", temp[2], temp[3]);
String hobby = temp[4];
Another option could be to create a format string as a regular expression and use it to parse the input line using named groups (?<group_name>capturing text):
// use named groups to define parts of the line
Pattern format = Pattern.compile("(?<name>\\w+\\s\\w+)\\s(?<address>\\w+\\s\\w+)\\s(?<hobby>\\w+)");
Matcher match = format.matcher(z);
if (match.matches()) {
String name = match.group("name");
String address = match.group("address");
String hobby = match.group("hobby");
System.out.printf("Input line matched: name=%s address=%s hobby=%s%n", name, address, hobby);
} else {
System.out.println("Input line not matching: " + z);
}

I can think of three solutions.
In order from best to worst:
Different delimiter
Enforce the format to always have two names, two address parts and one hobby
Have a dictionary with names and hobbies, check each word to determine which type it is and then group them together as needed.
(The 3rd option is not meant as a serious alternative.)

As others have mentioned, using spaces as both field delimiter and inside fields is problematic. You could use a regex pattern to split the line (paste (\w+ \w+) (\w+ \w+) (.+) in Regex101 for an explanation):
Pattern pattern = Pattern.compile("(\\w+ \\w+) (\\w+ \\w+) (.+)");
Matcher matcher = pattern.matcher("Bob Smith ABC Street Bowling Fishing Rollerblading");
System.out.println("matcher.matches() = " + matcher.matches());
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println("matcher.group(" + i + ") = " + matcher.group(i));
}
This would give the following output:
matcher.matches() = true
matcher.group(0) = Bob Smith ABC Street Bowling Fishing Rollerblading
matcher.group(1) = Bob Smith
matcher.group(2) = ABC Street
matcher.group(3) = Bowling Fishing Rollerblading
However this only works for this exact format. If you get a line with three name parts for example:
John B Smith ABC Street Swimming
This will get split into John B as the name, Smith ABC as the address and Street Swimming as hobbies.
So either make 100% sure your input will always match this format or use a different delimiter.

The split() method majorly works on the 2 things:
Delimiter and
The String Object
Sometimes on limit too.
Whatever limit you will provide, the split() method will do its work according to that.
It doesn't understand whether the left substring is a name or not, same as for the right substring.
Have a look at this code snippet:
String assets = "Gold:Stocks:Fixed Income:Commodity:Interest Rates";
String[] splits = assets.split(":");
System.out.println("splits.size: " + splits.length);
for(String asset: splits){
System.out.println(assets);
}
OutPut
splits.size: 5
Gold
Stocks
Fixed Income // with space
Commodity
Interest Rates // with space
The output came with spaces because I provided the ; as a delimiter.
This probably helped you to get your answer.
Find Detailed Information on Split():
Top 5 Use cases of Split()
Java Docs : Split()

It depends on the data you're dealing with. Will the name always consist of a first and last name? Then you can simply combine the first two elements from the resulting array into a new string.
Otherwise, you might have to find a different way to separate out the different pieces within the txt file. Possibly a comma? Some character that you know won't ever be used in your normal data.

Assuming that every line follows the format
Bob Smith ABC Street Swimming
ie, name surname.... this code can manually manipulate the data for you:
String[] temp = z.split(" ");
String[] temp2 = new String[temp.length - 1];
temp2[0] = temp[0] + " " + temp[1];
for (int i = 2; i < temp.length; i++) {
temp2[i] = temp2[i];
}
temp = temp2;

Related

How can I get non-matching groups using a Matcher in Java?

I'm trying to write a java regex to catch some groups of words from a String using a Matcher.
Say i got this string: "Hello, we are #happy# to see you today".
I would like to get 2 group of matches, one having
Hello, we are
to see you today
and the other
happy
So far, I was only able to match the word between the #s using this Pattern:
Pattern p = Pattern.compile("#(.+?)#");
I've read about negative lookahead and lookaround, played a bit with it but without success.
I assume I should do some sort of negation of the regex so far, but I couldn't come up with anything.
Any help would be really appreciated, thank you.
From comment:
I may incur in a string where I got more than one instances of words wrapped by #, such as "#Hello# kind #stranger#"
From comment:
I need to apply some different style format to both the text inside and outside.
Since you need to apply different stylings, the code need to process each block of text separately, and needs to know if the text is inside or outside a #..# section.
Note, in the following code, it will silently skip the last #, if there is an odd number of them.
String input = ...
for (Matcher m = Pattern.compile("([^#]+)|#([^#]+)#").matcher(input); m.find(); ) {
if (m.start(1) != -1) {
String outsideText = m.group(1);
System.out.println("Outside: \"" + outsideText + "\"");
} else {
String insideText = m.group(2);
System.out.println("Inside: \"" + insideText + "\"");
}
}
Output for input = "Hello, we are #happy# to see you today"
Outside: "Hello, we are "
Inside: "happy"
Outside: " to see you today"
Output for input = "#Hello# kind #stranger#"
Inside: "Hello"
Outside: " kind "
Inside: "stranger"
Output for input = "This #text# has unpaired # characters"
Outside: "This "
Inside: "text"
Outside: " has unpaired "
Outside: " characters"
The best I could do is splitting in 3 groups, then merging the group 1 and 4 :
(^.*)(\#(.+?)\#)(.*)
Test it here
EDIT: Taking remarks from the comments :
(^[^\#]*)(?:\#(.+?)\#)([^\#]*)
Thanks to #Lino we don't capture the useless group with # anymore, and we capture anything except #, instead of any non whitespace character in the 1st and 2nd groups.
Test it here
Is this solution fine?
Pattern pattern =
Pattern.compile("([^#]+)|#([^#]*)#");
Matcher matcher =
pattern.matcher("Hello, we are #happy# to see you today");
List<String> notBetween = new ArrayList<>(); // not surrounded by #
List<String> between = new ArrayList<>(); // surrounded by #
while (matcher.find()) {
if (Objects.nonNull(matcher.group(1))) notBetween.add(matcher.group(1));
if (Objects.nonNull(matcher.group(2))) between.add(matcher.group(2));
}
System.out.println("Printing group 1");
for (String string :
notBetween) {
System.out.println(string);
}
System.out.println("Printing group 2");
for (String string :
between) {
System.out.println(string);
}

How do i parse a string to get specific information using java?

Here are some lines from a file and I'm not sure how to parse it to extract 4 pieces of information.
11::American President, The (1995)::Comedy|Drama|Romance
12::Dracula: Dead and Loving It (1995)::Comedy|Horror
13::Balto (1995)::Animation|Children's
14::Nixon (1995)::Drama
I would like to get the number, title, release date and genre.
Genre has multiple genres so I would like to save each one in a variable as well.
I'm using the .split("::|\\|"); method to parse it but I'm not able to parse out the release date.
Can anyone help me!
The easiest would be matching by regex, something like this
String x = "11::Title (2016)::Category";
Pattern p = Pattern.compile("^([0-9]+)::([a-zA-Z ]+)\\(([0-9]{4})\\)::([a-zA-Z]+)$");
Matcher m = p.matcher(x);
if (m.find()) {
System.out.println("Number: " + m.group(1) + " Title: " + m.group(2) + " Year: " + m.group(3) + " Categories: " + m.group(4));
}
(please don't nail me on the exact syntax, just out of my head)
Then first capture will be the number, the second will be the name, the third is the year and the fourth is the set of categories, which you may then split by '|'.
You may need to adjust the valid characters for title and categories, but you should get the idea.
If you have multiple lines, split them into an ArrayList first and treat each one separately in a loop.
Try this
String[] s = {
"11::American President, The (1995)::Comedy|Drama|Romance",
"12::Dracula: Dead and Loving It (1995)::Comedy|Horror",
"13::Balto (1995)::Animation|Children's",
"14::Nixon (1995)::Drama",
};
for (String e : s) {
String[] infos = e.split("::|\\s*\\(|\\)::");
String number = infos[0];
String title = infos[1];
String releaseDate = infos[2];
String[] genres = infos[3].split("\\|");
System.out.printf("number=%s title=%s releaseDate=%s genres=%s%n",
number, title, releaseDate, Arrays.toString(genres));
}
output
number=11 title=American President, The releaseDate=1995 genres=[Comedy, Drama, Romance]
number=12 title=Dracula: Dead and Loving It releaseDate=1995 genres=[Comedy, Horror]
number=13 title=Balto releaseDate=1995 genres=[Animation, Children's]
number=14 title=Nixon releaseDate=1995 genres=[Drama]

Java Regex : How to search a text or a phrase in a large text

I have a large text file and I need to search a word or a phrase in the file line by line and output the line with the text found in it.
For example, the sample text is
And the earth was without form,
Where [art] thou?
if the user search for thou word, the only line to be display is
Where [art] thou?
and if the user search for the earth, the first line should be displayed.
I tried using the contains function but it will display also the without when searching only for thou.
This is my sample code :
String[] verseList = TextIO.readFile("pentateuch.txt");
Scanner kbd = new Scanner(System.in);
int counter = 0;
for (int i = 0; i < verseList.length; i++) {
String[] data = verseList[i].split("\t");
String[] info3 = data[3].split(" ");
System.out.print("Search for: ");
String txtSearch = kbd.nextLine();
LinkedList<String> searchedList = new LinkedList<String>();
for (String bible : verseList){
if (bible.contains(txtSearch)){
searchedList.add(bible);
counter++;
}
}
if (searchedList.size() > 0){
for (String s : searchedList){
String[] searchedData = s.split("\t");
System.out.printf("%s - %s - %s - %s \n",searchedData[0], searchedData[1], searchedData[2], searchedData[3]);
}
}
System.out.print("Total: " + counter);
So I am thinking of using regex but I don't know how.
Can anyone help? Thank you.
Since sometimes variables have non-word characters at boundary positions, you cannot rely on \b word boundary.
In such cases, it is safer to use look-arounds (?<!\w) and (?!\w), i.e. in Java, something like:
"(?<!\\w)" + searchedData[n] + "(?!\\w)"
To match a String that contains a word, use this code:
String txtSearch; // eg "thou"
if (str.matches(".*?\\b" + txtSearch + "\\b.*"))
// it matches
This code builds a regex that only matches if both ends of txtSearch fall and the start/end of a word in the string by using \b, which means "word boundary".

How to include white spaces in next() without using nextLine() in Java

I am trying to make the user input a string, which can both contain spaces or not. So in that, I'm using NextLine();
However, i'm trying to search a text file with that string, therefore i'm using next() to store each string it goes through with the scanner, I tried using NextLine() but it would take the whole line, I just need the words before a comma.
so far here's my code
System.out.print("Cool, now give me your Airport Name: ");
String AirportName = kb.nextLine();
AirportName = AirportName + ",";
while (myFile.hasNextLine()) {
ATA = myFile.next();
city = myFile.next();
country = myFile.next();
myFile.nextLine();
// System.out.println(city);
if (city.equalsIgnoreCase(AirportName)) {
result++;
System.out.println("The IATA code for "+AirportName.substring(0, AirportName.length()-1) + " is: " +ATA.substring(0, ATA.length()-1));
break;
}
}
The code works when the user inputs a word with no spaces, but when they input two words, the condition isn't met.
the text file just includes a number of Airports, their IATA, city, and country. Here's a sample:
ADL, Adelaide, Australia
IXE, Mangalore, India
BOM, Mumbai, India
PPP, Proserpine Queensland, Australia
By default, next() searches for first whitespace as a delimiter. You can change this behaviour like this:
Scanner s = new Scanner(input);
s.useDelimiter("\\s*,\\s*");
By this, s.next() will match commas as delimiters for your input (preceeded or followed by zero or more whitespaces)
Check out the String#split method.
Here's an example:
String test = "ADL, Adelaide, Australia\n"
+ "IXE, Mangalore, India\n"
+ "BOM, Mumbai, India\n"
+ "PPP, Proserpine Queensland, Australia\n";
Scanner scan = new Scanner(test);
String strings[] = null;
while(scan.hasNextLine()) {
// ",\\s" matches one comma followed by one white space ", "
strings = scan.nextLine().split(",\\s");
for(String tmp: strings) {
System.out.println(tmp);
}
}
Output:
ADL
Adelaide
Australia
IXE
Mangalore
India
BOM
Mumbai
India
PPP
Proserpine Queensland
Australia

How to remove spaces in between the String

I have below String
string = "Book Your Domain And Get\n \n\n \n \n \n Online Today."
string = str.replace("\\s","").trim();
which returning
str = "Book Your Domain And Get Online Today."
But what is want is
str = "Book Your Domain And Get Online Today."
I have tried Many Regular Expression and also googled but got no luck. and did't find related question, Please Help, Many Thanks in Advance
Use \\s+ instead of \\s as there are two or more consecutive whitespaces in your input.
string = str.replaceAll("\\s+"," ")
You can use replaceAll which takes a regex as parameter. And it seems like you want to replace multiple spaces with a single space. You can do it like this:
string = str.replaceAll("\\s{2,}"," ");
It will replace 2 or more consecutive whitespaces with a single whitespace.
First get rid of multiple spaces:
String after = before.trim().replaceAll(" +", " ");
If you want to just remove the white space between 2 words or characters and not at the end of string
then here is the
regex that i have used,
String s = " N OR 15 2 ";
Pattern pattern = Pattern.compile("[a-zA-Z0-9]\\s+[a-zA-Z0-9]", Pattern.CASE_INSENSITIVE);
Matcher m = pattern.matcher(s);
while(m.find()){
String replacestr = "";
int i = m.start();
while(i<m.end()){
replacestr = replacestr + s.charAt(i);
i++;
}
m = pattern.matcher(s);
}
System.out.println(s);
it will only remove the space between characters or words not spaces at the ends
and the output is
NOR152
Eg. to remove space between words in a string:
String example = "Interactive Resource";
System.out.println("Without space string: "+ example.replaceAll("\\s",""));
Output:
Without space string: InteractiveResource
If you want to print a String without space, just add the argument sep='' to the print function, since this argument's default value is " ".
//user this for removing all the whitespaces from a given string for example a =" 1 2 3 4"
//output: 1234
a.replaceAll("\\s", "")
String s2=" 1 2 3 4 5 ";
String after=s2.replace(" ", "");
this work for me
String string_a = "AAAA BBB";
String actualTooltip_3 = string_a.replaceAll("\\s{2,}"," ");
System.out.println(String actualTooltip_3);
OUTPUT will be:AAA BBB

Categories

Resources