Splitting string in java - java

I have input as follows
Date Place total trains
monday,chennai,10
tuesday,kolkata,20
wednesday,banglore,karnataka,30
I want to split this data.So far I have used
String[] data = input.split(",");
If I do like above I am getting
index[0] index[1] index[2]
monday chennai 10
tuesday kolkata 20
wednesday banglore karnataka 30
But I want the output like below
index[0] index[1] index[3]
wednesday banglore,karnataka 30
Is there any way to achieve this

Split your input according to the first comma or the last comma.
String s = "wednesday,banglore,karnataka,30";
String parts[] = s.split("(?<=^[^,]*),|,(?=[^,]*$)");
System.out.println(Arrays.toString(parts));
Output:
[wednesday, banglore,karnataka, 30]

If you split a string with a regex, you essentially tell where the string should be cut. This necessarily cuts away what you match with the regex. Which means if you split at \w, then every character is a split point and the substrings between them (all empty) are returned. Java automatically removes trailing empty strings, as described in the documentation.
This also explains why the lazy match \w*? will give you every character, because it will match every position between (and before and after) any character (zero-width). What's left are the characters of the string themselves
Try
String[] data = input.split("(?<=^\\w+),|,(?=\\d+)");
Some good Explanations is here

Sticking to the basics, your data has to be in this format.
Date,Place,total trains
"monday","chennai","10"
"tuesday","kolkata","20"
"wednesday",'banglore,karnataka","30"
Because, if both delimiter and data are same, then either write a complex code to handle to simply put your data in double quotes. csv files also uses this feature.

Assuming you know the position of "," you can get rid of it.
Program below replaces 2nd instance of , with " " so string.split() works as needed
need to import import java.util.regex.*;
===========
public static void main(String args[]){
StringBuffer sb = new StringBuffer();
String s = "wednesday,banglore,karnataka,30";
Pattern p = Pattern.compile(",");
Matcher m = p.matcher(s);
int count = 1;
while(m.find()) {
if(count == 2 ){
m.appendReplacement(sb, " ");
}
count++;
}
m.appendTail(sb);
System.out.println(sb);
s= sb.toString();
String[] data = s.split(",");
System.out.println( data[0] + "-" + data[1] + "-" +data[2] );
}//psvm
Output
wednesday,banglore karnataka,30
wednesday-banglore karnataka-30

This code will work for you :
public static void main(String[] args) {
String s1 = "wednesday,banglore,karnataka,30";
String s2 = "monday,chennai,10";
String[] arr1 = s1.split("(?<=^\\w+),|,(?=\\d+)");
for(String ss : arr1)
System.out.println(ss);
System.out.println();
String[] arr2 = s2.split("(?<=^\\w+),|,(?=\\d+)");
for(String ss : arr2)
System.out.println(ss);
}
O/P :
wednesday
banglore,karnataka
30
monday
chennai
10

Related

How can I eliminate duplicate words from String in Java?

I have an ArrayList of Strings and it contains records such as:
this is a first sentence
hello my name is Chris
what's up man what's up man
today is tuesday
I need to clear this list, so that the output does not contain repeated content. In the case above, the output should be:
this is a first sentence
hello my name is Chris
what's up man
today is tuesday
as you can see, the 3rd String has been modified and now contains only one statement what's up man instead of two of them.
In my list there is a situation that sometimes the String is correct, and sometimes it is doubled as shown above.
I want to get rid of it, so I thought about iterating through this list:
for (String s: myList) {
but I cannot find a way of eliminating duplicates, especially since the length of each string is not determined, and by that I mean there might be record:
this is a very long sentence this is a very long sentence
or sometimes short ones:
single word singe word
is there some native java function for that maybe?
Assuming the String is repeated just twice, and with an space in between as in your examples, the following code would remove repetitions:
for (int i=0; i<myList.size(); i++) {
String s = myList.get(i);
String fs = s.substring(0, s.length()/2);
String ls = s.substring(s.length()/2+1, s.length());
if (fs.equals(ls)) {
myList.set(i, fs);
}
}
The code just split each entry of the list into two substrings (dividing by the half point). If both are equal, substitute the original element with only one half, thus removing the repetition.
I was testing the code and did not see #Brendan Robert answer. This code follows the same logic as his answer.
I would suggest using regular expressions. I was able to remove duplicates using this pattern: \b([\w\s']+) \1\b
public class Main {
static String [] phrases = {
"this is a first sentence",
"hello my name is Chris",
"what's up man what's up man",
"today is tuesday",
"this is a very long sentence this is a very long sentence",
"single word single word",
"hey hey"
};
public static void main(String[] args) throws Exception {
String duplicatePattern = "\\b([\\w\\s']+) \\1\\b";
Pattern p = Pattern.compile(duplicatePattern);
for (String phrase : phrases) {
Matcher m = p.matcher(phrase);
if (m.matches()) {
System.out.println(m.group(1));
} else {
System.out.println(phrase);
}
}
}
}
Results:
this is a first sentence
hello my name is Chris
what's up man
today is tuesday
this is a very long sentence
single word
hey
Assumptions:
Uppercase words are equal to lowercase counterparts.
String fullString = "lol lol";
String[] words = fullString.split("\\W+");
StringBuilder stringBuilder = new StringBuilder();
Set<String> wordsHashSet = new HashSet<>();
for (String word : words) {
// Check for duplicates
if (wordsHashSet.contains(word.toLowerCase())) continue;
wordsHashSet.add(word.toLowerCase());
stringBuilder.append(word).append(" ");
}
String nonDuplicateString = stringBuilder.toString().trim();
simple logic : split every word by token space i.e " " and now add it in LinkedHashSet , Retrieve back, Replace "[","]",","
String s = "I want to walk my dog I want to walk my dog";
Set<String> temp = new LinkedHashSet<>();
String[] arr = s.split(" ");
for ( String ss : arr)
temp.add(ss);
String newl = temp.toString()
.replace("[","")
.replace("]","")
.replace(",","");
System.out.println(newl);
o/p : I want to walk my dog
It depends on the situation that you have but assuming that the string can be repeated at most twice and not three or more times you could find the length of the entire string, find the halfway point and compare each index after the halfway point with the matching beginning index. If the string can be repeated more than once you will need a more complicated algorithm that would first determine how many times the string is repeated and then finds the starting index of each repeat and truncates all index's from the beginning of the first repeat onward. If you can provide some more context into what possible scenarios you expect to handle we can start putting together some ideas.
//Doing it in Java 8
String str1 = "I am am am a good Good coder";
String[] arrStr = str1.split(" ");
String[] element = new String[1];
return Arrays.stream(arrStr).filter(str1 -> {
if (!str1.equalsIgnoreCase(element[0])) {
element[0] = str1;
return true;
}return false;
}).collect(Collectors.joining(" "));

How to split a string by every other separator

There's a string
String str = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
How do I split it into strings like this
"ggg;ggg;"
"nnn;nnn;"
"aaa;aaa;"
"xxx;xxx;"
???????
Using Regex
String input = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
Pattern p = Pattern.compile("([a-z]{3});\\1;");
Matcher m = p.matcher(input);
while (m.find())
// m.group(0) is the result
System.out.println(m.group(0));
Will output
ggg;ggg;
nnn;nnn;
aaa;aaa;
xxx;xxx;
I assume that the you only want to check if the last segment is similar and not every segment that has been read.
If that is not the case then you would probably have to use an ArrayList instead of a Stack.
I also assumed that each segment has the format /([a-z])\1\1/.
If that is not the case either then you should change the if statement with:
(stack.peek().substring(0,index).equals(temp))
public static Stack<String> splitString(String text, char split) {
Stack<String> stack = new Stack<String>();
int index = text.indexOf(split);
while (index != -1) {
String temp = text.substring(0, index);
if (!stack.isEmpty()) {
if (stack.peek().charAt(0) == temp.charAt(0)) {
temp = stack.pop() + split + temp;
}
}
stack.push(temp);
text = text.substring(index + 1);
index = text.indexOf(split);
}
return stack;
}
Split and join them.
public static void main(String[] args) throws Exception {
String data = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
String del = ";";
int splitSize = 2;
StringBuilder sb = new StringBuilder();
for (Iterable<String> iterable : Iterables.partition(Splitter.on(del).split(data), splitSize)) {
sb.append("\"").append(Joiner.on(del).join(iterable)).append(";\"");
}
sb.delete(sb.length()-3, sb.length());
System.out.println(sb.toString());
}
Ref : Split a String at every 3rd comma in Java
Use split with a regex:
String data="ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
String [] array=data.split("(?<=\\G\\S\\S\\S;\\S\\S\\S);");
S: A non-whitespace character
G: last match/start of string, think of it of a way to skip delimiting if the
previous string matches current one.
?<=:positive look-behind will match semicolon which has string behind it.
Some other answer, that only works given your specific example input.
You see, in your example, there are two similarities:
All patterns seem to have exactly three characters
All patterns occur exactly twice
In other words: if those two properties are really met for all your input, you could avoid splitting - as you know exactly what to find in each position of your string.
Of course, following the other answers for "real" splitting are more flexible; but (theoretically), you could just go forward and do a bunch of substring calls in order to directly access all elements.

java regular expression for String surrounded by ""

I have:
String s=" \"son of god\"\"cried out\" a good day and ok ";
This is shown on the screen as:
"son of god""cried out" a good day and ok
Pattern phrasePattern=Pattern.compile("(\".*?\")");
Matcher m=phrasePattern.matcher(s);
I want get all the phrases surrounded by "" and add them to an ArrayList<String>. It might have more than 2 such phrases. How can I get each phrase and put into my Arraylist?
With your Matcher you're 90% of the way there. You just need the #find method.
ArrayList<String> list = new ArrayList<>();
while(m.find()) {
list.add(m.group());
}
An alternative approach, and I only suggest it because you did not explicitly say you must use regex matching, is to split on ". Every other piece is your interest.
public static void main(String[] args) {
String[] testCases = new String[] {
" \"son of god\"\"cried out\" a good day and ok ",
"\"starts with a quote\" and then \"forgot the end quote",
};
for (String testCase : testCases) {
System.out.println("Input: " + testCase);
String[] pieces = testCase.split("\"");
System.out.println("Split into : " + pieces.length + " pieces");
for (int i = 0; i < pieces.length; i++) {
if (i%2 == 1) {
System.out.println(pieces[i]);
}
}
System.out.println();
}
}
Results:
Input: "son of god""cried out" a good day and ok
Split into : 5 pieces
son of god
cried out
Input: "starts with a quote" and then "forgot the end quote
Split into : 4 pieces
starts with a quote
forgot the end quote
If you want to ensure that there is an even number of double quotes, ensure the split result has an odd count.

How to extract integers from a complicated string?

I am having a hard time figuring with out. Say I have String like this
String s could equal
s = "{1,4,204,3}"
at another time it could equal
s = "&5,3,5,20&"
or it could equal at another time
s = "/4,2,41,23/"
Is there any way I could just extract the numbers out of this string and make a char array for example?
You can use regex for this sample:
String s = "&5,3,5,20&";
System.out.println(s.replaceAll("[^0-9,]", ""));
result:
5,3,5,20
It will replace all the non word except numbers and commas. If you want to extract all the number you can just call split method -> String [] sArray = s.split(","); and iterate to all the array to extract all the number between commas.
You can use RegEx and extract all the digits from the string.
stringWithOnlyNumbers = str.replaceAll("[^\\d,]+","");
After this you can use split() using deliminator ',' to get the numbers in an array.
I think split() with replace() must help you with that
Use regular expressions
String a = "asdf4sdf5323ki";
String regex = "([0-9]*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(a);
while (matcher.find())
{
String group = matcher.group(1);
if (group.length() > 0)
{
System.out.println(group);
}
}
from your cases, if the pattern of string is same in all cases, then something like below would work, check for any exceptions, not mentioned here :
String[] sArr= s.split(",");
sArr[0] = sArr[0].substring(1);
sArr[sArr.length()-1] =sArr[sArr.length()-1].substring(0,sArr[sArr.length()-1].length()-1);
then convert the String[] to char[] , here is an example converter method
You can use Scanner class with , delimiter
String s = "{1,4,204,3}";
Scanner in = new Scanner(s.substring(1, s.length() - 1)); // Will scan the 1,4,204,3 part
in.useDelimiter(",");
while(in.hasNextInt()){
int x = in.nextInt();
System.out.print(x + " ");
// do something with x
}
The above will print:
1 4 204 3

Splitting up input using regular expressions in Java

I am making a program that lets a user input a chemical for example C9H11N02. When they enter that I want to split it up into pieces so I can have it like C9, H11, N, 02. When I have it like this I want to make changes to it so I can make it C10H12N203 and then put it back together. This is what I have done so far. using the regular expression I have used I can extract the integer value, but how would I go about get C10, H11 etc..?
System.out.println("Enter Data");
Scanner k = new Scanner( System.in );
String input = k.nextLine();
String reg = "\\s\\s\\s";
String [] data;
data = input.split( reg );
int m = Integer.parseInt( data[0] );
int n = Integer.parseInt( data[1] );
It can be done using look arounds:
String[] parts = input.split("(?<=.)(?=[A-Z])");
Look arounds are zero-width, non-consuming assertions.
This regex splits the input where the two look arounds match:
(?<=.) means "there is a preceding character" (ie not at the start of input)
(?=[A-Z]) means "the next character is a capital letter" (All elements start with A-Z)
Here's a test, including a double-character symbol for some edge cases:
public static void main(String[] args) {
String input = "C9KrBr2H11NO2";
String[] parts = input.split("(?<=.)(?=[A-Z])");
System.out.println(Arrays.toString(parts));
}
Output:
[C9, Kr, Br2, H11, N, O2]
If you then wanted to split up the individual components, use a nested call to split():
public static void main(String[] args) {
String input = "C9KrBr2H11NO2";
for (String component : input.split("(?<=.)(?=[A-Z])")) {
// split on non-digit/digit boundary
String[] symbolAndNumber = component.split("(?<!\\d)(?=\\d)");
String element = symbolAndNumber[0];
// elements without numbers won't be split
String count = symbolAndNumber.length == 1 ? "1" : symbolAndNumber[1];
System.out.println(element + " x " + count);
}
}
Output:
C x 9
Kr x 1
Br x 2
H x 11
N x 1
O x 2
Did you accidentally put zeroes into some of those formula where the letter "O" (oxygen) was supposed to be? If so:
"C10H12N2O3".split("(?<=[0-9A-Za-z])(?=[A-Z])");
[C10, H12, N2, O3]
"CH2BrCl".split("(?<=[0-9A-Za-z])(?=[A-Z])");
[C, H2, Br, Cl]
I believe the following code should allow you to extract the various elements and their associated count. Of course, brackets make things more complicated, but you didn't ask about them!
Pattern pattern = Pattern.compile("([A-Z][a-z]*)([0-9]*)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String element = matcher.group(1);
int count = 1;
if (matcher.groupCount > 1) {
try {
count = Integer.parseInt(matcher.group(2));
} catch (NumberFormatException e) {
// Regex means we should never get here!
}
}
// Do stuff with this component
}

Categories

Resources