java regular expression for String surrounded by "" - java

I have:
String s=" \"son of god\"\"cried out\" a good day and ok ";
This is shown on the screen as:
"son of god""cried out" a good day and ok
Pattern phrasePattern=Pattern.compile("(\".*?\")");
Matcher m=phrasePattern.matcher(s);
I want get all the phrases surrounded by "" and add them to an ArrayList<String>. It might have more than 2 such phrases. How can I get each phrase and put into my Arraylist?

With your Matcher you're 90% of the way there. You just need the #find method.
ArrayList<String> list = new ArrayList<>();
while(m.find()) {
list.add(m.group());
}

An alternative approach, and I only suggest it because you did not explicitly say you must use regex matching, is to split on ". Every other piece is your interest.
public static void main(String[] args) {
String[] testCases = new String[] {
" \"son of god\"\"cried out\" a good day and ok ",
"\"starts with a quote\" and then \"forgot the end quote",
};
for (String testCase : testCases) {
System.out.println("Input: " + testCase);
String[] pieces = testCase.split("\"");
System.out.println("Split into : " + pieces.length + " pieces");
for (int i = 0; i < pieces.length; i++) {
if (i%2 == 1) {
System.out.println(pieces[i]);
}
}
System.out.println();
}
}
Results:
Input: "son of god""cried out" a good day and ok
Split into : 5 pieces
son of god
cried out
Input: "starts with a quote" and then "forgot the end quote
Split into : 4 pieces
starts with a quote
forgot the end quote
If you want to ensure that there is an even number of double quotes, ensure the split result has an odd count.

Related

Java regex, how to split on dot, whitespace, and keep quoted words together

I want to split my string on a dot, white space, and keep quoted words together. So for example say I have:
This "majestic world" is truly.awesome
it should result in:
This
majestic world
is
truly
awesome
Now, the regex I have so far is just myString.split("\\. | "). I know that
("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'") is supposed to split on white space and keep quoted words together. Now, I am not sure how to incorporate the dot escape there, and quite frankly that regex makes about as much sense as trying to flip a burger with your hands tied behind your back.
Edit:
the following is the reason why I want a better regex, since for me to handle keeping things within quotation marks, I have to do the following:
if(listOfLists.get(i).get(j).equalsIgnoreCase("INSERT")) {
String insertInto = "";
boolean buildingString = false;
int k = 0;
while(k < listOfLists.get(i).size()) {
if(listOfLists.get(i).get(k).endsWith("\"")) {
buildingString = false;
insertInto += listOfLists.get(i).get(k);
break;
}
if(listOfLists.get(i).get(k).startsWith("\"") || buildingString) {
buildingString = true;
insertInto += listOfLists.get(i).get(k) + " ";
}
k++;
}
System.out.println("k is: " + k + " message is: " + insertInto);
dbCommands.appendToTable(listOfLists.get(i).get(k + 2), listOfLists.get(i).get(k + 3), insertInto);
Where the file i am parsing has a command as follows:
INSERT "John Smith" INTO college.students
This regex will get you most of the way but you will still need to remove the quotes in a second step:
(\".*\"|[^\\s\\.]+)
IE:
Matcher m = Pattern.compile("(\".*\"|[^\\s\\.]+)")
.matcher("This \"majestic world\" is truly.awesome");
while (m.find()){
System.out.println(m.group(1).replaceAll("\"", ""));
}

Get certain data from text - Java

I am creating a bukkit plugin for minecraft and i need to know a few things before i move on.
I want to check if a text has this layout: "B:10 S:5" for example.
It stands for Buy:amount and Sell:amount
How can i check the easiest way if it follows the syntax?
It can be any number that is 0 or over.
Another problem is to get this data out of the text. how can i check what text is after B: and S: and return it as an integer
I have not tried out this yet because i have no idea where to start.
Thanks for help!
In the simple problem you gave, you can get away with a simple answer. Otherwise, see the regex answer below.
boolean test(String str){
try{
//String str = "B:10 S:5";
String[] arr = str.split(" ");//split to left and right of space = [B:10,S:5]
String[] bArr = arr[0].split(":");//split ...first colon = [B,10]
String[] sArr = arr[1].split(":");//... second colon = [S,5]
//need to use try/catch here in case the string is not an int value.
String labelB = bArr[0];
Integer b = Integer.parseInt(bArr[1]);
String labelS = sArr[0];
Integer s = Integer.parseInt(sArr[1]);
}catch( Exception e){return false;}
return true;
}
See my answer here for a related task. More related details below.
How can I parse a string for a set?
Essentially, you need to use regex and iterate through the groups. Just in case the grammar is not always B and S, I made this more abstract.Also, if there are extra white spaces in the middle for what ever reason, I made that more broad too. The pattern says there are 4 groups (indicated by parentheses): label1, number1, label2, and number2. + means 1 or more. [] means a set of characters. a-z is a range of characters (don't put anything between A-Z and a-z). There are also other ways of showing alpha and numeric patterns, but these are easier to read.
//this is expensive
Pattern p=Pattern.compile("([A-Za-z]+):([0-9]+)[ ]+([A-Za-z]+):([0-9]+)");
boolean test(String txt){
Matcher m=p.matcher(txt);
if(!m.matches())return false;
int groups=m.groupCount();//should only equal 5 (default whole match+4 groups) here, but you can test this
System.out.println("Matched: " + m.group(0));
//Label1 = m.group(1);
//val1 = m.group(2);
//Label2 = m.group(3);
//val2 = m.group(4);
return true;
}
Use Regular Expression.
In your case,^B:(\d)+ S:(\d)+$ is enough.
In java, to use a regular expression:
public class RegExExample {
public static void main(String[] args) {
Pattern p = Pattern.compile("^B:(\d)+ S:(\d)+$");
for (int i = 0; i < args.length; i++)
if (p.matcher(args[i]).matches())
System.out.println( "ARGUMENT #" + i + " IS VALID!")
else
System.out.println( "ARGUMENT #" + i + " IS INVALID!");
}
}
This sample program take inputs from command line, validate it against the pattern and print the result to STDOUT.

Splitting string in java

I have input as follows
Date Place total trains
monday,chennai,10
tuesday,kolkata,20
wednesday,banglore,karnataka,30
I want to split this data.So far I have used
String[] data = input.split(",");
If I do like above I am getting
index[0] index[1] index[2]
monday chennai 10
tuesday kolkata 20
wednesday banglore karnataka 30
But I want the output like below
index[0] index[1] index[3]
wednesday banglore,karnataka 30
Is there any way to achieve this
Split your input according to the first comma or the last comma.
String s = "wednesday,banglore,karnataka,30";
String parts[] = s.split("(?<=^[^,]*),|,(?=[^,]*$)");
System.out.println(Arrays.toString(parts));
Output:
[wednesday, banglore,karnataka, 30]
If you split a string with a regex, you essentially tell where the string should be cut. This necessarily cuts away what you match with the regex. Which means if you split at \w, then every character is a split point and the substrings between them (all empty) are returned. Java automatically removes trailing empty strings, as described in the documentation.
This also explains why the lazy match \w*? will give you every character, because it will match every position between (and before and after) any character (zero-width). What's left are the characters of the string themselves
Try
String[] data = input.split("(?<=^\\w+),|,(?=\\d+)");
Some good Explanations is here
Sticking to the basics, your data has to be in this format.
Date,Place,total trains
"monday","chennai","10"
"tuesday","kolkata","20"
"wednesday",'banglore,karnataka","30"
Because, if both delimiter and data are same, then either write a complex code to handle to simply put your data in double quotes. csv files also uses this feature.
Assuming you know the position of "," you can get rid of it.
Program below replaces 2nd instance of , with " " so string.split() works as needed
need to import import java.util.regex.*;
===========
public static void main(String args[]){
StringBuffer sb = new StringBuffer();
String s = "wednesday,banglore,karnataka,30";
Pattern p = Pattern.compile(",");
Matcher m = p.matcher(s);
int count = 1;
while(m.find()) {
if(count == 2 ){
m.appendReplacement(sb, " ");
}
count++;
}
m.appendTail(sb);
System.out.println(sb);
s= sb.toString();
String[] data = s.split(",");
System.out.println( data[0] + "-" + data[1] + "-" +data[2] );
}//psvm
Output
wednesday,banglore karnataka,30
wednesday-banglore karnataka-30
This code will work for you :
public static void main(String[] args) {
String s1 = "wednesday,banglore,karnataka,30";
String s2 = "monday,chennai,10";
String[] arr1 = s1.split("(?<=^\\w+),|,(?=\\d+)");
for(String ss : arr1)
System.out.println(ss);
System.out.println();
String[] arr2 = s2.split("(?<=^\\w+),|,(?=\\d+)");
for(String ss : arr2)
System.out.println(ss);
}
O/P :
wednesday
banglore,karnataka
30
monday
chennai
10

Splitting up input using regular expressions in Java

I am making a program that lets a user input a chemical for example C9H11N02. When they enter that I want to split it up into pieces so I can have it like C9, H11, N, 02. When I have it like this I want to make changes to it so I can make it C10H12N203 and then put it back together. This is what I have done so far. using the regular expression I have used I can extract the integer value, but how would I go about get C10, H11 etc..?
System.out.println("Enter Data");
Scanner k = new Scanner( System.in );
String input = k.nextLine();
String reg = "\\s\\s\\s";
String [] data;
data = input.split( reg );
int m = Integer.parseInt( data[0] );
int n = Integer.parseInt( data[1] );
It can be done using look arounds:
String[] parts = input.split("(?<=.)(?=[A-Z])");
Look arounds are zero-width, non-consuming assertions.
This regex splits the input where the two look arounds match:
(?<=.) means "there is a preceding character" (ie not at the start of input)
(?=[A-Z]) means "the next character is a capital letter" (All elements start with A-Z)
Here's a test, including a double-character symbol for some edge cases:
public static void main(String[] args) {
String input = "C9KrBr2H11NO2";
String[] parts = input.split("(?<=.)(?=[A-Z])");
System.out.println(Arrays.toString(parts));
}
Output:
[C9, Kr, Br2, H11, N, O2]
If you then wanted to split up the individual components, use a nested call to split():
public static void main(String[] args) {
String input = "C9KrBr2H11NO2";
for (String component : input.split("(?<=.)(?=[A-Z])")) {
// split on non-digit/digit boundary
String[] symbolAndNumber = component.split("(?<!\\d)(?=\\d)");
String element = symbolAndNumber[0];
// elements without numbers won't be split
String count = symbolAndNumber.length == 1 ? "1" : symbolAndNumber[1];
System.out.println(element + " x " + count);
}
}
Output:
C x 9
Kr x 1
Br x 2
H x 11
N x 1
O x 2
Did you accidentally put zeroes into some of those formula where the letter "O" (oxygen) was supposed to be? If so:
"C10H12N2O3".split("(?<=[0-9A-Za-z])(?=[A-Z])");
[C10, H12, N2, O3]
"CH2BrCl".split("(?<=[0-9A-Za-z])(?=[A-Z])");
[C, H2, Br, Cl]
I believe the following code should allow you to extract the various elements and their associated count. Of course, brackets make things more complicated, but you didn't ask about them!
Pattern pattern = Pattern.compile("([A-Z][a-z]*)([0-9]*)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String element = matcher.group(1);
int count = 1;
if (matcher.groupCount > 1) {
try {
count = Integer.parseInt(matcher.group(2));
} catch (NumberFormatException e) {
// Regex means we should never get here!
}
}
// Do stuff with this component
}

How to trim white space from all elements in array?

I was just wondering what the best way to remove the white space from all the elements of a list would be.
For example if I had String [] array = {" String", "Tom Selleck "," Fish "}
How could I get all the elements as {"String","Tom Selleck","Fish"}
Thanks!
Try this:
String[] trimmedArray = new String[array.length];
for (int i = 0; i < array.length; i++)
trimmedArray[i] = array[i].trim();
Now trimmedArray contains the same strings as array, but without leading and trailing whitespace. Alternatively, you could write this for modifying the strings in-place in the same array:
for (int i = 0; i < array.length; i++)
array[i] = array[i].trim();
Another java 8 lambda option :
String[] array2 = Arrays.stream(array).map(String::trim).toArray(String[]::new);
And the ugly but optimized version without new array creation
Arrays.stream(array).map(String::trim).toArray(unused -> array);
Original "array" is modified.
Add commons-lang3-3.1.jar in your application build path.
Use the below code snippet to trim the String array.
String array = {" String", "Tom Selleck "," Fish "};
array = StringUtils.stripAll(array);
In Java 8, Arrays.parallelSetAll seems ready made for this purpose:
import java.util.Arrays;
Arrays.parallelSetAll(array, (i) -> array[i].trim());
This will modify the original array in place, replacing each element with the result of the lambda expression.
I know this is a really old post, but since Java 1.8 there is a nicer way to trim every String in an array.
Java 8 Lamda Expression solution:
List<String> temp = new ArrayList<>(Arrays.asList(yourArray));
temp.forEach(e -> {temp.set((temp.indexOf(e), e.trim()});
yourArray = temp.toArray(new String[temp.size()]);
with this solution you don't have to create a new Array.
Like in Óscar López's solution
You can just iterate over the elements in the array and call array[i].trim() on each element
For those (like me) who was looking for the same solution in Kotlin and were pointed to Java only - how to trim in Kotlin:
fun main(args: Array<String>) {
// array definition
val array = arrayListOf<String>(" String", "Tom Selleck "," Fish ")
println(array) // print original -> [ String, Tom Selleck , Fish ]
// remove leading and trailing spaces, result is arrayList
val sol1 = array.map { it.trim() }
println("sol1 = $sol1") // -> sol1 = [String, Tom Selleck, Fish]
// remove leading and trailing spaces, result is String
val sol2 = array.joinToString { it.trim() }
println("sol2 = $sol2") // -> sol2 = String, Tom Selleck, Fish
}
Not knowing how the OP happened to have {" String", "Tom Selleck "," Fish "} in an array in the first place (6 years ago), I thought I'd share what I ended up with.
My array is the result of using split on a string which might have extra spaces around delimiters. My solution was to address this at the point of the split. My code follows. After testing, I put splitWithTrim() in my Utils class of my project. It handles my use case; you might want to consider what sorts of strings and delimiters you might encounter if you decide to use it.
public class Test {
public static void main(String[] args) {
test(" abc def ghi jkl ", " ");
test(" abc; def ;ghi ; jkl; ", ";");
}
public static void test(String str, String splitOn) {
System.out.println("Splitting \"" + str + "\" on \"" + splitOn + "\"");
String[] parts = splitWithTrim(str, splitOn);
for (String part : parts) {
System.out.println("(" + part + ")");
}
}
public static String[] splitWithTrim(String str, String splitOn) {
if (splitOn.equals(" ")) {
return str.trim().split(" +");
} else {
return str.trim().split(" *" + splitOn + " *");
}
}
}
Output of running the test application is:
Splitting " abc def ghi jkl " on " "
(abc)
(def)
(ghi)
(jkl)
Splitting " abc; def ;ghi ; jkl; " on ";"
(abc)
(def)
(ghi)
(jkl)
String val = "hi hello prince";
String arr[] = val.split(" ");
for (int i = 0; i < arr.length; i++)
{
System.out.print(arr[i]);
}

Categories

Resources