Splitting up input using regular expressions in Java

Splitting up input using regular expressions in Java - java

I am making a program that lets a user input a chemical for example C9H11N02. When they enter that I want to split it up into pieces so I can have it like C9, H11, N, 02. When I have it like this I want to make changes to it so I can make it C10H12N203 and then put it back together. This is what I have done so far. using the regular expression I have used I can extract the integer value, but how would I go about get C10, H11 etc..?
System.out.println("Enter Data");
Scanner k = new Scanner( System.in );
String input = k.nextLine();
String reg = "\\s\\s\\s";
String [] data;
data = input.split( reg );
int m = Integer.parseInt( data[0] );
int n = Integer.parseInt( data[1] );

It can be done using look arounds:
String[] parts = input.split("(?<=.)(?=[A-Z])");
Look arounds are zero-width, non-consuming assertions.
This regex splits the input where the two look arounds match:
(?<=.) means "there is a preceding character" (ie not at the start of input)
(?=[A-Z]) means "the next character is a capital letter" (All elements start with A-Z)
Here's a test, including a double-character symbol for some edge cases:
public static void main(String[] args) {
String input = "C9KrBr2H11NO2";
String[] parts = input.split("(?<=.)(?=[A-Z])");
System.out.println(Arrays.toString(parts));
}
Output:
[C9, Kr, Br2, H11, N, O2]
If you then wanted to split up the individual components, use a nested call to split():
public static void main(String[] args) {
String input = "C9KrBr2H11NO2";
for (String component : input.split("(?<=.)(?=[A-Z])")) {
// split on non-digit/digit boundary
String[] symbolAndNumber = component.split("(?<!\\d)(?=\\d)");
String element = symbolAndNumber[0];
// elements without numbers won't be split
String count = symbolAndNumber.length == 1 ? "1" : symbolAndNumber[1];
System.out.println(element + " x " + count);
}
}
Output:
C x 9
Kr x 1
Br x 2
H x 11
N x 1
O x 2

Did you accidentally put zeroes into some of those formula where the letter "O" (oxygen) was supposed to be? If so:
"C10H12N2O3".split("(?<=[0-9A-Za-z])(?=[A-Z])");
[C10, H12, N2, O3]
"CH2BrCl".split("(?<=[0-9A-Za-z])(?=[A-Z])");
[C, H2, Br, Cl]

I believe the following code should allow you to extract the various elements and their associated count. Of course, brackets make things more complicated, but you didn't ask about them!
Pattern pattern = Pattern.compile("([A-Z][a-z]*)([0-9]*)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String element = matcher.group(1);
int count = 1;
if (matcher.groupCount > 1) {
try {
count = Integer.parseInt(matcher.group(2));
} catch (NumberFormatException e) {
// Regex means we should never get here!
}
}
// Do stuff with this component
}

Related

finding out if the characters of a string exist in another string with the same order or not using regex in java

i want to write a program in java using REGEX that gets 2 strings from the input ( the first one is shorter than the second one ) and then if the characters of the first string was inside the second string with the same order but they do not need to be next to each other ( it is not substring ) it outputs "true" and if not it outputs "false" here's an example:
example1:
input:
phantom
pphvnbajknzxcvbnatopopoim
output:
true
in the above example it is obvious we can see the word "phantom" in the second string (the characters are in the same order)
example2:
input:
apple
fgayiypvbnltsrgte
output:
false
as you can see apple dos not exists in the second string with the conditions i have earlier mentioned so it outputs false
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
String word1 = input.next();
String word2 = input.next();
String pattern = "";
int n = word1.length();
char[] word1CharArr = word1.toCharArray();
for ( int i = 0 ; i < n ; i++) {
pattern += "[:alnum:]" +word1CharArr[i]+"[:alnum:]";
// pattern += ".*\\b|\\B" +word1CharArr[i]+"\\b|\\B";
}
pattern = "^" + pattern + "$";
// pattern = "(?s)" + pattern + ".*";
// System.out.println(pattern);
System.out.println(word2.matches(pattern));
}
}
here is what i did . i broke my first string to its characters and want to use REGEX before and after each character to determine the pattern. I have searched much about REGEX and how to use it but still i have problem here. the part i have commented comes out from one of my searches but it did not work
I emphasize that i want to solve it with REGEX not any other way.

[:alnum:] isn't a thing. Even if it is, that would match exactly one character, not 'any number, from 0 to infinitely many of them'.
You just want phantom with .* in the middle: ^.*p.*h.*a.*n.*t.*o.*m.*$' is all you need. After all, phantom` 'fits', and so does paahaanaataaoaamaa -
String pattern = word1.chars()
.mapToObj(c -> ".*" + (char) c)
.collect(Collectors.joining()) + ".*";
should get the job done.

Java check if a scan input is both a specific string and only three digits

I have a project that is asking, "Order is entered by the user. The order either begins with FB or SB and then has three digits after those letters. Must check to be sure the order number is either letter code and only three digits." in java.
ex.
Create order number [FB or SB for type of gift and three integers]: FB343
I'm struggling to find how to validate both in one input.

That looks like a regular expression to me. You can use a Pattern and Matcher to test if the given order matches the Pattern; does it start with F or S then B and then three digits. Like,
String[] arr = { "SB123", "FB124", "CBXXX", "FB1234" };
Pattern p = Pattern.compile("[SF]B\\d{3}");
for (String s : arr) {
Matcher m = p.matcher(s);
System.out.printf("%s %b%n", s, m.matches());
}
Outputs
SB123 true
FB124 true
CBXXX false
FB1234 false

Regex should do the trick.
Just run Pattern.matches() on the sequence ^((FB)|(SB){1})([0-9]{3})$
Something like
public class Matcher(string){
bool = Pattern.matches(^((FB)|(SB){1})([0-9]{3})$), string);
}

So to further enhance the Order Number validation:
String orderNumber = "fb323"; // The Order Number.
int minNumber = 100; // The min value that will ever be in a Order. Number
int maxNumber = 2500; // The max value that will ever be in a Order. Number
int curNumber = Integer.parseInt(orderNumber.replaceAll("\\D", ""));
if (orderNumber.matches("(?i)[SF]B\\d{3,}") && (curNumber >= minNumber && curNumber <= maxNumber)) {
System.out.println("VALID!");
}
else {
System.out.println("INVALID!");
}

Remove all the dots but not \in numbers - Java

I am trying to replace all the . in a string except numbers like 1.02
I have a string : -
String rM = "51.3L of water is provided. 23.3L is used."
If I use rM.replaceAll() then every dot will be replaced, I want my string to be : -
51.3L of water is provided 23.3L is used
Is it possible to do in java?

I am not a java developer but can you try it with a pattern like below.
rM = rM.replaceAll("(?<=[a-z\\s])\\.", "");

replaceAll() with the right regex can do it for you.
This uses a negative look-ahead and look-behind to look for a '.' not in the middle of a decimal number.
rM.replaceAll("(?<![\\d])\\.(?![\\d]+)", "")

yes its possible. Something like the following should work. The regex should just check that the element starts with a character 0-9. If yes, don't change the element. If no, replace any . with the empty string.
String rM = "51.3L of water is provided. 23.3L is used.";
String[] tokens = rM.split(" ");
StringBuffer buffer = new StringBuffer();
for (String element : tokens) {
if (element.matches("[0-9]+.*")) {
buffer.append(element + " ");
} else {
buffer.append(element.replace(".", "") + " ");
}
}
System.out.println(buffer.toString());
Output:
51.3L of water is provided 23.3L is used

Here's a simple approach that assumes you want to get rid of dots that are placed directly after a char which isn't a whitespace.
The following code basically splits the sentence by whitespace(s) and removes trailing dots in every resulting character sequence and joins them afterwards to a single String again.
public static void main(String[] args) {
// example sentence
String rM = "51.3L of water is provided. 23.3L is used.";
// split the sentence by whitespace(s)
String[] parts = rM.split("\\s+");
// go through all the parts
for (int i = 0; i < parts.length; i++) {
// check if one of the parts ends with a dot
if (parts[i].endsWith(".")) {
// if it does, replace that part by itself minus the trailing dot
parts[i] = parts[i].substring(0, parts[i].length() - 1);
}
}
// join the parts to a sentence String again
String removedUndesiredDots = String.join(" ", parts);
// and print that
System.out.println(removedUndesiredDots);
}
The output is
51.3L of water is provided 23.3L is used

Using negative lookahead you can use \.(?![\d](\.[\d])?).
private static final String DOTS_NO_NUM_REGEX = "\\.(?![\\d](\\.[\\d])?)";
private static final Pattern PATTERN = Pattern.compile(DOTS_NO_NUM_REGEX);
public static void main(String[] args)
{
String s = "51.3L of water is provided. 23.3L is used.";
String replaced = PATTERN.matcher(s).replaceAll("");
System.out.println(replaced);
}
Output:
51.3L of water is provided 23.3L is used

Get certain data from text - Java

I am creating a bukkit plugin for minecraft and i need to know a few things before i move on.
I want to check if a text has this layout: "B:10 S:5" for example.
It stands for Buy:amount and Sell:amount
How can i check the easiest way if it follows the syntax?
It can be any number that is 0 or over.
Another problem is to get this data out of the text. how can i check what text is after B: and S: and return it as an integer
I have not tried out this yet because i have no idea where to start.
Thanks for help!

In the simple problem you gave, you can get away with a simple answer. Otherwise, see the regex answer below.
boolean test(String str){
try{
//String str = "B:10 S:5";
String[] arr = str.split(" ");//split to left and right of space = [B:10,S:5]
String[] bArr = arr[0].split(":");//split ...first colon = [B,10]
String[] sArr = arr[1].split(":");//... second colon = [S,5]
//need to use try/catch here in case the string is not an int value.
String labelB = bArr[0];
Integer b = Integer.parseInt(bArr[1]);
String labelS = sArr[0];
Integer s = Integer.parseInt(sArr[1]);
}catch( Exception e){return false;}
return true;
}
See my answer here for a related task. More related details below.
How can I parse a string for a set?
Essentially, you need to use regex and iterate through the groups. Just in case the grammar is not always B and S, I made this more abstract.Also, if there are extra white spaces in the middle for what ever reason, I made that more broad too. The pattern says there are 4 groups (indicated by parentheses): label1, number1, label2, and number2. + means 1 or more. [] means a set of characters. a-z is a range of characters (don't put anything between A-Z and a-z). There are also other ways of showing alpha and numeric patterns, but these are easier to read.
//this is expensive
Pattern p=Pattern.compile("([A-Za-z]+):([0-9]+)[ ]+([A-Za-z]+):([0-9]+)");
boolean test(String txt){
Matcher m=p.matcher(txt);
if(!m.matches())return false;
int groups=m.groupCount();//should only equal 5 (default whole match+4 groups) here, but you can test this
System.out.println("Matched: " + m.group(0));
//Label1 = m.group(1);
//val1 = m.group(2);
//Label2 = m.group(3);
//val2 = m.group(4);
return true;
}

Use Regular Expression.
In your case,^B:(\d)+ S:(\d)+$ is enough.
In java, to use a regular expression:
public class RegExExample {
public static void main(String[] args) {
Pattern p = Pattern.compile("^B:(\d)+ S:(\d)+$");
for (int i = 0; i < args.length; i++)
if (p.matcher(args[i]).matches())
System.out.println( "ARGUMENT #" + i + " IS VALID!")
else
System.out.println( "ARGUMENT #" + i + " IS INVALID!");
}
}
This sample program take inputs from command line, validate it against the pattern and print the result to STDOUT.

Getting match of Group with Asterisk?

How can I get the content for a group with an asterisk?
For example I'd like to pare a comma separated list, e. g. 1,2,3,4,5.
private static final String LIST_REGEX = "^(\\d+)(,\\d+)*$";
private static final Pattern LIST_PATTERN = Pattern.compile(LIST_REGEX);
public static void main(String[] args) {
final String list = "1,2,3,4,5";
final Matcher matcher = LIST_PATTERN.matcher(list);
System.out.println(matcher.matches());
for (int i = 0, n = matcher.groupCount(); i < n; i++) {
System.out.println(i + "\t" + matcher.group(i));
}
}
And the output is
true
0 1,2,3,4,5
1 1
How can I get every single entry, i. e. 1, 2, 3, ...?
I am searching for a common solution. This is only a demonstrative example.
Please imagine a more complicated regex like ^\\[(\\d+)(,\\d+)*\\]$ to match a list like [1,2,3,4,5]

You can use String.split().
for (String segment : "1,2,3,4,5".split(","))
System.out.println(segment);
Or you can repeatedly capture with assertion:
Pattern pattern = Pattern.compile("(\\d),?");
for (Matcher m = pattern.matcher("1,2,3,4,5");; m.find())
m.group(1);
For your second example you added you can do a similar match.
for (String segment : "!!!!![1,2,3,4,5] //"
.replaceFirst("^\\D*(\\d(?:,\\d+)*)\\D*$", "$1")
.split(","))
System.out.println(segment);
I made an online code demo. I hope this is what you wanted.
how can I get all the matches (zero, one or more) for a arbitary group with an asterisk (xyz)*? [The group is repeated and I would like to get every repeated capture.]
No, you cannot. Regex Capture Groups and Back-References tells why:
The Returned Value for a Given Group is the Last One Captured
Since a capture group with a quantifier holds on to its number, what value does the engine return when you inspect the group? All engines return the last value captured. For instance, if you match the string A_B_C_D_ with ([A-Z]_)+, when you inspect the match, Group 1 will be D_. With the exception of the .NET engine, all intermediate values are lost. In essence, Group 1 gets overwritten each time its pattern is matched.

I assume you may be looking for something like the following, this will handle both of your examples.
private static final String LIST_REGEX = "^\\[?(\\d+(?:,\\d+)*)\\]?$";
private static final Pattern LIST_PATTERN = Pattern.compile(LIST_REGEX);
public static void main(String[] args) {
final String list = "[1,2,3,4,5]";
final Matcher matcher = LIST_PATTERN.matcher(list);
matcher.find();
int i = 0;
String[] vals = matcher.group(1).split(",");
System.out.println(matcher.matches());
System.out.println(i + "\t" + matcher.group(1));
for (String x : vals) {
i++;
System.out.println(i + "\t" + x);
}
}
Output
true
0 1,2,3,4,5
1 1
2 2
3 3
4 4
5 5

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Splitting up input using regular expressions in Java - java

Did you accidentally put zeroes into some of those formula where the letter "O" (oxygen) was supposed to be? If so: "C10H12N2O3".split("(?<=[0-9A-Za-z])(?=[A-Z])"); [C10, H12, N2, O3] "CH2BrCl".split("(?<=[0-9A-Za-z])(?=[A-Z])"); [C, H2, Br, Cl]

Related

finding out if the characters of a string exist in another string with the same order or not using regex in java

Java check if a scan input is both a specific string and only three digits

Remove all the dots but not \in numbers - Java

Get certain data from text - Java

Getting match of Group with Asterisk?

Categories

Resources