Get certain data from text - Java

Get certain data from text - Java - java

I am creating a bukkit plugin for minecraft and i need to know a few things before i move on.
I want to check if a text has this layout: "B:10 S:5" for example.
It stands for Buy:amount and Sell:amount
How can i check the easiest way if it follows the syntax?
It can be any number that is 0 or over.
Another problem is to get this data out of the text. how can i check what text is after B: and S: and return it as an integer
I have not tried out this yet because i have no idea where to start.
Thanks for help!

In the simple problem you gave, you can get away with a simple answer. Otherwise, see the regex answer below.
boolean test(String str){
try{
//String str = "B:10 S:5";
String[] arr = str.split(" ");//split to left and right of space = [B:10,S:5]
String[] bArr = arr[0].split(":");//split ...first colon = [B,10]
String[] sArr = arr[1].split(":");//... second colon = [S,5]
//need to use try/catch here in case the string is not an int value.
String labelB = bArr[0];
Integer b = Integer.parseInt(bArr[1]);
String labelS = sArr[0];
Integer s = Integer.parseInt(sArr[1]);
}catch( Exception e){return false;}
return true;
}
See my answer here for a related task. More related details below.
How can I parse a string for a set?
Essentially, you need to use regex and iterate through the groups. Just in case the grammar is not always B and S, I made this more abstract.Also, if there are extra white spaces in the middle for what ever reason, I made that more broad too. The pattern says there are 4 groups (indicated by parentheses): label1, number1, label2, and number2. + means 1 or more. [] means a set of characters. a-z is a range of characters (don't put anything between A-Z and a-z). There are also other ways of showing alpha and numeric patterns, but these are easier to read.
//this is expensive
Pattern p=Pattern.compile("([A-Za-z]+):([0-9]+)[ ]+([A-Za-z]+):([0-9]+)");
boolean test(String txt){
Matcher m=p.matcher(txt);
if(!m.matches())return false;
int groups=m.groupCount();//should only equal 5 (default whole match+4 groups) here, but you can test this
System.out.println("Matched: " + m.group(0));
//Label1 = m.group(1);
//val1 = m.group(2);
//Label2 = m.group(3);
//val2 = m.group(4);
return true;
}

Use Regular Expression.
In your case,^B:(\d)+ S:(\d)+$ is enough.
In java, to use a regular expression:
public class RegExExample {
public static void main(String[] args) {
Pattern p = Pattern.compile("^B:(\d)+ S:(\d)+$");
for (int i = 0; i < args.length; i++)
if (p.matcher(args[i]).matches())
System.out.println( "ARGUMENT #" + i + " IS VALID!")
else
System.out.println( "ARGUMENT #" + i + " IS INVALID!");
}
}
This sample program take inputs from command line, validate it against the pattern and print the result to STDOUT.

Related

finding out if the characters of a string exist in another string with the same order or not using regex in java

i want to write a program in java using REGEX that gets 2 strings from the input ( the first one is shorter than the second one ) and then if the characters of the first string was inside the second string with the same order but they do not need to be next to each other ( it is not substring ) it outputs "true" and if not it outputs "false" here's an example:
example1:
input:
phantom
pphvnbajknzxcvbnatopopoim
output:
true
in the above example it is obvious we can see the word "phantom" in the second string (the characters are in the same order)
example2:
input:
apple
fgayiypvbnltsrgte
output:
false
as you can see apple dos not exists in the second string with the conditions i have earlier mentioned so it outputs false
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
String word1 = input.next();
String word2 = input.next();
String pattern = "";
int n = word1.length();
char[] word1CharArr = word1.toCharArray();
for ( int i = 0 ; i < n ; i++) {
pattern += "[:alnum:]" +word1CharArr[i]+"[:alnum:]";
// pattern += ".*\\b|\\B" +word1CharArr[i]+"\\b|\\B";
}
pattern = "^" + pattern + "$";
// pattern = "(?s)" + pattern + ".*";
// System.out.println(pattern);
System.out.println(word2.matches(pattern));
}
}
here is what i did . i broke my first string to its characters and want to use REGEX before and after each character to determine the pattern. I have searched much about REGEX and how to use it but still i have problem here. the part i have commented comes out from one of my searches but it did not work
I emphasize that i want to solve it with REGEX not any other way.

[:alnum:] isn't a thing. Even if it is, that would match exactly one character, not 'any number, from 0 to infinitely many of them'.
You just want phantom with .* in the middle: ^.*p.*h.*a.*n.*t.*o.*m.*$' is all you need. After all, phantom` 'fits', and so does paahaanaataaoaamaa -
String pattern = word1.chars()
.mapToObj(c -> ".*" + (char) c)
.collect(Collectors.joining()) + ".*";
should get the job done.

Replace characters and keep only one of these characters

Can someone help me here? I dont understand where's the problem...
I need check if a String have more than 1 char like 'a', if so i need replace all 'a' for a empty space, but i still want only one 'a'.
String text = "aaaasomethingsomethingaaaa";
for (char c: text.toCharArray()) {
if (c == 'a') {
count_A++;//8
if (count_A > 1) {//yes
//app crash at this point
do {
text.replace("a", "");
} while (count_A != 1);
}
}
}
the application stops working when it enters the while loop. Any suggestion? Thank you very much!

If you want to replace every a in the string except for the last one then you may try the following regex option:
String text = "aaaasomethingsomethingaaaa";
text = text.replaceAll("a(?=.*a)", " ");
somethingsomething a
Demo
Edit:
If you really want to remove every a except for the last one, then use this:
String text = "aaaasomethingsomethingaaaa";
text = text.replaceAll("a(?=.*a)", "");

You can also do it like
String str = new String ("asomethingsomethingaaaa");
int firstIndex = str.indexOf("a");
firstIndex++;
String firstPart = str.substring(0, firstIndex);
String secondPart = str.substring(firstIndex);
System.out.println(firstPart + secondPart.replace("a", ""));

Maybe I'm wrong here but I have a feeling your talking about runs of any single character within a string. If this is the case then you can just use a little method like this:
public String removeCharacterRuns(String inputString) {
return inputString.replaceAll("([a-zA-Z])\\1{2,}", "$1");
}
To use this method:
String text = "aaaasomethingsomethingaaaa";
System.out.println(removeCharacterRuns(text));
The console output is:
asomethingsomethinga
Or perhaps even:
String text = "FFFFFFFourrrrrrrrrrrty TTTTTwwwwwwooo --> is the answer to: "
+ "The Meeeeeaniiiing of liiiiife, The UUUniveeeerse and "
+ "Evvvvverything.";
System.out.println(removeCharacterRuns(text));
The console output is........
Fourty Two --> is the answer to: The Meaning of life, The Universe and Everything.
The Regular Expression used within the provided removeCharacterRuns() method was actually borrowed from the answers provided within this SO Post.
Regular Expression Explanation:

How to split a string by every other separator

There's a string
String str = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
How do I split it into strings like this
"ggg;ggg;"
"nnn;nnn;"
"aaa;aaa;"
"xxx;xxx;"
???????

Using Regex
String input = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
Pattern p = Pattern.compile("([a-z]{3});\\1;");
Matcher m = p.matcher(input);
while (m.find())
// m.group(0) is the result
System.out.println(m.group(0));
Will output
ggg;ggg;
nnn;nnn;
aaa;aaa;
xxx;xxx;

I assume that the you only want to check if the last segment is similar and not every segment that has been read.
If that is not the case then you would probably have to use an ArrayList instead of a Stack.
I also assumed that each segment has the format /([a-z])\1\1/.
If that is not the case either then you should change the if statement with:
(stack.peek().substring(0,index).equals(temp))
public static Stack<String> splitString(String text, char split) {
Stack<String> stack = new Stack<String>();
int index = text.indexOf(split);
while (index != -1) {
String temp = text.substring(0, index);
if (!stack.isEmpty()) {
if (stack.peek().charAt(0) == temp.charAt(0)) {
temp = stack.pop() + split + temp;
}
}
stack.push(temp);
text = text.substring(index + 1);
index = text.indexOf(split);
}
return stack;
}

Split and join them.
public static void main(String[] args) throws Exception {
String data = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
String del = ";";
int splitSize = 2;
StringBuilder sb = new StringBuilder();
for (Iterable<String> iterable : Iterables.partition(Splitter.on(del).split(data), splitSize)) {
sb.append("\"").append(Joiner.on(del).join(iterable)).append(";\"");
}
sb.delete(sb.length()-3, sb.length());
System.out.println(sb.toString());
}
Ref : Split a String at every 3rd comma in Java

Use split with a regex:
String data="ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
String [] array=data.split("(?<=\\G\\S\\S\\S;\\S\\S\\S);");
S: A non-whitespace character
G: last match/start of string, think of it of a way to skip delimiting if the
previous string matches current one.
?<=:positive look-behind will match semicolon which has string behind it.

Some other answer, that only works given your specific example input.
You see, in your example, there are two similarities:
All patterns seem to have exactly three characters
All patterns occur exactly twice
In other words: if those two properties are really met for all your input, you could avoid splitting - as you know exactly what to find in each position of your string.
Of course, following the other answers for "real" splitting are more flexible; but (theoretically), you could just go forward and do a bunch of substring calls in order to directly access all elements.

Splitting string in between two characters in Java

I am currently attempting to interpret some code I wrote for something. The information I would like to split looks something like this:
{hey=yes}TEST
What I am trying to accomplish, is splitting above string in between '}' and 'T' (T, which could be any letter). The result I am after is (in pseudocode):
["{hey=yes}", "TEST"]
How would one go about doing so? I know basic regex, but have never gotten into using it to split strings in between letters before.
Update:
In order to split the string I am using the String.split method. Do tell if there is a better way to go about doing this.

You can use String's split method, as follow:
String str = "{hey=foo}TEST";
String[] split = str.split("(?<=})");
System.out.println(split[0] + ", " + split[1]);
It splits the string and prints this:
{hey=foo}, TEST
?<=}, is to split after the character } and keep the character while doing it. By default, if you just split on a character, it will be removed by the split.
This other answer provides a complete explanation of all options when using the split method:
how-to-split-string-with-some-separator-but-without-removing-that-separator-in-j

Usage of regexp for such a small piece of code can be really slow, if it is repeated thousands of times (e.g. like analysing Alfresco metadata for lot of documents).
Look at this snippet:
String s = "{key=value}SOMETEXT";
String[] e = null;
long now = 0L;
now = new Date().getTime();
for (int i = 0; i < 3000000; i++) {
e = s.split("(?<=})");
}
System.out.println("Regexp: " + (new Date().getTime() - now));
now = new Date().getTime();
for (int i = 0; i < 3000000; i++) {
int idx = s.indexOf('}') + 1;
e = new String[] { s.substring(0, idx), s.substring(idx) };
}
System.out.println("IndexOf:" + (new Date().getTime() - now));
result is
Regexp: 2544
IndexOf:113
This means that regexp is 25 times slower than a (easier) substring. Keep it in mind: it can make the difference between a efficient code and a elegant (!) one.

If you're looking for a regex approach and also want some validation that input follows the expected syntax you probably want something like this:
public List<String> splitWithRegexp(String string)
{
Matcher matcher = Pattern.compile("(\\{.*\\})(.*)").matcher(string);
if (matcher.find())
return Arrays.asList(matcher.group(1), matcher.group(2));
else
throw new IllegalArgumentException("Input didn't match!");
}
The parenthesis in the regexp captures groups, which you can access with matcher.group(n) calls. Group 0 matches the whole pattern.

Splitting up input using regular expressions in Java

I am making a program that lets a user input a chemical for example C9H11N02. When they enter that I want to split it up into pieces so I can have it like C9, H11, N, 02. When I have it like this I want to make changes to it so I can make it C10H12N203 and then put it back together. This is what I have done so far. using the regular expression I have used I can extract the integer value, but how would I go about get C10, H11 etc..?
System.out.println("Enter Data");
Scanner k = new Scanner( System.in );
String input = k.nextLine();
String reg = "\\s\\s\\s";
String [] data;
data = input.split( reg );
int m = Integer.parseInt( data[0] );
int n = Integer.parseInt( data[1] );

It can be done using look arounds:
String[] parts = input.split("(?<=.)(?=[A-Z])");
Look arounds are zero-width, non-consuming assertions.
This regex splits the input where the two look arounds match:
(?<=.) means "there is a preceding character" (ie not at the start of input)
(?=[A-Z]) means "the next character is a capital letter" (All elements start with A-Z)
Here's a test, including a double-character symbol for some edge cases:
public static void main(String[] args) {
String input = "C9KrBr2H11NO2";
String[] parts = input.split("(?<=.)(?=[A-Z])");
System.out.println(Arrays.toString(parts));
}
Output:
[C9, Kr, Br2, H11, N, O2]
If you then wanted to split up the individual components, use a nested call to split():
public static void main(String[] args) {
String input = "C9KrBr2H11NO2";
for (String component : input.split("(?<=.)(?=[A-Z])")) {
// split on non-digit/digit boundary
String[] symbolAndNumber = component.split("(?<!\\d)(?=\\d)");
String element = symbolAndNumber[0];
// elements without numbers won't be split
String count = symbolAndNumber.length == 1 ? "1" : symbolAndNumber[1];
System.out.println(element + " x " + count);
}
}
Output:
C x 9
Kr x 1
Br x 2
H x 11
N x 1
O x 2

Did you accidentally put zeroes into some of those formula where the letter "O" (oxygen) was supposed to be? If so:
"C10H12N2O3".split("(?<=[0-9A-Za-z])(?=[A-Z])");
[C10, H12, N2, O3]
"CH2BrCl".split("(?<=[0-9A-Za-z])(?=[A-Z])");
[C, H2, Br, Cl]

I believe the following code should allow you to extract the various elements and their associated count. Of course, brackets make things more complicated, but you didn't ask about them!
Pattern pattern = Pattern.compile("([A-Z][a-z]*)([0-9]*)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String element = matcher.group(1);
int count = 1;
if (matcher.groupCount > 1) {
try {
count = Integer.parseInt(matcher.group(2));
} catch (NumberFormatException e) {
// Regex means we should never get here!
}
}
// Do stuff with this component
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get certain data from text - Java - java

Related

finding out if the characters of a string exist in another string with the same order or not using regex in java

Replace characters and keep only one of these characters

How to split a string by every other separator

Splitting string in between two characters in Java

Splitting up input using regular expressions in Java

Categories

Resources