Java: regular expressions to parse the "edges" of a string

Java: regular expressions to parse the "edges" of a string - java

Java novice here.
Say I'm given a string:
===This 銳is a= stri = ng身===
How would I use pattern-matching to efficiently figure out how many "=" signs there are at the edges of "This 銳is a= stri = ng身"?
Also, I'm trying to use Java escape sequences such as \G, but apparently they don't compile.

I personally probably wouldn't use a regex for this, but ... this is what works:
Matcher m = Pattern.compile("^(=+).+[^=](=+)$").matcher("===Som=e=Text====");
m.find();
int count = m.group(1).length() + m.group(2).length();
System.out.println(count);
(Note this isn't doing error checking and assume there are = on both ends)
Edit to Add: And here's one that works regardless if there's = on either end:
public static int equalsCount(String source)
{
int count = 0;
Matcher m = Pattern.compile("^(=+)?.+[^=](=+)?$").matcher(source);
if (m.find())
{
count += m.group(1) == null ? 0 : m.group(1).length();
count += m.group(2) == null ? 0 : m.group(2).length();
}
return count;
}
public static void main(String[] args)
{
System.out.println(equalsCount("===Some=tex=t="));
System.out.println(equalsCount("===Some=tex=t"));
System.out.println(equalsCount("Some=tex=t="));
System.out.println(equalsCount("Some=tex=t"));
}
On the other hand ... you could avoid the regex and do:
String myString = "==blah=";
int count = 0;
int i = 0;
while (myString.charAt(i++) == '=')
{
count++;
}
i = myString.length() - 1;
while (myString.charAt(i--) == '=')
{
count++;
}

If you want to count the number of occurrence of "=" at the edges then try this.
int count = str.length() - str.replaceAll("[^=]=[^=]", "").length();

This can be one probable answer:
public static void main(String[] args) {
int count = 0;
String str = "===This is a= stri = ng===";
Pattern edgeEq = Pattern.compile("=");
Pattern wordEq = Pattern.compile("[^=]=+[^=]");
Matcher edgeMatch = edgeEq.matcher(str);
while (edgeMatch.find()) {
count++;
}
Matcher wordMatch = wordEq.matcher(str);
while (wordMatch.find()) {
count--;
}
System.out.println(count);
}
This will help you find the number of = on the edges of the string.

Assuming there are always the same number of = at the start as at the end:
import java.util.regex.*;
Matcher m = Pattern.compile("^=*").matcher(s);
int count = m.find()? m.group(0).length(): 0;

Use the following code
String s1 = "===This 銳is a= stri = ng身===";
System.out.println("Length : "+s1.length());
p = Pattern.compile("^=+");
m = p.matcher(s1);
int count = 0;
while (m.find())
{
count = m.group().length();
System.out.println("Group : "+m.group());
}
p = Pattern.compile("(=+)$");
m = p.matcher(s1);
while (m.find())
{
count += m.group().length();
System.out.println("End Group : "+m.group());
}
System.out.println("Total : " + count);

If = at the edges are balanced you can use
^(=+).*\1$
Group1's length is the length of = at the edges

Related

Finding Count of Pattern in a String (Overlap Inclusive)

So I'm trying to write an algorithm that counts the number of occurrences of some pattern, say "aa", within a string, say "aaabca." The number of patterns in that string should return an integer, in this case 2, because the first three characters contain two occurrences of the pattern.
What I have finds the number of patterns under the assumption the existing occurrences of a pattern is NOT overlapping:
public class Pattern{
public static void main(String[] args){
Scanner scan = new Scanner(System.in);
System.out.println("Enter the string: ");
String s = scan.nextLine();
String[] splittedInput = s.split(";");
String pattern = splittedInput[0];
String blobs = splittedInput[1];
Pattern p = new Pattern();
p.count(pattern, blobs);
}
public static void count(String pattern, String blobs){
String[] substrings = blobs.split("[|]");
int numOccurences = 0;
int[] instances = new int[substrings.length];
int patternLength = pattern.length();
for (int i = 0; i < instances.length; i++){
int length = substrings[i].length();
String temp = substrings[i];
temp = temp.replaceAll(pattern, "");
int postLength = temp.length();
numOccurences = (length - postLength) / pattern.length();
instances[i] = numOccurences;
numOccurences = 0;
}
int sum = 0;
for (int i = 0; i < instances.length; i++){
System.out.print(instances[i] + "|");
sum += instances[i];
}
System.out.print(sum);
}
}
Any suggestions?

I would personally compare the pattern as a substring in this case. For example a run of a single String from your array would look like this:
//Initial values
String blobs = "aaaabcaaa";
String pattern = "aab";
String[] substrings = blobs.split("[|]");
//The code I added that should placed into the loop
int numOccurences = 0;
String str = substrings[0];
for (int k = 0; k <= (str.length() - pattern.length()); k++)
{
if (str.substring(k, k + pattern.length()).equals(pattern))
{
numOccurences++;
}
}
System.out.println(numOccurences);
If you want to run this on each String in your array simply modify String str = substrings[0] to String str = substrings[i] and iterate over the array storing the final numOccurences as you please.
Example Run:
String is aaaabcaaa
Pattern is aa
Output is 5 occurences

For one String, match is the String you're looking for:
int len = theStr.length ();
int start = 0;
int pos;
int count = 0;
while ((start < len) && ((pos = theStr.indexOf (match, start)) >= 0))
{
++count;
start = pos + 1;
}

If you use Java 8 you can count this value in the following way.
Example:
String blobs = "aaabcaaa";
String pattern = "aa";
List<String> strings = Arrays.asList(blobs.split(""));
long count = IntStream.range(0, strings.size())
.mapToObj(index -> index < strings.size() - 1 ? strings.get(index) + strings.get(index + 1) : strings.get(index - 1))
.filter(str -> str.equals(pattern))
.count();
System.out.println("Result count: " + count);

Continually taking substrings and using the startsWith method seems to work pretty well.
String pat = "ss";
String str = "kskslsksaaaslsslssskssssllsssss";
int count = 0;
while (str.length() >= pat.length()) {
count += str.startsWith(pat) ? 1 : 0;
str = str.substring(1);
}
System.out.println("count = " + count);
You can also take a similar approach with streams.
long count = IntStream.range(0, str.length()).mapToObj(
n -> str.substring(n)).filter(n -> n.startsWith(pat)).count();
System.out.println("count = " + count);
But in this case I actually prefer the non-stream approach.

parsing/converting task with characters and numbers within

It is necessary to repeat the character, as many times as the number behind it.
They are positive integer numbers.
case #1
input: "abc3leson11"
output: "abccclesonnnnnnnnnnn"
I already finish it in the following way:
String a = "abbc2kd3ijkl40ggg2H5uu";
String s = a + "*";
String numS = "";
int cnt = 0;
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
if (Character.isDigit(ch)) {
numS = numS + ch;
cnt++;
} else {
cnt++;
try {
for (int j = 0; j < Integer.parseInt(numS); j++) {
System.out.print(s.charAt(i - cnt));
}
if (i != s.length() - 1 && !Character.isDigit(s.charAt(i + 1))) {
System.out.print(s.charAt(i));
}
} catch (Exception e) {
if (i != s.length() - 1 && !Character.isDigit(s.charAt(i + 1))) {
System.out.print(s.charAt(i));
}
}
cnt = 0;
numS = "";
}
}
But I wonder is there some better solution with less and cleaner code?

Could you take a look below? I'm using a library from StringUtils from Apache Common Utils to repeat character:
public class MicsTest {
public static void main(String[] args) {
String input = "abc3leson11";
String output = input;
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(input);
while (m.find()) {
int number = Integer.valueOf(m.group());
char repeatedChar = input.charAt(m.start()-1);
output = output.replaceFirst(m.group(), StringUtils.repeat(repeatedChar, number));
}
System.out.println(output);
}
}
In case you don't want to use StringUtils. You can use the below custom method to achieve the same effect:
public static String repeat(char c, int times) {
char[] chars = new char[times];
Arrays.fill(chars, c);
return new String(chars);
}

Using java basic string regx should make it more terse as follows:
public class He1 {
private static final Pattern pattern = Pattern.compile("[a-zA-Z]+(\\d+).*");
// match the number between or the last using regx;
public static void main(String... args) {
String s = "abc3leson11";
System.out.println(parse(s));
s = "abbc2kd3ijkl40ggg2H5uu";
System.out.println(parse(s));
}
private static String parse(String s) {
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
int num = Integer.valueOf(matcher.group(1));
char prev = s.charAt(s.indexOf(String.valueOf(num)) - 1);
// locate the char before the number;
String repeated = new String(new char[num-1]).replace('\0', prev);
// since the prev is not deleted, we have to decrement the repeating number by 1;
s = s.replaceFirst(String.valueOf(num), repeated);
matcher = pattern.matcher(s);
}
return s;
}
}
And the output should be:
abccclesonnnnnnnnnnn
abbcckdddijkllllllllllllllllllllllllllllllllllllllllggggHHHHHuu

String g(String a){
String result = "";
String[] array = a.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
//System.out.println(java.util.Arrays.toString(array));
for(int i=0; i<array.length; i++){
String part = array[i];
result += part;
if(++i == array.length){
break;
}
char charToRepeat = part.charAt(part.length() - 1);
result += repeat(charToRepeat+"", new Integer(array[i]) - 1);
}
return result;
}
// In Java 11 this could be removed and replaced with the builtin `str.repeat(amount)`
String repeat(String str, int amount){
return new String(new char[amount]).replace("\0", str);
}
Try it online.
Explanation:
The split will split the letters and numbers:
abbc2kd3ijkl40ggg2H5uu would become ["abbc", "2", "kd", "3", "ijkl", "40", "ggg", "2", "H", "5", "uu"]
We then loop over the parts and add any strings as is to the result.
We then increase i by 1 first and if we're done (after the "uu") in the array above, it will break the loop.
If not the increase of i will put us at a number. So it will repeat the last character of the part x amount of times, where x is the number we found minus 1.

Here is another solution:
String str = "abbc2kd3ijkl40ggg2H5uu";
String[] part = str.split("(?<=\\d)(?=\\D)|(?=\\d)(?<=\\D)");
String res = "";
for(int i=0; i < part.length; i++){
if(i%2 == 0){
res = res + part[i];
}else {
res = res + StringUtils.repeat(part[i-1].charAt(part[i-1].length()-1),Integer.parseInt(part[i])-1);
}
}
System.out.println(res);

Yet another solution :
public static String getCustomizedString(String input) {
ArrayList<String > letters = new ArrayList<>(Arrays.asList(input.split("(\\d)")));
letters.removeAll(Arrays.asList(""));
ArrayList<String > digits = new ArrayList<>(Arrays.asList(input.split("(\\D)")));
digits.removeAll(Arrays.asList(""));
for(int i=0; i< digits.size(); i++) {
int iteration = Integer.valueOf(digits.get(i));
String letter = letters.get(i);
char c = letter.charAt(letter.length()-1);
for (int j = 0; j<iteration -1 ; j++) {
letters.set(i,letters.get(i).concat(String.valueOf(c)));
}
}
String finalResult = "";
for (String str : letters) {
finalResult += str;
}
return finalResult;
}
The usage:
public static void main(String[] args) {
String testString1 = "abbc2kd3ijkl40ggg2H5uu";
String testString2 = "abc3leson11";
System.out.println(getCustomizedString(testString1));
System.out.println(getCustomizedString(testString2));
}
And the result:
abbcckdddijkllllllllllllllllllllllllllllllllllllllllggggHHHHHuu
abccclesonnnnnnnnnnn

How to split string at every nth occurrence of character in Java

I would like to split a string at every 4th occurrence of a comma ,.
How to do this? Below is an example:
String str = "1,,,,,2,3,,1,,3,,";
Expected output:
array[0]: 1,,,,
array[1]: ,2,3,,
array[2]: 1,,3,,
I tried using Google Guava like this:
Iterable<String> splitdata = Splitter.fixedLength(4).split(str);
output: [1,,,, ,,2,, 3,,1, ,,3,, ,]
I also tried this:
String [] splitdata = str.split("(?<=\\G.{" + 4 + "})");
output: [1,,,, ,,2,, 3,,1, ,,3,, ,]
Yet this is is not the output I want. I just want to split the string at every 4th occurrence of a comma.
Thanks.

Take two int variable. One is to count the no of ','. If ',' occurs then the count will move. And if the count is go to 4 then reset it to 0. The other int value will indicate that from where the string will be cut off. it will start from 0 and after the first string will be detected the the end point (char position in string) will be the first point of the next. Use the this start point and current end point (i+1 because after the occurrence happen the i value will be incremented). Finally add the string in the array list. This is a sample code. Hope this will help you. Sorry for my bad English.
String str = "1,,,,,2,3,,1,,3,,";
int k = 0;
int startPoint = 0;
ArrayList<String> arrayList = new ArrayList<>();
for (int i = 0; i < str.length(); i++)
{
if (str.charAt(i) == ',')
{
k++;
if (k == 4)
{
String ab = str.substring(startPoint, i+1);
System.out.println(ab);
arrayList.add(ab);
startPoint = i+1;
k = 0;
}
}
}

Here's a more flexible function, using an idea from this answer:
static List<String> splitAtNthOccurrence(String input, int n, String delimiter) {
List<String> pieces = new ArrayList<>();
// *? is the reluctant quantifier
String regex = Strings.repeat(".*?" + delimiter, n);
Matcher matcher = Pattern.compile(regex).matcher(input);
int lastEndOfMatch = -1;
while (matcher.find()) {
pieces.add(matcher.group());
lastEndOfMatch = matcher.end();
}
if (lastEndOfMatch != -1) {
pieces.add(input.substring(lastEndOfMatch));
}
return pieces;
}
This is how you call it using your example:
String input = "1,,,,,2,3,,1,,3,,";
List<String> pieces = splitAtNthOccurrence(input, 4, ",");
pieces.forEach(System.out::println);
// Output:
// 1,,,,
// ,2,3,,
// 1,,3,,
I use Strings.repeat from Guava.

try this also, if you want result in array
String str = "1,,,,,2,3,,1,,3,,";
System.out.println(str);
char c[] = str.toCharArray();
int ptnCnt = 0;
for (char d : c) {
if(d==',')
ptnCnt++;
}
String result[] = new String[ptnCnt/4];
int i=-1;
int beginIndex = 0;
int cnt=0,loopcount=0;
for (char ele : c) {
loopcount++;
if(ele==',')
cnt++;
if(cnt==4){
cnt=0;
result[++i]=str.substring(beginIndex,loopcount);
beginIndex=loopcount;
}
}
for (String string : result) {
System.out.println(string);
}

This work pefectly and tested in Java 8
public String[] split(String input,int at){
String[] out = new String[2];
String p = String.format("((?:[^/]*/){%s}[^/]*)/(.*)",at);
Pattern pat = Pattern.compile(p);
Matcher matcher = pat.matcher(input);
if (matcher.matches()) {
out[0] = matcher.group(1);// left
out[1] = matcher.group(2);// right
}
return out;
}
//Ex: D:/folder1/folder2/folder3/file1.txt
//if at = 2, group(1) = D:/folder1/folder2 and group(2) = folder3/file1.txt

The accepted solution above by Saqib Rezwan does not add the leftover string to the list, if it divides the string after every 4th comma and the length of the string is 9 then it will leave the 9th character, and return the wrong list.
A complete solution would be :
private static ArrayList<String> splitStringAtNthOccurrence(String str, int n) {
int k = 0;
int startPoint = 0;
ArrayList<String> list = new ArrayList();
for (int i = 0; i < str.length(); i++) {
if (str.charAt(i) == ',') {
k++;
if (k == n) {
String ab = str.substring(startPoint, i + 1);
list.add(ab);
startPoint = i + 1;
k = 0;
}
}
// if there is no comma left and there are still some character in the string
// add them to list
else if (!str.substring(i).contains(",")) {
list.add(str.substring(startPoint));
break;
}
}
return list;
}
}

Counting words from array in a string

I have an array of string say
A=["hello", "you"]
I have a string, say
s="hello, hello you are so wonderful"
I need to count the number of occurrence of strings from A in s.
In this case, the number of occurrences is 3 (2 "hello", 1 "you").
How to do this effectively? (A might contains lots of words, and s might be long in practice)

Try:
Map<String, Integer> wordCount = new HashMap<>();
for(String a : dictionnary) {
wordCount.put(a, 0);
}
for(String s : text.split("\\s+")) {
Integer count = wordCount.get(s);
if(count != null) {
wordCount.put(s, count + 1);
}
}

public void countMatches() {
String[] A = {"hello", "you"};
String s = "hello, hello you are so wonderful";
String patternString = "(" + StringUtils.join(A, "|") + ")";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(s);
int count = 0;
while (matcher.find()) {
count++;
}
System.out.println(count);
}
Note that StringUtils is from apache commons. If you don't want to include and additional jar you can just construct that string using a for loop.

HashSet<String> searchWords = new HashSet<String>();
for(String a : dictionary) {
searchWords.add(a);
}
int count = 0;
for(String s : input.split("[ ,]")) {
if(searchWords.contains(s)) {
count++;
}
}

int count =0;
for(int i=0;i<A.length;i++)
{
count = count + s.split(A[i],-1).length - 1;
}
Working Ideone : http://ideone.com/Z9K3JX

This is fully working method with output :)
public static void main(String[] args) {
String[] A={"hello", "you"};
String s= "hello, hello you are so wonderful";
int[] count = new int[A.length];
for (int i = 0; i < A.length; i++) {
count[i] = (s.length() - s.replaceAll(A[i], "").length())/A[i].length();
}
for (int i = 0; i < count.length; i++) {
System.out.println(A[i] + ": " + count[i]);
}
}
What does this line do?
count[i] = (s.length() - s.replaceAll(A[i], "").length())/A[i].length();
This part s.replaceAll(A[i], "") changes all "hello" to empty "" string in the text.
So I take the length of everything s.length() I substract from it the length of same string without that word s.replaceAll(A[i], "").length() and I divide it by the length of that word /A[i].length()
Sample output for this example :
hello: 2
you: 1

You can use the String Tokenizer
Do something like this:
A = ["hello", "you"];
s = "hello, hello you are so wonderful";
StringTokenizer st = new StringTokenizer(s);
while (st.hasMoreElements()) {
for (String i: A) {
if(st.nextToken() == i){
//You can keep going from here
}
}
}

This is what I came up with:
It doesn't create any new objects. It uses String.indexOf(String, int), keeps track of the current index, and increments the occurance-count.
public class SearchWordCount {
public static final void main(String[] ignored) {
String[] searchWords = {"hello", "you"};
String input = "hello, hello you are so wonderful";
for(int i = 0; i < searchWords.length; i++) {
String searchWord = searchWords[i];
System.out.print(searchWord + ": ");
int foundCount = 0;
int currIdx = 0;
while(currIdx != -1) {
currIdx = input.indexOf(searchWord, currIdx);
if(currIdx != -1) {
foundCount++;
currIdx += searchWord.length();
} else {
currIdx = -1;
}
}
System.out.println(foundCount);
}
}
}
Output:
hello: 2
you: 1

Regex pattern matcher

I have a string :
154545K->12345K(524288K)
Suppose I want to extract numbers from this string.
The string contains the group 154545 at position 0, 12345 at position 1 and 524288 at position 2.
Using regex \\d+, I need to extract 12345 which is at position 1.
I am getting the desired result using this :
String lString = "154545K->12345K(524288K)";
Pattern lPattern = Pattern.compile("\\d+");
Matcher lMatcher = lPattern.matcher(lString);
String lOutput = "";
int lPosition = 1;
int lGroupCount = 0;
while(lMatcher.find()) {
if(lGroupCount == lPosition) {
lOutput = lMatcher.group();
break;
}
else {
lGroupCount++;
}
}
System.out.println(lOutput);
But, is there any other simple and direct way to achieve this keeping the regex same \\d+(without using the group counter)?

try this
String d1 = "154545K->12345K(524288K)".replaceAll("(\\d+)\\D+(\\d+).*", "$1");

If you expect your number to be at the position 1, then you can use find(int start) method like this
if (lMatcher.find(1) && lMatcher.start() == 1) {
// Found lMatcher.group()
}
You can also convert your loop into for loop to get ride of some boilerplate code
String lString = "154540K->12341K(524288K)";
Pattern lPattern = Pattern.compile("\\d+");
Matcher lMatcher = lPattern.matcher(lString);
int lPosition = 2;
for (int i = 0; i < lPosition && lMatcher.find(); i++) {}
if (!lMatcher.hitEnd()) {
System.out.println(lMatcher.group());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: regular expressions to parse the "edges" of a string - java

If you want to count the number of occurrence of "=" at the edges then try this. int count = str.length() - str.replaceAll("[^=]=[^=]", "").length();

Assuming there are always the same number of = at the start as at the end: import java.util.regex.; Matcher m = Pattern.compile("^=").matcher(s); int count = m.find()? m.group(0).length(): 0;

If = at the edges are balanced you can use ^(=+).*\1$ Group1's length is the length of = at the edges

Related

Finding Count of Pattern in a String (Overlap Inclusive)

parsing/converting task with characters and numbers within

How to split string at every nth occurrence of character in Java

Counting words from array in a string

Regex pattern matcher

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: regular expressions to parse the "edges" of a string - java

If you want to count the number of occurrence of "=" at the edges then try this. int count = str.length() - str.replaceAll("[^=]=[^=]", "").length();

Assuming there are always the same number of = at the start as at the end: import java.util.regex.*; Matcher m = Pattern.compile("^=*").matcher(s); int count = m.find()? m.group(0).length(): 0;

If = at the edges are balanced you can use ^(=+).*\1$ Group1's length is the length of = at the edges

Related

Finding Count of Pattern in a String (Overlap Inclusive)

parsing/converting task with characters and numbers within

How to split string at every nth occurrence of character in Java

Counting words from array in a string

Regex pattern matcher

Categories

Resources

Assuming there are always the same number of = at the start as at the end: import java.util.regex.; Matcher m = Pattern.compile("^=").matcher(s); int count = m.find()? m.group(0).length(): 0;