I would like to split a string at every 4th occurrence of a comma ,.
How to do this? Below is an example:
String str = "1,,,,,2,3,,1,,3,,";
Expected output:
array[0]: 1,,,,
array[1]: ,2,3,,
array[2]: 1,,3,,
I tried using Google Guava like this:
Iterable<String> splitdata = Splitter.fixedLength(4).split(str);
output: [1,,,, ,,2,, 3,,1, ,,3,, ,]
I also tried this:
String [] splitdata = str.split("(?<=\\G.{" + 4 + "})");
output: [1,,,, ,,2,, 3,,1, ,,3,, ,]
Yet this is is not the output I want. I just want to split the string at every 4th occurrence of a comma.
Thanks.
Take two int variable. One is to count the no of ','. If ',' occurs then the count will move. And if the count is go to 4 then reset it to 0. The other int value will indicate that from where the string will be cut off. it will start from 0 and after the first string will be detected the the end point (char position in string) will be the first point of the next. Use the this start point and current end point (i+1 because after the occurrence happen the i value will be incremented). Finally add the string in the array list. This is a sample code. Hope this will help you. Sorry for my bad English.
String str = "1,,,,,2,3,,1,,3,,";
int k = 0;
int startPoint = 0;
ArrayList<String> arrayList = new ArrayList<>();
for (int i = 0; i < str.length(); i++)
{
if (str.charAt(i) == ',')
{
k++;
if (k == 4)
{
String ab = str.substring(startPoint, i+1);
System.out.println(ab);
arrayList.add(ab);
startPoint = i+1;
k = 0;
}
}
}
Here's a more flexible function, using an idea from this answer:
static List<String> splitAtNthOccurrence(String input, int n, String delimiter) {
List<String> pieces = new ArrayList<>();
// *? is the reluctant quantifier
String regex = Strings.repeat(".*?" + delimiter, n);
Matcher matcher = Pattern.compile(regex).matcher(input);
int lastEndOfMatch = -1;
while (matcher.find()) {
pieces.add(matcher.group());
lastEndOfMatch = matcher.end();
}
if (lastEndOfMatch != -1) {
pieces.add(input.substring(lastEndOfMatch));
}
return pieces;
}
This is how you call it using your example:
String input = "1,,,,,2,3,,1,,3,,";
List<String> pieces = splitAtNthOccurrence(input, 4, ",");
pieces.forEach(System.out::println);
// Output:
// 1,,,,
// ,2,3,,
// 1,,3,,
I use Strings.repeat from Guava.
try this also, if you want result in array
String str = "1,,,,,2,3,,1,,3,,";
System.out.println(str);
char c[] = str.toCharArray();
int ptnCnt = 0;
for (char d : c) {
if(d==',')
ptnCnt++;
}
String result[] = new String[ptnCnt/4];
int i=-1;
int beginIndex = 0;
int cnt=0,loopcount=0;
for (char ele : c) {
loopcount++;
if(ele==',')
cnt++;
if(cnt==4){
cnt=0;
result[++i]=str.substring(beginIndex,loopcount);
beginIndex=loopcount;
}
}
for (String string : result) {
System.out.println(string);
}
This work pefectly and tested in Java 8
public String[] split(String input,int at){
String[] out = new String[2];
String p = String.format("((?:[^/]*/){%s}[^/]*)/(.*)",at);
Pattern pat = Pattern.compile(p);
Matcher matcher = pat.matcher(input);
if (matcher.matches()) {
out[0] = matcher.group(1);// left
out[1] = matcher.group(2);// right
}
return out;
}
//Ex: D:/folder1/folder2/folder3/file1.txt
//if at = 2, group(1) = D:/folder1/folder2 and group(2) = folder3/file1.txt
The accepted solution above by Saqib Rezwan does not add the leftover string to the list, if it divides the string after every 4th comma and the length of the string is 9 then it will leave the 9th character, and return the wrong list.
A complete solution would be :
private static ArrayList<String> splitStringAtNthOccurrence(String str, int n) {
int k = 0;
int startPoint = 0;
ArrayList<String> list = new ArrayList();
for (int i = 0; i < str.length(); i++) {
if (str.charAt(i) == ',') {
k++;
if (k == n) {
String ab = str.substring(startPoint, i + 1);
list.add(ab);
startPoint = i + 1;
k = 0;
}
}
// if there is no comma left and there are still some character in the string
// add them to list
else if (!str.substring(i).contains(",")) {
list.add(str.substring(startPoint));
break;
}
}
return list;
}
}
Related
So I'm trying to write an algorithm that counts the number of occurrences of some pattern, say "aa", within a string, say "aaabca." The number of patterns in that string should return an integer, in this case 2, because the first three characters contain two occurrences of the pattern.
What I have finds the number of patterns under the assumption the existing occurrences of a pattern is NOT overlapping:
public class Pattern{
public static void main(String[] args){
Scanner scan = new Scanner(System.in);
System.out.println("Enter the string: ");
String s = scan.nextLine();
String[] splittedInput = s.split(";");
String pattern = splittedInput[0];
String blobs = splittedInput[1];
Pattern p = new Pattern();
p.count(pattern, blobs);
}
public static void count(String pattern, String blobs){
String[] substrings = blobs.split("[|]");
int numOccurences = 0;
int[] instances = new int[substrings.length];
int patternLength = pattern.length();
for (int i = 0; i < instances.length; i++){
int length = substrings[i].length();
String temp = substrings[i];
temp = temp.replaceAll(pattern, "");
int postLength = temp.length();
numOccurences = (length - postLength) / pattern.length();
instances[i] = numOccurences;
numOccurences = 0;
}
int sum = 0;
for (int i = 0; i < instances.length; i++){
System.out.print(instances[i] + "|");
sum += instances[i];
}
System.out.print(sum);
}
}
Any suggestions?
I would personally compare the pattern as a substring in this case. For example a run of a single String from your array would look like this:
//Initial values
String blobs = "aaaabcaaa";
String pattern = "aab";
String[] substrings = blobs.split("[|]");
//The code I added that should placed into the loop
int numOccurences = 0;
String str = substrings[0];
for (int k = 0; k <= (str.length() - pattern.length()); k++)
{
if (str.substring(k, k + pattern.length()).equals(pattern))
{
numOccurences++;
}
}
System.out.println(numOccurences);
If you want to run this on each String in your array simply modify String str = substrings[0] to String str = substrings[i] and iterate over the array storing the final numOccurences as you please.
Example Run:
String is aaaabcaaa
Pattern is aa
Output is 5 occurences
For one String, match is the String you're looking for:
int len = theStr.length ();
int start = 0;
int pos;
int count = 0;
while ((start < len) && ((pos = theStr.indexOf (match, start)) >= 0))
{
++count;
start = pos + 1;
}
If you use Java 8 you can count this value in the following way.
Example:
String blobs = "aaabcaaa";
String pattern = "aa";
List<String> strings = Arrays.asList(blobs.split(""));
long count = IntStream.range(0, strings.size())
.mapToObj(index -> index < strings.size() - 1 ? strings.get(index) + strings.get(index + 1) : strings.get(index - 1))
.filter(str -> str.equals(pattern))
.count();
System.out.println("Result count: " + count);
Continually taking substrings and using the startsWith method seems to work pretty well.
String pat = "ss";
String str = "kskslsksaaaslsslssskssssllsssss";
int count = 0;
while (str.length() >= pat.length()) {
count += str.startsWith(pat) ? 1 : 0;
str = str.substring(1);
}
System.out.println("count = " + count);
You can also take a similar approach with streams.
long count = IntStream.range(0, str.length()).mapToObj(
n -> str.substring(n)).filter(n -> n.startsWith(pat)).count();
System.out.println("count = " + count);
But in this case I actually prefer the non-stream approach.
It is necessary to repeat the character, as many times as the number behind it.
They are positive integer numbers.
case #1
input: "abc3leson11"
output: "abccclesonnnnnnnnnnn"
I already finish it in the following way:
String a = "abbc2kd3ijkl40ggg2H5uu";
String s = a + "*";
String numS = "";
int cnt = 0;
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
if (Character.isDigit(ch)) {
numS = numS + ch;
cnt++;
} else {
cnt++;
try {
for (int j = 0; j < Integer.parseInt(numS); j++) {
System.out.print(s.charAt(i - cnt));
}
if (i != s.length() - 1 && !Character.isDigit(s.charAt(i + 1))) {
System.out.print(s.charAt(i));
}
} catch (Exception e) {
if (i != s.length() - 1 && !Character.isDigit(s.charAt(i + 1))) {
System.out.print(s.charAt(i));
}
}
cnt = 0;
numS = "";
}
}
But I wonder is there some better solution with less and cleaner code?
Could you take a look below? I'm using a library from StringUtils from Apache Common Utils to repeat character:
public class MicsTest {
public static void main(String[] args) {
String input = "abc3leson11";
String output = input;
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(input);
while (m.find()) {
int number = Integer.valueOf(m.group());
char repeatedChar = input.charAt(m.start()-1);
output = output.replaceFirst(m.group(), StringUtils.repeat(repeatedChar, number));
}
System.out.println(output);
}
}
In case you don't want to use StringUtils. You can use the below custom method to achieve the same effect:
public static String repeat(char c, int times) {
char[] chars = new char[times];
Arrays.fill(chars, c);
return new String(chars);
}
Using java basic string regx should make it more terse as follows:
public class He1 {
private static final Pattern pattern = Pattern.compile("[a-zA-Z]+(\\d+).*");
// match the number between or the last using regx;
public static void main(String... args) {
String s = "abc3leson11";
System.out.println(parse(s));
s = "abbc2kd3ijkl40ggg2H5uu";
System.out.println(parse(s));
}
private static String parse(String s) {
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
int num = Integer.valueOf(matcher.group(1));
char prev = s.charAt(s.indexOf(String.valueOf(num)) - 1);
// locate the char before the number;
String repeated = new String(new char[num-1]).replace('\0', prev);
// since the prev is not deleted, we have to decrement the repeating number by 1;
s = s.replaceFirst(String.valueOf(num), repeated);
matcher = pattern.matcher(s);
}
return s;
}
}
And the output should be:
abccclesonnnnnnnnnnn
abbcckdddijkllllllllllllllllllllllllllllllllllllllllggggHHHHHuu
String g(String a){
String result = "";
String[] array = a.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
//System.out.println(java.util.Arrays.toString(array));
for(int i=0; i<array.length; i++){
String part = array[i];
result += part;
if(++i == array.length){
break;
}
char charToRepeat = part.charAt(part.length() - 1);
result += repeat(charToRepeat+"", new Integer(array[i]) - 1);
}
return result;
}
// In Java 11 this could be removed and replaced with the builtin `str.repeat(amount)`
String repeat(String str, int amount){
return new String(new char[amount]).replace("\0", str);
}
Try it online.
Explanation:
The split will split the letters and numbers:
abbc2kd3ijkl40ggg2H5uu would become ["abbc", "2", "kd", "3", "ijkl", "40", "ggg", "2", "H", "5", "uu"]
We then loop over the parts and add any strings as is to the result.
We then increase i by 1 first and if we're done (after the "uu") in the array above, it will break the loop.
If not the increase of i will put us at a number. So it will repeat the last character of the part x amount of times, where x is the number we found minus 1.
Here is another solution:
String str = "abbc2kd3ijkl40ggg2H5uu";
String[] part = str.split("(?<=\\d)(?=\\D)|(?=\\d)(?<=\\D)");
String res = "";
for(int i=0; i < part.length; i++){
if(i%2 == 0){
res = res + part[i];
}else {
res = res + StringUtils.repeat(part[i-1].charAt(part[i-1].length()-1),Integer.parseInt(part[i])-1);
}
}
System.out.println(res);
Yet another solution :
public static String getCustomizedString(String input) {
ArrayList<String > letters = new ArrayList<>(Arrays.asList(input.split("(\\d)")));
letters.removeAll(Arrays.asList(""));
ArrayList<String > digits = new ArrayList<>(Arrays.asList(input.split("(\\D)")));
digits.removeAll(Arrays.asList(""));
for(int i=0; i< digits.size(); i++) {
int iteration = Integer.valueOf(digits.get(i));
String letter = letters.get(i);
char c = letter.charAt(letter.length()-1);
for (int j = 0; j<iteration -1 ; j++) {
letters.set(i,letters.get(i).concat(String.valueOf(c)));
}
}
String finalResult = "";
for (String str : letters) {
finalResult += str;
}
return finalResult;
}
The usage:
public static void main(String[] args) {
String testString1 = "abbc2kd3ijkl40ggg2H5uu";
String testString2 = "abc3leson11";
System.out.println(getCustomizedString(testString1));
System.out.println(getCustomizedString(testString2));
}
And the result:
abbcckdddijkllllllllllllllllllllllllllllllllllllllllggggHHHHHuu
abccclesonnnnnnnnnnn
Question:
Given an input String like "1,2,3..6..8,9..11", we have to convert it into "1,2,3,4,5,6,7,8,9,10,11". So basically we have to populate the ranges mentioned by dots. Below is my solution. Is there any better way to solve this ? Can we optimize this further ?
public class FlattenAString {
public static String flattenAString(String input) {
StringBuilder sbr = new StringBuilder("");
StringBuilder current = new StringBuilder("");
StringBuilder next = new StringBuilder("");
int i = 0;
while (i < input.length()) {
if (input.charAt(i) == '.') {
i = i + 2;
while (i != input.length() && input.charAt(i) != '.' && input.charAt(i) != ',') {
next.append(input.charAt(i));
i++;
}
int currentInt = Integer.parseInt(current.toString());
int nextInt = Integer.parseInt(next.toString());
appendFromCurrentTillPrevToNextInt(currentInt, nextInt, sbr);
current = next;
next = new StringBuilder("");
} else if (input.charAt(i) == ',') {
sbr.append(current);
sbr.append(',');
current = new StringBuilder("");
i++;
} else {
current.append(input.charAt(i));
i++;
}
}
sbr.append(current);
return sbr.toString();
}
private static void appendFromCurrentTillPrevToNextInt(int current, int val, StringBuilder sbr) {
for (int i = current; i < val; i++) {
sbr.append(i);
sbr.append(',');
}
}
}
I would approach this by splitting your input string twice. First, split by comma to get either single numbers or ranges with ellipsis. For single numbers, simply add them to a list. For ranges, do a second split on .. to obtain another list of numbers. Then iterate over the range of each of these pairs to fill in the missing values.
Note one tricky point here is that we need to avoid double counting a number position in a range. This is best explained by example:
3..6..8
For this range, we first add 3, 4, 5, 6. But for the second ellipsis, we begin at 7, and then continue until hitting 8.
String input = "1,2,3..6..8,9..11";
String[] parts = input.split(",");
List<Integer> list = new ArrayList<>();
for (String part : parts) {
if (!part.contains("..")) {
list.add(Integer.parseInt(part));
}
else {
String[] ranges = part.split("\\.\\.");
for (int i=0; i < ranges.length-1; ++i) {
int start = Integer.parseInt(ranges[i]) + (i == 0 ? 0 : 1);
int end = Integer.parseInt(ranges[i+1]);
for (int j=start; j <= end; ++j) list.add(j);
}
}
}
// print list of numbers
for (int i=0; i < list.size(); ++i) {
System.out.print((i > 0 ? ", " : "") + list.get(i));
}
Output:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
Demo here:
Rextester
Try this.
static final Pattern RANGE = Pattern.compile("(\\d+)(\\.\\.(\\d+))+");
static String flattenString(String input) {
StringBuffer sb = new StringBuffer();
StringBuilder temp = new StringBuilder();
Matcher m = RANGE.matcher(input);
while (m.find()) {
int begin = Integer.parseInt(m.group(1));
int end = Integer.parseInt(m.group(3));
temp.setLength(0);
for (int i = begin; i <= end; ++i)
temp.append(",").append(i);
m.appendReplacement(sb, temp.substring(1));
}
m.appendTail(sb);
return sb.toString();
}
I've been really struggling with a programming assignment. Basically, we have to write a program that translates a sentence in English into one in Pig Latin. The first method we need is one to tokenize the string, and we are not allowed to use the Split method usually used in Java. I've been trying to do this for the past 2 days with no luck, here is what I have so far:
public class PigLatin
{
public static void main(String[] args)
{
String s = "Hello there my name is John";
Tokenize(s);
}
public static String[] Tokenize(String english)
{
String[] tokenized = new String[english.length()];
for (int i = 0; i < english.length(); i++)
{
int j= 0;
while (english.charAt(i) != ' ')
{
String m = "";
m = m + english.charAt(i);
if (english.charAt(i) == ' ')
{
j++;
}
else
{
break;
}
}
for (int l = 0; l < tokenized.length; l++) {
System.out.print(tokenized[l] + ", ");
}
}
return tokenized;
}
}
All this does is print an enormously long array of "null"s. If anyone can offer any input at all, I would reallllyyyy appreciate it!
Thank you in advance
Update: We are supposed to assume that there will be no punctuation or extra spaces, so basically whenever there is a space, it's a new word
If I understand your question, and what your Tokenize was intended to do; then I would start by writing a function to split the String
static String[] splitOnWhiteSpace(String str) {
List<String> al = new ArrayList<>();
StringBuilder sb = new StringBuilder();
for (char ch : str.toCharArray()) {
if (Character.isWhitespace(ch)) {
if (sb.length() > 0) {
al.add(sb.toString());
sb.setLength(0);
}
} else {
sb.append(ch);
}
}
if (sb.length() > 0) {
al.add(sb.toString());
}
String[] ret = new String[al.size()];
return al.toArray(ret);
}
and then print using Arrays.toString(Object[]) like
public static void main(String[] args) {
String s = "Hello there my name is John";
String[] words = splitOnWhiteSpace(s);
System.out.println(Arrays.toString(words));
}
If you're allowed to use the StringTokenizer Object (which I think is what the assignment is asking, it would look something like this:
StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
which will produce the output:
this
is
a
test
Taken from here.
The string is split into tokens and stored in a stack. The while loop loops through the tokens, which is where you can apply the pig latin logic.
Some hints for you to do the "manual splitting" work.
There is a method String#indexOf(int ch, int fromIndex) to help you to find next occurrence of a character
There is a method String#substring(int beginIndex, int endIndex) to extract certain part of a string.
Here is some pseudo-code that show you how to split it (there are more safety handling that you need, I will leave that to you)
List<String> results = ...;
int startIndex = 0;
int endIndex = 0;
while (startIndex < inputString.length) {
endIndex = get next index of space after startIndex
if no space found {
endIndex = inputString.length
}
String result = get substring of inputString from startIndex to endIndex-1
results.add(result)
startIndex = endIndex + 1 // move startIndex to next position after space
}
// here, results contains all splitted words
String english = "hello my fellow friend"
ArrayList tokenized = new ArrayList<String>();
String m = "";
int j = 0; //index for tokenised array list.
for (int i = 0; i < english.length(); i++)
{
//the condition's position do matter here, if you
//change them, english.charAt(i) will give index
//out of bounds exception
while( i < english.length() && english.charAt(i) != ' ')
{
m = m + english.charAt(i);
i++;
}
//add to array list if there is some string
//if its only ' ', array will be empty so we are OK.
if(m.length() > 0 )
{
tokenized.add(m);
j++;
m = "";
}
}
//print the array list
for (int l = 0; l < tokenized.size(); l++) {
System.out.print(tokenized.get(l) + ", ");
}
This prints, "hello,my,fellow,friend,"
I used an array list since at the first sight the length of the array is not clear.
I have this String:
String string="NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
How can I do to split it into an array every 4 commas?
I would like something like this:
String[] a=string.split("d{4}");
a[0]="NNP,PERSON,true,?";
a[1]="IN,O,false,pobj";
a[2]="NNP,ORGANIZATION,true,?";
a[3]="p";
Keep it simple. No need to use regex. Simply count the number of commas. when four commas are found then use String.substring() to find out the value.
Finally store the printed values in ArrayList<String>.
String string = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
int count = 0;
int beginIndex = 0;
int endIndex = 0;
for (char ch : string.toCharArray()) {
if (ch == ',') {
count++;
}
if (count == 4) {
System.out.println(string.substring(beginIndex + 1, endIndex));
beginIndex = endIndex;
count = 0;
}
endIndex++;
}
if (beginIndex < endIndex) {
System.out.println(string.substring(beginIndex + 1, endIndex));
}
output:
NP,PERSON,true,?
IN,O,false,pobj
NNP,ORGANIZATION,true,?
p
If you really have to use split you can use something like
String[] array = string.split("(?<=\\G[^,]{1,100},[^,]{1,100},[^,]{1,100},[^,]{1,100}),");
Explanation if idea in my previous answer on similar but simpler topic
Demo:
String string = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
String[] array = string.split("(?<=\\G[^,]{1,100},[^,]{1,100},[^,]{1,100},[^,]{1,100}),");
for (String s : array)
System.out.println(s);
output:
NNP,PERSON,true,?
IN,O,false,pobj
NNP,ORGANIZATION,true,?
p
But if there is any chance that you don't have to use split but you still want to use regex then I encourage you to use Pattern and Matcher classes to create simple regex which can find parts you are interested in, not complicated regex to find parts you want to get rid of. I mean something like
any xx,xxx,xxx,xxx part where x is not ,
any xx or xx,xx or xxx,xxx,xxx parts if they are placed at the end of string (to catch rest of data unmatched by regex from point 1.)
So
Pattern p = Pattern.compile("[^,]+(,[^,]+){3}|[^,]+(,[^,]+){0,2}$");
should do the trick.
Another solution and probably the fastest (and quite easy to write) would be creating your own parser which will iterate over all characters from your string, store them in some buffer, calculate how many , already occurred and if number is multiplication of 4 clear buffer and write its contend to array (or better dynamic collection like list). Such parser can look like
public static List<String> parse(String s){
List<String> tokens = new ArrayList<>();
StringBuilder sb = new StringBuilder();
int commaCounter = 0;
for (char ch: s.toCharArray()){
if (ch==',' && ++commaCounter == 4){
tokens.add(sb.toString());
sb.delete(0, sb.length());
commaCounter = 0;
}else{
sb.append(ch);
}
}
if (sb.length()>0)
tokens.add(sb.toString());
return tokens;
}
You can later convert List to array if you need but I would stay with List.
StringTokenizer tizer = new StringTokenizer (string,",");
int count = tizer.countTokens ()/4;
int overFlowCount = tizer.countTokens % 4;
String [] a;
if(overflowCount > 0)
a = new String[count +1];
else
a = new String[count];
int x = 0;
for (; x <count; x++){
a[x]= tizer.nextToken() + "," + tizer.nextToken() + "," + tizer.nextToken() + "," + tizer.nextToken();
}
if(overflowCount > 0)
while(tizer.hasMoreTokens()){
a[x+1] = a[x+1] + tizer.nextToken() + ",";
}
Edited,
Try this:
String str = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
String[] arr = str.split(",");
ArrayList<String> result = new ArrayList<String>();
String s = arr[0] + ",";
int len = arr.length - (arr.length /4) * 4;
int i;
for (i = 1; i <= arr.length-len; i++) {
if (i%4 == 0) {
result.add(s.substring(0, s.length()-1));
s = arr[i] + ",";
}
else
s += arr[i] + ",";
}
s = "";
while (i <= arr.length-1) {
s += arr[i] + ",";
i++;
}
s += arr[arr.length-1];
result.add(s);
output:
NP,PERSON,true,?
IN,O,false,pobj
NNP,ORGANIZATION,true,?
p