Java String: split String - java

I have this String:
String string="NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
How can I do to split it into an array every 4 commas?
I would like something like this:
String[] a=string.split("d{4}");
a[0]="NNP,PERSON,true,?";
a[1]="IN,O,false,pobj";
a[2]="NNP,ORGANIZATION,true,?";
a[3]="p";

Keep it simple. No need to use regex. Simply count the number of commas. when four commas are found then use String.substring() to find out the value.
Finally store the printed values in ArrayList<String>.
String string = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
int count = 0;
int beginIndex = 0;
int endIndex = 0;
for (char ch : string.toCharArray()) {
if (ch == ',') {
count++;
}
if (count == 4) {
System.out.println(string.substring(beginIndex + 1, endIndex));
beginIndex = endIndex;
count = 0;
}
endIndex++;
}
if (beginIndex < endIndex) {
System.out.println(string.substring(beginIndex + 1, endIndex));
}
output:
NP,PERSON,true,?
IN,O,false,pobj
NNP,ORGANIZATION,true,?
p

If you really have to use split you can use something like
String[] array = string.split("(?<=\\G[^,]{1,100},[^,]{1,100},[^,]{1,100},[^,]{1,100}),");
Explanation if idea in my previous answer on similar but simpler topic
Demo:
String string = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
String[] array = string.split("(?<=\\G[^,]{1,100},[^,]{1,100},[^,]{1,100},[^,]{1,100}),");
for (String s : array)
System.out.println(s);
output:
NNP,PERSON,true,?
IN,O,false,pobj
NNP,ORGANIZATION,true,?
p
But if there is any chance that you don't have to use split but you still want to use regex then I encourage you to use Pattern and Matcher classes to create simple regex which can find parts you are interested in, not complicated regex to find parts you want to get rid of. I mean something like
any xx,xxx,xxx,xxx part where x is not ,
any xx or xx,xx or xxx,xxx,xxx parts if they are placed at the end of string (to catch rest of data unmatched by regex from point 1.)
So
Pattern p = Pattern.compile("[^,]+(,[^,]+){3}|[^,]+(,[^,]+){0,2}$");
should do the trick.
Another solution and probably the fastest (and quite easy to write) would be creating your own parser which will iterate over all characters from your string, store them in some buffer, calculate how many , already occurred and if number is multiplication of 4 clear buffer and write its contend to array (or better dynamic collection like list). Such parser can look like
public static List<String> parse(String s){
List<String> tokens = new ArrayList<>();
StringBuilder sb = new StringBuilder();
int commaCounter = 0;
for (char ch: s.toCharArray()){
if (ch==',' && ++commaCounter == 4){
tokens.add(sb.toString());
sb.delete(0, sb.length());
commaCounter = 0;
}else{
sb.append(ch);
}
}
if (sb.length()>0)
tokens.add(sb.toString());
return tokens;
}
You can later convert List to array if you need but I would stay with List.

StringTokenizer tizer = new StringTokenizer (string,",");
int count = tizer.countTokens ()/4;
int overFlowCount = tizer.countTokens % 4;
String [] a;
if(overflowCount > 0)
a = new String[count +1];
else
a = new String[count];
int x = 0;
for (; x <count; x++){
a[x]= tizer.nextToken() + "," + tizer.nextToken() + "," + tizer.nextToken() + "," + tizer.nextToken();
}
if(overflowCount > 0)
while(tizer.hasMoreTokens()){
a[x+1] = a[x+1] + tizer.nextToken() + ",";
}

Edited,
Try this:
String str = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
String[] arr = str.split(",");
ArrayList<String> result = new ArrayList<String>();
String s = arr[0] + ",";
int len = arr.length - (arr.length /4) * 4;
int i;
for (i = 1; i <= arr.length-len; i++) {
if (i%4 == 0) {
result.add(s.substring(0, s.length()-1));
s = arr[i] + ",";
}
else
s += arr[i] + ",";
}
s = "";
while (i <= arr.length-1) {
s += arr[i] + ",";
i++;
}
s += arr[arr.length-1];
result.add(s);
output:
NP,PERSON,true,?
IN,O,false,pobj
NNP,ORGANIZATION,true,?
p

Related

Splitting a string in an array of strings of limited size

I have a string of a random address like
String s = "H.N.-13/1443 laal street near bharath dental lab near thana qutubsher near modern bakery saharanpur uttar pradesh 247001";
I want to split it into array of string with two conditions:
each element of that array of string is of length less than or equal to 20
No awkward ending of an element of array of string
For example, splitting every 20 characters would produce:
"H.N.-13/1443 laal st"
"reet near bharath de"
"ntal lab near thana"
"qutubsher near moder"
"n bakery saharanpur"
but the correct output would be:
"H.N.-13/1443 laal"
"street near bharath"
"dental lab near"
"thana qutubsher near"
"modern bakery"
"saharanpur"
Notice how each element in string array is less than or equal to 20.
The above is my output for this code:
static String[] split(String s,int max){
int total_lines = s.length () / 24;
if (s.length () % 24 != 0) {
total_lines++;
}
String[] ans = new String[total_lines];
int count = 0;
int j = 0;
for (int i = 0; i < total_lines; i++) {
for (j = 0; j < 20; j++) {
if (ans[count] == null) {
ans[count] = "";
}
if (count > 0) {
if ((20 * count) + j < s.length()) {
ans[count] += s.charAt (20 * count + j);
} else {
break;
}
} else {
ans[count] += s.charAt (j);
}
}
String a = "";
a += ans[count].charAt (0);
if (a.equals (" ")) {
ans[i] = ans[i].substring (0, 0) + "" + ans[i].substring (1);
}
System.out.println (ans[i]);
count++;
}
return ans;
}
public static void main (String[]args) {
String add = "H.N.-13/1663 laal street near bharath dental lab near thana qutubsher near modern bakery";
String city = "saharanpur";
String state = "uttar pradesh";
String zip = "247001";
String s = add + " " + city + " " + state + " " + zip;
String[]ans = split (s);
}
Find all occurrences of up to 20 chars starting with a non-space and ending with a word boundary, and collect them to a List:
List<String> parts = Pattern.compile("\\S.{1,19}\\b").matcher(s)
.results()
.map(MatchResult::group)
.collect(Collectors.toList());
See live demo.
The code is not very clear, but at first glance it seems you are building character by character that is why you are getting the output you see. Instead you go word by word if you want to retain a word and overflow it to next String if necessary. A more promising code would be:
static String[] splitString(String s, int max) {
String[] words = s.split("\s+");
List<String> out = new ArrayList<>();
int numWords = words.length;
int i = 0;
while (i <numWords) {
int len = 0;
StringBuilder sb = new StringBuilder();
while (i < numWords && len < max) {
int wordLength = words[i].length();
len += (i == numWords-1 ? wordLength : wordLength + 1);//1 for space
if (len <= max) {
sb.append(words[i]+ " ");
i++;
}
}
out.add(sb.toString().trim());
}
return out.toArray(new String[] {});
}
Note: It works on your example input, but you may need to tweak it so it works for cases like a long word containing more than 20 characters, etc.

Finding Count of Pattern in a String (Overlap Inclusive)

So I'm trying to write an algorithm that counts the number of occurrences of some pattern, say "aa", within a string, say "aaabca." The number of patterns in that string should return an integer, in this case 2, because the first three characters contain two occurrences of the pattern.
What I have finds the number of patterns under the assumption the existing occurrences of a pattern is NOT overlapping:
public class Pattern{
public static void main(String[] args){
Scanner scan = new Scanner(System.in);
System.out.println("Enter the string: ");
String s = scan.nextLine();
String[] splittedInput = s.split(";");
String pattern = splittedInput[0];
String blobs = splittedInput[1];
Pattern p = new Pattern();
p.count(pattern, blobs);
}
public static void count(String pattern, String blobs){
String[] substrings = blobs.split("[|]");
int numOccurences = 0;
int[] instances = new int[substrings.length];
int patternLength = pattern.length();
for (int i = 0; i < instances.length; i++){
int length = substrings[i].length();
String temp = substrings[i];
temp = temp.replaceAll(pattern, "");
int postLength = temp.length();
numOccurences = (length - postLength) / pattern.length();
instances[i] = numOccurences;
numOccurences = 0;
}
int sum = 0;
for (int i = 0; i < instances.length; i++){
System.out.print(instances[i] + "|");
sum += instances[i];
}
System.out.print(sum);
}
}
Any suggestions?
I would personally compare the pattern as a substring in this case. For example a run of a single String from your array would look like this:
//Initial values
String blobs = "aaaabcaaa";
String pattern = "aab";
String[] substrings = blobs.split("[|]");
//The code I added that should placed into the loop
int numOccurences = 0;
String str = substrings[0];
for (int k = 0; k <= (str.length() - pattern.length()); k++)
{
if (str.substring(k, k + pattern.length()).equals(pattern))
{
numOccurences++;
}
}
System.out.println(numOccurences);
If you want to run this on each String in your array simply modify String str = substrings[0] to String str = substrings[i] and iterate over the array storing the final numOccurences as you please.
Example Run:
String is aaaabcaaa
Pattern is aa
Output is 5 occurences
For one String, match is the String you're looking for:
int len = theStr.length ();
int start = 0;
int pos;
int count = 0;
while ((start < len) && ((pos = theStr.indexOf (match, start)) >= 0))
{
++count;
start = pos + 1;
}
If you use Java 8 you can count this value in the following way.
Example:
String blobs = "aaabcaaa";
String pattern = "aa";
List<String> strings = Arrays.asList(blobs.split(""));
long count = IntStream.range(0, strings.size())
.mapToObj(index -> index < strings.size() - 1 ? strings.get(index) + strings.get(index + 1) : strings.get(index - 1))
.filter(str -> str.equals(pattern))
.count();
System.out.println("Result count: " + count);
Continually taking substrings and using the startsWith method seems to work pretty well.
String pat = "ss";
String str = "kskslsksaaaslsslssskssssllsssss";
int count = 0;
while (str.length() >= pat.length()) {
count += str.startsWith(pat) ? 1 : 0;
str = str.substring(1);
}
System.out.println("count = " + count);
You can also take a similar approach with streams.
long count = IntStream.range(0, str.length()).mapToObj(
n -> str.substring(n)).filter(n -> n.startsWith(pat)).count();
System.out.println("count = " + count);
But in this case I actually prefer the non-stream approach.

Tokenize method: Split string into array

I've been really struggling with a programming assignment. Basically, we have to write a program that translates a sentence in English into one in Pig Latin. The first method we need is one to tokenize the string, and we are not allowed to use the Split method usually used in Java. I've been trying to do this for the past 2 days with no luck, here is what I have so far:
public class PigLatin
{
public static void main(String[] args)
{
String s = "Hello there my name is John";
Tokenize(s);
}
public static String[] Tokenize(String english)
{
String[] tokenized = new String[english.length()];
for (int i = 0; i < english.length(); i++)
{
int j= 0;
while (english.charAt(i) != ' ')
{
String m = "";
m = m + english.charAt(i);
if (english.charAt(i) == ' ')
{
j++;
}
else
{
break;
}
}
for (int l = 0; l < tokenized.length; l++) {
System.out.print(tokenized[l] + ", ");
}
}
return tokenized;
}
}
All this does is print an enormously long array of "null"s. If anyone can offer any input at all, I would reallllyyyy appreciate it!
Thank you in advance
Update: We are supposed to assume that there will be no punctuation or extra spaces, so basically whenever there is a space, it's a new word
If I understand your question, and what your Tokenize was intended to do; then I would start by writing a function to split the String
static String[] splitOnWhiteSpace(String str) {
List<String> al = new ArrayList<>();
StringBuilder sb = new StringBuilder();
for (char ch : str.toCharArray()) {
if (Character.isWhitespace(ch)) {
if (sb.length() > 0) {
al.add(sb.toString());
sb.setLength(0);
}
} else {
sb.append(ch);
}
}
if (sb.length() > 0) {
al.add(sb.toString());
}
String[] ret = new String[al.size()];
return al.toArray(ret);
}
and then print using Arrays.toString(Object[]) like
public static void main(String[] args) {
String s = "Hello there my name is John";
String[] words = splitOnWhiteSpace(s);
System.out.println(Arrays.toString(words));
}
If you're allowed to use the StringTokenizer Object (which I think is what the assignment is asking, it would look something like this:
StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
which will produce the output:
this
is
a
test
Taken from here.
The string is split into tokens and stored in a stack. The while loop loops through the tokens, which is where you can apply the pig latin logic.
Some hints for you to do the "manual splitting" work.
There is a method String#indexOf(int ch, int fromIndex) to help you to find next occurrence of a character
There is a method String#substring(int beginIndex, int endIndex) to extract certain part of a string.
Here is some pseudo-code that show you how to split it (there are more safety handling that you need, I will leave that to you)
List<String> results = ...;
int startIndex = 0;
int endIndex = 0;
while (startIndex < inputString.length) {
endIndex = get next index of space after startIndex
if no space found {
endIndex = inputString.length
}
String result = get substring of inputString from startIndex to endIndex-1
results.add(result)
startIndex = endIndex + 1 // move startIndex to next position after space
}
// here, results contains all splitted words
String english = "hello my fellow friend"
ArrayList tokenized = new ArrayList<String>();
String m = "";
int j = 0; //index for tokenised array list.
for (int i = 0; i < english.length(); i++)
{
//the condition's position do matter here, if you
//change them, english.charAt(i) will give index
//out of bounds exception
while( i < english.length() && english.charAt(i) != ' ')
{
m = m + english.charAt(i);
i++;
}
//add to array list if there is some string
//if its only ' ', array will be empty so we are OK.
if(m.length() > 0 )
{
tokenized.add(m);
j++;
m = "";
}
}
//print the array list
for (int l = 0; l < tokenized.size(); l++) {
System.out.print(tokenized.get(l) + ", ");
}
This prints, "hello,my,fellow,friend,"
I used an array list since at the first sight the length of the array is not clear.

How to split string at every nth occurrence of character in Java

I would like to split a string at every 4th occurrence of a comma ,.
How to do this? Below is an example:
String str = "1,,,,,2,3,,1,,3,,";
Expected output:
array[0]: 1,,,,
array[1]: ,2,3,,
array[2]: 1,,3,,
I tried using Google Guava like this:
Iterable<String> splitdata = Splitter.fixedLength(4).split(str);
output: [1,,,, ,,2,, 3,,1, ,,3,, ,]
I also tried this:
String [] splitdata = str.split("(?<=\\G.{" + 4 + "})");
output: [1,,,, ,,2,, 3,,1, ,,3,, ,]
Yet this is is not the output I want. I just want to split the string at every 4th occurrence of a comma.
Thanks.
Take two int variable. One is to count the no of ','. If ',' occurs then the count will move. And if the count is go to 4 then reset it to 0. The other int value will indicate that from where the string will be cut off. it will start from 0 and after the first string will be detected the the end point (char position in string) will be the first point of the next. Use the this start point and current end point (i+1 because after the occurrence happen the i value will be incremented). Finally add the string in the array list. This is a sample code. Hope this will help you. Sorry for my bad English.
String str = "1,,,,,2,3,,1,,3,,";
int k = 0;
int startPoint = 0;
ArrayList<String> arrayList = new ArrayList<>();
for (int i = 0; i < str.length(); i++)
{
if (str.charAt(i) == ',')
{
k++;
if (k == 4)
{
String ab = str.substring(startPoint, i+1);
System.out.println(ab);
arrayList.add(ab);
startPoint = i+1;
k = 0;
}
}
}
Here's a more flexible function, using an idea from this answer:
static List<String> splitAtNthOccurrence(String input, int n, String delimiter) {
List<String> pieces = new ArrayList<>();
// *? is the reluctant quantifier
String regex = Strings.repeat(".*?" + delimiter, n);
Matcher matcher = Pattern.compile(regex).matcher(input);
int lastEndOfMatch = -1;
while (matcher.find()) {
pieces.add(matcher.group());
lastEndOfMatch = matcher.end();
}
if (lastEndOfMatch != -1) {
pieces.add(input.substring(lastEndOfMatch));
}
return pieces;
}
This is how you call it using your example:
String input = "1,,,,,2,3,,1,,3,,";
List<String> pieces = splitAtNthOccurrence(input, 4, ",");
pieces.forEach(System.out::println);
// Output:
// 1,,,,
// ,2,3,,
// 1,,3,,
I use Strings.repeat from Guava.
try this also, if you want result in array
String str = "1,,,,,2,3,,1,,3,,";
System.out.println(str);
char c[] = str.toCharArray();
int ptnCnt = 0;
for (char d : c) {
if(d==',')
ptnCnt++;
}
String result[] = new String[ptnCnt/4];
int i=-1;
int beginIndex = 0;
int cnt=0,loopcount=0;
for (char ele : c) {
loopcount++;
if(ele==',')
cnt++;
if(cnt==4){
cnt=0;
result[++i]=str.substring(beginIndex,loopcount);
beginIndex=loopcount;
}
}
for (String string : result) {
System.out.println(string);
}
This work pefectly and tested in Java 8
public String[] split(String input,int at){
String[] out = new String[2];
String p = String.format("((?:[^/]*/){%s}[^/]*)/(.*)",at);
Pattern pat = Pattern.compile(p);
Matcher matcher = pat.matcher(input);
if (matcher.matches()) {
out[0] = matcher.group(1);// left
out[1] = matcher.group(2);// right
}
return out;
}
//Ex: D:/folder1/folder2/folder3/file1.txt
//if at = 2, group(1) = D:/folder1/folder2 and group(2) = folder3/file1.txt
The accepted solution above by Saqib Rezwan does not add the leftover string to the list, if it divides the string after every 4th comma and the length of the string is 9 then it will leave the 9th character, and return the wrong list.
A complete solution would be :
private static ArrayList<String> splitStringAtNthOccurrence(String str, int n) {
int k = 0;
int startPoint = 0;
ArrayList<String> list = new ArrayList();
for (int i = 0; i < str.length(); i++) {
if (str.charAt(i) == ',') {
k++;
if (k == n) {
String ab = str.substring(startPoint, i + 1);
list.add(ab);
startPoint = i + 1;
k = 0;
}
}
// if there is no comma left and there are still some character in the string
// add them to list
else if (!str.substring(i).contains(",")) {
list.add(str.substring(startPoint));
break;
}
}
return list;
}
}

Java - make new string based on old one and lag

I need to get a new string based on an old one and a lag. Basically, I have a string with the alphabet (s = "abc...xyz") and based on a lag (i.e. 3), the new string should replace the characters in a string I type with the character placed some positions forward (lag). If, let's say, I type "cde" as my string, the output should be "fgh". If any other character is added in the string (apart from space - " "), it should be removed. Here is what I tried, but it doesn't work :
String code = "abcdefghijklmnopqrstuvwxyzabcd"; //my lag is 4 and I added the first 4 characters to
char old; //avoid OutOfRange issues
char nou;
for (int i = 0; i < code.length() - lag; ++i)
{
old = code.charAt(i);
//System.out.print(old + " ");
nou = code.charAt(i + lag);
//System.out.println(nou + " ");
// if (s.indexOf(old) != 0)
// {
s = s.replace(old, nou);
// }
}
I commented the outputs for old and nou (new, but is reserved word) because I have used them only to test if the code from position i to i + lag is working (and it is), but if I uncomment the if statement, it doesn't do anything and I leave it like this, it keeps executing the instructions inside the for statmement for code.length() times, but my string doesn't need to be so long. I have also tried to make the for statement like below, but I got lost.
for (int i = 0; i < s.length(); ++i)
{
....
}
Could you help me with this? Or maybe some advices about how I should think the algorithm?
Thanks!
It doesn't work because, as the javadoc of replace() says:
Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.
(emphasis mine)
So, the first time you meet an 'a' in the string, you replace all the 'a's by 'd'. But then you go to the next char, and if it's a 'd' that was an 'a' before, you replace it once again, etc. etc.
You shouldn't use replace() at all. Instead, you should simply build a new string, using a StringBuilder, by appending each shifted character of the original string:
String dictionary = "abcdefghijklmnopqrstuvwxyz";
StringBuilder sb = new StringBuilder(input.length());
for (int i = 0; i < input.length(); i++) {
char oldChar = input.charAt(i);
int oldCharPositionInDictionary = dictionary.indexOf(oldChar);
if (oldCharPositionInDictionary >= 0) {
int newCharPositionInDictionary =
(oldCharPositionInDictionary + lag) % dictionary.length();
sb.append(dictionary.charAt(newCharPositionInDictionary));
}
else if (oldChar == ' ') {
sb.append(' ');
}
}
String result = sb.toString();
Try this:
Convert the string to char array.
iterate over each char array and change the char by adding lag
create new String just once (instead of loop) with new String passing char array.
String code = "abcdefghijklmnopqrstuvwxyzabcd";
String s = "abcdef";
char[] ch = s.toCharArray();
char[] codes = code.toCharArray();
for (int i = 0; i < ch.length; ++i)
{
ch[i] = codes[ch[i] - 'a' + 3];
}
String str = new String(ch);
System.out.println(str);
}
My answer is something like this.
It returns one more index to every character.
It reverses every String.
Have a good day!
package org.owls.sof;
import java.util.Scanner;
public class Main {
private static final String CODE = "abcdefghijklmnopqrstuvwxyz"; //my lag is 4 and I added the first 4 characters to
#SuppressWarnings("resource")
public static void main(String[] args) {
System.out.print("insert alphabet >> ");
Scanner scanner = new Scanner(System.in);
String s = scanner.next();
char[] char_arr = s.toCharArray();
for(int i = 0; i < char_arr.length; i++){
int order = CODE.indexOf(char_arr[i]) + 1;
if(order%CODE.length() == 0){
char_arr[i] = CODE.charAt(0);
}else{
char_arr[i] = CODE.charAt(order);
}
}
System.out.println(new String(char_arr));
//reverse
System.out.println(reverse(new String(char_arr)));
}
private static String reverse (String str) {
char[] char_arr = str.toCharArray();
for(int i = 0; i < char_arr.length/2; i++){
char tmp = char_arr[i];
char_arr[i] = char_arr[char_arr.length - i - 1];
char_arr[char_arr.length - i - 1] = tmp;
}
return new String(char_arr);
}
}
String alpha = "abcdefghijklmnopqrstuvwxyzabcd"; // alphabet
int N = alpha.length();
int lag = 3; // shift value
String s = "cde"; // input
StringBuilder sb = new StringBuilder();
for (int i = 0, index; i < s.length(); i++) {
index = s.charAt(i) - 'a';
sb.append(alpha.charAt((index + lag) % N));
}
String op = sb.toString(); // output

Categories

Resources