I have a lengthy string and want to break it up into a number of sub-strings so I can display it in a menu as a paragraph rather than a single long line. But I don't want to break it up in the middle of a word (so a break every n characters won't work).
So I want to break the string up by the first occurrence of any of the characters in a String after a certain point (in my case, the characters would be a space and a semi-colon, but they could be anything).
Something like:
String result[] = breakString(baseString, // String
lineLength, // int
breakChars) // String
Consider splitting by the break chars first and then summing the lengths of the segments that result from that split until you reach your line length.
Here is one way. I took "by the first occurrence of any of the characters in a String after a certain point" to mean that the next instance of breakChars after a certain lineLength should be the end of a line. So, breakString("aaabc", 2, "b") would return {"aaab", "c"}.
static String[] breakString(String baseString, int lineLength, String breakChars) {
// find `lineLength` or more characters of the String, until the `breakChars` string
Pattern p = Pattern.compile(".{" + lineLength + ",}?" + Pattern.quote(breakChars));
Matcher m = p.matcher(baseString);
List<String> list = new LinkedList<>();
int index = 0;
while (m.find(index)) {
String s = m.group();
list.add(s);
// find another match starting at the end of the last one
index = m.end();
}
if (index < baseString.length() - 1) {
list.add(baseString.substring(index));
}
return list.toArray(new String[list.size()]);
}
Related
There's a string
String str = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
How do I split it into strings like this
"ggg;ggg;"
"nnn;nnn;"
"aaa;aaa;"
"xxx;xxx;"
???????
Using Regex
String input = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
Pattern p = Pattern.compile("([a-z]{3});\\1;");
Matcher m = p.matcher(input);
while (m.find())
// m.group(0) is the result
System.out.println(m.group(0));
Will output
ggg;ggg;
nnn;nnn;
aaa;aaa;
xxx;xxx;
I assume that the you only want to check if the last segment is similar and not every segment that has been read.
If that is not the case then you would probably have to use an ArrayList instead of a Stack.
I also assumed that each segment has the format /([a-z])\1\1/.
If that is not the case either then you should change the if statement with:
(stack.peek().substring(0,index).equals(temp))
public static Stack<String> splitString(String text, char split) {
Stack<String> stack = new Stack<String>();
int index = text.indexOf(split);
while (index != -1) {
String temp = text.substring(0, index);
if (!stack.isEmpty()) {
if (stack.peek().charAt(0) == temp.charAt(0)) {
temp = stack.pop() + split + temp;
}
}
stack.push(temp);
text = text.substring(index + 1);
index = text.indexOf(split);
}
return stack;
}
Split and join them.
public static void main(String[] args) throws Exception {
String data = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
String del = ";";
int splitSize = 2;
StringBuilder sb = new StringBuilder();
for (Iterable<String> iterable : Iterables.partition(Splitter.on(del).split(data), splitSize)) {
sb.append("\"").append(Joiner.on(del).join(iterable)).append(";\"");
}
sb.delete(sb.length()-3, sb.length());
System.out.println(sb.toString());
}
Ref : Split a String at every 3rd comma in Java
Use split with a regex:
String data="ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
String [] array=data.split("(?<=\\G\\S\\S\\S;\\S\\S\\S);");
S: A non-whitespace character
G: last match/start of string, think of it of a way to skip delimiting if the
previous string matches current one.
?<=:positive look-behind will match semicolon which has string behind it.
Some other answer, that only works given your specific example input.
You see, in your example, there are two similarities:
All patterns seem to have exactly three characters
All patterns occur exactly twice
In other words: if those two properties are really met for all your input, you could avoid splitting - as you know exactly what to find in each position of your string.
Of course, following the other answers for "real" splitting are more flexible; but (theoretically), you could just go forward and do a bunch of substring calls in order to directly access all elements.
I have code that reading a file contains multiple square brackets [] in one line, i will take that value (inside square brackets) and will be replaced by another string. The problem is i just got first square brackets value in the line and the others cannot be handled. This is my code :
if (line.contains("[") && line.contains("]")) {
getindex = getIndexContent(line);
}
And the method to get the index value:
String getIndexContent(String str) {
int startIdx = str.indexOf("[");
int endIdx = str.indexOf("]");
String content = str.substring(startIdx + 1, endIdx);
return content;
}
And this is the file contain square brackets that i read:
var[_ii][_ee] = init_value;
Well, i have got the _ii value but how get the _ee that the second value of square brackets? I just imagine that store in Array, but i don't know how?
Thanks.
you can iterate through your String until you got all
also make life easy by returning all within one method:
List<String> getIndexContent(String str) {
List<String> list = new ArrayList<String>();
while(true){
if(!str.contains("[") && !str.contains("]")){
break;
}
int startIdx = str.indexOf("[");
int endIdx = str.indexOf("]");
String content = str.substring(startIdx + 1, endIdx);
list.add(content);
if(endIdx==str.length()-1){
break;
}
str=str.subString(endIdx+1,str.length());
}
return list;
}
NOTE:
it won't work on nested brackets
You can also do it with regex like this.
Pattern pattern = Pattern.compile("\\[[^\\[.]+?\\]");
String str = "dt = (double[]) obj[i];";
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
You can also get the first and last indices of every matches.
matcher.start() and matcher.end() will return the starting index and the ending index of the current match.
indexOf takes an optional positional argument for the starting point of your search. If you set that to your end index, endIdx, plus one, it will find the second occurrence of the brackets.
int startIdx2 = str.indexOf("[", endIdx + 1);
int endIdx2 = str.indexOf("]", endIdx + 1);
I want to retrieve last word before the last semi-colon in a sentence, please see the following.
String met = "This is the string with comma numberone; string having second wedaweda; last word";
String laststringa = met.substring(met.lastIndexOf(";")-1);
if(laststringa != null)
{
System.out.println(laststringa);
}else{
System.out.println("No Words");
}
I am getting strange Results as
a; last word
in my case, for the above string, i should get
wedaweda
as last before the last semi-colon.
That character is a semi-colon (not a comma) and the call to lastIndex() gives you the end of your match, you need a second call to lastIndex() to get the start of your match. Something like,
String met = "This is the string with comma numberone; string having second wedaweda; last word";
int lastIndex = met.lastIndexOf(";");
int prevIndex = met.lastIndexOf(";", lastIndex - 1);
String laststringa = met.substring(prevIndex + 1, lastIndex).trim();
if (laststringa != null) {
System.out.println(laststringa);
} else {
System.out.println("No Words");
}
Output is
string having second wedaweda
To then get the last word, you could split on \\s+ (a regular expression matching one, or more, white-space characters) like
if (laststringa != null) {
String[] arr = laststringa.split("\\s+");
System.out.println(arr[arr.length - 1]);
} else {
System.out.println("No Words");
}
Which outputs (the requested)
wedaweda
To split a string you'd do the following. This would return an array with the elements split on the "separator"
string_object.split("separator")
In your case you'd do
met.split(";")
And it would return an array with each part as an element. Select the last element to get what you need.
Actually. You said "wedaweda" should be the last result...? So I assume you mean the last word BEFORE the last semi-colon.
So do do this you'd do the split as previously stated, and then, you'd get the second to last element in the array
String[] array = met.split(";'); // split the entire first sentence on semi-colons
String[] words = array[array.length - 2] .split(" "); // split into specific words by splitting on a blank space
String wordBeforeSemiColon = words[words.length - 1]; // get the word directly before the last semi-colon
I tested this code in my IDE and it works exactly as you want.
It must be:
String laststringa = met.substring(met.lastIndexOf(";")+1);
^
The Word after means you have to add one to the Position of the last comma.
I need to put a sequence of characters in a String in brackets in such way that it would choose the longest substring as the optimal to put in brackets. To make it clear because it is too complicated to explain with words:
If my input is:
'these are some chars *£&$'
'these are some chars *£&$^%(((£'
the output in both inputs respectively should be:
'these are some chars (*£&$)'
'these are some chars (*£&$^%)(((£'
so I would like to put in brackets the sequence *£&$^% IF it exists otherwise put in brackets just *£&$
I hope it makes sense!
In the general case, this method works. It surrounds the earliest substring of any keyword in any given String:
public String bracketize() {
String chars = ...; // you can put whatever input (such as 'these are some chars *£&$')
String keyword = ...; // you can put whatever keyword (such as *£&$^%)
String longest = "";
for(int i=0;i<keyword.length()-1;i++) {
for(int j=keyword.length(); j>i; j--) {
String tempString = keyword.substring(i,j);
if(chars.indexOf(tempString) != -1 && tempString.length()>longest.length()) {
longest = tempString;
}
}
}
if(longest.length() == 0)
return chars; // no possible substring of keyword exists in chars, so just return chars
String bracketized = chars.substring(0,chars.indexOf(longest))+"("+longest+")"+chars.substring(chars.indexOf(longest)+longest.length());
return bracketized;
}
The nested for loops check every possible substring of keyword and select the longest one that is contained in the bigger String, chars. For example, if the keyword is Dog, it will check the substrings "Dog", "Do", "D", "og", "o", and "g". It stores this longest possible substring in longest (which is initialized to the empty String). If the length of longest is still 0 after checking every substring, then no such substring of keyword can be found in chars, so the original String, chars, is returned. Otherwise, a new string is returned which is chars with the substring longest surrounded by brackets (parentheses).
Hope this helps, let me know if it works.
Try something like this (assuming target string only occurs once).
String input = "these are some chars *£&$"
String output = "";
String[] split;
if(input.indexOf("*£&$^%")!=(-1)){
split = input.split("*£&$^%");
output = split[0]+"(*£&$^%)";
if(split.length>1){
output = output+split[1];
}
}else if(input.indexOf("*£&$")!=(-1)){
split = input.split("*£&$");
output = split[0]+"(*£&$)";
if(split.length>1){
output = output+split[1];
}
}else{
System.out.println("does not contain either string");
}
I have a unique problem statement where I have to perform regex on an input string using triple characters. e.g. if my input is ABCDEFGHI, a pattern search for BCD should return false since I am treating my input as ABC+DEF+GHI and need to compare my regex pattern with these triple characters.
Similarly, regex pattern DEF will return true since it matches one of the triplets. Using this problem statement, assume that my input is QWEABCPOIUYTREWXYZASDFGHJKLABCMNBVCXZASXYZFGH and I am trying to get all output strings that start with triplet ABC and end with XYZ. So, in above input, my outputs should be two strings: ABCPOIUYTREWXYZ and ABCMNBVCXZASXYZ.
Also, I have to store these strings in an ArrayList. Below is my function:
public static void newFindMatches (String text, String startRegex, String endRegex, List<String> output) {
int startPos = 0;
int endPos = 0;
int i = 0;
// Making sure that substrings are always valid
while ( i < text.length()-2) {
// Substring for comparing triplets
String subText = text.substring(i, i+3);
Pattern startP = Pattern.compile(startRegex);
Pattern endP = Pattern.compile(endRegex);
Matcher startM = startP.matcher(subText);
if (startM.find()) {
// If a match is found, set the start position
startPos = i;
for (int j = i; j < text.length()-2; j+=3) {
String subText2 = text.substring(j, j+3);
Matcher endM = endP.matcher(subText2);
if (endM.find()) {
// If match for end pattern is found, set the end position
endPos = j+3;
// Add the string between start and end positions to ArrayList
output.add(text.substring(startPos, endPos));
i = j;
}
}
}
i = i+3;
}
}
Upon running this function in main as follows:
String input = "QWEABCPOIUYTREWXYZASDFGHJKLABCMNBVCXZASXYZFGH";
String start = "ABC";
String end = "XYZ";
List<String> results = new ArrayList<String> ();
newFindMatches(input, start, end, results);
for (int x = 0; x < results.size(); x++) {
System.out.println("Output String number "+(x+1)+" is: "+results.get(x));
}
I get the following output:
Output String number 1 is: ABCPOIUYTREWXYZ
Output String number 2 is: ABCPOIUYTREWXYZASDFGHJKLABCMNBVCXZASXYZ
Notice that first string is correct. However, for the second string, program is again reading from start of input string. Instead, i want the program to read after the last end pattern (i.e. skip the first search and unwanted characters such as ASDFGHJKL and should only print 2nd string as: ABCMNBVCXZASXYZ
Thanks for your responses
The problem here is that when you find your end match (the if statement within the for loop), you don't stop the for loop. So it just keeps looking for more end-matches until it hits the for-loop end condition j < text.length()-2. When you find your match and process it, you should end the loop using "break;". Place "break;" after the i=j line.
Note that technically the second answer your current program gave you is correct, that is also a substring that begins with ABC and ends with XYZ. You might want to rethink the correct output for your program. You could accommodate that situation by not setting i=j when you find a match, so that the only incrementing of i is the i=i+3 line, iterating across the triplets (and not adding the break).