I'm trying to make a Minecraft Bukkit plugin, and it involves making hashtags and such. I have it so that when you do #hashtag-goes-here it'll highlight it. The only problem is, you must have another word after it (to have a space) for it to work. This is my code so far:
try{
for(int i = index; i < message.length(); i++){
String str = Character.toString(message.charAt(i));
String sbString = sb.toString().trim();
System.out.println(sbString);
if(str.equals(" ")){
str.replace(str, str + ChatColor.RESET);
String hName = sb.toString().replaceFirst("#", "").trim();
String newMessage = message.replaceAll("#", ChatColor.AQUA + "#").replace(str, ChatColor.RESET + str);
event.setMessage(newMessage);
logHashtag(event.getPlayer(), event.getMessage(), hName);
break;
}else{
sb.append(str);
}
}
}catch(Exception e){
throw new HashtagException("Failed to change hashtag colors in message!");
}
Edit: (Already answered like, last year, I know; this is so that people who read this know what I was asking) My question was so that it could work in all situations that the hashtag could be found in. Thanks to halfbit for helping me :)
If you want to find strings in a text and highlight them, then regular expressions can be quite useful. You might use code similar to the following:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HashTagColorizer {
public static void main(String[] args) {
String AQUA = "<AQUA>", RESET = "<RESET>";
String message = "Aaa #hashtag-goes-here bbb #another-hashtag ccc";
Pattern pattern = Pattern.compile("#([A-Za-z0-9-]+)");
Matcher matcher = pattern.matcher(message);
StringBuilder sb = new StringBuilder(message.length());
int position = 0;
while (matcher.find(position)) {
sb.append(message.substring(position, matcher.start()));
sb.append(AQUA);
System.out.println("event for " + matcher.group(1));
sb.append(matcher.group().substring(1));
sb.append(RESET);
position = matcher.end();
}
sb.append(message.substring(position));
System.out.println(sb);
// Aaa <AQUA>hashtag-goes-here<RESET> bbb <AQUA>another-hashtag<RESET> ccc
}
}
Instead of doing your logic in the if on sb, do it in the else on the str (or in the sb after appending str, but you will repeat a lot of processing in the sections of the sb already processed).
You are welcome.
BTW:
str.replace(str, str + ChatColor.RESET);
does nothing because String instances are immutable
Related
I need to split a string in Java (first remove whitespaces between quotes and then split at whitespaces.)
"abc test=\"x y z\" magic=\" hello \" hola"
becomes:
firstly:
"abc test=\"xyz\" magic=\"hello\" hola"
and then:
abc
test="xyz"
magic="hello"
hola
Scenario :
I am getting a string something like above from input and I want to break it into parts as above. One way to approach was first remove the spaces between quotes and then split at spaces. Also string before quotes complicates it. Second one was split at spaces but not if inside quote and then remove spaces from individual split. I tried capturing quotes with "\"([^\"]+)\"" but I'm not able to capture just the spaces inside quotes. I tried some more but no luck.
We can do this using a formal pattern matcher. The secret sauce of the answer below is to use the not-much-used Matcher#appendReplacement method. We pause at each match, and then append a custom replacement of anything appearing inside two pairs of quotes. The custom method removeSpaces() strips all whitespace from each quoted term.
public static String removeSpaces(String input) {
return input.replaceAll("\\s+", "");
}
String input = "abc test=\"x y z\" magic=\" hello \" hola";
Pattern p = Pattern.compile("\"(.*?)\"");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer("");
while (m.find()) {
m.appendReplacement(sb, "\"" + removeSpaces(m.group(1)) + "\"");
}
m.appendTail(sb);
String[] parts = sb.toString().split("\\s+");
for (String part : parts) {
System.out.println(part);
}
abc
test="xyz"
magic="hello"
hola
Demo
The big caveat here, as the above comments hinted at, is that we are really using a regex engine as a rudimentary parser. To see where my solution would fail fast, just remove one of the quotes by accident from a quoted term. But, if you are sure you input is well formed as you have showed us, this answer might work for you.
I wanted to mention the java 9's Matcher.replaceAll lambda extension:
// Find quoted strings and remove there whitespace:
s = Pattern.compile("\"[^\"]*\"").matcher(s)
.replaceAll(mr -> mr.group().replaceAll("\\s", ""));
// Turn the remaining whitespace in a comma and brace all.
s = '{' + s.trim().replaceAll("\\s+", ", ") + '}';
Probably the other answer is better but still I have written it so I will post it here ;) It takes a different approach
public static void main(String[] args) {
String test="abc test=\"x y z\" magic=\" hello \" hola";
Pattern pattern = Pattern.compile("([^\\\"]+=\\\"[^\\\"]+\\\" )");
Matcher matcher = pattern.matcher(test);
int lastIndex=0;
while(matcher.find()) {
String[] parts=matcher.group(0).trim().split("=");
boolean newLine=false;
for (String string : parts[0].split("\\s+")) {
if(newLine)
System.out.println();
newLine=true;
System.out.print(string);
}
System.out.println("="+parts[1].replaceAll("\\s",""));
lastIndex=matcher.end();
}
System.out.println(test.substring(lastIndex).trim());
}
Result is
abc
test="xyz"
magic="hello"
hola
It sounds like you want to write a basic parser/Tokenizer. My bet is that after you make something that can deal with pretty printing in this structure, you will soon want to start validating that there arn't any mis-matching "'s.
But in essence, you have a few stages for this particular problem, and Java has a built in tokenizer that can prove useful.
import java.util.LinkedList;
import java.util.List;
import java.util.StringTokenizer;
import java.util.stream.Collectors;
public class Q50151376{
private static class Whitespace{
Whitespace(){ }
#Override
public String toString() {
return "\n";
}
}
private static class QuotedString {
public final String string;
QuotedString(String string) {
this.string = "\"" + string.trim() + "\"";
}
#Override
public String toString() {
return string;
}
}
public static void main(String[] args) {
String test = "abc test=\"x y z\" magic=\" hello \" hola";
StringTokenizer tokenizer = new StringTokenizer(test, "\"");
boolean inQuotes = false;
List<Object> out = new LinkedList<>();
while (tokenizer.hasMoreTokens()) {
final String token = tokenizer.nextToken();
if (inQuotes) {
out.add(new QuotedString(token));
} else {
out.addAll(TokenizeWhitespace(token));
}
inQuotes = !inQuotes;
}
System.out.println(joinAsStrings(out));
}
private static String joinAsStrings(List<Object> out) {
return out.stream()
.map(Object::toString)
.collect(Collectors.joining());
}
public static List<Object> TokenizeWhitespace(String in){
List<Object> out = new LinkedList<>();
StringTokenizer tokenizer = new StringTokenizer(in, " ", true);
boolean ignoreWhitespace = false;
while (tokenizer.hasMoreTokens()){
String token = tokenizer.nextToken();
boolean whitespace = token.equals(" ");
if(!whitespace){
out.add(token);
ignoreWhitespace = false;
} else if(!ignoreWhitespace) {
out.add(new Whitespace());
ignoreWhitespace = true;
}
}
return out;
}
}
My string is "test"
"test" has 4 characters
I want to replace "test" with "****"
so I get "****"
My code
System.out.println("_test_");
System.out.println("_test_".replaceAll("test", "*"));
But it replace test with 1 *.
If the word test is just an example, you may use Matcher.appendReplacement (see How to appendReplacement on a Matcher group instead of the whole pattern? for more details on this technique):
String fileText = "_test_";
String pattern = "test";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(fileText);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, repeat("*", m.group(0).length()));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb);
And the repeat function (borrowed from Simple way to repeat a String in java, see other options there) SO post is:
public static String repeat(String s, int n) {
if(s == null) {
return null;
}
final StringBuilder sb = new StringBuilder(s.length() * n);
for(int i = 0; i < n; i++) {
sb.append(s);
}
return sb.toString();
}
See IDEONE demo
If you have an arbitrary text to be replaced, and you want to use replaceAll(), be aware that it takes a regular expression, and various characters have special meaning. To prevent issues, call Pattern.quote().
Also, to replace with a sequence of * of equal length, you need to build a string of such.
Here is a nice short method for doing it:
private static String mask(String input, String codeword) {
char[] buf = new char[codeword.length()];
Arrays.fill(buf, '*');
return input.replaceAll(Pattern.quote(codeword), new String(buf));
}
Test
System.out.println(mask("_test_", "test"));
System.out.println(mask("This is his last chance", "is"));
Output
_****_
Th** ** h** last chance
Yes, because replaceAll(str1, str2) will replace all occurrences of str1 with str2. Since you are using literals, you need to say
System.out.println("_test_".replaceAll("test", "****"));
If you want your own replacement function you can do something like this:
public static String replaceStringWithChar(String src, String seek, char replacement)
{
StringBuilder sb = new StringBuilder();
for(int i = 0; i < seek.length(); i++) sb.append(replacement);
return src.replaceAll(seek, sb.toString());
}
You would then call it like so:
replaceStringWithChar("_test_", "test", '*');
So I got the answer and I was really looking for
something with as few line as possible. Thank you all
for the answer but this is the answer I found most useful.
I apologize for not being clear in the question, if I was not.
String str1 = "_AnyString_";
int start_underscore = str1.indexOf("_");
int end_underscore = str1.indexOf("_", start_underscore + 1);
String str_anything = str1.substring(start_underscore + 1, end_underscore);
String str_replace_asterisk = str_anything.replaceAll(".", "*");
System.out.println(str_replace_asterisk);
str1 = str1.replace(str_anything, str_replace_asterisk);
System.out.println(str1);
Output:
_AnyString_
_*********_
Actually you are pretty close the what you want. This is what you can do:
System.out.println("_test_".replaceAll("[test]", "*"));
System.out.println("hello".replaceAll("[el]", "*"));
Output:
_****_
h***o
This question already has answers here:
Regular Expression to Split String based on space and matching quotes in java
(3 answers)
Closed 8 years ago.
I have a String str, which is comprised of several words separated by single spaces.
If I want to create a set or list of strings I can simply call str.split(" ") and I would get I want.
Now, assume that str is a little more complicated, for example it is something like:
str = "hello bonjour \"good morning\" buongiorno";
In this case what is in between " " I want to keep so that my list of strings is:
hello
bonjour
good morning
buongiorno
Clearly, if I used split(" ") in this case it won't work because I'd get
hello
bonjour
"good
morning"
buongiorno
So, how do I get what I want?
You can create a regex that finds every word or words between "".. like:
\w+|(\"\w+(\s\w+)*\")
and search for them with the Pattern and Matcher classes.
ex.
String searchedStr = "";
Pattern pattern = Pattern.compile("\\w+|(\\\"\\w+(\\s\\w+)*\\\")");
Matcher matcher = pattern.matcher(searchedStr);
while(matcher.find()){
String word = matcher.group();
}
Edit: works for every number of words within "" now. XD forgot that
You can do something like below. First split the Sting using "\"" and then split the remaining ones using space" " . The even tokens will be the ones between quotes "".
public static void main(String args[]) {
String str = "hello bonjour \"good morning\" buongiorno";
System.out.println(str);
String[] parts = str.split("\"");
List<String> myList = new ArrayList<String>();
int i = 1;
for(String partStr : parts) {
if(i%2 == 0){
myList.add(partStr);
}
else {
myList.addAll(Arrays.asList(partStr.trim().split(" ")));
}
i++;
}
System.out.println("MyList : " + myList);
}
and the output is
hello bonjour "good morning" buongiorno
MyList : [hello, bonjour, good morning, buongiorno]
You may be able to find a solution using regular expressions, but what I'd do is simply manually write a string breaker.
List<String> splitButKeepQuotes(String s, char splitter) {
ArrayList<String> list = new ArrayList<String>();
boolean inQuotes = false;
int startOfWord = 0;
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == splitter && !inQuotes && i != startOfWord) {
list.add(s.substring(startOfWord, i));
startOfWord = i + 1;
}
if (s.charAt(i) == "\"") {
inQuotes = !inQuotes;
}
}
return list;
}
I want to split a string with a delimiter white space. but it should handle quoted strings intelligently. E.g. for a string like
"John Smith" Ted Barry
It should return three strings John Smith, Ted and Barry.
After messing around with it, you can use Regex for this. Run the equivalent of "match all" on:
((?<=("))[\w ]*(?=("(\s|$))))|((?<!")\w+(?!"))
A Java Example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Test
{
public static void main(String[] args)
{
String someString = "\"Multiple quote test\" not in quotes \"inside quote\" \"A work in progress\"";
Pattern p = Pattern.compile("((?<=(\"))[\\w ]*(?=(\"(\\s|$))))|((?<!\")\\w+(?!\"))");
Matcher m = p.matcher(someString);
while(m.find()) {
System.out.println("'" + m.group() + "'");
}
}
}
Output:
'Multiple quote test'
'not'
'in'
'quotes'
'inside quote'
'A work in progress'
The regular expression breakdown with the example used above can be viewed here:
http://regex101.com/r/wM6yT9
With all that said, regular expressions should not be the go to solution for everything - I was just having fun. This example has a lot of edge cases such as the handling unicode characters, symbols, etc. You would be better off using a tried and true library for this sort of task. Take a look at the other answers before using this one.
Try this ugly bit of code.
String str = "hello my dear \"John Smith\" where is Ted Barry";
List<String> list = Arrays.asList(str.split("\\s"));
List<String> resultList = new ArrayList<String>();
StringBuilder builder = new StringBuilder();
for(String s : list){
if(s.startsWith("\"")) {
builder.append(s.substring(1)).append(" ");
} else {
resultList.add((s.endsWith("\"")
? builder.append(s.substring(0, s.length() - 1))
: builder.append(s)).toString());
builder.delete(0, builder.length());
}
}
System.out.println(resultList);
well, i made a small snipet that does what you want and some more things. since you did not specify more conditions i did not go through the trouble. i know this is a dirty way and you can probably get better results with something that is already made. but for the fun of programming here is the example:
String example = "hello\"John Smith\" Ted Barry lol\"Basi German\"hello";
int wordQuoteStartIndex=0;
int wordQuoteEndIndex=0;
int wordSpaceStartIndex = 0;
int wordSpaceEndIndex = 0;
boolean foundQuote = false;
for(int index=0;index<example.length();index++) {
if(example.charAt(index)=='\"') {
if(foundQuote==true) {
wordQuoteEndIndex=index+1;
//Print the quoted word
System.out.println(example.substring(wordQuoteStartIndex, wordQuoteEndIndex));//here you can remove quotes by changing to (wordQuoteStartIndex+1, wordQuoteEndIndex-1)
foundQuote=false;
if(index+1<example.length()) {
wordSpaceStartIndex = index+1;
}
}else {
wordSpaceEndIndex=index;
if(wordSpaceStartIndex!=wordSpaceEndIndex) {
//print the word in spaces
System.out.println(example.substring(wordSpaceStartIndex, wordSpaceEndIndex));
}
wordQuoteStartIndex=index;
foundQuote = true;
}
}
if(foundQuote==false) {
if(example.charAt(index)==' ') {
wordSpaceEndIndex = index;
if(wordSpaceStartIndex!=wordSpaceEndIndex) {
//print the word in spaces
System.out.println(example.substring(wordSpaceStartIndex, wordSpaceEndIndex));
}
wordSpaceStartIndex = index+1;
}
if(index==example.length()-1) {
if(example.charAt(index)!='\"') {
//print the word in spaces
System.out.println(example.substring(wordSpaceStartIndex, example.length()));
}
}
}
}
this also checks for words that were not separated with a space after or before the quotes, such as the words "hello" before "John Smith" and after "Basi German".
when the string is modified to "John Smith" Ted Barry the output is three strings,
1) "John Smith"
2) Ted
3) Barry
The string in the example is hello"John Smith" Ted Barry lol"Basi German"hello and prints
1)hello
2)"John Smith"
3)Ted
4)Barry
5)lol
6)"Basi German"
7)hello
Hope it helps
This is my own version, clean up from http://pastebin.com/aZngu65y (posted in the comment).
It can take care of Unicode. It will clean up all excessive spaces (even in quote) - this can be good or bad depending on the need. No support for escaped quote.
private static String[] parse(String param) {
String[] output;
param = param.replaceAll("\"", " \" ").trim();
String[] fragments = param.split("\\s+");
int curr = 0;
boolean matched = fragments[curr].matches("[^\"]*");
if (matched) curr++;
for (int i = 1; i < fragments.length; i++) {
if (!matched)
fragments[curr] = fragments[curr] + " " + fragments[i];
if (!fragments[curr].matches("(\"[^\"]*\"|[^\"]*)"))
matched = false;
else {
matched = true;
if (fragments[curr].matches("\"[^\"]*\""))
fragments[curr] = fragments[curr].substring(1, fragments[curr].length() - 1).trim();
if (fragments[curr].length() != 0)
curr++;
if (i + 1 < fragments.length)
fragments[curr] = fragments[i + 1];
}
}
if (matched) {
return Arrays.copyOf(fragments, curr);
}
return null; // Parameter failure (double-quotes do not match up properly).
}
Sample input for comparison:
"sdfskjf" sdfjkhsd "hfrif ehref" "fksdfj sdkfj fkdsjf" sdf sfssd
asjdhj sdf ffhj "fdsf fsdjh"
日本語 中文 "Tiếng Việt" "English"
dsfsd
sdf " s dfs fsd f " sd f fs df fdssf "日本語 中文"
"" "" ""
" sdfsfds " "f fsdf
(2nd line is empty, 3rd line is spaces, last line is malformed).
Please judge with your own expected output, since it may varies, but the baseline is that, the 1st case should return [sdfskjf, sdfjkhsd, hfrif ehref, fksdfj sdkfj fkdsjf, sdf, sfssd].
commons-lang has a StrTokenizer class to do this for you, and there is also java-csv library.
Example with StrTokenizer:
String params = "\"John Smith\" Ted Barry"
// Initialize tokenizer with input string, delimiter character, quote character
StrTokenizer tokenizer = new StrTokenizer(params, ' ', '"');
for (String token : tokenizer.getTokenArray()) {
System.out.println(token);
}
Output:
John Smith
Ted
Barry
I have a text file which contains data seperated by '|'. I need to get each field(seperated by '|') and process it. The text file can be shown as below :
ABC|DEF||FGHT
I am using string tokenizer(JDK 1.4) for getting each field value. Now the problem is, I should get an empty string after DEF.However, I am not getting the empty space between DEF & FGHT.
My result should be - ABC,DEF,"",FGHT but I am getting ABC,DEF,FGHT
From StringTokenizer documentation :
StringTokenizer is a legacy class that
is retained for compatibility reasons
although its use is discouraged in new
code. It is recommended that anyone
seeking this functionality use the
split method of String or the
java.util.regex package instead.
The following code should work :
String s = "ABC|DEF||FGHT";
String[] r = s.split("\\|");
Use the returnDelims flag and check two subsequent occurrences of the delimiter:
String str = "ABC|DEF||FGHT";
String delim = "|";
StringTokenizer tok = new StringTokenizer(str, delim, true);
boolean expectDelim = false;
while (tok.hasMoreTokens()) {
String token = tok.nextToken();
if (delim.equals(token)) {
if (expectDelim) {
expectDelim = false;
continue;
} else {
// unexpected delim means empty token
token = null;
}
}
System.out.println(token);
expectDelim = true;
}
this prints
ABC
DEF
null
FGHT
The API isn't pretty and therefore considered legacy (i.e. "almost obsolete"). Use it only with where pattern matching is too expensive (which should only be the case for extremely long strings) or where an API expects an Enumeration.
In case you switch to String.split(String), make sure to quote the delimiter. Either manually ("\\|") or automatically using string.split(Pattern.quote(delim));
StringTokenizer ignores empty elements. Consider using String.split, which is also available in 1.4.
From the javadocs:
StringTokenizer is a legacy class that
is retained for compatibility reasons
although its use is discouraged in new
code. It is recommended that anyone
seeking this functionality use the
split method of String or the
java.util.regex package instead.
you can use the constructor that takes an extra 'returnDelims' boolean, and pass true to it.
this way you will receive the delimiters, which will allow you to detect this condition.
alternatively you can just implement your own string tokenizer that does what you need, it's not that hard.
Here is another way to solve this problem
String str = "ABC|DEF||FGHT";
StringTokenizer s = new StringTokenizer(str,"|",true);
String currentToken="",previousToken="";
while(s.hasMoreTokens())
{
//Get the current token from the tokenize strings
currentToken = s.nextToken();
//Check for the empty token in between ||
if(currentToken.equals("|") && previousToken.equals("|"))
{
//We denote the empty token so we print null on the screen
System.out.println("null");
}
else
{
//We only print the tokens except delimiters
if(!currentToken.equals("|"))
System.out.println(currentToken);
}
previousToken = currentToken;
}
Here is a way to split a string into tokens (a token is one or more letters)
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
String s = scan.nextLine();
s = s.replaceAll("[^A-Za-z]", " ");
StringTokenizer arr = new StringTokenizer(s, " ");
int n = arr.countTokens();
System.out.println(n);
while(arr.hasMoreTokens()){
System.out.println(arr.nextToken());
}
scan.close();
}
package com.java.String;
import java.util.StringTokenizer;
public class StringWordReverse {
public static void main(String[] kam) {
String s;
String sReversed = "";
System.out.println("Enter a string to reverse");
s = "THIS IS ASHIK SKLAB";
StringTokenizer st = new StringTokenizer(s);
while (st.hasMoreTokens()) {
sReversed = st.nextToken() + " " + sReversed;
}
System.out.println("Original string is : " + s);
System.out.println("Reversed string is : " + sReversed);
}
}
Output:
Enter a string to reverse
Original string is : THIS IS ASHIK SKLAB
Reversed string is : SKLAB ASHIK IS THIS