How to replace text using ReplaceFirst() without case sensitivity

How to replace text using ReplaceFirst() without case sensitivity - java

I'm trying to create a method which can highlight text in a jlabel according user entered search text. it works fine except it case sensitive. I used a regex (?i) to ignore case. But still it case sensitive.
private void jTextField1KeyReleased(java.awt.event.KeyEvent evt) {
String SourceText = "this is a sample text";
String SearchText = jTextField1.getText();
if (SourceText.contains(SearchText)) {
String OutPut = "<html>" + SourceText.replaceFirst("(?i)" + SearchText, "<span style=\"background-color: #d5f4e6;\">" + SearchText + "</span>") + "</html>";
jLabel1.setText(OutPut);
} else {
jLabel1.setText(SourceText);
}
}
How can i fix this.
Update
contains is case sensitive.
How to check if a String contains another String in a case insensitive manner in Java

You have not used the matched text in the replacement, you hard-coded the same string you used in the search. Since you wrap the whole match with html tags, you need to use the $0 backreference in the replacement (it refers to the whole match that resides in Group 0).
Besides, you have not escaped ("quoted") the search term, it may cause trouble if the SearchText contains special regex metacharacters.
You can fix the code using
String OutPut = "<html>" + SourceText.replaceFirst("(?i)" + Pattern.quote(SearchText), "<span style=\"background-color: #d5f4e6;\">$0</span>") + "</html>";

Related

Java won't replace all strings, because there is text next to the tags (post improved)

I'm working on a program, which formats HTML Code, extracted from a PDF file.
I have a String list, which contains paragraphs and is divided by that.
As the PDF has hyperlinks, I decided to replace them with a foot note number "[1]".
This will be used for citation of sources. I will eventually plan, to put it at the end of a paragraph, or sentence, so you can look up the sources, like you would in a book.
My Problem
For some reason not all the hyperlinks are replaced.
The reason is most likely, that there is text directly next to the tag.
Hell<a href="http://www.example.com">o old chap!
Specifically the "o" part and the "hell" part is blocking the java .replaceAll function, from doing it's job.
Expected Result
Hello [1] old chap!
EDIT:
If I would just add space, before and after the URL, it might split some words like "help", into "hel p", which is also not an option.
My code would have to replace the URL tag (without the ) and create no new extra spaces.
This is some of my code, where the problem occures:
for (int i = 0; i < EN.length; i++) {
Pattern pattern_URL = Pattern.compile("<a(.+?)\">", Pattern.DOTALL);
Matcher matcher_URL = pattern_URL.matcher(EN[i]); //Checks in the curren Array part.
if (matcher_URL.find() == true) {
source_number++;
String extractedURL = matcher_URL.group(0);
//System.out.println(extractedURL);
String extractedURL_fully = extractedURL.replaceAll("href=\"", ""); //Anführungszeichen
//System.out.println(extractedURL_fully);
String nobracketURL = extractedURL.replaceAll("\\)", ""); //Remove round brackets from URL
EN[i] = EN[i].replaceAll("\\)\"", "\""); /*Replace round brackets from URL in Array. (For some reasons there have been href URLs, with an bracket at the end. This was already in the PDF. They were causing massive problems, because it didn't comment them out, so the entire replaceAll command didn't function.)*/
EN[i] = EN[i].replaceAll(nobracketURL, "[" + source_number + "]"); //Replace URL tags with number and Edgy brackets
}
else{
//System.out.println("FALSE: " + "[" + i + "]");
}
}
The whole idea of this is, that it loops through the array and replaces all the URLs, including it's starting tag <a until the end of the starting tag "> (which can also be seen in the pattern regex.)

Correct me if I'm wrong, but what you need is to eliminate all the <a> tags from a given string, right? If that's the case all you needed to do was use a code like the following:
final String string = "<a href=\"http://www.example.com\">Sen";
final Pattern pattern = Pattern.compile("<a(.+?)>", Pattern.DOTALL);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll("");
System.out.println(result); // prints "Sen"
Notice I didn't use the replaceAll from the String object, but from the Matcher object. This replaces all matches for the empty string "".

Cleaning a file name in Java

I want to write a script that will clean my .mp3 files.
I was able to write a few line that change the name but I want to write an automatic script that will erase all the undesired characters $%_!?7 and etc. while changing the name in the next format Artist space dash Song.
File file = new File("C://Users//nikita//Desktop//$%#Artis8t_-_35&Son5g.mp3");
String Original = file.toString();
String New = "Code to change 'Original' to 'Artist - Song'";
File file2 = new File("C://Users//nikita//Desktop//" + New + ".mp3");
file.renameTo(file2);
I feel like I should make a list with all possible characters and then run the String through this list and erase all of the listed characters but I am not sure how to do it.
String test = "$%$#Arti56st_-_54^So65ng.mp3";
Edit 1:
When I try using the method remove, it still doesn't change the name.
String test = "$%$#Arti56st_-_54^So65ng.mp3";
System.out.println("Original: " + test);
test.replace( "[0-9]%#&\\$", "");
System.out.println("New: " + test);
The code above returns the following output
Original: $%$#Arti56st_-_54^So65ng.mp3
New: $%$#Arti56st_-_54^So65ng.mp3

I'd suggest something like this:
public static String santizeFilename(String original){
Pattern p = Pattern.compile("(.*)-(.*)\\.mp3");
Matcher m = p.matcher(original);
if (m.matches()){
String artist = m.group(1).replaceAll("[^a-zA-Z ]", "");
String song = m.group(2).replaceAll("[^a-zA-Z ]", "");
return String.format("%s - %s", artist, song);
}
else {
throw new IllegalArgumentException("Failed to match filename : "+original);
}
}
(Edit - changed whitelist regex to exclude digits and underscores)
Two points in particular - when sanitizing strings, it's a good idea to whitelist permitted characters, rather than blacklisting the ones you want to exclude, so you won't be surprised by edge cases later. (You may want a less restrictive whitelist than I've used here, but it's easy to vary)
It's also a good idea to handle the case that the filename doesn't match the expected pattern. If your code comes across something other than an MP3, how would you like it to respond? Here I've through an exception, so the calling code can catch and handle that appropriately.

String new = original.replace( "[0-9]%#&\\$", "")
this should replace almost all the characters you don't want
or you can come up with your own regex
https://docs.oracle.com/javase/tutorial/essential/regex/

Regex convert to convert a string to tab delimited field

I want to convert a string to get tab delimited format. In my opinion option 1 should do it. But it looks like option 2 is actually producing the desired result. Can someone explain why?
public class test {
public static void main(String[] args) {
String temp2 = "My name\" is something";
System.out.println(temp2);
System.out.println( "\"" + temp2.replaceAll("\"", "\\\"") +"\""); //option 1
System.out.println( "\"" + temp2.replaceAll("\"", "\\\\\"") +"\""); //option 2
if(temp2.contains("\"")) {
System.out.println("Identified");
}
}
}
and the output is:
My name" is something
"My name" is something"
"My name\" is something"
Identified

If you want an Excel compatible CSV format, the escaping of the double quote is two double quotes, so called self-escaping.
String twoColumns = "\"a nice text\"\t\"with a \"\"quote\"\".";
String s = "Some \"quoted\" text.";
String s2 = "\"" + s.replace("\"", "\"\"") + "\"";
And ... no head-ache counting the backslashes.

Use String#replace(CharSequence, CharSequence) instead of String#replaceAll(). The former is a simple string replacement, so it works as you'd expect if you haven't read any documentation or don't know about regular expressions. The latter interprets its arguments differently because it's a regex find-and-replace:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string.
You'll get this output:
My name" is something
"My name\" is something"
"My name\\" is something"
Identified

More efficient way to make a string in a string of just words

I am making an application where I will be fetching tweets and storing them in a database. I will have a column for the complete text of the tweet and another where only the words of the tweet will remain (I need the words to calculate which words were most used later).
How I currently do it is by using 6 different .replaceAll() functions which some of them might be triggered twice. For example I will have a for loop to remove every "hashtag" using replaceAll().
The problem is that I will be editing as many as thousands of tweets that I fetch every few minutes and I think that the way I am doing it will not be too efficient.
What my requirements are in this order (also written in comments down bellow):
Delete all usernames mentioned
Delete all RT (retweets flags)
Delete all hashtags mentioned
Replace all break lines with spaces
Replace all double spaces with single spaces
Delete all special characters except spaces
Here is a Short and Compilable Example:
public class StringTest {
public static void main(String args[]) {
String text = "RT #AshStewart09: Vote for Lady Gaga for \"Best Fans\""
+ " at iHeart Awards\n"
+ "\n"
+ "RT!!\n"
+ "\n"
+ "My vote for #FanArmy goes to #LittleMonsters #iHeartAwards"
+ " htt…";
String[] hashtags = {"#FanArmy", "#LittleMonsters", "#iHeartAwards"};
System.out.println("Before: " + text + "\n");
// Delete all usernames mentioned (may run multiple times)
text = text.replaceAll("#AshStewart09", "");
System.out.println("First Phase: " + text + "\n");
// Delete all RT (retweets flags)
text = text.replaceAll("RT", "");
System.out.println("Second Phase: " + text + "\n");
// Delete all hashtags mentioned
for (String hashtag : hashtags) {
text = text.replaceAll(hashtag, "");
}
System.out.println("Third Phase: " + text + "\n");
// Replace all break lines with spaces
text = text.replaceAll("\n", " ");
System.out.println("Fourth Phase: " + text + "\n");
// Replace all double spaces with single spaces
text = text.replaceAll(" +", " ");
System.out.println("Fifth Phase: " + text + "\n");
// Delete all special characters except spaces
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();
System.out.println("Finaly: " + text);
}
}

Relying on replaceAll is probably the biggest performance killer as it compiles the regex again and again. The use of regexes for everything is probably the second most significant problem.
Assuming all usernames start with #, I'd replace
// Delete all usernames mentioned (may run multiple times)
text = text.replaceAll("#AshStewart09", "");
by a loop copying everything until it founds a #, then checking if the following chars match any of the listed usernames and possibly skipping them. For this lookup you could use a trie. A simpler method would be a replaceAll-like loop for the regex #\w+ together with a HashMap lookup.
// Delete all RT (retweets flags)
text = text.replaceAll("RT", "");
Here,
private static final Pattern RT_PATTERN = Pattern.compile("RT");
is a sure win. All the following parts could be handled similarly. Instead of
// Delete all special characters except spaces
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();
you could use Guava's CharMatcher. The method removeFrom does exactly what you did, but collapseFrom or trimAndCollapseFrom might be better.

According to the now closed question, it all boils down to
tweet = tweet.replaceAll("#\\w+|#\\w+|\\bRT\\b", "")
.replaceAll("\n", " ")
.replaceAll("[^\\p{L}\\p{N} ]+", " ")
.replaceAll(" +", " ")
.trim();
The second line seems to be redundant as the third one does remove \n too. Changing the first line's replacement to " " doesn't change the outcome an allows to aggregate the replacements.
tweet = tweet.replaceAll("#\\w*|#\\w*|\\bRT\\b|[^##\\p{L}\\p{N} ]+", " ")
.replaceAll(" +", " ")
.trim();
I've changed the usernames and hashtags part to eating also lone # or #, so that it doesn't need to be consumed by the special chars part. This is necessary for corrent processing of strings like !#AshStewart09.
For maximum performance, you surely need a precompiled pattern. I'd also re-suggest to use Guava's CharMatcher for the second part. Guava is huge (2 MB I guess), but you surely find more useful things there. So in the end you can get
private static final Pattern PATTERN =
Pattern.compile("#\\w*|#\\w*|\\bRT\\b|[^##\\p{L}\\p{N} ]+");
private static final CharMatcher CHAR_MATCHER = CharMacher.is(" ");
tweet = PATTERN.matcher(tweet).replaceAll(" ");
tweet = CHAR_MATCHER.trimAndCollapseFrom(tweet, " ");

You can inline all of the things that are being replaced with nothing into one call to replace all and everything that is replaced with a space into one call like so (also using a regex to find the hashtags and usernames as this seems easier):
text = text.replaceAll("#\w+|#\w+|RT", "");
text = text.replaceAll("\n| +", " ");
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();

JAVA - Ignore part of strings containing "#"

I'm having some difficulties in excluding part of strings after the "#" symbol.
I explain myself better:
This is a sample input text a user could insert in a textbox:
Some Text
Some Text again #A comment
#A comment line
Another Text
Another Text again#Comment
I need to read this text and ignore all text after "#" symbol.
This should be the expected output:
Some Text;Some Text again;Another Text;Another Text again
As for now here's the code:
This replaces all newlines with ";"
readText = userInputTextArea.getText();
readTextAllInALine = readText.replaceAll("\\n", ";");
so the output after this is:
Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment
This code is to ignore all characters after the first "#" but works fine just for the first line if we read it all sequentially.
int startIndex = inputCommandText.indexOf("#");
int endIndex = inputCommandText.indexOf(";");
String toBeReplaced = inputCommandText.substring(startIndex, endIndex);
readTextAllInALine.replace(toBeReplaced, "");
I'm stuck in finding a way for having the expected output. I was thinking of using a StringTokenizer, processing every line, removing text after "#" or ignoring the whole line if it starts with "#", and then printing all tokens (i.e. all lines) separating them with ";" but I cannot make it work.
Any help will be appreciated.
Thank you very much in advance.
Regards.

Just call this replace command on your pure string, retrieved from the text input. The regex #[^;]* grabs everything, starting at the hash until it reads a semicolon. Afterwards it replaces it with an empty string.
public static void main(String[] args) {
String text = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment";
System.out.println(text);
text = text.replaceAll("#[^;]*", "");
System.out.println(text);
}

A regex is useful here but it's tricky because your pattern is moderately complex. The comments are end line so they can appear in more than one arrangement.
I came up with the following which is a two-pass:
replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";");
The two-pass circumvents the fact that sometimes you get a duplicate line break. The first expression replaces comments but not new line characters and the second expression replaces multiple new line characters with a single semicolon.
The individual parts of the expression in the first pass are the following:
" *"
This includes zero or more leading spaces in the comment match. IE in "...again #A...", we want to remove that space between n and #.
"(#.* )"
The start of the comment match: matches a # followed by zero or more characters. (Typically the . matches any character except a new line.)
"(?= )"
This is a positive lookahead and where the regex starts to get tricky. It looks for whatever is inside this expression but doesn't include it in the text that's matched. It asserts that the #.* is followed by a certain string but doesn't replace that certain string.
"\\n|$"
The lookahead finds a new line or the end anchor. This will find a comment ended with a new line character or a comment that is at the end of the String. But again, since it's inside the lookahead, the new line doesn't get replaced.
So given the input:
String text = (
"Some Text" + '\n' +
"Some Text again #A comment" + '\n' +
"#A comment line" + '\n' +
"Another Text" + '\n' +
"Another Text again#Comment"
);
System.out.println(
text.replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";")
);
The output is:
Some Text;Some Text again;Another Text;Another Text again

readText = userInputTextArea.getText();
readText = readText.replaceAll("\\s*#[^\n]*", "");
readText = readText.replaceAll("\n+", ";");

Just to make it clear, Coxer's reply is the way to go. Far more precise and clean. But in any case, if you fancy experimenting here is a recursive solution that will work:
public class IgnoreHash {
#Test
public void test() {
String readTextAllInALine = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment;";
String actualResult = removeHashComments(readTextAllInALine);
Assert.assertEquals(actualResult, "Some Text;Some Text again ;Another Text;Another Text again");
}
private String removeHashComments(String input) {
StringBuffer result = new StringBuffer();
int hashIndex = input.indexOf("#");
int endIndex = input.indexOf(";");
if(hashIndex != -1){
result.append(input.substring(0, hashIndex));
//first line
if(hashIndex < endIndex ) {
result.append(removeHashComments(input.substring(endIndex)));
} // the case of ;#
else if (endIndex == hashIndex-1) {
int endIndex2 = input.indexOf(";", hashIndex+1);
result.append(removeHashComments(input.substring(endIndex2+1)));
}
else {
result.append(removeHashComments(input.substring(hashIndex)));
}
}
return result.toString();
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to replace text using ReplaceFirst() without case sensitivity - java

Related

Java won't replace all strings, because there is text next to the tags (post improved)

Cleaning a file name in Java

Regex convert to convert a string to tab delimited field

More efficient way to make a string in a string of just words

JAVA - Ignore part of strings containing "#"

Categories

Resources