Java : Omitting a Word in a String - java

I have a String , from which i need to omit a particular word from it .
As shown below the String may contain a Word "Baci" OR "BACI" in it
I have written a sample program shown below which works fine , but i want to know if there is better way to do it ??
public class Test {
public static void main(String args[]) {
String str = "Mar 14 Baci WIC";
if(str!=null&&!str.isEmpty())
{
if(str.contains("Baci") || str.contains("BACI"))
{
str = str.replaceAll("(?i) Baci", "");
}
}
System.out.println(str);
}
}

I think better way here will be to not additionally check the existance of "Baci", i.e. without the following if check
if(str.contains("Baci") || str.contains("BACI"))

You could improve it a little by using the \b regexp (which matches a "word boundary") :
str = str.replaceAll("(?i) Baci\\b", "");
That way, you code will not replace "my bacil is..." with "myl is..."

Your second if condition is unnecessary, since replaceAll() will replace zero or more occurrences of the String without error.

you can .toUpperCase your String and then only ask for contains("BACI"). Inside the if block, then just call replace twice with both Baci and BACI.
Thinking it again, I think it's better just calling replace twice without asking if your String contains it or not. If it doesn't find anything to replace, then it won't replace nothing.
Hope it would be useful!

Related

Trim unwanted characters in a Java String

I have few Java Strings like below:
ab-android-regression-4.4-git
ab-ios-regression-4.4-git
ab-tablet-regression-4.4-git
However, I do not want such lengthy and unwanted names and so I want to get rid of starting ab- and ending -git part. The pattern for all the Strings is the same (starts with ab and ends with git)
Is there a function/class in Java that will help me in trimming such things? For example, something like:
String test = "ab-android-regression-4.4-git";
test.trim(ab, git)
Also, can StringUtils class help me with this? Thoughts on regular expressions?
EDITED PART: I also want to know how to eliminate the - characters in the Strings and change everything to uppercase letters
Here's a method that's more general purpose to remove a prefix and suffix from a string:
public static String trim (String str, String prefix, String suffix)
{
int indexOfLast = str.lastIndexOf(suffix);
// Note: you will want to do some error checking here
// in case the suffix does not occur in the passed in String
str = str.substring(0, indexOfLast);
return str.replaceFirst(prefix, "");
}
Usage:
String test = "ab-android-regression-4.4-git";
String trim = trim(test, "ab-", "-git"));
To remove the "-" and make uppercase, then just do:
trim = trim.replaceAll("-", " ").toUpperCase();
You can use test = test.replace("ab-", "") and similar for the "-git" or you can use test = StringUtils.removeStart(test, "ab-") and similarly, removeEnd.
I prefer the latter if you can use StringUtils because it won't ever accidentally remove the middle of the filename if those expressions are matched.
Since the parts to trim are constant in size, you should simply use substring :
yourString.substring(3, yourString.length - 4)
If your string always contains ab- at the begining and -git at the end then here is the code
String test = "ab-android-regression-4.4-git";
test=test.substring(3, s.length() - 4);
System.out.println("s is"+s); //output is android-regression-4.4
To know more about substrings click https://docs.oracle.com/javase/tutorial/java/data/manipstrings.html

How can i find whether the string starts with 's','r','p' in java

Example: This is my string,
String sample = "s5656";
If the first character of the string contains 's' or 'p' or 'r' means i should remove the character,Otherwise i have to
return the original string.
Is there any optimized way to do that like "regex" or "StringUtils" in apache common?
Why do you want to add 3rd party jar for this kind of simple requirement? You can try as follows
String sample = "s5656";
if(sample.startsWith("s")||sample.startsWith("r")||sample.startsWith("p")){
// do necessary
}else{
// do necessary
}
String#startsWith()
A simple regex could solve your problem :
public static void main(String[] args) {
String s = "s5656s";
System.out.println(s.replaceFirst("^[spr]", "")); // a String which begins with s,p or r
}
O/P:
5656s
PS: regex here leads to smaller/simpler but inefficient code. Use Ruchira's answer for a rather long but efficient code. :)
^(s|p|r)
Try this.Use yourString.replaceAll() / replaceFirst() with empty string.Use m.
See demo.
http://regex101.com/r/dZ1vT6/49
I should go for replaceAll function with multiline modifier (?m).
String s = "s5656s\n" +
"r878dsjhj\n" +
"fshghg";
System.out.println(s.replaceAll("(?m)^[spr]", ""));
Output:
5656s
878dsjhj
fshghg

How to properly use java Pattern object to match string patterns

I wrote a code that does several string operations including checking whether a given string matches with a certain regular expression. It ran just fine with 70,000 input but it started to give me out of memory error when I iteratively ran it for five-fold cross validation. It just might be the case that I have to assign more memory, but I have a feeling that I might have written an inefficient code, so wanted to double check if I didn't make any obvious mistake.
static Pattern numberPattern = Pattern.compile("^[a-zA-Z]*([0-9]+).*");
public static boolean someMethod(String line) {
String[] tokens = line.split(" ");
for(int i=0; i<tokens.length; i++) {
tokens[i] = tokens[i].replace(",", "");
tokens[i] = tokens[i].replace(";", "");
if(numberPattern.matcher(tokens[i]).find()) return true;
}
return false;
}
and I have also many lines like below:
token.matches("[a-z]+[A-Z][a-z]+");
Which way is more memory efficient? Do they look efficient enough? Any advice is appreciated!
Edited:
Sorry, I had a wrong code, which I intended to modify before posting this question but I forgot at the last minute. But the problem was I had many similar looking operations all over, aside from the fact that the example code did not make sense, I wanted to know if regexp comparison part was efficient.
Thanks for all of your comments, I'll look through and modify the code following the advice!
Well, first at all, try a second look at your code... it will always return a "true" value ! You are not reading the 'match' variable, just putting values....
At second, String is immutable, so, each time you're splitting, you're creating another instances... why don't you try so create a pattern that makes the matches you want ignoring the commas and semicolons? I'm not sure, but I think it will take you less memory...
Yes, this code is inefficient indeed because you can return immediately once you've found that match = true; (no point to continue looping).
Further, are you sure you need to break the line into tokens ? why not check the regex only once ?
And last, if all comparisons checks failed, you should return false (last line).
Instead of altering the text and splitting it you can put it all in the regex.
// the \\b means it must be the start of the String or a word
static Pattern numberPattern = Pattern.compile("\\b[a-zA-Z,;]*[0-9,;]*[0-9]");
// return true if the string contains
// a number which might have letters in front
public static boolean someMethod(String line) {
return numberPattern.matcher(line).find());
}
Aside from what #alfasin has mentioned in his answer, you should avoid duplicating code; Rewrite the following:
{
tokens[i] = tokens[i].replace(",", "");
tokens[i] = tokens[i].replace(";", "");
}
Into:
tokens[i] = tokens[i].replaceAll(",|;", "");
And please just compute this before it was .split(), such that the operation doesn't have to be repeated within the loop:
String[] tokens = line.replaceAll(",|;", "").split(" ");
^^^^^^^^^^^^^^^^^^^^^^
Edit: After staring at your code for a bit I think I have a better solution, using regex ;)
public static boolean someMethod(String line) {
return Pattern.compile("\\b[a-zA-Z]*\\d")
.matcher(line.replaceAll(",|;", "")).find();
}
Online Regex DemoOnline Code Demo
\b is a Word Boundary.
It asserts position at the Boundary of a word (Start of line + after spacing)
Code Demo STDOUT:
foo does not match
bar does not match
bar1 does match
foo baz bar bar1 lolz does match
password_01 does not match

How to replace || (two pipes) from a string with | (one) pipe

I am getting response for some images in json format within this tag:
"xmlImageIds":"57948916||57948917||57948918||57948919||57948920||57948921||57948‌ ​922||57948923||57948924||57948925||57948926||5794892"
What i want to do is to separate each image id using .split("||") of the string class. Then append url with this image id and display it.
I have tried .replace("\"|\"|","\"|"); but its not working for me. Please help.
EDIT: Shabbir, I tried to update your question according to your comments below. Please edit it again, if I didn't get it right.
Use
.replace("||", "|");
| is no special char.
However, if you are using split() or replaceAll instead of replace(), beware that you need to escape the pipe symbol as \\|, because these methods take a regex as parameter.
For example:
public static void main(String[] args) {
String in = "\"xmlImageIds\":\"57948916||57948917||57948918||57948919||57948920||57948921||57948‌922||57948923||57948924||57948925||57948926||5794892\"".replace("||", "|");
String[] q = in.split("\"");
String[] ids = q[3].split("\\|");
for (String id : ids) {
System.out.println("http://test/" + id);
}
}
I think I know what your problem is. You need to assign the result of replace(), not just call it.
String s = "foo||bar||baz";
s = s.replace("||", "|");
System.out.println(s);
I tested it, and just calling s.replace("||", "|"); doesn't seem to modify the string; you have to assign that result back to s.
Edit: The Java 6 spec says "Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar." (the emphasis is mine).
According to http://docs.oracle.com/javase/6/docs/api/java/lang/String.html, replace() takes chars instead of Strings. Perhaps you should try replaceAll(String, String) instead? Either that, or try changing your String ("") quotation marks into char ('') quotation marks.
Edit: I just noticed the overload for replace() that takes a CharSequence. I'd still give replaceAll() a try though.
String pipe="pipes||";
System.out.println("Old Pipe:::"+pipe);
System.out.println("Updated Pipe:::"+pipe.replace("||", "|"));
i dont remember how it works that method... but you can make your own:
String withTwoPipes = "helloTwo||pipes";
for(int i=0; i<withTwoPipes.lenght;i++){
char a = withTwoPipes.charAt(i);
if(a=='|' && i<withTwoPipes.lenght+1){
char b = withTwoPipes.charAt(i+1);
if(b=='|' && i<withTwoPipes.lenght){
withTwoPipes.charAt(i)='';
withTwoPipes.charAt(i+1)='|';
}
}
}
I think that some code like this should work... its not a perfect answer but can help...

Word By Word Comparison

I want to ask if anyone knows whether Java has built in library for doing something like the following.
For instance,
I have 2 Strings which are:
String a = "Yeahh, I love Java programming.";
String b = "love";
I want to check whether the String b which contains "love" is part of the tokens in the String a. Hence, I want to ask are there any Java API to do so?
I want something like,
a.contains (b) <--------- Return true result
Are there any???
because,,,,if there's no Java API for that, I would write my own algorithm then.
Thanks in advance for any helps..^^
The suggestions so far (indexOf, contains) are all fine if you just want to find substrings. However, given the title of your question, I assume you actually want to find words. For instance, if asked whether "She wore black gloves" contained "love" my guess is you'd want the answer to be no.
Regular expressions are probably the best way forward here, using a word boundary around the word in question:
import java.util.regex.*;
public class Test
{
public static void main(String[] args)
{
System.out.println(containsWord("I love Java", "love"));
System.out.println(containsWord("She wore gloves", "love"));
System.out.println(containsWord("start match", "start"));
System.out.println(containsWord("match at end", "end"));
}
public static boolean containsWord(String input, String word)
{
Pattern pattern = Pattern.compile("\\b" + Pattern.quote(word) + "\\b");
return pattern.matcher(input).find();
}
}
Output:
true
false
true
true
string strToCheck = "check me";
int firstOccurence = strToCheck .indexOf("me");
//0 if no any
the method contains(CharSequence s) exist in the class String.
So your a.contains(b) will work
You can use the indexOf() function in the String library to check for it:
if(a.indexOf(b)>=0)
return true;
Yes, there is: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#contains(java.lang.CharSequence).
If you want to find the words will be quite easy, you just have to use split.
//Split to an array using space as delimiter
String[] arrayOfWords = a.split(" ");
And then you'll compare like:
"love".equalsIgnoreCase(arrayOfWords[i]);
something like that, you get the idea

Categories

Resources