Extract substring in Java

Extract substring in Java - java

I got a string like this:
string = "item=somevalue&user=user1";
And I need to find a way to extract, in Java, the substring "somevalue" (i.e. the substring after item= and before &).

JUst use a positive lookbehind and positive lookahead assertions like below,
(?<=item=).*?(?=&)
OR
(?<=item=)[^&]*(?=&)
Explanation:
(?<=item=) string which preceeds the match must be ietm=
[^&]* Match any character but not of & symbol zero or more times.
(?=&) Character which follows the match must be & symbol.
Code:
String s = "item=somevalue&user=user1";
Pattern regex = Pattern.compile("(?<=item=).*?(?=&)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(0));
}
Output:
somevalue

Try following code:
String test= "item=somevalue&user=user1";
String tok[]=test.split("&");
String finalTok[]=tok[0].split("=");
System.out.println(finalTok[1]);
Output :
somevalue

public static void main(String[] args) {
String str = "item=somevalue&user=user1";
String result = "item=somevalue&user=user1".substring(str.indexOf("=") + 1, str.indexOf("&"));
System.out.println(result);
}
Output:
somevalue

One line solution!
System.out.println(string.substring(string.indexOf("=")+1, string.indexOf("&")));
Or
if in case 'somevalue' place is changed add following code!
string = "user=user1&item=somevalue"; System.out.println(string.substring(string.lastIndexOf("=")+1));

Splitting strings on special characters is an often required task and in my opinion regex is quite overkill and has bad performance for such a simple task. Everybody should have some performant string utils for often used task.
e.g. you can do this
for (String s : splitToIterable(str, '&')) {
if (s.startsWith("item=")) {
String itemValue = s.substring(5);
}
}
if you have a helper method like this
public static Iterable<String> splitToIterable(final String str, final char delim) {
if (str == null) {
return null;
}
return new Iterable<String>() {
public Iterator<String> iterator() {
return new Iterator<String>() {
int lastIndex = 0;
String next = fetchNext();
public boolean hasNext() {
return next != null;
}
public String next() {
if (next == null) {
throw new NoSuchElementException();
}
String result = next;
next = fetchNext();
return result;
}
public String fetchNext() {
if (lastIndex == -1) {
return null;
}
String next;
int i = str.indexOf(delim, lastIndex);
if (i > -1) {
next = str.substring(lastIndex, i);
lastIndex = i + 1;
}
else {
next = str.substring(lastIndex);
lastIndex = -1;
}
return next;
}
public void remove() {
throw new UnsupportedOperationException();
}
};
}
};
}

Related

Method that corrects the text, inserting spaces after the comma in Java

I want to create a refactorSeparators method that takes an object of the String type as an argument. Method returns the text from the input object corrected so that if there is any
a comma or a period and there is no space after it, it will insert this space.I'm stuck, don't know what to do next, below is my code. How can I finish it? I wonder how to write this part: if (s.equals(".") && i.next().equals())
public class Separator {
public static void main(String[] args) {
String text = "Periods,hyphens, the last two characters cannot be a period. The rest of them don't. And there you go.";
ArrayList<String> stringArr = new ArrayList<>();
String[] arrOfStr = text.split("");
Iterator i = stringArr.iterator();
for (String s : arrOfStr) {
stringArr.add(s);
System.out.println("{" +s + "}");
}
for (String s : arrOfStr) {
if (s.equals(".") && i.next().equals()) {
String space = " ";
stringArr.add(i.next(, " ");
} else {
System.out.println("error");
}
}
}}

You're over-thinking it:
String refactorSeparators(String str) {
return str.replaceAll("([,.]) ?", "$1 ").trim();
}
The regex ([,.]) ? matches a comma or dot optionally followed by a space, which is replaced with the dot/comma and a space. The trim() removes the space that would be added if there's a dot at the end of the input.

Your main problem is here:
Iterator i = stringArr.iterator();
// ...
for (String s : arrOfStr) {
if (s.equals(".") && i.next().equals()) {
You are iterating both with an iterator and a for-loop, that makes life complicated. I'd suggest using a for-loop over the index.
Also your second equals expression is missing an argument.
Since Bohemian already posted the regex one liner, I might as well post my solution too:
public class PunctuationFixer {
private String text;
private int index;
public static String addMissingWhitespaceAfterPunctuation(String input) {
return new PunctuationFixer(input).fix();
}
private PunctuationFixer(String input) {
this.text = input;
this.index = 0;
}
private String fix() {
while (index < this.text.length() - 1) {
fix(this.text.charAt(index));
index++;
}
return this.text;
}
private void fix(char current) {
if (current == '.' || current == ',') {
addSpaceIfNeeded();
}
}
private void addSpaceIfNeeded() {
if (' ' != (text.charAt(index + 1))) {
text = text.substring(0, index + 1) + " " + text.substring(index + 1);
index++;
}
}
public static void main(String[] args) {
String text = "Periods,hyphens, the last two characters cannot be a period. The rest of them don't. And there you go.";
System.out.println(addMissingWhitespaceAfterPunctuation(text));
}
}

That's an typical parse issue. You have to remember the last char you've read and deppendend on your current char, you know what to do.
public static void main(String[] args) {
String text = "Periods,hyphens, the last two characters cannot be a period.The rest of them don't. And there you go.";
StringBuilder fixedString = new StringBuilder();
String[] arrOfStr = text.split("");
String lastChar = null;
for(String currentChar : arrOfStr) {
if(lastChar != null && (lastChar.equals(",") || lastChar.equals(".")) && !currentChar.equals(" "))
fixedString.append(" ");
fixedString.append(currentChar);
lastChar = currentChar;
}
System.out.println(fixedString);
}
Side note: The shortest and most elegant solution is, of course, #bohemian's. But I suspect it is not in the sense of the homework! ;-)

java regex negate with boundaries (square bracktes)

I would appreciate if anybody could help me with a JAVA regex requirement
I got a String like "/ABC/KLM[XYZ/ABC/KLM]/ABC"
I want to replace all ABC not surround by square brackets.
In this case only the first and last ABC should be found.
But not ABC in the middle because it is surrounded with square brackets

You cannot do this without a recursive regular expression. Java does not support this within the standard libraries, but flavours of regex found in Perl or .NET do. This is in essence the same problem as trying to match content within HTML tags - by far the easiest way to do it is using a stack-based parser.

Solution is here:
public class MergeParentAndChildXPATH {
public static void main(String[] args) {
String substringToBeFound = "ABC";
String toReplayceWith = "XXX";
String xPathFromExcel = "/UTILMD/ABC/[XYZ/ABC/KLM]KLM[XYZ/ABC/[XYZ/ABC/KLM]KLM]/ABC";
System.out.println("original String\t"+xPathFromExcel);
String manupulatedString = mergeParentAndChildXPATH(substringToBeFound, toReplayceWith,xPathFromExcel);
System.out.println("manipulated String\t"+manupulatedString);
}
public static String mergeParentAndChildXPATH(String substringToBeFound, String toReplayceWith, String xPathFromExcel ) {
StringBuffer sbManipulatedString = new StringBuffer();
int lengthABC = substringToBeFound.length();
CharStack charStack = new CharStack();
String substringAfterMatch = "";
while (xPathFromExcel.indexOf(substringToBeFound)>-1) {
int matchStartsAt = xPathFromExcel.indexOf(substringToBeFound);
int matchEndssAt = xPathFromExcel.indexOf(substringToBeFound)+lengthABC;
String substringBeforeMatch = xPathFromExcel.substring(0, matchStartsAt);
substringAfterMatch = xPathFromExcel.substring(matchStartsAt+lengthABC);
String substringMatch = xPathFromExcel.substring(matchStartsAt, matchEndssAt);
// System.out.println("Loop Count\t"+loopCount);
// System.out.println("substringBeforeMatch\t"+substringBeforeMatch);
// System.out.println("substringAfterMatch\t"+substringAfterMatch);
// System.out.println("starts "+matchStartsAt+ " ends "+matchEndssAt);
// System.out.println("Output of match: "+substringMatch);
// now tokenize the string till match is reached and memorize brackets via Stack
String sTokenize = xPathFromExcel;
for (int i = 0; i < matchStartsAt; i++) {
char ch = sTokenize.charAt(0);
// System.out.println(ch);
// System.out.println(sTokenize.substring(0,1));
if (ch == '[') {
charStack.push(ch);
}
if (ch == ']') {
charStack.pop();
}
sTokenize = sTokenize.substring(1);
}//for
if (charStack.empty()) {
substringMatch = substringMatch.replaceAll(substringMatch, toReplayceWith);
}
//
sbManipulatedString.append(substringBeforeMatch + substringMatch);
// System.out.println("manipulatedString\t"+sbManipulatedString.toString());
xPathFromExcel = substringAfterMatch;
// System.out.println("remaining String\t"+substringAfterMatch);
}
return (sbManipulatedString.toString()+substringAfterMatch);
}
}
import java.util.Stack;
public class CharStack {
private Stack theStack;
CharStack() {
theStack = new Stack();
}
public char peek() {
Character temp = (Character) theStack.peek();
return temp.charValue();
}
public void push(char c) {
theStack.push(new Character(c));
}
public char pop() {
char temp = (Character) theStack.pop();
return temp;
}
public boolean empty() {
return theStack.empty();
}
}

regex email validation

this is a continuation of my earlier post, My code:
public class Main {
static String theFile = "C:\\Users\\Pc\\Desktop\\textfile.txt";
public static boolean validate(String input) {
boolean status = false;
String REGEX = "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
status = true;
} else {
status = false;
}
return status;
}
public static void main(String[] args) {
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(theFile));
String line;
int count = 0;
while ((line = br.readLine()) != null) {
String[] arr = line.split("#");
for (int x = 0; x < arr.length; x++) {
if (arr[x].equals(validate(theFile))) {
count++;
}
System.out.println("no of matches " + count);
}
}
} catch (IOException e) {
e.printStackTrace();
}
Main.validate(theFile);
}
}
It shows result :
no of matches 0
no of matches 0
no of matches 0
no of matches 0
and this is my text in input file
sjbfbhbs#yahoo.com # fgfgfgf#yahoo.com # ghghgh#gamil.com #fhfbs#y.com
my output should be three emails because the last string is not a standard email format
I know I'm not suppose to pass (arr[x].validate(theFile)))

I have always used this:
public static bool Validate(string email)
{
string expressions = #"^\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*$";
return Regex.IsMatch(email, expressions);
}
Note: I also have a function that "cleans" the string should there be multiple # symbols also.
Edit: Here is how I clean out extra # symbols. Note this will keep the FIRST # it finds and just remove the rest. This function should be used BEFORE you run the validation function on it.
public static string CleanEmail(string input)
{
string output = "";
try
{
if (input.Length > 0)
{
string first_pass = Regex.Replace(input, #"[^\w\.#-]", "");
List<string> second_pass = new List<string>();
string third_pass = first_pass;
string final_pass = "";
if (first_pass.Contains("#"))
{
second_pass = first_pass.Split('#').ToList();
if (second_pass.Count >= 2)
{
string second_pass_0 = second_pass[0];
string second_pass_1 = second_pass[1];
third_pass = second_pass_0 + "#" + second_pass_1;
second_pass.Remove(second_pass_0);
second_pass.Remove(second_pass_1);
}
}
if (second_pass.Count > 0)
{
final_pass = third_pass + string.Join("", second_pass.ToArray());
}
else
{
final_pass = third_pass;
}
output = final_pass;
}
}
catch (Exception Ex)
{
}
return output;
}

There are several errors in your code:
if (arr[x].equals(validate(theFile))) checks whether the mail address string equals the boolean value you get from the validate() method. This will never be the case.
In the validate() method, if you only want to check if the string matches a regex, you can simply do that with string.matches(pattern) - so you only need one line of code (not really in error, but it's more elegant this way)
After splitting your input string (the line), there are whitespaces left, because you only split at the # symbol. You can either trim() each string afterwards to remove those (see the code below) or split() at \\s*#\\s* instead of just #
Here is an example with all the fixes (i left out the part where you read the file and used a string with your mail addresses instead!):
public class Main {
private static final String PATTERN_MAIL
= "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#" + "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
public static boolean validate(String input) {
return input.matches(PATTERN_MAIL);
}
public static void main(String[] args) {
String line = "sjbfbhbs#yahoo.com # fgfgfgf#yahoo.com # ghghgh#gamil.com #fhfbs#y.com";
String[] arr = line.split("#");
int count = 0;
for (int x = 0; x < arr.length; x++) {
if (validate(arr[x].trim())) {
count++;
}
System.out.println("no of matches " + count);
}
}
}
It prints:
no of matches 1
no of matches 2
no of matches 3
no of matches 4
EDIT: If the pattern is not supposed to match the last mail address, you'll have to change the pattern. Right now it matches all of them.

does text contains substring

How to check if some String contains a specific String like "ABC72961". So we search for String which starts with "ABC" following by 5 digits. I've implemented a algorithm but I want it with "matches" or somehow else and then check the speed of these two solutions.

You may want to use regex for this
^ABC[0-9]{5}$
^ : Beginning of the string
ABC : Matches ABC literally (case-sensitive)
[0-9]{5} : Matches 5x numbers from 0 to 9
$ : End of the string
And use String#matches to test it
Regex101
Example
String regex = "^ABC[0-9]{5}$";
String one = "ABC72961";
String two = "ABC2345";
String three = "AB12345";
String four = "ABABABAB";
System.out.println(one.matches(regex)); // true
System.out.println(two.matches(regex)); // false
System.out.println(three.matches(regex)); // false
System.out.println(four.matches(regex)); // false
EDIT
Seeing your comment, you want it to work for String one = "textABC72961text" also. For that to be possible, you should just erase ^ and $ that limit the String.
.*ABC[0-9]{5}.*
EDIT 2
Here is if you want to extract it
if (s.matches(".*ABC[0-9]{5}.*")) {
Matcher m = Pattern.compile("ABC[0-9]{5}").matcher(s);
m.find();
result = m.group();
}

str.contains("ABC72961");
Returns true if str contains the string. False if not.

public String getString() {
String str = extractString();
return str;
}
public boolean exists() {
return !getString().trim().equals("") ? false : true;
}
private List<Integer> getPositionsOfABC() {
List<Integer> positions = new ArrayList<>();
int index = text.indexOf("ABC");
while (index > 0) {
positions.add(index);
index = text.indexOf("ABC", index + 1);
}
return positions;
}
private static boolean isInteger(String str) {
boolean isValidInteger = false;
try {
Integer.parseInteger(str);
isValidInteger = true;
} catch (NumberFormatException ex) {
return isValidInteger;
}
return isValidInteger;
}
private String extractString() {
List<Integer> positions = getPositionsOfABC();
for (Integer position : positions) {
int index = position.intValue();
String substring = text.substring(index, index + LENGTH_OF_DIGITS);
String lastDigits = substring.substring(3, substring.length());
if (isInteger(lastDigits)) {
return substring;
}
}
return "";
}

Here's a simple code that checks whether a substring exists in a string without using library functions, regex or other complex data structures.
class SSC {
public static void main(String[] args) {
String main_str <-- MAIN STRING
String sub_str <-- SUBSTRING
String w; int flag=0;
for(int i=0;i<=main_str.length()-sub_str.length();i++){
w="";
for(int j=0;j<sub_str.length();j++){
w+=main_str.charAt(i+j);
}
if(w.equals(sub_str))
flag++;
}
if(flag>0)
System.out.print("exists "+flag+" times");
else
System.out.print("doesn't exist");
}
}
Hope this helps.

I think what you want to use is java.util.regex.Pattern.
Pattern p = Pattern.compile("ABC(\d*)");
Matcher m = p.matcher("ABC72961");
boolean b = m.matches();
or if it shall be exactly 5 digits after "ABC", you can use the regex ABC(\d{5})
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#compile(java.lang.String)
Another solution would be:
String stringToTest = "ABC72961"
boolean b = stringToTest.contains("ABC");
http://www.tutorialspoint.com/java/lang/string_contains.htm

You can use the String indexOf command like this:
int result = someString.indexOf("ABC72961")
result will be -1 if there are no matches.
If there is a match, the result will be the index where the match starts.

Inserting phrases before/after certain letters in strings

Hey I'm having another problem with my coding assignment tonight. I'm supposed to write a method to add "bool" in front of every "a" in the passed string s. But my code only adds it to one specific "a". How would a go about fixing this with a while loop. Thanks!
Lets say s=banana
public static String insertBool(String s){
int pos=s.indexOf("a");
if(pos>-1){
String firstS=(s.substring(0,pos));
String secondS=(s.substring(pos, s.length()));
return(firstS+"bool"+secondS);
}
else
return s;
}

You could just replace all the a's in the string with "boola".
public static String insertBool(String s) {
return s.replaceAll("a", "boola");
}

You could use String.replace()
public static String insertBool(String s) {
if (s == null) {
return null;
}
return s.replace("a", "boola");
}
Or you could use a more complicated while and something like,
public static String insertBool(String s) {
if (s == null) {
return null;
}
StringBuilder sb = new StringBuilder();
int i = 0;
char[] arr = s.toCharArray();
while (i < arr.length) {
if (arr[i] == 'a') {
sb.append("bool");
}
sb.append(arr[i]);
i++;
}
return sb.toString();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extract substring in Java - java

I got a string like this: string = "item=somevalue&user=user1"; And I need to find a way to extract, in Java, the substring "somevalue" (i.e. the substring after item= and before &).

Try following code: String test= "item=somevalue&user=user1"; String tok[]=test.split("&"); String finalTok[]=tok[0].split("="); System.out.println(finalTok[1]); Output : somevalue

public static void main(String[] args) { String str = "item=somevalue&user=user1"; String result = "item=somevalue&user=user1".substring(str.indexOf("=") + 1, str.indexOf("&")); System.out.println(result); } Output: somevalue

One line solution! System.out.println(string.substring(string.indexOf("=")+1, string.indexOf("&"))); Or if in case 'somevalue' place is changed add following code! string = "user=user1&item=somevalue"; System.out.println(string.substring(string.lastIndexOf("=")+1));

Related

Method that corrects the text, inserting spaces after the comma in Java

java regex negate with boundaries (square bracktes)

regex email validation

does text contains substring

Inserting phrases before/after certain letters in strings

Categories

Resources