I am trying to extract all heading digits from a string using Java regex without writing additional code and I could not find something to work:
"12345XYZ6789ABC" should give me "12345".
"X12345XYZ6789ABC" should give me nothing
public final class NumberExtractor {
private static final Pattern DIGITS = Pattern.compile("what should be my regex here?");
public static Optional<Long> headNumber(String token) {
var matcher = DIGITS.matcher(token);
return matcher.find() ? Optional.of(Long.valueOf(matcher.group())) : Optional.empty();
}
}
Use a word boundary \b:
\b\d+
See live demo.
If you strictly want to match only digits at the start of the input, and not from each word (same thing when the input contains only one word), use ^:
^\d+
Pattern DIGITS = Pattern.compile("\\b\\d+"); // leading digits of all words
Pattern DIGITS = Pattern.compile("^\\d+"); // leading digits of input
I'd think something like "^[0-9]*" would work. There's a \d that matches other Unicode digits if you want to include them as well.
Edit: removed errant . from the string.
I've got a string in my Java project which looks something like this
9201,92710,94500,920,1002
How can I enter a dot 2 places before the comma? So it looks like
this:
920.1,9271.0,9450.0,92.0,100.2
I had an attempt at it but I can't get the last number to get a dot.
numbers = numbers.replaceAll("([0-9],)", "\\.$1");
The result I got is
920.1,9271.0,9450.0,92.0,1002
Note: The length of the string is not always the same. It can be longer / shorter.
Check if string ends with ",". If not, append a "," to the string, run the same replaceAll, remove "," from end of String.
Split string by the "," delimiter, process each piece adding the "." where needed.
Just add a "." at numbers.length-1 to solve the issue with the last number
As your problem is not only inserting the dot before every comma, but also before end of string, you just must add this additional condition to your capturing group:
numbers = numbers.replaceAll("([0-9](,|$))", "\\.$1");
As suggested by Siguza, you could as well use a non-capturing group which is even more what a "human" would expect to be captured in the capturing group:
numbers = numbers.replaceAll("([0-9](?:,|$))", "\\.$1");
But as a non-capturing group is (although a really nice feature) not standard Regex and the overhead is not that significant here, I would recommend using the first option.
You could use word boundary:
numbers = numbers.replaceAll("(\\d)\b", ".$1");
Your solution is fine, as long as you put a comma at the end like dan said.
So instead of:
numbers = numbers.replaceAll("([0-9],)", "\\.$1");
write:
numbers = (numbers+",").replaceAll("([0-9],)", "\\.$1");
numbers = numbers.substring(0,numbers.size()-1);
You may use a positive lookahead to check for the , or end of string right after a digit and a zeroth backreference to the whole match:
String s = "9201,92710,94500,920,1002";
System.out.println(s.replaceAll("\\d(?=,|$)", ".$0"));
// => 920.1,9271.0,9450.0,92.0,100.2
See the Java demo and a regex demo.
Details:
\\d - exactly 1 digit...
(?=,|$) - that must be before a , or end of string ($).
A capturing variation (Java demo):
String s = "9201,92710,94500,920,1002";
System.out.println(s.replaceAll("(\\d)(,|$)", ".$1$2"));
You where right to go for the replaceAll method. But your regex was not matching the end of the string, the last set of numbers.
Here is my take on your problem:
public static void main(String[] args) {
String numbers = "9201,92710,94500,920,1002";
System.out.println(numbers.replaceAll("(\\d,|\\d$)", ".$1"));
}
the regex (\\d,|\\d$) matches a digit followed by a comma \d,, OR | a digit followed by the end of the string \d$.
I have tested it and found to work.
As others have suggested you could add a comma at the end, run the replace all and then remove it. But it seems as extra effort.
Example:
public static void main(String[] args) {
String numbers = "9201,92710,94500,920,1002";
//add on the comma
numbers += ",";
numbers = numbers.replaceAll("(\\d,)", "\\.$1");
//remove the comma
numbers = numbers.substring(0, numbers.length()-1);
System.out.println(numbers);
}
I have a quite simple question here is that i have a string 0-1000
say str = "0-1000"
I successfully extract 0 and 1000 by using str.split("-")
Now, I am assigned to check the number because i am noticed that those two numbers can be a negative.
If I continue str.split("-"), then I will skip the negative sign as well.
Could anyone suggest methods for me?
Since String.split() uses regular expressions to split, you could do something like this:
String[] nos = "-1000--1".split("(?<=\\d)-";
This means you split at minus characters that follow a digit, i.e. must be an operator.
Note that the positive look-behind (?<=\d) needs to be used since you only want to match the minus character. String.split() removes all matching separators and thus something like \d- would remove digits as well.
To parse the numbers you'd then iterate over the array elements and call Integer.valueOf(element) or Integer.parseInt(element).
Note that this assumes the input string to be valid. Depending on what you want to achieve, you might first have to check the input for a match, e.g. by using -?\d--?\d to check whether the string is in format x-y where x and y can be positive or negative integers.
You can use regex like this :Works for all cases
public static void main(String[] args) {
String s = "-500--578";
String[] arr = s.split("(?<=\\d)-"); // split on "-" only if it is preceeded by a digit
for (String str : arr)
System.out.println(str);
}
O/P:
-500
-578
Why is non-greedy match not working for me? Take following example:
public String nonGreedy(){
String str2 = "abc|s:0:\"gef\";s:2:\"ced\"";
return str2.split(":.*?ced")[0];
}
In my eyes the result should be: abc|s:0:\"gef\";s:2 but it is: abc|s
The .*? in your regex matches any character except \n (0 or more times, matching the least amount possible).
You can try the regular expression:
:[^:]*?ced
On another note, you should use a constant Pattern to avoid recompiling the expression every time, something like:
private static final Pattern REGEX_PATTERN =
Pattern.compile(":[^:]*?ced");
public static void main(String[] args) {
String input = "abc|s:0:\"gef\";s:2:\"ced\"";
System.out.println(java.util.Arrays.toString(
REGEX_PATTERN.split(input)
)); // prints "[abc|s:0:"gef";s:2, "]"
}
It is behaving as expected. The non-greedy match will match as little as it has to, and with your input, the minimum characters to match is the first colon to the next ced.
You could try limiting the number of characters consumed. For example to limit the term to "up to 3 characters:
:.{0,3}ced
To make it split as close to ced as possible, use a negative look-ahead, with this regex:
:(?!.*:.*ced).*ced
This makes sure there isn't a closer colon to ced.
I am trying to use a stringtokenizer on a list of words as below
String sentence=""Name":"jon" "location":"3333 abc street" "country":"usa"" etc
When i use stringtokenizer and give space as the delimiter as below
StringTokenizer tokens=new StringTokenizer(sentence," ")
I was expecting my output as different tokens as below
Name:jon
location:3333 abc street
country:usa
But the string tokenizer tries to tokenize on the value of location also and it appears like
Name:jon
location:3333
abc
street
country:usa
Please let me know how i can fix the above and if i need to do a regex what kind of the expression should i specify?
This can be easily handled using a CSV Reader.
String str = "\"Name\":\"jon\" \"location\":\"3333 abc street\" \"country\":\"usa\"";
// prepare String for CSV parsing
CsvReader reader = CsvReader.parse(str.replaceAll("\" *: *\"", ":"));
reader.setDelimiter(' '); // use space a delimiter
reader.readRecord(); // read CSV record
for (int i=0; i<reader.getColumnCount(); i++) // loop thru columns
System.out.printf("Scol[%d]: [%s]%n", i, reader.get(i));
Update: And here is pure Java SDK solution:
Pattern p = Pattern.compile("(.+?)(\\s+(?=(?:(?:[^\"]*\"){2})*[^\"]*$)|$)");
Matcher m = p.matcher(str);
for (int i=0; m.find(); i++)
System.out.printf("Scol[%d]: [%s]%n", i, m.group(1).replace("\"", ""));
OUTPUT:
Scol[0]: [Name:jon]
Scol[1]: [location:3333 abc street]
Scol[2]: [country:usa]
Live Demo: http://ideone.com/WO0NK6
Explanation: As per OP's comments:
I am using this regex:
(.+?)(\\s+(?=(?:(?:[^\"]*\"){2})*[^\"]*$)|$)
Breaking it down now into smaller chunks.
PS: DQ represents Double quote
(?:[^\"]*\") 0 or more non-DQ characters followed by one DQ (RE1)
(?:[^\"]*\"){2} Exactly a pair of above RE1
(?:(?:[^\"]*\"){2})* 0 or more occurrences of pair of RE1
(?:(?:[^\"]*\"){2})*[^\"]*$ 0 or more occurrences of pair of RE1 followed by 0 or more non-DQ characters followed by end of string (RE2)
(?=(?:(?:[^\"]*\"){2})*[^\"]*$) Positive lookahead of above RE2
.+? Match 1 or more characters (? is for non-greedy matching)
\\s+ Should be followed by one or more spaces
(\\s+(?=RE2)|$) Should be followed by space or end of string
In short: It means match 1 or more length any characters followed by "a space OR end of string". Space must be followed by EVEN number of DQs. Hence space outside double quotes will be matched and inside double quotes will not be matched (since those are followed by odd number of DQs).
StringTokenizer is too simple-minded for this job. If you don't need to deal with quote marks inside the values, you can try this regex:
String s = "\"Name\":\"jon\" \"location\":\"3333 abc street\" \"country\":\"usa\"";
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
Output:
Name
jon
location
3333 abc street
country
usa
This won't handle internal quote marks within values—where the output should be, e.g.,
Name:Fred ("Freddy") Jones
You can use Json, Its looks like You are using Json kind of schema.
Do a bit google and try to implement Json.
String sentence=""Name":"jon" "location":"3333 abc street" "country":"usa"" etc
Will be key, value pair in Json like name is key and Jon is value. location is key and 3333 abc street is value. and so on....
Give it a try.
Here is one link
http://www.mkyong.com/java/json-simple-example-read-and-write-json/
Edit:
Its just a bit silly answer, But You can try something like this,
sentence = sentence.replaceAll("\" ", "");
StringTokenizer tokens=new StringTokenizer(sentence,"");