Fast way to extract data from string

Fast way to extract data from string - java

I have a response from my OkHttpClient like:
{"CUSTOMER_ID":"928941293291"}
{"CUSTOMER_ID":"291389218398"}
{"CUSTOMER_ID":"1C4DC4FC-02Q9-4130-S12B-762D97FS43C"}
{"CUSTOMER_ID":"219382198"}
{"CUSTOMER_ID":"282828"}
{"CUSTOMER_ID":"21268239813"}
{"CUSTOMER_ID":"1114445184"}
{"CUSTOMER_ID":"2222222222"}
{"CUSTOMER_ID":"99218492183921"}
I want to extract all customerId that are of type Long (then skip 1C4DC4FC-02Q9-4130-S12B-762D97FS43C) between a minId and maxId.
This is my implementation:
final List<String> customerIds = Arrays.asList(response.body().string()
.replace("CUSTOMER_ID", "")
.replace("\"", "")
.replace("{", "").replace(":", "")
.replace("}", ",").split("\\s*,\\s*"));
for (final String id : customerIds) {
try {
final Long idParsed = Long.valueOf(id);
if (idParsed > minId && idParsed < maxId) {
ids.add(idParsed);
}
} catch (final NumberFormatException e) {
logger.debug("NumberFormatException", e);
}
}
I have a long list of customerId (around 1M) then performance are really important. This is best implementation of my behaviour?

I would use a BufferedReader to read the string line by line
https://www.mkyong.com/java/how-to-read-file-from-java-bufferedreader-example/
The for each line I would reduce amount of replaces
String id= line.replace({"CUSTOMER_ID":", "");
id = id.substring(0, id.length-2); //to avoid one more replace
and then apply the attempt to parse long logic, adding successfull attempts to a list.

Since you have a big file, then reading the content line by line can be a way to go, and dont replace the CUSTOMER_ID, but instead define a better regex pattern.
Following your approach: replace USER_ID and use regex:
String x = "{\"CUSTOMER_ID\":\"928941293291\"}{\"CUSTOMER_ID\":\"291389218398\"}{\"CUSTOMER_ID\":\"1C4DC4FC-02Q9-4130-S12B-762D97FS43C\"}"
+ "{\"CUSTOMER_ID\":\"99218492183921\"}";
x = x.replaceAll("\"CUSTOMER_ID\"", "");
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(x);
while (m.find()) {
System.out.println(m.group(1));
}
or implement a regex that matches all between :" and "}
String x = "{\"CUSTOMER_ID\":\"928941293291\"}{\"CUSTOMER_ID\":\"291389218398\"}{\"CUSTOMER_ID\":\"1C4DC4FC-02Q9-4130-S12B-762D97FS43C\"}"
+ "{\"CUSTOMER_ID\":\"99218492183921\"}";
Pattern p = Pattern.compile(":\"([^\"]*)\"}");
Matcher m = p.matcher(x);
while (m.find()) {
System.out.println(m.group(1));
}
so no need to replace CUSTOMER_ID

Try to avoid exceptions! When 10%-20% of your number parsing fails then it needs 10x more time to execute and it (you can write a litte test for it).
If your input is exactly like you showed it you should use cheap operations:
Read the file with a BufferedReader line by line (like mentioned before) or (if you have the whole data as string) us a StringTokenizer to handle each line separat.
Every line starts with {"CUSTOMER_ID":" and ends with "}. Don't use replace or regex (which is even worse) to remove this! Just use one simple substring:
String input = line.substring(16, line.length() - 2)
To avoid exceptions you need to find metrics to distinguish between id and a UUID(?) so your parsing works without exception. For example your ids will be positiv but your UUID contains minus signs, or a long can only contain 20 digits but your UUID contains 35 characters. So it's a simple if-else instead of try-catch.
For those who think its bad to not catch NumberFormatException when parsing numbers: in case there is an id which can not be parsed the whole file is corrupt which means you shouldn't try to continue but fail hard.
This is a little test to see the performance difference between catching exceptions and testing the input:
long REPEATS = 1_000_000, startTime;
final String[] inputs = new String[]{"0", "1", "42", "84", "168", "336", "672", "a-b", "1-2"};
for (int r = 0; r < 1000; r++) {
startTime = System.currentTimeMillis();
for (int i = 0; i < REPEATS; i++) {
try {
Integer.parseInt(inputs[i % inputs.length]);
} catch (NumberFormatException e) { /* ignore */ }
}
System.out.println("Try: " + (System.currentTimeMillis() - startTime) + " ms");
startTime = System.currentTimeMillis();
for (int i = 0; i < REPEATS; i++) {
final String input = inputs[i % inputs.length];
if (input.indexOf('-') == -1)
Integer.parseInt(inputs[i % inputs.length]);
}
System.out.println("If: " + (System.currentTimeMillis() - startTime) + " ms");
}
My results are:
~20ms (testing) and ~200ms (catching) with 20% invalid input.
~22ms (testing) and ~130ms (catching) with 10% invalid input.
Those kinds of performance tests are easy to do right because of JIT or other optimizations. But I think you can see a direction.

You can use Files.lines() to stream the data from your file. Here I demonstrate using a stream from a List.
List<String> sample = Arrays.asList(
"{\"CUSTOMER_ID\":\"928941293291\"}",
"{\"CUSTOMER_ID\":\"291389218398\"}",
"{\"CUSTOMER_ID\":\"1C4DC4FC-02Q9-4130-S12B-762D97FS43C\"}",
"{\"CUSTOMER_ID\":\"219382198\"}",
"{\"CUSTOMER_ID\":\"282828\"}",
"{\"CUSTOMER_ID\":\"21268239813\"}",
"{\"CUSTOMER_ID\":\"1114445184\"}",
"{\"CUSTOMER_ID\":\"2222222222\"}",
"{\"CUSTOMER_ID\":\"99218492183921\"}"
);
static final long MIN_ID = 1000000L;
static final long MAX_ID = 1000000000000000000L;
public void test() {
sample.stream()
// Extract CustomerID
.map(s -> s.substring("{\"CUSTOMER_ID\":\"".length(), s.length() - 2))
// Remove any bad ones - such as UUID.
.filter(s -> s.matches("[0-9]+"))
// Convert to long - assumes no number too big, add a further filter for that.
.map(s -> Long.valueOf(s))
// Apply limits.
.filter(l -> MIN_ID <= l && l <= MAX_ID)
// For now - just print them.
.forEach(s -> System.out.println(s));
}

First you should be trying to read the file line by line. Then from each line you should extract the id if it's matching to the pattern and collect it into an array. Here's similar solution implemented in python.
import re
# Open the file
with open('cids.json') as f:
# Read line by line
for line in f:
try:
# Try to extract matching id with regex pattern
_id = re.search('^{[\w\W]+:"([A-Z\d]+-[A-Z\d]+-[A-Z\d]+-[A-Z\d]+-[A-Z\d]+)"}', line).group(1)
customer_ids.append(_id)
except:
print('No match')

You can ignore all non numeric fields
long[] ids =
Stream.of(response.body().string().split("\""))
.mapToLong(s -> parseLong(s))
.filter(l -> l > minId && i < maxId)
.toArray();
static long parseLong(String s) {
try {
if (!s.isEmpty() && Character.isDigit(s.charAt(0)))
return Long.parseLong(s);
} catch (NumberFormatException expected) {
}
return Long.MIN_VALUE
}
Or if you are using Java 7
List<Long> ids = new ArrayList<>();
for (String s : response.body().string().split("\"")) {
long id = parseLong(s);
if (id > minId && id < maxId)
ids.add(id);
}

Related

Can not count how many number of unique date are available in every part of string

I divided my string in three part using newline ('\n'). The output that i want to achieve: count how many number of unique date are available in every part of string.
According to below code, first part contains two unique date, second part contains two and third part contains three unique date. So the output should be like this: 2,2,3,
But after run this below code i get this Output: 5,5,5,5,1,3,1,
How do i get Output: 2,2,3,
Thanks in advance.
String strH;
String strT = null;
StringBuilder sbE = new StringBuilder();
String strA = "2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-11,2021-03-11,2021-03-11,2021-03-11,2021-03-11," + '\n' +
"2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-15,2021-03-15,2021-03-15,2021-03-15,2021-03-15," + '\n' +
"2021-03-02,2021-03-09,2021-03-07,2021-03-09,2021-03-09,";
String[] strG = strA.split("\n");
for(int h=0; h<strG.length; h++){
strH = strG[h];
String[] words=strH.split(",");
int wrc=1;
for(int i=0;i<words.length;i++) {
for(int j=i+1;j<words.length;j++) {
if(words[i].equals(words[j])) {
wrc=wrc+1;
words[j]="0";
}
}
if(words[i]!="0"){
sbE.append(wrc).append(",");
strT = String.valueOf(sbE);
}
wrc=1;
}
}
Log.d("TAG", "Output: "+strT);

I would use a set here to count the duplicates:
String strA = "2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-11,2021-03-11,2021-03-11,2021-03-11,2021-03-11" + "\n" +
"2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-15,2021-03-15,2021-03-15,2021-03-15,2021-03-15" + "\n" +
"2021-03-02,2021-03-09,2021-03-07,2021-03-09,2021-03-09";
String[] lines = strA.split("\n");
List<Integer> counts = new ArrayList<>();
for (String line : lines) {
counts.add(new HashSet<String>(Arrays.asList(line.split(","))).size());
}
System.out.println(counts); // [2, 2, 3]
Note that I have done a minor cleanup of the strA input by removing the trailing comma from each line.

With Java 8 Streams, this can be done in a single statement:
String strA = "2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-02,2021-03-11,2021-03-11,2021-03-11,2021-03-11,2021-03-11," + '\n' +
"2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-07,2021-03-15,2021-03-15,2021-03-15,2021-03-15,2021-03-15," + '\n' +
"2021-03-02,2021-03-09,2021-03-07,2021-03-09,2021-03-09,";
String strT = Pattern.compile("\n").splitAsStream(strA)
.map(strG -> String.valueOf(Pattern.compile(",").splitAsStream(strG).distinct().count()))
.collect(Collectors.joining(","));
System.out.println(strT); // 2,2,3
Note that Pattern.compile("\n").splitAsStream(strA) can also be written as Arrays.stream(strA.split("\n")), which is shorter to write, but creates an unnecessary intermediate array. Matter of personal preference which is better.
String strT = Arrays.stream(strA.split("\n"))
.map(strG -> String.valueOf(Arrays.stream(strG.split(",")).distinct().count()))
.collect(Collectors.joining(","));
The first version can be further micro-optimized by only compiling the regex once:
Pattern patternComma = Pattern.compile(",");
String strT = Pattern.compile("\n").splitAsStream(strA)
.map(strG -> String.valueOf(patternComma.splitAsStream(strG).distinct().count()))
.collect(Collectors.joining(","));

Java replace word in curly braces by name

I have a string like:
String message = "This is a message for {ID_PW}. Your result is {exam_result}. Please quote {ID_PW} if replying";
I am importing data from CSV that I would like to use to replace the items between curly braces.
// Using OpenCSV to read in CSV...code omitted for brevity
values = (Map<String, String>) reader.readMap();
// values has 'ID_PW', 'exam_result', etc keys
How can I replace the items in curly braces in the message with the equivalent value of the key in values?

Probably you are looking for:
String s = "I bought {0,number,integer} mangos. From {1}, the fruit seller. Out of them {2,number,percent} were bad.";
MessageFormat formatter = new MessageFormat(s);
Object[] argz = {22, "John", 0.3};
System.out.println(formatter.format(argz));
This outputs:
I bought 22 mangos. From John, the fruit seller. Out of them 30% were bad.
Refer https://docs.oracle.com/javase/8/docs/api/java/text/MessageFormat.html for more details.

String message = "This is a message for {ID_PW}. Your result is {exam_result}. Please quote {ID_PW} if replying";
LinkedHashSet<String> fields = new LinkedHashSet<>(); // 'Automatically' handle duplicates
Pattern p = Pattern.compile("\\{([^}]*)\\}");
Matcher m = p.matcher(message);
// Find 'fields' in the message that are wrapped in curly braces and add to hash set
while (m.find()) {
fields.add((m.group(1)));
}
// Go through CSV and parse the message with the associated fields
while (((values = (Map<String, String>) reader.readMap())) != null)
{
Iterator itr = fields.iterator();
String newMsg = message;
while (itr.hasNext()) {
String field = (String) itr.next();
String value = values.get(field);
if(value != null) {
newMsg = newMsg.replaceAll("\\{" + field + "\\}", value);
}
}
}

Use StringBuilder. StringBuilder is explicitly designed to be a mutable type of String. Next, don't use regular expressions in a loop. Regular expressions can be powerful, but since you will be using a loop to search for multiple patterns there is nothing regular involved (multiple patterns means multiple expressions).
I would just search left to right for { and then } extract the key and search for it in the values map. Something like,
Map<String, String> values = new HashMap<>();
values.put("ID_PW", "SimpleOne");
values.put("exam_result", "84");
String message = "This is a message for {ID_PW}. Your result "
+ "is {exam_result}. Please quote {ID_PW} if replying";
StringBuilder sb = new StringBuilder(message);
int p = -1;
while ((p = sb.indexOf("{", p + 1)) > -1) {
int e = sb.indexOf("}", p + 1);
if (e > -1) {
String key = sb.substring(p + 1, e);
if (values.containsKey(key)) {
sb.replace(p, p + key.length() + 2, values.get(key));
}
}
}
System.out.println(sb);
Outputs
This is a message for SimpleOne. Your result is 84. Please quote SimpleOne if replying

How to pattern match and transform string to generate certain output?

The below code is for getting some form of input which includes lots of whitespace in between important strings and before and after the important strings, so far I have been able to filter the whitespace out. After preparing the string what I want to do is process it.
Here is an example of the inputs that I may get and the favorable output I want;
Input
+--------------+
EDIT example.mv Starter web-onyx-01.example.net.mv
Notice how whitespace id before and after the domain, this whitespace could be concluded as random amount.
Output
+--------------+
example.mv. in ns web-onyx-01.example.net.mv.
In the output the important bit is the whitespace between the domain (Example.) and the keyword (in) and keyword (ns) and host (web-onyx-01.example.net.mv.)
Also notice the period (".") after the domain and host. Another part is the fact that if its a (.mv) ccTLD we will have to remove that bit from the string,
What I would like to achieve is this transformation with multiple lines of text, meaning I want to process a bunch of unordered chaotic list of strings and batch process them to produce the clean looking outputs.
The code is by no-means any good design, but this is at least what I have come up with. NOTE: I am a beginner who is still learning about programming. I would like your suggestions to improve the code as well as to solve the problem at hand i.e transform the input to the desired output.
P.S The output is for zone files in DNS, so errors can be very problematic.
So far my code is accepting text from a textarea and outputs the text into another textarea which shows the output.
My code works for as long as the array length is 2 and 3 but fails at anything larger. So how do I go about being able to process the input to the output dynamically for as big as the list/array may become in the future?
String s = jTextArea1.getText();
Pattern p = Pattern.compile("ADD|EDIT|DELETE|Domain|Starter|Silver|Gold|ADSL Business|Pro|Lite|Standard|ADSL Multi|Pro Plus", Pattern.MULTILINE);
Matcher m = p.matcher(s);
s = m.replaceAll("");
String ms = s.replaceAll("(?m)(^\\s+|[\\t\\f ](?=[\\t\\f ])|[\\t\\f ]$|\\s+\\z)", "");
String[] last = ms.split(" ");
for (String test : last){
System.out.println(test);
}
System.out.println("The length of array is: " +last.length);
if (str.isContain(last[0], ".mv")) {
if (last.length == 2) {
for(int i = 0; i < last.length; i++) {
last[0] = last[0].replaceFirst(".mv", "");
System.out.println(last[0]);
last[i] += ".";
if (last[i] == null ? last[0] == null : last[i].equals(last[0])) {
last[i]+= " in ns ";
}
String str1 = String.join("", last);
jTextArea2.setText(str1);
System.out.println(str1);
}
}
else if (last.length == 3) {
for(int i = 0; i < last.length; i++) {
last[0] = last[0].replaceFirst(".mv", "");
System.out.println(last[0]);
last[i] += ".";
if (last[i] == null ? last[0] == null : last[i].equals(last[0])) {
last[i]+= " in ns ";
}
if (last[i] == null ? last[1] == null : last[i].equals(last[1])){
last[i] += "\n";
}
if (last[i] == null ? last[2] == null : last[i].equals(last[2])){
last[i] = last[0] + last[2];
}
String str1 = String.join("", last);
jTextArea2.setText(str1);
System.out.println(str1);
}
}
}

As I understand your question you have multiple lines of input in the following form:
whitespace[command]whitespace[domain]whitespace[label]whitespace[target-domain]whitespace
You want to convert that to the following form such that multiple lines are aligned nicely:
[domain]. in ns [target-domain].
To do that I'd suggest the following:
Split your input into multiple lines
Use a regular expression to check the line format (e.g. for a valid command etc.) and extract the domains
store the maximum length of both domains separately
build a string format using the maximum lengths
iterate over the extraced domains and build a string for that line using the format defined in step 4
Example:
String input = " EDIT domain1.mv Starter example.domain1.net.mv \n" +
" DELETE long-domain1.mv Silver long-example.long-domain1.net.mv \n" +
" ADD short-domain1.mv ADSL Business ex.sdomain1.net.mv \n";
//step 1: split the input into lines
String[] lines = input.split( "\n" );
//step 2: build a regular expression to check the line format and extract the domains - which are the (\S+) parts
Pattern pattern = Pattern.compile( "^\\s*(?:ADD|EDIT|DELETE)\\s+(\\S+)\\s+(?:Domain|Starter|Silver|Gold|ADSL Business|Pro|Lite|Standard|ADSL Multi|Pro Plus)\\s+(\\S+)\\s*$" );
List<String[]> lineList = new LinkedList<>();
int maxLengthDomain = 0;
int maxLengthTargetDomain = 0;
for( String line : lines )
{
//step 2: check the line
Matcher matcher = pattern.matcher( line );
if( matcher.matches() ) {
//step 2: extract the domains
String domain = matcher.group( 1 );
String targetDomain = matcher.group( 2 );
//step 3: get the maximum length of the domains
maxLengthDomain = Math.max( maxLengthDomain, domain.length() );
maxLengthTargetDomain = Math.max( maxLengthTargetDomain, targetDomain.length() );
lineList.add( new String[] { domain, targetDomain } );
}
}
//step 4: build the format string with variable lengths
String formatString = String.format( "%%-%ds in ns %%-%ds", maxLengthDomain + 5, maxLengthTargetDomain + 2 );
//step 5: build the output
for( String[] line : lineList ) {
System.out.println( String.format( formatString, line[0] + ".", line[1] + "." ) );
}
Result:
domain1.mv. in ns example.domain1.net.mv.
long-domain1.mv. in ns long-example.long-domain1.net.mv.
short-domain1.mv. in ns ex.sdomain1.net.mv.

Regular Expression WildCard matching split with java split method

I know there's similar questions like this asked before, but i want to do a custom operation and i don't know how to go about it.
I want to split a string of data with a regular expression like, but this time like i know the starting character and the ending character like:
String myString="Google is a great search engine<as:...s>";
The <as: and s> is the beginning and closing characters
the ... is dynamic which i cant predict its value
I want to be able to split the string from the beginning <as: to the end s>
with the dynamic string in it.
Like:
myString.split("<as:/*s>");
Something like that. I also want to get all the occurrence of the <as:..s> in the string.
i know this can be done with regex, but I've never done it before. I need a simple and neat way to do this.
Thanks in advance

Rather than using a .split(), I would just extract using Pattern and Matcher. This approach finds everything between <as: and s> and extracts it to a capture group. Group 1 then has the text you would like.
public static void main(String[] args)
{
final String myString="Google is a great search engine<as:Some stuff heres>";
Pattern pat = Pattern.compile("^[^<]+<as:(.*)s>$");
Matcher m = pat.matcher(myString);
if (m.matches()) {
System.out.println(m.group(1));
}
}
Output:
Some stuff here
If you need the text at the beginning, you can put it in a capture group as well.
Edit: If there are more than one <as...s> in the input, then the following will gather all of them.
Edit 2: increased the logic. Added checks for emptiness.
public static List<String> multiEntry(final String myString)
{
String[] parts = myString.split("<as:");
List<String> col = new ArrayList<>();
if (! parts[0].trim().isEmpty()) {
col.add(parts[0]);
}
Pattern pat = Pattern.compile("^(.*?)s>(.*)?");
for (int i = 1; i < parts.length; ++i) {
Matcher m = pat.matcher(parts[i]);
if (m.matches()) {
for (int j = 1; j <= m.groupCount(); ++j) {
String s = m.group(j).trim();
if (! s.isEmpty()) {
col.add(s);
}
}
}
}
return col;
}
Output:
[Google is a great search engine, Some stuff heress, Here is Facebook, More Stuff, Something else at the end]
Edit 3: This approach uses find and looping to do the parsing. It uses optional capture groups as well.
public static void looping()
{
final String myString="Google is a great search engine"
+ "<as:Some stuff heresss>Here is Facebook<as:More Stuffs>"
+ "Something else at the end" +
"<as:Stuffs>" +
"<as:Yet More Stuffs>";
Pattern pat = Pattern.compile("([^<]+)?(<as:(.*?)s>)?");
Matcher m = pat.matcher(myString);
List<String> col = new ArrayList<>();
while (m.find()) {
String prefix = m.group(1);
String contents = m.group(3);
if (prefix != null) { col.add(prefix); }
if (contents != null) { col.add(contents); }
}
System.out.println(col);
}
Output:
[Google is a great search engine, Some stuff heress, Here is Facebook, More Stuff, Something else at the end, Stuff, Yet More Stuff]
Additional Edit: wrote some quick test cases (with super hacked helper class) to help validate. These all pass (updated) multiEntry:
public static void main(String[] args)
{
Input[] inputs = {
new Input("Google is a great search engine<as:Some stuff heres>", 2),
new Input("Google is a great search engine"
+ "<as:Some stuff heresss>Here is Facebook<as:More Stuffs>"
+ "Something else at the end" +
"<as:Stuffs>" +
"<as:Yet More Stuffs>" +
"ending", 8),
new Input("Google is a great search engine"
+ "<as:Some stuff heresss>Here is Facebook<as:More Stuffs>"
+ "Something else at the end" +
"<as:Stuffs>" +
"<as:Yet More Stuffs>", 7),
new Input("No as here", 1),
new Input("Here is angle < input", 1),
new Input("Angle < plus <as:Stuff in as:s><as:Other stuff in as:s>", 3),
new Input("Angle < plus <as:Stuff in as:s><as:Other stuff in as:s>blah", 4),
new Input("<as:To start with anglass>Some ending", 2),
};
List<String> res;
for (Input inp : inputs) {
res = multiEntry(inp.inp);
if (res.size() != inp.cnt) {
System.err.println("FAIL: " + res.size()
+ " did not match exp of " + inp.cnt
+ " on " + inp.inp);
System.err.println(res);
continue;
}
System.out.println(res);
}
}

How to convert formatted strings to float?

I have a list of strings and I'd like to convert them to float if a pattern is matched.
Here are some values and the expected result:
1000 -> 1000.0
1.000 -> 1000.0
1.000,000 -> 1000.0
-1.000,000 -> -1000.0
9,132 -> 9.132
1,000.00 -> invalid
30.10.2010 -> invalid
1,000.000,00 -> invalid
I tried this code for checking if a number is valid, but the pattern is never matched:
Pattern pattern = Pattern.compile("#.###,###");
for(String s : list){
Matcher m = pattern.matcher(s);
if(m.matches()){
//convert
}
}
Beside that I've tried to use this code:
DecimalFormat df = (DecimalFormat) NumberFormat.getCurrencyInstance();
for(String s : list){
try {
Number num = df.parse(s);
//..
} catch (ParseException e) {
}
}
The problem with this code is, that no pattern-based validation is performed. E.g. a date like 2012/05/30 is converted to 2012.
So how can I either define a valid pattern or configure DecimalFormat for my needs?

The Pattern class works with regular expressions. You probably want this:
Pattern pattern = Pattern.compile("-?\d\.\d{1,3}(,\d{1,3})?");
You probably want to tune this regex depending on exactly what formats you want or don't want to match.

I think this is what you want. The comments should explain it.
#Test
public void testAllValues() {
testValue("1000", "1000");
testValue("1.000,000", "1000");
testValue("-1.000,000", "-1000");
testValue("9,132", "9.132");
testValue("1,000.00", null);
testValue("30.10.2010", null);
testValue("1,000.000,00", null);
}
private void testValue(String germanString, String usString) {
BigDecimal germanDecimal = (BigDecimal) parse(germanString);
if (usString != null) {
BigDecimal usDecimal = new BigDecimal(usString);
assertEquals("German " + germanString + " did not equal US " + usString, 0, germanDecimal.compareTo(usDecimal));
} else {
assertEquals("German " + germanString + " should not have been pareseable", null, germanDecimal);
}
}
public BigDecimal parse(String s) {
// Patch because parse doesn't enforce the number of digits between the
// grouping character (dot).
if (!Pattern.matches("[^.]*(\\.\\d{3})*[^.]*", s)) {
return null;
}
DecimalFormat df = (DecimalFormat) DecimalFormat.getInstance(Locale.GERMANY);
df.setParseBigDecimal(true);
// Have to use the ParsePosition API or else it will silently stop
// parsing even though some of the characters weren't part of the parsed
// number.
ParsePosition position = new ParsePosition(0);
BigDecimal parsed = (BigDecimal) df.parse(s, position);
// getErrorIndex() doesn't seem to accurately reflect errors, but
// getIndex() does reflect how far we successfully parsed.
if (position.getIndex() == s.length()) {
return parsed;
} else {
return null;
}
}

Try
System.out.println("1,000.000,00".matches("^[+-]?\\d+(\\.\\d{3})*(,\\d+)?"));
I am not sure if your number can start with + so added it just in case. Also don't know if 0100000.000.000,1234 should be valid. If not tell why and regex will be corrected.

If the pattern is the comma try:
String[] splitted = string.split(",")
If size of splitted > 2 --> invalid.
If splitted.size == 2 && splitted[1].split(".") > 0 --> invalid also.
If the format is fine --> remove all points, replace comma with point, parse string after comma into int and connect the pieces.
A very simple approach but it works...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Fast way to extract data from string - java

Related

Can not count how many number of unique date are available in every part of string

Java replace word in curly braces by name

How to pattern match and transform string to generate certain output?

Regular Expression WildCard matching split with java split method

How to convert formatted strings to float?

Categories

Resources