How to convert formatted strings to float?

How to convert formatted strings to float? - java

I have a list of strings and I'd like to convert them to float if a pattern is matched.
Here are some values and the expected result:
1000 -> 1000.0
1.000 -> 1000.0
1.000,000 -> 1000.0
-1.000,000 -> -1000.0
9,132 -> 9.132
1,000.00 -> invalid
30.10.2010 -> invalid
1,000.000,00 -> invalid
I tried this code for checking if a number is valid, but the pattern is never matched:
Pattern pattern = Pattern.compile("#.###,###");
for(String s : list){
Matcher m = pattern.matcher(s);
if(m.matches()){
//convert
}
}
Beside that I've tried to use this code:
DecimalFormat df = (DecimalFormat) NumberFormat.getCurrencyInstance();
for(String s : list){
try {
Number num = df.parse(s);
//..
} catch (ParseException e) {
}
}
The problem with this code is, that no pattern-based validation is performed. E.g. a date like 2012/05/30 is converted to 2012.
So how can I either define a valid pattern or configure DecimalFormat for my needs?

The Pattern class works with regular expressions. You probably want this:
Pattern pattern = Pattern.compile("-?\d\.\d{1,3}(,\d{1,3})?");
You probably want to tune this regex depending on exactly what formats you want or don't want to match.

I think this is what you want. The comments should explain it.
#Test
public void testAllValues() {
testValue("1000", "1000");
testValue("1.000,000", "1000");
testValue("-1.000,000", "-1000");
testValue("9,132", "9.132");
testValue("1,000.00", null);
testValue("30.10.2010", null);
testValue("1,000.000,00", null);
}
private void testValue(String germanString, String usString) {
BigDecimal germanDecimal = (BigDecimal) parse(germanString);
if (usString != null) {
BigDecimal usDecimal = new BigDecimal(usString);
assertEquals("German " + germanString + " did not equal US " + usString, 0, germanDecimal.compareTo(usDecimal));
} else {
assertEquals("German " + germanString + " should not have been pareseable", null, germanDecimal);
}
}
public BigDecimal parse(String s) {
// Patch because parse doesn't enforce the number of digits between the
// grouping character (dot).
if (!Pattern.matches("[^.]*(\\.\\d{3})*[^.]*", s)) {
return null;
}
DecimalFormat df = (DecimalFormat) DecimalFormat.getInstance(Locale.GERMANY);
df.setParseBigDecimal(true);
// Have to use the ParsePosition API or else it will silently stop
// parsing even though some of the characters weren't part of the parsed
// number.
ParsePosition position = new ParsePosition(0);
BigDecimal parsed = (BigDecimal) df.parse(s, position);
// getErrorIndex() doesn't seem to accurately reflect errors, but
// getIndex() does reflect how far we successfully parsed.
if (position.getIndex() == s.length()) {
return parsed;
} else {
return null;
}
}

Try
System.out.println("1,000.000,00".matches("^[+-]?\\d+(\\.\\d{3})*(,\\d+)?"));
I am not sure if your number can start with + so added it just in case. Also don't know if 0100000.000.000,1234 should be valid. If not tell why and regex will be corrected.

If the pattern is the comma try:
String[] splitted = string.split(",")
If size of splitted > 2 --> invalid.
If splitted.size == 2 && splitted[1].split(".") > 0 --> invalid also.
If the format is fine --> remove all points, replace comma with point, parse string after comma into int and connect the pieces.
A very simple approach but it works...

Related

Remove elements from Date Format String using a Regular Expression

I want to remove elements a supplied Date Format String - for example convert the format "dd/MM/yyyy" to "MM/yyyy" by removing any non-M/y element.
What I'm trying to do is create a localised month/year format based on the existing day/month/year format provided for the Locale.
I've done this using regular expressions, but the solution seems longer than I'd expect.
An example is below:
public static void main(final String[] args) {
System.out.println(filterDateFormat("dd/MM/yyyy HH:mm:ss", 'M', 'y'));
System.out.println(filterDateFormat("MM/yyyy/dd", 'M', 'y'));
System.out.println(filterDateFormat("yyyy-MMM-dd", 'M', 'y'));
}
/**
* Removes {#code charsToRetain} from {#code format}, including any redundant
* separators.
*/
private static String filterDateFormat(final String format, final char...charsToRetain) {
// Match e.g. "ddd-"
final Pattern pattern = Pattern.compile("[" + new String(charsToRetain) + "]+\\p{Punct}?");
final Matcher matcher = pattern.matcher(format);
final StringBuilder builder = new StringBuilder();
while (matcher.find()) {
// Append each match
builder.append(matcher.group());
}
// If the last match is "mmm-", remove the trailing punctuation symbol
return builder.toString().replaceFirst("\\p{Punct}$", "");
}

Let's try a solution for the following date format strings:
String[] formatStrings = { "dd/MM/yyyy HH:mm:ss",
"MM/yyyy/dd",
"yyyy-MMM-dd",
"MM/yy - yy/dd",
"yyabbadabbadooMM" };
The following will analyze strings for a match, then print the first group of the match.
Pattern p = Pattern.compile(REGEX);
for(String formatStr : formatStrings) {
Matcher m = p.matcher(formatStr);
if(m.matches()) {
System.out.println(m.group(1));
}
else {
System.out.println("Didn't match!");
}
}
Now, there are two separate regular expressions I've tried. First:
final String REGEX = "(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*)";
With program output:
MM/yyyy
MM/yyyy
yyyy-MMM
Didn't match!
Didn't match!
Second:
final String REGEX = "(?:[^My]*)((?:[My]+[^\\w]*)+[My]+)(?:[^My]*)";
With program output:
MM/yyyy
MM/yyyy
yyyy-MMM
MM/yy - yy
Didn't match!
Now, let's see what the first regex actually matches to:
(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*) First regex =
(?:[^My]*) Any amount of non-Ms and non-ys (non-capturing)
([My]+ followed by one or more Ms and ys
[^\\w]* optionally separated by non-word characters
(implying they are also not Ms or ys)
[My]+) followed by one or more Ms and ys
(?:[^My]*) finished by any number of non-Ms and non-ys
(non-capturing)
What this means is that at least 2 M/ys are required to match the regex, although you should be careful that something like MM-dd or yy-DD will match as well, because they have two M-or-y regions 1 character long. You can avoid getting into trouble here by just keeping a sanity check on your date format string, such as:
if(formatStr.contains('y') && formatStr.contains('M') && m.matches())
{
String yMString = m.group(1);
... // other logic
}
As for the second regex, here's what it means:
(?:[^My]*)((?:[My]+[^\\w]*)+[My]+)(?:[^My]*) Second regex =
(?:[^My]*) Any amount of non-Ms and non-ys
(non-capturing)
( ) followed by
(?:[My]+ )+[My]+ at least two text segments consisting of
one or more Ms or ys, where each segment is
[^\\w]* optionally separated by non-word characters
(?:[^My]*) finished by any number of non-Ms and non-ys
(non-capturing)
This regex will match a slightly broader series of strings, but it still requires that any separations between Ms and ys be non-words ([^a-zA-Z_0-9]). Additionally, keep in mind that this regex will still match "yy", "MM", or similar strings like "yyy", "yyyy"..., so it would be useful to have a sanity check as described for the previous regular expression.
Additionally, here's a quick example of how one might use the above to manipulate a single date format string:
LocalDateTime date = LocalDateTime.now();
String dateFormatString = "dd/MM/yyyy H:m:s";
System.out.println("Old Format: \"" + dateFormatString + "\" = " +
date.format(DateTimeFormatter.ofPattern(dateFormatString)));
Pattern p = Pattern.compile("(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*)");
Matcher m = p.matcher(dateFormatString);
if(dateFormatString.contains("y") && dateFormatString.contains("M") && m.matches())
{
dateFormatString = m.group(1);
System.out.println("New Format: \"" + dateFormatString + "\" = " +
date.format(DateTimeFormatter.ofPattern(dateFormatString)));
}
else
{
throw new IllegalArgumentException("Couldn't shorten date format string!");
}
Output:
Old Format: "dd/MM/yyyy H:m:s" = 14/08/2019 16:55:45
New Format: "MM/yyyy" = 08/2019

I'll try to answer with the understanding of my question : how do I remove from a list/table/array of String, elements that does not exactly follow the patern 'dd/MM'.
so I'm looking for a function that looks like
public List<String> removeUnWantedDateFormat(List<String> input)
We can expect, from my knowledge on Dateformat, only 4 possibilities that you would want, hoping i dont miss any, which are "MM/yyyy", "MMM/yyyy", "MM/yy", "MM/yyyy". So that we know what we are looking for we can do an easy function.
public List<String> removeUnWantedDateFormat(List<String> input) {
String s1 = "MM/yyyy";
string s2 = "MMM/yyyy";
String s3 = "MM/yy";
string s4 = "MMM/yy";
for (String format:input) {
if (!s1.equals(format) && s2.equals(format) && s3.equals(format) && s4.equals(format))
input.remove(format);
}
return input;
}
Better not to use regex if you can, it costs a lot of resources. And great improvement would be to use an enum of the date format you accept, like this you have better control over it, and even replace them.
Hope this will help, cheers
edit: after i saw the comment, i think it would be better to use contains instead of equals, should work like a charm and instead of remove,
input = string expected.
so it would looks more like:
public List<String> removeUnWantedDateFormat(List<String> input) {
List<String> comparaisons = new ArrayList<>();
comparaison.add("MMM/yyyy");
comparaison.add("MMM/yy");
comparaison.add("MM/yyyy");
comparaison.add("MM/yy");
for (String format:input) {
for(String comparaison: comparaisons)
if (format.contains(comparaison)) {
format = comparaison;
break;
}
}
return input;
}

How can I format int numbers in java?

I am trying to apply this format to a given int number #´###,### I tried with DecimalFormat class but it only allows to have one grouping separator symbol when I need to have two the accute accent for millions and commas for thousands.
So at the end I can format values like 1,000 or millions in this way 1´000,000

I always prefer to use String.format, but I am not sure if there is a Locale that would format numbers like that either. Here is some code that will do the job though.
// Not sure if you wanted to start with a number or a string. Adjust accordingly
String stringValue = "1000000";
float floatValue = Float.valueOf(stringValue);
// Format the string to a known format
String formattedValue = String.format(Locale.US, "%,.2f", floatValue);
// Split the string on the separator
String[] parts = formattedValue.split(",");
// Put the parts back together with the special separators
String specialFormattedString = "";
int partsRemaining = parts.length;
for(int i=0;i<parts.length;i++)
{
specialFormattedString += parts[i];
partsRemaining--;
if(partsRemaining > 1)
specialFormattedString += "`";
else if(partsRemaining == 1)
specialFormattedString += ",";
}

I found useful the link #Roshan provide in comments, this solution is using regex expression and replaceFirst method
public static String audienceFormat(int number) {
String value = String.valueOf(number);
if (value.length() > 6) {
value = value.replaceFirst("(\\d{1,3})(\\d{3})(\\d{3})", "$1\u00B4$2,$3");
} else if (value.length() >=5 && value.length() <= 6) {
value = value.replaceFirst("(\\d{2,3})(\\d{3})", "$1,$2");
} else {
value = value.replaceFirst("(\\d{1})(\\d+)", "$1,$2");
}
return value;
}
I don't know if this solution has a performance impact, also I am rockie with regex, so this code might be shorted.

Try this, these Locale formats in your required format.
List<Locale> locales = Arrays.asList(new Locale("it", "CH"), new Locale("fr", "CH"), new Locale("de", "CH"));
for (Locale locale : locales) {
DecimalFormat df = (DecimalFormat) NumberFormat.getCurrencyInstance(locale);
DecimalFormatSymbols dfs = df.getDecimalFormatSymbols();
dfs.setCurrencySymbol("");
df.setDecimalFormatSymbols(dfs);
System.out.println(String.format("%5s %15s %15s", locale, format(df.format(1000)), format(df.format(1_000_000))));
}
util method
private static String format(String str) {
int index = str.lastIndexOf('\'');
if (index > 0) {
return new StringBuilder(str).replace(index, index + 1, ",").toString();
}
return str;
}
output
it_CH 1,000.00 1'000,000.00
fr_CH 1,000.00 1'000,000.00
de_CH 1,000.00 1'000,000.00
set df.setMaximumFractionDigits(0); to remove the fractions
output
it_CH 1,000 1'000,000
fr_CH 1,000 1'000,000
de_CH 1,000 1'000,000

Maybe try using this, the "#" in place with the units you want before the space or comma.
String num = "1000500000.574";
String newnew = new DecimalFormat("#,###.##").format(Double.parseDouble(number));

REGEX: Get double (positive or negative) from string [duplicate]

let's say i have string like that:
eXamPLestring>1.67>>ReSTOfString
my task is to extract only 1.67 from string above.
I assume regex will be usefull, but i can't figure out how to write propper expression.

If you want to extract all Int's and Float's from a String, you can follow my solution:
private ArrayList<String> parseIntsAndFloats(String raw) {
ArrayList<String> listBuffer = new ArrayList<String>();
Pattern p = Pattern.compile("[0-9]*\\.?[0-9]+");
Matcher m = p.matcher(raw);
while (m.find()) {
listBuffer.add(m.group());
}
return listBuffer;
}
If you want to parse also negative values you can add [-]? to the pattern like this:
Pattern p = Pattern.compile("[-]?[0-9]*\\.?[0-9]+");
And if you also want to set , as a separator you can add ,? to the pattern like this:
Pattern p = Pattern.compile("[-]?[0-9]*\\.?,?[0-9]+");
.
To test the patterns you can use this online tool: http://gskinner.com/RegExr/
Note: For this tool remember to unescape if you are trying my examples (you just need to take off one of the \)

You could try matching the digits using a regular expression
\\d+\\.\\d+
This could look something like
Pattern p = Pattern.compile("\\d+\\.\\d+");
Matcher m = p.matcher("eXamPLestring>1.67>>ReSTOfString");
while (m.find()) {
Float.parseFloat(m.group());
}

Here's how to do it in one line,
String f = input.replaceAll(".*?(-?[\\d.]+)?.*", "$1");
Which returns a blank String if there is no float found.
If you actually want a float, you can do it in one line:
float f = Float.parseFloat(input.replaceAll(".*?(-?[\\d.]+).*", "$1"));
but since a blank cannot be parsed as a float, you would have to do it in two steps - testing if the string is blank before parsing - if it's possible for there to be no float.

String s = "eXamPLestring>1.67>>ReSTOfString>>0.99>>ahgf>>.9>>>123>>>2323.12";
Pattern p = Pattern.compile("\\d*\\.\\d+");
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(">> "+ m.group());
}
Gives only floats
>> 1.67
>> 0.99
>> .9
>> 2323.12

You can use the regex \d*\.?,?\d* This will work for floats like 1.0 and 1,0

Have a look at this link, they also explain a few things that you need to keep in mind when building such a regex.
[-+]?[0-9]*\.?[0-9]+
example code:
String[] strings = new String[3];
strings[0] = "eXamPLestring>1.67>>ReSTOfString";
strings[1] = "eXamPLestring>0.57>>ReSTOfString";
strings[2] = "eXamPLestring>2547.758>>ReSTOfString";
Pattern pattern = Pattern.compile("[-+]?[0-9]*\\.?[0-9]+");
for (String string : strings)
{
Matcher matcher = pattern.matcher(string);
while(matcher.find()){
System.out.println("# float value: " + matcher.group());
}
}
output:
# float value: 1.67
# float value: 0.57
# float value: 2547.758

/**
* Extracts the first number out of a text.
* Works for 1.000,1 and also for 1,000.1 returning 1000.1 (1000 plus 1 decimal).
* When only a , or a . is used it is assumed as the float separator.
*
* #param sample The sample text.
*
* #return A float representation of the number.
*/
static public Float extractFloat(String sample) {
Pattern pattern = Pattern.compile("[\\d.,]+");
Matcher matcher = pattern.matcher(sample);
if (!matcher.find()) {
return null;
}
String floatStr = matcher.group();
if (floatStr.matches("\\d+,+\\d+")) {
floatStr = floatStr.replaceAll(",+", ".");
} else if (floatStr.matches("\\d+\\.+\\d+")) {
floatStr = floatStr.replaceAll("\\.\\.+", ".");
} else if (floatStr.matches("(\\d+\\.+)+\\d+(,+\\d+)?")) {
floatStr = floatStr.replaceAll("\\.+", "").replaceAll(",+", ".");
} else if (floatStr.matches("(\\d+,+)+\\d+(.+\\d+)?")) {
floatStr = floatStr.replaceAll(",", "").replaceAll("\\.\\.+", ".");
}
try {
return new Float(floatStr);
} catch (NumberFormatException ex) {
throw new AssertionError("Unexpected non float text: " + floatStr);
}
}

Checking if a given String is a valid currency using number format

I have below strings:
String str1 = "$123.00";
String str2 = "$(123.05)";
String str3 = "incorrectString";
I want to check and output like:
if(str1 is a valid currency){
System.out.println("Str1 Valid");
}else{
System.out.println("Str1 InValid");
}
if(str2 is a valid currency){
System.out.println("Str2 Valid");
}else{
System.out.println("Str2 InValid");
}
if(str3 is a valid currency){
System.out.println("Str3 Valid");
}else{
System.out.println("Str3 InValid");
}
UseCase: I am parsing a pdf using pdfbox. Given a searchterm say "abc", I want to read next token after the search term. For this purpose, I am searching for the search term in parsed pdf text and then reading the next token to that search term.
The token should be a valid currency. But there could be a case where in "abc" is present at two different places in a page with one having valid currency token next to it while the other not.
So I want to put in a check that if the token I am reading is not a valid currency token, break the loop and continue the search on the page.
I did it as below:
if (tokenRead.length() > 0) {
String temp = tokenRead.replace("$", "").replaceAll("\\(", "");
char checkFirstChar = temp.trim().charAt(0);
if (!(checkFirstChar >= '0' && checkFirstChar <= '9')) {
System.out.println("breaking");
break;
}
}
This works, but I believe there should be a elegant solution using NumberFormat.
Hence the question!
Thanks for reading!

NumberFormat has nothing out of the box for your use case.
A possible solution I could come up with is this:
Currency currency = Currency.getInstance(currentLocale);
String symbol = currency.getSymbol();
if(string.startsWith(symbol) || string.endsWith(symbol)){
System.out.println("valid");
}else{
System.out.println("invalid");
}
But then you still need to check if the rest of the string can be parsed to a number.
Therefore I recommend to have a look at Apache Commons Currency Validator, it may fit your needs:
#Test
public void test() {
BigDecimalValidator validator = CurrencyValidator.getInstance();
BigDecimal amount = validator.validate("$123.00", Locale.US);
assertNotNull(amount);
//remove the brackets since this is something unusual
String in = "$(123.00)".replaceAll("\\(", "").replace(')', ' ').trim();
amount = validator.validate(in, Locale.US);
assertNotNull(amount);
amount = validator.validate("invalid", Locale.US);
assertNull(amount);
}

You could try DecimalFormat. It allows you to handle positive and negative value patterns separately using ;:
List<String> list = new ArrayList<>();
list.add("$123.00");
list.add("$(123.05)");
list.add("incorrectString");
NumberFormat nf = new DecimalFormat("¤#.00;¤(#.00)", new DecimalFormatSymbols(Locale.US));
try {
for(String str : list){
nf.parse(str);
}
} catch (ParseException e) {
System.out.println(e.getMessage());
}

Regex positive lookbehind woes

My goal is to match the first 0 and everything after that zero in a decimal value. If the first decimal place is a zero then I want to match the decimal too. If there is no decimal then capture nothing.
Here are some examples of what I want:
180.570123 // should capture the "0123" on the end
180.570 // should capture the "0" on the end
180.0123 // should capture the ".0123" on the end
180.0 // should capture the ".0" on the end
180123 // should capture nothing
180 // should capture nothing
If the first decimal place is a 0 then making the match is easy:
(\.0.*)
My problem is matching when the first decimal place is not a 0. I believe positive lookbehind will fix this issue, but I am not able to get it to work correctly. Here is one regex I have tried:
(?<=^.*\..*)0.*
This regex will eventually be used in Java.
UPDATE:
I am going to use this regex to get rid of numbers and possibly a decimal point on the end of a string using Java's replaceAll method. I will do this by replacing the capture group with an empty string. Here is a better example of what I want.
String case1 = "180.570123";
String case2 = "180.570";
String case3 = "180.0123";
String case4 = "180.0";
String case5 = "180123";
String case6 = "180";
String result = null;
result = case1.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180.57
result = case2.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180.57
result = case3.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180
result = case4.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180
result = case5.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180123
result = case6.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180
Also, I am testing these regexs at http://gskinner.com/RegExr/

You can use this expression:
\.[1-9]*(0\d*)
And what you want will be in the first capturing group. (Except the decimal point.)
If you want to capture the decimal point too, you can use:
(?:\.[1-9]+|(?=\.))(\.?0\d*)
Example (online):
Pattern p = Pattern.compile("(?:\\.[1-9]+|(?=\\.))(\\.?0\\d*)");
String[] strs = {"180.570123", "180.570", "180.0123", "180.0", "180123", "180", "180.2030405"};
for (String s : strs) {
Matcher m = p.matcher(s);
System.out.printf("%-12s: Match: %s%n", s,
m.find() ? m.group(1) : "n/a");
}
Output:
180.570123 : Match: 0123
180.570 : Match: 0
180.0123 : Match: .0123
180.0 : Match: .0
180123 : Match: n/a
180 : Match: n/a
180.2030405 : Match: 030405

I would write a small function to do the extracting instead of regex.
private String getZeroPart(final String s) {
final String[] strs = s.split("\\.");
if (strs.length != 2 || strs[1].indexOf("0") < 0) {
return null;
} else {
return strs[1].startsWith("0") ? "." + strs[1] : strs[1].substring(strs[1].indexOf("0"));
}
}
to test it:
final String[] ss = { "180.570123", "180.570", "180.0123",
"180.0", "180123", "180", "180.2030405","180.5555" };
for (final String s : ss) {
System.out.println(getZeroPart(s));
}
output:
0123
0
.0123
.0
null
null
030405
null
update
based on the EDIT of the question. do some changes on the method to get the right number:
private String cutZeroPart(final String s) {
final String[] strs = s.split("\\.");
if (strs.length != 2 || strs[1].indexOf("0") < 0) {
return s;
} else {
return strs[1].startsWith("0") ? strs[0] : s.substring(0, strs[0].length() + strs[1].indexOf("0") + 1);
}
}
output:
180.570123 -> 180.57
180.570 -> 180.57
180.0123 -> 180
180.0 -> 180
180123 -> 180123
180 -> 180
180.2030405 -> 180.2
180.5555 -> 180.5555

Lookbehinds might be overkill. This one worked well-->
/\.\d*(0\d*)/

You can try this :
Matcher m = Pattern.compile("(?<=(\\.))(\\d*?)(0.*)").matcher("180.2030405");
String res="";
if(m.find()){
if(m.group(2).equals("")){
res = "."+m.group(3);
}
else{
res = m.group(3);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to convert formatted strings to float? - java

The Pattern class works with regular expressions. You probably want this: Pattern pattern = Pattern.compile("-?\d\.\d{1,3}(,\d{1,3})?"); You probably want to tune this regex depending on exactly what formats you want or don't want to match.

Try System.out.println("1,000.000,00".matches("^[+-]?\\d+(\\.\\d{3})*(,\\d+)?")); I am not sure if your number can start with + so added it just in case. Also don't know if 0100000.000.000,1234 should be valid. If not tell why and regex will be corrected.

Related

Remove elements from Date Format String using a Regular Expression

How can I format int numbers in java?

REGEX: Get double (positive or negative) from string [duplicate]

Checking if a given String is a valid currency using number format

Regex positive lookbehind woes

Categories

Resources