Instances went wrong from csv to weka

Instances went wrong from csv to weka - java

.csv
100387C,254,73,93
100388D,2047,60,98
100388D,2736,62,9
100389E,951,82,90
100390F,2048,91,98
100411C,254,50,96
100412D,047,75,9
.arff
#relation test
#attribute Admno {100387C,100388.0,100389E,100390.0,100411C,100412.0}
#attribute Code {254,2047,2736,951,2048,254,047}
#attribute ore numeric
#attribute tend numeric
100387C,254,73,93
100388.0,2047,60,98
100388.0,2736,62,9
100389E,951,82,90
100390.0,2048,91,98
100411C,254,50,96
100412.0,047,75,9
If you were to notice the different between this two data after converting was
from D to .0 on #attribute Admno. The file conversion I was using are below. So I was wondering what went wrong on the conversion. Thanks
CSVLoader loader = new CSVLoader();
loader.setSource(new File("C:\\test.csv"));
Instances data = loader.getDataSet();
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File("C:\\test.arff"));
saver.writeBatch();

The reason you are getting 100388D as 100388.0 and 100390F as 100390.0 is because the values are ending with D and F respectively. In Java, this means the values are Double and Float (D stands for Double and F stands for Float). That is why when Weka is converting them into nominal values, it is believing that the values should be Double or Float and hence the .0 instead of D and F.
You can find a discussion here and the related documentation here.
To the best of my knowledge, there is no straight forward way to overcome this in Weka. But if this is an ID and does not take part into classification or clustering, then you can have the facility to ignore this attribute when you build a model based on this data and apply it on your test data.
Another way to overcome this is to change this attribute's values to some values that don't end with neither D nor F.

Related

Remove Scientific representation of double value without converting the variable datatype to String in Java

I need to write some double values to a Json file
double number = 24910098.356;
It is written to Json file as below
2.4910098356E7
but want it to be displayed as 24910098.356.
I tried DecimalFormat but it is converting the value to String, but I want it to be a double value even after fomatting since it is part of requirement.

Format Issue while reading excel data into Java

I am reading Excel data using java apache. I got format issue while reading double value such as 869.87929 (in excel) into 869.8792899999999 (in java).
I'm using following files to read excel data.
1. Schema.csv: SheetName,2-int-Double
2. File.xls:
col1 | col2
123 | 869.87929
Sample code:
if(type.equals("Double")){
Double fval=Double.parseDouble(content[i-1]);
String sval=fval.toString();
listObjects.add(new Double(Double.parseDouble(sval)));
}
Note: type from schema.csv & content [] value from file.xls

There is no point converting number back and forth to Strings as this shouldn't do any thing useful.
Try doing
listObjects.add(new BigDecimal(content[i-1]));
with rounding you can do
listObjects.add(new BigDecimal(content[i-1]).setScale(9, RoundingMode.HALF_UP));
though I suspect the rounding error has occurred before this point as this should do basically the same thing as
listObjects.add(new Double(content[i-1]));
with rounding you can do
double d = Double.parseDouble(content[i-1]);
double round9 = Math.round(d * 1e9) / 1e9;
listObjects.add((Double) round9);
These are much the same as the number is within the precision of double and there should be no additional error here (i.e. the error is likely to be before this point)

Double is not good for preserving precision. Preffered is using BigDecimal. I believe this is your problem.
https://blogs.oracle.com/CoreJavaTechTips/entry/the_need_for_bigdecimal

If you use Apache POI - you can use getCellType()==Cell.CELL_TYPE_NUMERIC comparison and getNumericCellValue() from Cell interface.

How to divide the value of a field in a tmap in TALEND

I'm just getting started with Talend and I would like to know how to divide a value from a CSV file and round it if possible?
Here's my job layout:
And here's how my tMap is configured:

I assume the "/r" is to add a new line? That won't actually work and will instead add a string literal "/r" to whatever other string you're adding it to. You also don't need to do that because Talend will automatically start a new line at the end of the row of data for your tFileOutputDelimited.
But more importantly, you're attempting to call the divide method on a string which obviously doesn't exist (how would it be defined?).
You need to first parse the string as a numeric type (such as float/double/Big Decimal) and then divide by another numeric type (your Var1 is defined as a string in your example, so will actually fail there too because a string must be contained in quotes).
So typically you would either define the schema column that you are dividing as a numeric type (as mentioned) or you'd attempt to parse the string into a float in the tMap/tJavaRow component.
If you have your prices defined as something like a double before your tMap/tJavaRow operation that divides then you can use:
row1.prix2 / Var.var1
Or to round it to two decimal places:
(double)Math.round((row1.prix2 / Var.var1) * 100) / 100
You can also use a tConvertType component to explicitly convert between types where available. Alternatively you could parse the string as a double using:
Double.parseDouble(row1.prix2)
And then proceed to use that as previously described.
In your case though (according to your comment on Gabriele's answer), there is a further issue in that Java (and most programming languages) expect numbers to be formatted with a . for the decimal point. You need to add a pre-processing step to be able to parse your string as a double.
As this question's answers show, there are a couple of options. You can use a regex processing step to change all of your commas in that field to periods or you can use a tJavaRow to set your locale to French as you parse the double like so:
NumberFormat format = NumberFormat.getInstance(Locale.FRENCH);
Number number = format.parse(input_row.prix2);
double d = number.doubleValue();
output_row.nom = input_row.nom;
output_row.code = input_row.code;
output_row.date = input_row.date;
output_row.ref = input_row.ref;
output_row.ean = input_row.ean;
output_row.quantitie = input_row.quantitie;
output_row.prix1 = input_row.prix1;
output_row.prix2 = d;
And make sure to import the relevant libraries in the Advanced Settings tab of the tJavaRow component:
import java.text.NumberFormat;
import java.util.Locale;
Your output schema for the tJavaRow should be the same as the input but with prix2 being a double rather than a string.

Since var1 is defined as String you cannot apply the divide method. Try something like this for your output prix2 calculus:
(Float.parseFloat(row1.prix2)/2200f) + "Vr"
or something like that (I cannot read the text in the screenshot very well, actually)

Storing big numbers as they are

I'm using java and Apache derby to create a project that deals with big numbers. Everything is going fine except when i store big numbers.
For eg. when i save 1000000000 through my java class to a derby table, it automatically becomes 1.0E9. When this value is retrieved in another form it is displayed like 1.0E9. How can I stop this? I'm using float data type to do this.
In other words, how can I save 1000000000 as 1000000000 and not 1.0E9

Like above said you could use a BigInteger or you could just covert 1.0E9 to what the number actually is. 1.0 x 10^9.

1.0e9 is the same as 1000000000; it's just a representation issue. You just have to apply the proper formatters when transforming it to a string.
Two things that would make this easier are to use the NUMERIC column type in Derby, and also use either BigDecimal or BigInteger data types in your Java code, or possibly a long if you're confident that the long can hold the values in your problem domain.

import java.math.BigInteger;
//...
//...
//...
BigInteger store = new BigInteger("1000000000");

Convert GeoDB Geocoordinates to Lat/Lon and do an area search

I have a Database with GeoCoordinates of every Zipcode in a Decimal form (e.g. 5099755, 928690)
I want to do an area search based on these values, but in the formula that I'v found, I should pass the Lat and Lon values as Double.
How can I convert these "decimal" values to "double" values?

If GeoPoints is the GeoCoordinates variable
double lat = GeoPoints.getLattitude(); or double lat = GeoPoints.getLattitudeE6;
double lon = GeoPoints.getLongitude(); or double lat = GeoPoints.getLongitudeE6;

I think we need to know what sort of coordinate system your numbers are in - they look like they might be UTM (universal transverse mercator), but if so, there should also be a 'Zone' parameter (e.g. 55H). This document describes how to convert from UTM to DMS (and provides parameters for the various geographic datums) and also provides Javascript which you should be able to convert into Java pretty easily.
Also, have a look at this stackoverflow post, which talks about java packages which can do coordinate system conversion.
Then again, maybe your code wants data in exactly the format they're already in, in which case all you need to do is cast your decimals to doubles.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Instances went wrong from csv to weka - java

Related

Remove Scientific representation of double value without converting the variable datatype to String in Java

Format Issue while reading excel data into Java

How to divide the value of a field in a tmap in TALEND

Storing big numbers as they are

Convert GeoDB Geocoordinates to Lat/Lon and do an area search

Categories

Resources