Matching filenames with id - java

I have 1 text file which contains numbers from 1 to 11644. Beside the numbers are the names of the xml files that i have in another folder. I have a total of 8466 xml files. I need to match the filename of all the xml files with the id in the text file and extract the value of the id out. All of the id are in random position. An example would be my first xml file id is 7025, which means it's id is 7025. I'm new to java so i really hope someone would enlighten me thanks.

The data structure for this is a map.
Read in the input file, and add each line to a java.util.HashMap<String, Integer>. The key should be the filename. The value should be the id. Thus, for each line, myMap.put(filename, id). Now, when you want to check the ID of a file, do myMap.get(filename). It will return the Integer ID of the file.

Related

how to specify nominal attribute value's order when converting csv file into arff file?

I'm trying to convert a csv file into an arff file using the following code.
var csvFile = new File("/path/to/input/file.csv");
var arffOutputFile = new File("/path/to/output/file.arff");
var loader = new CSVLoader();
loader.setSource(csvFile);
var instances = loader.getDataSet();
var saver = new ArffSaver();
saver.setInstances(instances);
saver.setFile(arffOutputFile);
saver.writeBatch();
This code works, but the problem is the following. In my attributes list, I have a nominal attribute with values {yes, no} and i need that the arff header shows as first value yes. To be clearer, I need #attribute nominal_attr {yes,no} and not #attribute nominal_attr {no,yes} in the arff output header. The problem is that the order is determined by the value of the first Instance in instances: if the first row in csv input file has the no value, in the header there will be #attribute nominal_attr {no,yes}.
Is there a way to force the ArffSaver to use a certain order in the header without changing the order of the Instances?
Instead of fixing the output (ie ArffSaver), it would be easier fixing the input (ie CSVLoader). The -L command-line option (nominalLabelSpecs property in the GUI) allows you to specify the labels for nominal attributes. That way, you can force the order and available labels, if one of the CSV files doesn't have all the labels present.
The following filters can be used as well to change the order of your labels:
weka.filters.unsupervised.attribute.SortLabels
weka.filters.unsupervised.attribute.SwapValues

Java : Read text file. Filter selected words. Input into an ArrayList

Even if anyone can point me to similar question would be appreciated.
So i have this text file which has nodes and connections(basically graph) written in form of
nodes[ id]
and
edge[source destination]
There is a certain pattern to input data in text file.
Text file with everything inside [square brackets]. '
So the heirarhcy goes entire_text_inside_a[
node[ id 1] node[ id 2] edge[ source 1 destination 2] ]
Now i want to get values of id and store in a list seperately and edge in a seperate.
ME being new to java I/O have a general idea of input output a text file but what about the FILTER part.
Thank You in Advance.
A small piece of text file
node [
id 152
label "Milan"
Country "Italy"
Longitude 9.18951
Internal 1
Latitude 45.46427
type "Data Centre and MAN"
]
edge [
source 0
target 1
LinkLabel "Operational network managed end-to-end"
]
If i get the answer i will post it.
You could use a BufferReader to check each a line for "[" character. After finding the "[" character check whether the leading text contains "node" or "edge". Pass the corresponding ArrayList into a method that keeps going down the lines of the text file to find either "id" or "source" on a line. When it finds a line with "id" or "source" parse that line to extract the information you want and add it to your ArrayList. If the line contains a "]" break out of the method and return to your original code. Keep looping through the lines of the file until you receive a null line.

Naming variable in R from a text File

I am very new to R and am looking for a possible solution for this problem.
Suppose I have a variables.txt file (or any other file for that matter), which contains a list of variable names. EX, Product,
Ingredient,
Label,
Manufacturer,
Marketing,
This text file is generated in java and this file has to be read in R and variable are to be named according to the names in the file.
My example code is :
list(Product=0,Ingredient=0,Label=0,Manufacturer=0,Marketing=0)
which is now manually hard coded.
I need a way to get these names of variables from the variables.txt file and dynamically assign them in R. How can this be done?? is there any config file concept in R so that can also be a way out??
Maybe you can use:
data = read.table("file.txt",header=TRUE, sep=".") ?
The sep is depends on the seperator in the file. It could be comma, tab, space, dot or whatever.
With header=TRUE that means you want to take the original variable name from the file.
If you need the list structure described above you can use any read.table or read.csv command to get the names into R as mthbnd showed above.
Say your file.txt looks like: Product,Ingredient,Label,Manufacturer,Marketing
Read in the file and create a list from it. The Elements will then be filled with logical(0). Then you can easily set all elements to a 0 by using [ ] in order to keep the list structure
vars <- as.list(read.csv(file = "file.txt", header = T))
vars[] <- 0

Comparing values in multiple text files and database

I have multiple text files and an a database table. The database contains a fixed number of entries and the text files have more entries.
For Example:
------------text1.txt-----------
44-CAT-IV-CORE 626518 T19P45
44-CAT-IV-OUTER 626522 LB0N08
44-CAT-IV-EXTER 626956 AG8N15
44-CAT-IV-DOUT 626965 PQ7715
------------text2.txt-----------
44-CAT-IV-CORE 626518 T19P50
44-CAT-IV-OUTER 626522 LB0N08
44-CAT-IV-EXTER 626956 AG8N15
44-CAT-IV-DOUT 626965 PQ2718
Many files like this....
The database looks like:
|unit| |value| |name-part| |version|
|CAT-IV| |626518| |CAT IV CORE| |T19P43|
|CAT-IV| |626522| |CAT IV OUTER| |LB0N08|
|CAT-IV| |626956| |CAT IV EXTER| |AG8N15|
I want to get those part names and values from the text files whose value or version or both do not match on the database(only for those parts where the name exist in the database, like here we need to ignore CAT-IV-DOUT as it is not on the database)
I tried loading the database values to a text file and then comparing against text files, however it seems inefficient. Is there a better way to do this ?
put all lines of the files in an array (file();)
go trough the databasetable and compare each arrayelement with the current data from the database.

java:how to check if csv file has header

I have a csv file in which I am able to insert the header for the first run, but when I again write the file the program is creating the header again. Is there a way to check if csv file has a header and if yes then to skip it?
You would have to read the first line and test if the first column matches the column header you expect. Since your code inserts the header, I'm assuming it knows what the header should look like. You can use this same variable in your header check. Something like:
String HEADER = "column1,column2,column3";
String COLUMN1 = HEADER.substring(0,HEADER.indexOf(",")+1); //Or just set it to "column1", but that would be violating the DRY principle!
//...Get line1, column1 from the file you are reading
if(!line1Column1.equals(COLUMN1))
{
out.write(HEADER);
}
// Print rows of data...
Are you using any framework to do that or you are doing it yourself.. A code snippet would help... or you can put a Boolean flag to check or hard match the first line with the standard header code to check it...
If you inserted the header, couldn't you make it start, for instance with a dash (#) and if present not to write again ?
Regards,
Stéphane
Are you simply appending the records to the existing file and in which case the program is appending the header after the prior write?
Can you simply check if the file exists and if it does and is not size zero, assume the header is already present?

Categories

Resources