I'm trying to read lines from file into Arraylist. Here is my writer :
private Map<Integer, ArrayList<Integer>> motPage =
new HashMap<Integer, ArrayList<Integer>>();
private void writer() throws UnsupportedEncodingException, FileNotFoundException, IOException{
try (Writer writer = new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream("/home/kdiri/workJuno/motorRecherche/src/kemal.txt"), "utf-8"))) {
for(Map.Entry<Integer, ArrayList<Integer>> entry : motPage.entrySet()){
writer.write(entry.getKey() + " : " + entry.getValue() + "\n");
}
}
}
And this is an exemple result in the file kemal.txt :
0 : [38, 38, 38, 38, 199, 199, 199, 199, 3004, 3004, 3004, 3004, 23, 23]
My question is how can I read it these lines efficiently into Hashmap again ? Because size of file is about 500MB. Thank you in advance.
As JonSkeet said, you should start with something working. Find below one possible way. The snippet is kept quite verbose to show the principle.
String line = "0 : [38, 38, 38, 38, 199, 199, 199, 199, 3004, 3004, 3004, 3004,
23, 23]";
int firstSpace = line.indexOf(" ");
int leftSquareBracket = line.indexOf("[");
int rightSquareBracket = line.indexOf("]");
String keyString = line.substring(0, firstSpace);
String[] valuesString = line.substring(leftSquareBracket + 1, rightSquareBracket)
.split(", ");
int key = new Integer(keyString);
List<Integer> values = new ArrayList<>(valuesString.length);
for (String value : valuesString) {
values.add(new Integer(value));
}
Map<Integer, List<Integer>> motPage = new HashMap<>();
motPage.put(key, values);
Btw. read ... these lines efficiently into Hashmap depends on your requirements. Efficiency could be for example:
read speed of the huge file
convertion speed String to Integer
small size of the bytecode
less object generation
... there could be other as well
When the snippet does not fulfil your efficiency criteria. Start to tune the part which impacts your criteria.
Related
I've a problem that I don't understand about adding an element into the ArrayList.
The result show me that it hasnt added the two computers.
Someone can help me ?
Computer computer;
GenerateXML xml;
Parser dom = new Parser();
List computers = new ArrayList();
computer = new Computer("2", "fisso", "Corsair", "Venom 2X", 1029971511, 4.5f, 12, 32, 600, 24.0f, 1900, "Linux", "21-10-2021");
computers.add(computer);
computer = new Computer("3", "laptop", "Microsoft", "Surface", 1000091801, 4.5f, 12, 32, 600, 24.0f, 1900, "Linux", "21-10-2021");
computers.add(computer);
try {
xml = new GenerateXML(computers);
xml.printToFile("computer.xml");
} catch (ParserConfigurationException | TransformerException exception) {
System.out.println("Errore generate!");
}
try{
computers = dom.parseDocument("computer.xml");
} catch (ParserConfigurationException | SAXException | IOException exception){
System.out.println("Errore parsing!");
}
System.out.println("Numero computers: " + computers.size());
for (int i = 0; i < computers.size(); i++)
System.out.println(computers.get(i));
The result is:
Numero computers: 0
You initialize computers to be an empty list.
List computers = new ArrayList();
Then you add two computers to your list.
computer = new Computer("2", "fisso", "Corsair", "Venom 2X", 1029971511, 4.5f, 12, 32, 600, 24.0f, 1900, "Linux", "21-10-2021");
computers.add(computer);
computer = new Computer("3", "laptop", "Microsoft", "Surface", 1000091801, 4.5f, 12, 32, 600, 24.0f, 1900, "Linux", "21-10-2021");
computers.add(computer);
At this point, your computers list will contain two computers. The size of the list will be two.
Then you assign a new value to the computers list in your second try/catch block. By doing this, you lose the list you had created and populated before. After this, computers will be a completely new list.
computers = dom.parseDocument("computer.xml");
I have this Object:
QuoteProductDTO with three columns ( name, value1, value2)
List<QuoteProductDTO> lstQuoteProductDTO = new ArrayList<>();
lstQuoteProductDTO.add( new QuoteProductDTO("product", 10, 15.5) );
lstQuoteProductDTO.add( new QuoteProductDTO("product", 05, 2.5) );
lstQuoteProductDTO.add( new QuoteProductDTO("product", 13, 1.0) );
lstQuoteProductDTO.add( new QuoteProductDTO("product", 02, 2.0) );
I need to get a consolidate ( a new object QuoteProductDTO ):
the firts column name,I have to get the first value "product".
the second one (value1) I have to get the biggest value 13.
and third column I heve to get the sum of all values 20.
This takes the current data provided and generates a new object with the required data. It uses the Collectors.teeing() method of Java 12+
Given the following data:
ArrayList<QuoteProductDTO> lstQuoteProductDTO = new ArrayList<>();
ArrayList<QuoteProductDTO> nextQuoteProductDTO = new ArrayList<>();
// empty Quote for Optional handling below.
QuoteProductDTO emptyQuote = new QuoteProductDTO("EMPTY", -1, -1);
lstQuoteProductDTO.add(
new QuoteProductDTO("Product", 10, 15.5));
lstQuoteProductDTO.add(
new QuoteProductDTO("Product", 05, 2.5));
lstQuoteProductDTO.add(
new QuoteProductDTO("Product", 13, 1.0));
lstQuoteProductDTO.add(
new QuoteProductDTO("Product", 02, 2.0));
You can consolidate like you want into a new instance of QuoteProductDTO.
QuoteProductDTO prod = lstQuoteProductDTO.stream()
.collect(Collectors.teeing(
Collectors.maxBy(Comparator
.comparing(p -> p.value1)),
Collectors.summingDouble(
p -> p.value2),
(a, b) -> new QuoteProductDTO(
a.orElse(emptyQuote).name,
a.orElse(emptyQuote).value1,
b.doubleValue())));
System.out.println(prod);
Prints
Product, 13, 21.0
You can also take a list of lists of different products and put them in a list of consolidated products. Add the following to a new list and then add those to a main list.
nextQuoteProductDTO.add(
new QuoteProductDTO("Product2", 10, 15.5));
nextQuoteProductDTO.add(
new QuoteProductDTO("Product2", 25, 20.5));
nextQuoteProductDTO.add(
new QuoteProductDTO("Product2", 13, 1.0));
nextQuoteProductDTO.add(
new QuoteProductDTO("Product2", 02, 2.0));
List<List<QuoteProductDTO>> list = List.of(
lstQuoteProductDTO, nextQuoteProductDTO);
Now consolidate those into a list of objects.
List<QuoteProductDTO> prods = list.stream().map(lst -> lst.stream()
.collect(Collectors.teeing(
Collectors.maxBy(Comparator
.comparing(p -> p.value1)),
Collectors.summingDouble(
p -> p.value2),
(a, b) -> new QuoteProductDTO(
a.orElse(emptyQuote).name,
a.orElse(emptyQuote).value1,
b.doubleValue()))))
.collect(Collectors.toList());
prods.forEach(System.out::println);
Prints
Product, 13, 21.0
Product2, 25, 39.0
I created a class to help demonstrate this.
class QuoteProductDTO {
public String name;
public int value1;
public double value2;
public QuoteProductDTO(String name, int value1,
double value2) {
this.name = name;
this.value1 = value1;
this.value2 = value2;
}
public String toString() {
return name + ", " + value1 + ", " + value2;
}
}
I have the following piece of code
OrderCriteria o1 = new OrderCriteria(1, 1, 101, 201);
OrderCriteria o2 = new OrderCriteria(1, 1, 102, 202);
OrderCriteria o4 = new OrderCriteria(1, 1, 102, 201);
OrderCriteria o5 = new OrderCriteria(2, 2, 501, 601);
OrderCriteria o6 = new OrderCriteria(2, 2, 501, 602);
OrderCriteria o7 = new OrderCriteria(2, 2, 502, 601);
OrderCriteria o8 = new OrderCriteria(2, 2, 502, 602);
OrderCriteria o9 = new OrderCriteria(2, 2, 503, 603);
Where OrderCriteria looks like below:
public class OrderCriteria {
private final long orderId;
private final long orderCatalogId;
private final long procedureId;
private final long diagnosisId;
public OrderCriteria(long orderId, long orderCatalogId, long procedureId, long diagnosisId) {
this.orderId = orderId;
this.orderCatalogId = orderCatalogId;
this.procedureId = procedureId;
this.diagnosisId = diagnosisId;
}
// Getters
}
What I want is to get a list of procedures and list of diagnosis grouped by order id. So it should return:
{1, {101, 102}, {201, 202}}
{2, {501, 502, 503}, {601, 602, 603}}
which means Order with id 1 is having procedure ids 101, 102 and diagnosis ids 201, 202 etc. I tried using google guava table but could not come up with any valid solution.
First you'll need a new structure to hold the grouped data:
class OrderCriteriaGroup {
final Set<Long> procedures = new HashSet<>();
final Set<Long> diagnoses = new HashSet<>();
void add(OrderCriteria o) {
procedures.add(o.getProcedureId());
diagnoses.add(o.getDiagnosisId());
}
OrderCriteriaGroup merge(OrderCriteriaGroup g) {
procedures.addAll(g.procedures);
diagnoses.addAll(g.diagnoses);
return this;
}
}
add() and merge() are convenience methods that will help us stream and collect the data, like so:
Map<Long, OrderCriteriaGroup> grouped = criteriaList.stream()
.collect(Collectors.groupingBy(OrderCriteria::getOrderId,
Collector.of(
OrderCriteriaGroup::new,
OrderCriteriaGroup::add,
OrderCriteriaGroup::merge)));
I highly recommend you to change the output structure. The current, according to your example is probably Map<List<Set<Long>>>. I suggest you distinguish between "procedure: and "diagnosis" set of data using the following structure:
Map<Long, Map<String, Set<Long>>> map = new HashMap<>();
Now filling the data is quite easy:
for (OrderCriteria oc: list) {
if (map.containsKey(oc.getOrderId())) {
map.get(oc.getOrderId()).get("procedure").add(oc.getProcedureId());
map.get(oc.getOrderId()).get("diagnosis").add(oc.getDiagnosisId());
} else {
Map<String, Set<Long>> innerMap = new HashMap<>();
innerMap.put("procedure", new HashSet<>());
innerMap.put("diagnosis", new HashSet<>());
map.put(oc.getOrderId(), innerMap);
}
}
Output: {1={diagnosis=[201, 202], procedure=[102]}, 2={diagnosis=[601, 602, 603], procedure=[501, 502, 503]}}
If you insist on the structure you have drafted, you would have to remember that the first Set contains procedures and the second one contains the diagnosis and the maintenaince would be impractical.
Map<Long, List<Set<Long>>> map = new HashMap<>();
for (OrderCriteria oc: list) {
if (map.containsKey(oc.getOrderId())) {
map.get(oc.getOrderId()).get(0).add(oc.getProcedureId());
map.get(oc.getOrderId()).get(1).add(oc.getDiagnosisId());
} else {
List<Set<Long>> listOfSet = new ArrayList<>();
listOfSet.add(new HashSet<>());
listOfSet.add(new HashSet<>());
map.put(oc.getOrderId(), listOfSet);
}
}
Output: {1=[[102], [201, 202]], 2=[[501, 502, 503], [601, 602, 603]]}
Alternatively you might want to create a new object with 2 Set<Long> to store the data instead (another answer shows the way).
Is there any way to programatically extract the final value of the aggregators after a Dataflow batch execution ?
Based on the DirectePipelineRunner class, I wrote the following method. It seems to work, but for dinamically created counters, it gives different values than the values shown in the console output.
PS. If it helps, I'm assuming that aggregators are based on Long values, with a sum combining function.
public static Map<String, Object> extractAllCounters(Pipeline p, PipelineResult pr)
{
AggregatorPipelineExtractor aggregatorExtractor = new AggregatorPipelineExtractor(p);
Map<String, Object> results = new HashMap<>();
for (Map.Entry<Aggregator<?, ?>, Collection<PTransform<?, ?>>> e :
aggregatorExtractor.getAggregatorSteps().entrySet()) {
Aggregator agg = e.getKey();
try {
results.put(agg.getName(), pr.getAggregatorValues(agg).getTotalValue(agg.getCombineFn()));
} catch(AggregatorRetrievalException|IllegalArgumentException aggEx) {
//System.err.println("Can't extract " + agg.getName() + ": " + aggEx.getMessage());
}
}
return results;
}
The values of aggregators should be available in the PipelineResult. For example:
CountOddsFn countOdds = new CountOddsFn();
pipeline
.apply(Create.of(1, 3, 5, 7, 2, 4, 6, 8, 10, 12, 14, 20, 42, 68, 100))
.apply(ParDo.of(countOdds));
PipelineResult result = pipeline.run();
// Here you may need to use the BlockingDataflowPipelineRunner
AggregatorValues<Integer> values =
result.getAggregatorValues(countOdds.aggregator);
Map<String, Integer> valuesAtSteps = values.getValuesAtSteps();
// Now read the values from the step...
Example DoFn that reports the aggregator:
private static class CountOddsFn extends DoFn<Integer, Void> {
Aggregator<Integer, Integer> aggregator =
createAggregator("odds", new SumIntegerFn());
#Override
public void processElement(ProcessContext c) throws Exception {
if (c.element() % 2 == 1) {
aggregator.addValue(1);
}
}
}
I am using this command line
java -cp weka.jar weka.classifiers.trees.RandomForest -T tdata.arff -l rndforrest.model -p 0 > data.out
But I want to do it in java without using files, everything should be on the fly. The model can be loaded once at the beginning and the tdata.arff should be one data row for which I need the prediction (classification?).
Like this:
weka.classifiers.Classifier rndForrest = (weka.classifiers.Classifier)weka.core.SerializationHelper.read("rndforrest.model");
var dataInst = new weka.core.Instance(1, new double[] { 0, 9, -96, 62, 1, 200, 35, 1 });
double pred = rndForrest.classifyInstance(dataInst);
I get an error
Instance doesn't have access to a dataset!
Thank you for help.
edit: my code
Stopwatch sw = new Stopwatch();
sw.Start();
var values = new double[] { 0, 9, -96, 62, 1, 200, 35, 0 };
weka.classifiers.Classifier rndForrest = (weka.classifiers.Classifier)weka.core.SerializationHelper.read("rndforrest.model");
var dataInst = new weka.core.Instance(1, values);
FastVector atts = new FastVector();
for(int i=0; i < values.Length; i++) {
atts.addElement(new weka.core.Attribute("att" + i));
}
weka.core.Instances data = new Instances("MyRelation", atts, 0);
data.add(dataInst);
data.setClassIndex(data.numAttributes() - 1);
double pred = rndForrest.classifyInstance(data.firstInstance());
Console.WriteLine("prediction is " + pred);
Console.WriteLine(sw.ElapsedMilliseconds);
Well, the error says it, doesn't it?
Instances doesn't have access to a dataset!
The Javadoc for the constructor you use says:
public Instance(double weight, double[] attValues)
Constructor that inititalizes instance variable with given values. Reference to the dataset is set to null. (ie. the instance doesn't have access to information about the attribute types)
Every Instance has to belong to a data set (Instances), because in Weka each value of an instance is stored as a double value. Additional information is needed to determine how to interpret that double value (e.g. as double, string, nominal, ...) and this information is provided through the data set.
You need to do something like:
FastVector atts = new FastVector();
// assuming all your eight attributes are numeric
for( int i = 1; i <= 8; i++ ) {
atts.addElement(new Attribute("att" + i)); // - numeric
}
Instances data = new Instances("MyRelation", atts, 0);
data.add(dataInst);
(Also see Creating an ARFF file for additional examples on how to create attributes of a certain type)