Values Clustering

Values Clustering - java

In a text file the data is distributed like shown below, I am considering this as table for ease.
Column1 Column2 Column3 Column4
A B 1 2
A B 1 5
A C 1 3
B C 2 3
C A 3 4
A B 4 5
I need to cluster the the same values if column1 and column2 is same, like A->B is repeated 3 times combine like this.
A B 1 2
A B 1 5
A B 4 5

Here's how I would do it.
Define a class Record containing the 4 fields
Define a class RecordKey containing the identification of a row, i.e. the two first column values. Make sure equals and hashCode are properly defined.
Create a Map<RecordKey, List<Record>>.
Read the records line by line. If there is already a list in the map for the current record key, then add the current record to this list. Otherwise, create a new list, add the record to it, and put this list in the map.

Providing memory is not an issue then simply loading them into a List and then sorting them with those two columns as a compound key will cause them to cluster. I would suggest creating a simple class to store each record, then using list.sort(new Comparator<MyRecord>(){...});
The compare method will be fairly straightforward if you can be sure you have no nulls in your keys:
#Override
public int compare(MyRecord a, MyRecord b) {
int n = a.getFirst().compareTo(b.getFirst());
if (n == 0)
return a.getSecond().compareTo(b.getSecond());
return n;
}
If you can have nulls then you'll need to be a bit more careful and check for them

You can use this type of Map structure.
Map<String, Map<String, List<Record>>> parentMap
Record is a pojo in which you can store entire record.
public class Record {
private String column1;
private String column2;
private Integer column3;
private Integer column4;
//getter setter
}
And in map you can put like this.
Map<String, Map<String, List<Record>>> parentMap = new HashMap<String, Map<String,List<Record>>>();
Map<String, List<Record>> innerMap;
List<Record> innerList;
Record r;
for (Record loop) {
innerMap = parentMap.get(column1);
if (innerMap == null || innerMap.size() == 0) {
innerMap = new HashMap<String, List<Record>>();
parentMap.put(column1, innerMap);
}
innerList = innerMap.get(column2);
if (innerList == null || innerList.size() == 0) {
innerList = new ArrayList<Record>();
innerMap.put(column2, innerList);
}
r = new Record();
//set values in r
innerList.add(r);
}

Related

Merging values contained by key in a map (Java)

final Multimap<String, Map.Entry<Integer, Integer>>[] actionMap = new Multimap[]{null};
final boolean[] loaded = {false};
db.execute(connection -> {
PreparedStatement statement = null;
ResultSet resultSet = null;
actionMap[0] = ArrayListMultimap.create();
try {
statement = connection.prepareStatement("Blah Blah...
while (resultSet.next()) {
final String name = ...
actionMap[0].put(name, new AbstractMap.SimpleEntry<>(int1, int2));
I have a map where I use SimpleEntry to insert two integer values (int1, int2). On a duplicate key I want to merge values of what's already mapped. My idea is computeIfPresent but I have no idea of the BiFunctions since I'm using AbstractMap.SimpleEntry to enter the two values. Any help would be much appreciated.

Before you put and overwrite, retrieve the existing value.
If you get null, there is no such value and you put as initially intended.
If you get something, merge your new data into something.

Based on input you gave (which could be more complete) it seems that you're trying to use wrong structure for your data. Assuming you want to merge value of the entry if both name and int1 exist, you should use Guava's Table instead. Let's consider this piece of code as an idea:
#Test
public void shouldMergeValues() throws SQLException {
given(resultSet.getString("name"))
.willReturn("name")
.willReturn("name")
.willReturn("name")
.willReturn("name2");
given(resultSet.getInt("key"))
.willReturn(1)
.willReturn(1)
.willReturn(2)
.willReturn(100);
given(resultSet.getInt("value"))
.willReturn(2)
.willReturn(40)
.willReturn(3)
.willReturn(200);
given(resultSet.next())
.willReturn(true)
.willReturn(true)
.willReturn(true)
.willReturn(true)
.willReturn(false);
Table<String, Integer, Integer> actionTable = HashBasedTable.create();
while (resultSet.next()) {
String name = resultSet.getString("name");
int int1 = resultSet.getInt("key");
int int2 = resultSet.getInt("value");
if (actionTable.contains(name, int1)) {
Integer oldInt2 = actionTable.get(name, int1);
actionTable.put(name, int1, oldInt2 + int2); // in this example "+" is the "merge" function
} else {
actionTable.put(name, int1, int2);
}
}
assertThat(actionTable) // {name={1=42, 2=3}, name2={100=200}}
.containsCell("name", 1, 42)
.containsCell("name", 2, 3)
.containsCell("name2", 100, 200)
.hasSize(3);
}
Given 4 rows: ("name", 1, 2), ("name", 1, 40), ("name", 2, 3) and ("name2", 100, 200) and example "merge" operation being addition, you'd get Table {name={1=42, 2=3}, name2={100=200}}. Please check if it fits your use case.

How to put all the rows in a HashMap in while loop iteration

Issue : There will 3 records in testcaseInputs . All the three records are iterated, but at end the "rows" map has only one record which is iterated in the last. I want rows map should contain all the three records.
Issue 2: The iteration takes record1, then record 2, record 3 .. again it takes record 1 or 3 for iteration. I don't know why.
public void addinputtosc() {
try {
Map<String, List<JsonNode>> testrecords = null;
Map<String, String> rows = new Hashmap<String, String>();
// this function takes the input sheet , sheet name and returns data in Map<String, List<JsonNode>> format.
testrecords = fetchScenariosData("C:\\testData.xlsx", "input", "inputParam");
Iterator<Map.Enry<String, List<JsonNode>>> entries = testRecords.entrySet().iterator();
while (entries.hasNext()) {
Map.Entry<String, List<JsonNode>> entry = entries.next();
String scenarioName = entry.getKey();
List<JsonNode> testcaseInputs = entry.getValue();
if (scenarioName.equalsIgnoreCase("TestCase1")) {
ListIterator<JsonNode> listIterator = testCaseInputs.listIterator();
while (listIterator.hasNext()) {
for (JsonNode tcinputs :testCaseInputs) {
String keyValue = tcinputs.toString();
String newKeyValue = keyValue.replaceAll("[{}]", "");
String[] keyValue1 = newKeyValue.split(",");
for (String j : keyValue1) {
String[] keyValueorg = j.split(":");
row.put(keyValueorg[0].substring(1, keyValueorg[0].length() - 1), keyValueorg[1].substring(1, keyValueorg[1].length() - 1));
}
}
}
}
}
} catch (exception e) {
e.printStackTrace();
}
}

Issue : There will 3 records in testcaseInputs . All the three records
are iterated, but at end the "rows" map has only one record which is
iterated in the last. I want rows map should contain all the three
records.
This is happening because of this line :
rows.put(keyValueorg[0].substring(1, keyValueorg[0].length() - 1), keyValueorg[1].substring(1, keyValueorg[1].length() - 1));
when you are procesing frist JsonNode suppose this as per your example
{"File Source Env.":"Unix","TC_ID":"tc1","File Path":"/tmp/test.dat","Date":"20190101"}
the HashMap rows will contain content as :
{Date=20190101, path=/tmp/test.dat, TC_ID=tc1, File Source Env.=Unix}
now when again this codeis executed for second JsonNode suppose this :
{"File Source Env.":"Unix-qa","TC_ID":"tc2","File Path":"/tmp/test1.dat","Date":"20190201"}
as per your code , keys which will be calculated for this new record (keyValueorg[0].substring(1, keyValueorg[0].length() - 1)) is same as the previous key values that are stored in hashmap i.e. Date, File Source Env, TC_ID, Path by the first record.
Since these key values are already present in hashmap there values get updated by new values which is property of PUT operation of HashMap(if key is there then it just override with new values else insert new key in map).
This process will continue and hence only last record values are seen in hashmap.
In order to keep all key-value pairs of all records in single hashmap you need to create different key for each record. Otherwise create a nested hashmap.

How to convert a List into a Map using List as key in Java

I read an Excel table containing four columns and create a List. Now, I'd like to use the first three columns as key and use the last column as value. I've seen similar questions asked, but in all those questions, either String or Integer is used as a key.
public class initial {
private int from;
private int to;
private int via;
private int cost;
//constructor
//set and get methods
//hashCode and equals methods
}
public class myTuple {
private int from;
private int to;
private int via;
}
//main function
//get the Excel Table as a list
ArrayList<initial> myList = new ArrayList<initial>();
for(int i= mySheet.getFirstRowNum()+1 ; i<= mySheet.getLastRowNum(); i++) {
initial e = new initial();
Row ro = mySheet.getRow(i);
for(int j = ro.getFirstCellNum(); j <= ro.getLastCellNum(); j++) {
Cell ce = ro.getCell(j);
switch(j) {
case 0:
e.setFrom((int) ce.getNumericCellValue());
break;
.....
case 3:
e.setCost((int) ce.getNumericCellValue());
break;
}
}
myList.add(e);
}
//Create map
Map<myTuple, Integer> myMap = new HashMap<>();
I do not know how to proceed after this point. I believe I should use something like;
Map<myTuple, Integer> myMap= myList.stream().collectC(ollectors.toMap(myList:: , myList::));
If someone could assist me, I'd really appreciate.
Also, if you believe that there is a more efficient way to perform this (e.g., the way I read my data and parse into a list, the way I convert the list into a map), please let me know. Even though it is not in the content of this question, if there is a better way to read a multi dimensional table and parse into a List as I do, I 'd love to hear that too. In the future, I will have a bigger tables with more columns. Hence, I'm not quite sure if going through every column with a switch statement is the way to go.

You can just create the map while looping.
Tuple key = new Tuple(row.getNum(0), row.getNum(1), row.getNum(2));
List<Integer> value = new ArrayList<>();
for (int cell = 3; cell < row.getCount(); cell++) {
value.add(row.getNum(cell));
}
Map.put(key,value);

The toMap collector needs 2 functions (1 to create a key & 1 to create a value). You can use lambdas (to extract the relevant fields from your source type):
Map<MyTuple, Integer> myMap = myList
.stream()
.collect(Collectors.toMap(
i -> new myTuple(i.from, i.to, i.via),
i -> i.cost
));
Your destination type "MyTuple" needs a constructor, equals, and hashcode.

Here is an example:
class Tuple implements Comparable<Tuple> {
Object one;
Object two;
Object three;
public Tuple(final Object one, final Object two, final Object three) {
this.one = one;
this.two = two;
this.three = three;
}
#Override
public int compareTo(final Tuple that) {
// TODO: Do your comparison here for the fields one, two and three
return 0;
}
}
Map<Tuple, Object> mapKeyedByCompositeTuple = new HashMap<>();
// TODO: Inside your loop
for (int i = 10; i > 0; i--) {
Tuple key = new Tuple("cell-one-value-" + i, "cell-two-value-" + i, "cell-three-value-" + i);
mapKeyedByCompositeTuple.put(key, "cell-four-value-" + i);
}
System.out.println(mapKeyedByCompositeTuple);
Hope that helps,
Cheers,
Michael

How can I break down one big arraylist into several arrraylists inside a hashmap?

Example: One query throws the next result set:
Name | Age | Grand Total
John Smith, 45, 1000
John Smith, 56, 800
John Smithers, 34, 500
John Smyth, 56, 500
John Smyth, 56, 1100
I want to separate this arraylist into three, and store them in a hashmap where the key is the client name.
I was thinking something like
Arraylist<Row> rows = dao.getClientActivity();
Map map = new HashMap<Clients Name, Clients Row>();
Arraylist<Row> = null;
for (Row row : rows){
if (map.get(row.clientName) == null) list = new ArrayList<Row>();
list.add(row);
if(map.get(row.clientName) == null) map.put(row.clientName, list);
}
The list will always be sorted by name.
Take the upper snippet as pseudo code, I don't have coding programs at home, I just got that off the top of my head, I think I tested something like that this friday but it only printed on row;
I don't know if there's a better way to do this, but this is the first thing I come up with.

Your map declaration should be as follows (assuming Row.clientName is String):
Map<String, List<Row>> map = new HashMap<String, List<Row>>();
And the for loop should look like as follows:
for (Row row : rows){
/*Get the list of rows for current client name.*/
List<Row> currRows = map.get(row.clientName);
if (currRows == null) {/*If not list exists for*/
currRows = new ArrayList<Row>(); /*create a new one*/
map.put(row.clientName, currRows); /*and put it in the map*/
}
currRows.add(row);/*add the current row to the list*/
}

I'm assuming that there is no way that you can change the input format.
I would suggest that you create a model to represent a client:
public class Client {
private final String name;
private final byte age; //Nobody should be older than 256
private final int total;
/* Construct model */
/* Getters/Functions */
}
I would also suggest that you create a factory method inside Client to create the class from your string input.
public static Client parseClient(String clientRep){
String[] clientData = clientRep.split(',');
Client newClient = new Client(); //TODO: Name conventionally.
newClient.name = clientData[0];
newClient.age = Byte.valueOf(clientData[1]);
newClient.total = Integer.valueOf(clientData[2]);
return newClient;
}
Now, you can add these to a map (Map<String, Client>).
String clientFromWherever = getWhateverDataFromWherever();
Map<String, Client> clientel = new HashMap<>();
Client addingToMap = Client.parseClient(clientFromWherever);
clientel.put(addingToMap.getName() /* or however the name should be got */, addingToMap);
That should do well enough.
=====
However - if you should not want to use the client object, I would suggest creating a Map<String, int[]> and storing that age and charge in the array. If your charges do not exceed Short.MAXVALUE use a short[]. Storing a large number of arraylists (or any complex collections) just to store that little amount of data is unecessary.
ArrayList<Row> rows = dao.getClientActivity();
Map<String, int[]> clientelData = new HashMap<>();
for(Row clientRow : rows) {
if (!map.containsKey(clientRow.clientName) {
int[] clientNumericalData = new int[2];
map.put(clientRow.clientName, clientNumericalData);
}
}

How to optimize the updating of values in an ArrayList<Integer>

I want to store all values of a certain variable in a dataset and the frequency for each of these values. To do so, I use an ArrayList<String> to store the values and an ArrayList<Integer> to store the frequencies (since I can't use int). The number of different values is unknown, that's why I use ArrayList and not Array.
Example (simplified) dataset:
a,b,c,d,b,d,a,c,b
The ArrayList<String> with values looks like: {a,b,c,d} and the ArrayList<Integer> with frequencies looks like: {2,3,2,2}.
To fill these ArrayLists I iterate over each record in the dataset, using the following code.
public void addObservation(String obs){
if(values.size() == 0){// first value
values.add(obs);
frequencies.add(new Integer(1));
return;//added
}else{
for(int i = 0; i<values.size();i++){
if(values.get(i).equals(obs)){
frequencies.set(i, new Integer((int)frequencies.get(i)+1));
return;//added
}
}
// only gets here if value of obs is not found
values.add(obs);
frequencies.add(new Integer(1));
}
}
However, since the datasets I will use this for can be very big, I want to optimize my code, and using frequencies.set(i, new Integer((int)frequencies.get(i)+1)); does not seem very efficient.
That brings me to my question; how can I optimize the updating of the Integer values in the ArrayList?

Use a HashMap<String,Integer>
Create the HashMap like so
HashMap<String,Integer> hm = new HashMap<String,Integer>();
Then your addObservation method will look like
public void addObservation(String obs) {
if( hm.contains(obs) )
hm.put( obs, hm.get(obs)+1 );
else
hm.put( obs, 1 );
}

I would use a HashMap or a Hashtable as tskzzy suggested. Depending on your needs I would also create an object that has the name, count as well as other metadata that you might need.
So the code would be something like:
Hashtable<String, FrequencyStatistics> statHash = new Hashtable<String, FrequencyStatistics>();
for (String value : values) {
if (statHash.get(value) == null) {
FrequencyStatistics newStat = new FrequencyStatistics(value);
statHash.set(value, newStat);
} else {
statHash.get(value).incrementCount();
}
}
Now, your FrequencyStatistics objects constructor would automatically set its inital count to 1, while the incrementCound() method would increment the count, and perform any other statistical calculations that you might require. This should also be more extensible in the future than storing a hash of the String with only its corresponding Integer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Values Clustering - java

Related

Merging values contained by key in a map (Java)

How to put all the rows in a HashMap in while loop iteration

How to convert a List into a Map using List as key in Java

How can I break down one big arraylist into several arrraylists inside a hashmap?

How to optimize the updating of values in an ArrayList<Integer>

Categories

Resources