Hadoop MR hold array reference in reduce method - java

I would like to have an arrayList that holds reference to object inside the reduce function.
#Override
public void reduce( final Text pKey,
final Iterable<BSONWritable> pValues,
final Context pContext )
throws IOException, InterruptedException{
final ArrayList<BSONWritable> bsonObjects = new ArrayList<BSONWritable>();
for ( final BSONWritable value : pValues ){
bsonObjects.add(value);
//do some calculations.
}
for ( final BSONWritable value : bsonObjects ){
//do something else.
}
}
The problem is that the bsonObjects.size() returns the correct number of elements but all the elements of the list are equal to the last inserted element.
e.g. if the
{id:1}
{id:2}
{id:3}
elements are to be inserted the bsonObjects will hold 3 items but all of them will be {id:3}.
Is there a problem with this approach? any idea why this happens?
I have tried to change the List to a Map but then only one element was added to the map.
Also I have tried to change the declaration of the bsonObject to global but the same behavior happes.

This is documented behavior. The reason is that the pValues Iterator re-uses the BSONWritable instance and when it's value changes in the loop all references in bsonObjects ArrayList are updated as well. You're storing a reference when you call add() on bsonObjects. This approach allows Hadoop to save memory.
You should instantiate a new BSONWritable variable in that first loop that equals the variable value (deep copy). Then add the new variable into bsonObjects.
Try this:
for ( final BSONWritable value : pValues ){
BSONWritable v = value;
bsonObjects.add(v);
//do some calculations.
}
for ( final BSONWritable value : bsonObjects ){
//do something else.
}
Then you will be able to iterate through bsonObjects in the second loop and retrieve each distinct value.
However, you should also be careful -- if you make a deep copy all the values for the key in this reducer will need to fit in memory.

Related

Why can Java final array variable be modified?

I know that writing a lambda in Java 8 to use a variable requires final type, but why can a variable of final array type be modified?
public static void main(String[] args) {
final String[] prefix = {"prefix_"};
String suffix = "_suffix";
List<Integer> list = Arrays.asList(1005, 1006, 1007, 1009);
List<String> flagList = list.stream().map(param -> {
prefix[0] = "NoPrefix_";
String flag = prefix[0] + param + suffix;
return flag;
}).collect(Collectors.toList());
System.out.println(flagList);
System.out.println(prefix[0]);
}
result:
[NoPrefix_1005_suffix, NoPrefix_1006_suffix, NoPrefix_1007_suffix, NoPrefix_1009_suffix]
NoPrefix_
So a final array means that the array variable which is actually a reference to an object, cannot be changed to refer to anything else, but the members of the array can be modified
refer below link for more information.
https://www.geeksforgeeks.org/final-arrays-in-java/
As per description
for example
final String[] arr = new String[10];
list.stream().map(ele -> {
arr[0] = ele.getName(); // this will work as you are updating member of array
arr = new String[5]; // this will not work as you are changing whole array object instead of changing member of array object.
}).collect(Collectors.toList());
the same thing happen when you use any final collection there.
why can a variable of final array type be modified?
It can't, even though it looks like it can.
The declaration is below, which defines "prefix" as an array:
final String[] prefix
First, edit the code to include two println() calls right after the declaration, like this:
final String[] prefix = {"prefix_"};
System.out.println("prefix: " + prefix);
System.out.println(prefix[0]);
And at the end, add a second println() next to the one you already had, like this:
System.out.println("prefix: " + prefix);
System.out.println(prefix[0]);
If you run that code, you'll see that the object hashCode when printing prefix will be the same object each time. The thing that changes then is not what "prefix" references – that remains the same, it's the same array as before. Instead, what you're doing is changing something inside the array, which is different from the array itself.
Here are the relevant lines from a local run showing that the object reference remains the same, but the value for "prefix[0]" changes:
prefix: [Ljava.lang.String;#5ca881b5
prefix_
prefix: [Ljava.lang.String;#5ca881b5
NoPrefix_
If we try to assign an entirely new array to "prefix", then the compiler will show an error – it knows "prefix" was defined as final.
prefix = new String[]{"new"};
cannot assign a value to final variable prefix
If you're looking for a way to prevent changes to the data, and if it's possible to use a List (instead of an array), you could use
Collections.unmodifiableList():
Returns an unmodifiable view of the specified list. Query operations on the returned list "read through" to the specified list, and attempts to modify the returned list, whether direct or via its iterator, result in an UnsupportedOperationException.

Why when you set an ArrayList, there is no need to return a value?

I am new to java and I was writing some code to practice, but there is something that I am confused about. I have the following code:
public class test {
public static void main(String[]args) {
int n = 0;
ArrayList<String> fruits = new ArrayList();
setList(fruits);
n =setInt(9);
// get the values from fruits
for (String value: fruits) {
print(value);
}
}
public static void setList( ArrayList list) {
list.add("pear");
list.add("apple");
list.add("pear");
}
public static int setInt(int number) {
number = 3;
return number;
}
}
My question is why in order to set my Arraylist() there is no need to return the any value, but in order to set my int I need to return something.If run this code it prints all the values in my list, but I expected not to print anything because In my method setList I do not return any value. If I did not return any value with my setInt, the value of n would not change, and that makes sense to me.
Thank you.
There are different ways to that params get passed in functions. The usuall way that most beginners start with is pass by value. The other way is pass by reference. In passing by reference, the object itself is pass in, not a copy as is with pass by value. That means any changes will affect the param and remain, even after it is called. All objects in java are passed by reference, only primitives are passed by value. Thus, is why you don't have to return when using arraylist object.
Edit: Actually, I've made an error. What is actually occuring is that a copy of the reference itself is being passed by value. Take a look at this.
Everything in Java is Pass by Value.

java change variable argument value

I am not sure if I am doing some silly mistake, What I am trying to achieve is I have JSON list and I want to convert them into multiple objects depending on variable argument passed to function.
Unit u1= new Unit();
User us = new User();
//calling funtion
StaticUtil.MagicJsonMapper(list, u1,us);
System.out.println(u1.getUnitName()); //place -1 unitName is null after function call
Inside static class I have create a function
#SuppressWarnings("rawtypes")
public static void MagicJsonMapper(List list,Object... objects){
if(list.size()!= objects.length){
//TODO
System.out.println("parame`ter mismatch");
return;
}
int i=0;
ObjectMapper mapper = new ObjectMapper();
for(Object object : objects){
if(list.get(i) instanceof List){
MagicJsonMapper((List)list.get(i),object);
}
else{
objects[i] = mapper.convertValue(list.get(i), object.getClass());
}
i++;
}
//place -2 "objects" contains proper value of unitname
}
The issue is I am still not getting proper value in parameter after finished running this method. It means argument values are not retained as in contrast of normal java behaviour, is it something to do with variable argument.
Just for clarity I debugged the code and values are proper at the end of the function.
The objects array is created during the function call and discarded afterwards. If you need to access the values from the array after the call, you need to create the array explicitly.

use ints for prop.setProperty()

i'm trying to do a configurations file from an array of objects, where the properties are taken from a range of getters.
eg
prop.setProperty("Name", bugs[0].getName());
prop.setProperty("Species", bugs[0].getSpecies());
When i try, for example
prop.setProperty("Energy", bugs[0].getEnergy());
it says
The method setProperty(String, String) in the type Properties is not
applicable for the arguments (String, int)
How would I do method of setProperty(String, Int)?
EDIT*
Also, How do i write for an array of the objects, looping bugs[i] doesn't seem to work either.
for (int i = 0; i < bugs.length; i++) {
prop.setProperty("Name", bugs[i].getName());
prop.setProperty("Species", bugs[i].getSpecies());
prop.setProperty("X", String.valueOf(bugs[i].getX()));
prop.setProperty("Y", String.valueOf(bugs[i].getY()));
prop.setProperty("Energy", String.valueOf(bugs[i].getEnergy()));
prop.setProperty("Symbol", String.valueOf(bugs[i].getId()));
}
// save properties to project root folder
prop.store(output, null);
How would i make it show the values for all the bugs, its only showing the last one?
bugs[0].getEnergy() is giving integer value to you and you are setting integer value insted of string that why exception came in your code.
Try
prop.setProperty("Energy", String.valueOf(bugs[0].getEnergy()));
For second part of Question :
How do i write for an array of the objects, looping bugs[i] doesn't
seem to work either.
Your loop showing last value because your key of property is same, you are not changing the key assigning all values to same key so it is giving you last value.
Try something , It will create new keys
for (int i = 0; i < bugs.length; i++) {
prop.setProperty("Name"+i, bugs[i].getName());
prop.setProperty("Species"+i, bugs[i].getSpecies());
}
Mainly you can do this in two ways:
First way:
Overloading the setProperty(); metod. As it seems you decleared this method some where like this:
public void setProperty(String s1, String s2) {
// Doing some operations
}
Now, you can overload another method with the same name, but different signature to handle another kind of arguments (String - int), like this:
public void setProperty(String s1, int value) {
// Doing some operations suitable for this kind of method
}
Second way:
You can also easily convert the value of given integer to the string, by String.valueOf(); method, like this:
prop.setProperty("Energy", String.valueOf(bugs[0].getEnergy()));
EDITED:
In the loop you entered here, you're overwriting the values of prop object properties every time you loop through for.
First time the name of prop will be the bugs[0].getName(). In the next time, the last value will be deleted and bugs[1].getName() replaces it.
So if you want to store all of bugs array properties, you need an array of prop like objects (I don't know whats the type of prop but I assume it's Prop). So you need to write something like this:
Prop[] props = new Prop[bugs.length];
And then set properties of it's elements.
Also if you want to store all of properties in one object, you have to change the given key to setProperty(String, Int) method (as the String). So you can do something like this:
for (int i = 0; i < bugs.length; i++) {
prop.setProperty("Name " + i, bugs[i].getName());
prop.setProperty("Species " + i, bugs[i].getSpecies());
// An so
}
prop.store(output, null);

How to retrieve objects values stored in a Java ArrayList

ArrayList<yellowPage> ob1 = new ArrayList<yellowPage>();
yellowPage thing = new yellowPage(100,100);
thing.calc(i,y,s3);
ob1.add(thing);
I stored some data in thing. How can I retrieve the value stored in ob1.thing?
If you know the index, you can do yellowPage
yellowPage yp = ob1.get(index);
Otherwise you need to iterate over the list.
Iterator<yellowPate> iter = ob1.iterator();
while(iter.hasNext())
{
yellowPage yp = iter.next();
yp.whateverYouwantGet();
}
Note: I just typed code here, there may be syntax errors.
int x=5;
int info=ob1.get(x).getInfo();
The above example will get whatever information you wanted from your yellow pages class (by using a getter method) at the 6th index (because 0 counts) of your array list ob1. This example assumes you want an integer from the yellow page. You will have to create a getter method and change the x to the index of the yellow page you want to retrieve information from.
An example getter method (which you should put in your yellow pages class) could look like this:
public int getInfo() { return z; }
In the above case z may be an instance variable in your yellow pages class, containing the information you're looking for. You will most probably have to change this to suit your own situation.
If you wanted to get information from all yellow pages stored in the array list then you will need to iterate through it as Chrandra Sekhar suggested
Use an Iterator object to do this.
ArrayList<yellowPage> ob1 = new ArrayList<yellowPage>();
yellowPage thing = new yellowPage(100,100);
thing.calc(i,y,s3);
ob1.add(thing);
yelloPage retrievedThing = null;
Iterator<yelloPage> i = ob1.iterator();
if(i.hasNext()){
retrievedThing = i.next();
}
You could have the data stored in thing (horribly named variable) simply returned from the calc method. That way you don't need to maintain state for prior calculations in subsequent calls. Otherwise you just need a getter type method on the YellowPage class.
public class YellowPage {
private int result;
public void calc(...) {
result = ...
}
public int getResult() {
return result;
}
}
Print the list and override toString method.
public String toString()
{
return (""+ a+b); //Here a and b are int fields declared in class
}
System.out.print(ob1);
Class ArrayList<E>
Syntax
ArrayList<Integer> list = new ArrayList<Integer>();
You replace "Integer" with the class that the list is of.
An application can increase the capacity of an ArrayList instance before adding a large number of elements using the ensureCapacity operation. This may reduce the amount of incremental reallocation.
E represents an Element, which could be any class.
ensureCapacity is used to ensure that the list has enough capacity to take in the new elements. It's called internally every time you add a new item to the list. As the name suggests, ArrayList uses an Array to store the items. So when the array is initialized, it's given an arbitrary length, say 10. Now once you've added 10 items, if you go to add the 11th item, it'll crash because it exceeds the arrays capacity. Hence, ensureCapacity is called (internally) to ensure that there's enough space. So if you were adding the 11th element, the array size might be, say, doubled, to 20.

Categories

Resources