Spark - Find rows with same ID but values with opposite sign

Spark - Find rows with same ID but values with opposite sign - java

I have a spark dataset with 2 columns - id & value.
It may have some values with same id but values with opposite sign (same absolute value). For example,
id
value
a
5
b
10
a
-5
b
10
a
5
b
10
a
-5
b
5
a
5
b
1
My use-case is to flag all such pairs of rows where ID is same but one value is positive and the other is negative (but absolute value is same). For example:
id
value
flag
a
5
true
b
10
true
a
-5
true
b
-10
true
a
5
true
b
10
false
a
-5
true
b
5
false
a
5
false
b
1
false
Please note that one positive value must be paired with at most one other negative value and vice versa.
I came across a solution in SQL (might need some modifications but the idea is similar): Need to display records which has positive and negative value
But since I’m new to spark, I’m not able to convert it into an equivalent spark code. Any help would be highly appreciated.
Thanks!

This would work:
df.alias("t1").join(df.alias("t2"), (col("t1.id")===col("t2.id")) && (col("t1.value")===col("t2.value").*(-1)), "left")
.withColumn("Flag", when(col("t2.id").isNull, "false").otherwise("true"))
.select("t1.*", "Flag")
.show()
Input:
Output:

Related

How do I represent potency in Java with all zeros? [duplicate]

What function does the ^ (caret) operator serve in Java?
When I try this:
int a = 5^n;
...it gives me:
for n = 5, returns 0
for n = 4, returns 1
for n = 6, returns 3
...so I guess it doesn't perform exponentiation. But what is it then?

The ^ operator in Java
^ in Java is the exclusive-or ("xor") operator.
Let's take 5^6 as example:
(decimal) (binary)
5 = 101
6 = 110
------------------ xor
3 = 011
This the truth table for bitwise (JLS 15.22.1) and logical (JLS 15.22.2) xor:
^ | 0 1 ^ | F T
--+----- --+-----
0 | 0 1 F | F T
1 | 1 0 T | T F
More simply, you can also think of xor as "this or that, but not both!".
See also
Wikipedia: exclusive-or
Exponentiation in Java
As for integer exponentiation, unfortunately Java does not have such an operator. You can use double Math.pow(double, double) (casting the result to int if necessary).
You can also use the traditional bit-shifting trick to compute some powers of two. That is, (1L << k) is two to the k-th power for k=0..63.
See also
Wikipedia: Arithmetic shift
Merge note: this answer was merged from another question where the intention was to use exponentiation to convert a string "8675309" to int without using Integer.parseInt as a programming exercise (^ denotes exponentiation from now on). The OP's intention was to compute 8*10^6 + 6*10^5 + 7*10^4 + 5*10^3 + 3*10^2 + 0*10^1 + 9*10^0 = 8675309; the next part of this answer addresses that exponentiation is not necessary for this task.
Horner's scheme
Addressing your specific need, you actually don't need to compute various powers of 10. You can use what is called the Horner's scheme, which is not only simple but also efficient.
Since you're doing this as a personal exercise, I won't give the Java code, but here's the main idea:
8675309 = 8*10^6 + 6*10^5 + 7*10^4 + 5*10^3 + 3*10^2 + 0*10^1 + 9*10^0
= (((((8*10 + 6)*10 + 7)*10 + 5)*10 + 3)*10 + 0)*10 + 9
It may look complicated at first, but it really isn't. You basically read the digits left to right, and you multiply your result so far by 10 before adding the next digit.
In table form:
step result digit result*10+digit
1 init=0 8 8
2 8 6 86
3 86 7 867
4 867 5 8675
5 8675 3 86753
6 86753 0 867530
7 867530 9 8675309=final

As many people have already pointed out, it's the XOR operator. Many people have also already pointed out that if you want exponentiation then you need to use Math.pow.
But I think it's also useful to note that ^ is just one of a family of operators that are collectively known as bitwise operators:
Operator Name Example Result Description
a & b and 3 & 5 1 1 if both bits are 1.
a | b or 3 | 5 7 1 if either bit is 1.
a ^ b xor 3 ^ 5 6 1 if both bits are different.
~a not ~3 -4 Inverts the bits.
n << p left shift 3 << 2 12 Shifts the bits of n left p positions. Zero bits are shifted into the low-order positions.
n >> p right shift 5 >> 2 1 Shifts the bits of n right p positions. If n is a 2's complement signed number, the sign bit is shifted into the high-order positions.
n >>> p right shift -4 >>> 28 15 Shifts the bits of n right p positions. Zeros are shifted into the high-order positions.
From here.
These operators can come in handy when you need to read and write to integers where the individual bits should be interpreted as flags, or when a specific range of bits in an integer have a special meaning and you want to extract only those. You can do a lot of every day programming without ever needing to use these operators, but if you ever have to work with data at the bit level, a good knowledge of these operators is invaluable.

It's bitwise XOR, Java does not have an exponentiation operator, you would have to use Math.pow() instead.

XOR operator rule =>
0 ^ 0 = 0
1 ^ 1 = 0
0 ^ 1 = 1
1 ^ 0 = 1
Binary representation of 4, 5 and 6 :
4 = 1 0 0
5 = 1 0 1
6 = 1 1 0
now, perform XOR operation on 5 and 4:
5 ^ 4 => 1 0 1 (5)
1 0 0 (4)
----------
0 0 1 => 1
Similarly,
5 ^ 5 => 1 0 1 (5)
1 0 1 (5)
------------
0 0 0 => (0)
5 ^ 6 => 1 0 1 (5)
1 1 0 (6)
-----------
0 1 1 => 3

It is the XOR bitwise operator.

Lot many people have already explained about what it is and how it can be used but apart from the obvious you can use this operator to do a lot of programming tricks like
XORing of all the elements in a boolean array would tell you if the array has odd number of true elements
If you have an array with all numbers repeating even number of times except one which repeats odd number of times you can find that by XORing all elements.
Swapping values without using temporary variable
Finding missing number in the range 1 to n
Basic validation of data sent over the network.
Lot many such tricks can be done using bit wise operators, interesting topic to explore.

XOR operator rule
0 ^ 0 = 0
1 ^ 1 = 0
0 ^ 1 = 1
1 ^ 0 = 1
Bitwise operator works on bits and performs bit-by-bit operation. Assume if a = 60 and b = 13; now in binary format they will be as follows −
a = 0011 1100
b = 0000 1101
a^b ==> 0011 1100 (a)
0000 1101 (b)
------------- XOR
0011 0001 => 49
(a ^ b) will give 49 which is 0011 0001

As others have said, it's bitwise XOR. If you want to raise a number to a given power, use Math.pow(a , b), where a is a number and b is the power.

AraK's link points to the definition of exclusive-or, which explains how this function works for two boolean values.
The missing piece of information is how this applies to two integers (or integer-type values). Bitwise exclusive-or is applied to pairs of corresponding binary digits in two numbers, and the results are re-assembled into an integer result.
To use your example:
The binary representation of 5 is 0101.
The binary representation of 4 is 0100.
A simple way to define bitwise XOR is to say the result has a 1 in every place where the two input numbers differ.
With 4 and 5, the only difference is in the last place; so
0101 ^ 0100 = 0001 (5 ^ 4 = 1) .

It is the Bitwise xor operator in java which results 1 for different value of bit (ie 1 ^ 0 = 1) and 0 for same value of bit (ie 0 ^ 0 = 0) when a number is written in binary form.
ex :-
To use your example:
The binary representation of 5 is 0101.
The binary representation of 4 is 0100.
A simple way to define Bitwise XOR is to say the result has a 1 in every place where the two input numbers differ.
0101 ^ 0100 = 0001 (5 ^ 4 = 1) .

To perform exponentiation, you can use Math.pow instead:
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Math.html#pow%28double,%20double%29

As already stated by the other answer(s), it's the "exclusive or" (XOR) operator. For more information on bit-operators in Java, see: http://java.sun.com/docs/books/tutorial/java/nutsandbolts/op3.html

That is because you are using the xor operator.
In java, or just about any other language, ^ is bitwise xor,
so of course,
10 ^ 1 = 11.
more info about bitwise operators
It's interesting how Java and C# don't have a power operator.

It is the bitwise xor operator in java which results 1 for different value (ie 1 ^ 0 = 1) and 0 for same value (ie 0 ^ 0 = 0).

^ is binary (as in base-2) xor, not exponentiation (which is not available as a Java operator). For exponentiation, see java.lang.Math.pow().

It is XOR operator. It is use to do bit operations on numbers. It has the behavior such that when you do a xor operation on same bits say 0 XOR 0 / 1 XOR 1 the result is 0. But if any of the bits is different then result is 1.
So when you did 5^3 then you can look at these numbers 5, 6 in their binary forms and thus the expression becomes (101) XOR (110) which gives the result (011) whose decimal representation is 3.

As an addition to the other answers, it's worth mentioning that the caret operator can also be used with boolean operands, and it returns true (if and only if) the operands are different:
System.out.println(true ^ true); // false
System.out.println(true ^ false); // true
System.out.println(false ^ false); // false
System.out.println(false ^ true); // true

^ = (bitwise XOR)
Description
Binary XOR Operator copies the bit if it is set in one operand but not both.
example
(A ^ B) will give 49 which is 0011 0001

In other languages like Python you can do 10**2=100, try it.

How nextClearBit() of BitSet in java actually works?

This method in BitSet class is used to return the index of the first bit that is set to false
import java.util.BitSet;
public class BitSetDemo {
public static void main(String[] args) {
BitSet b = new BitSet();
b.set(5);
b.set(9);
b.set(6);
System.out.println(""+b);
System.out.println(b.nextClearBit(5));
System.out.println(b.nextClearBit(9));
}
}
Output :
{5, 6, 9}
7
10
In this code, 6 is set after 9 but it shows that the values are stored consecutively ((b.nextClearBit(5) returns next value which is 7). So, how BitSet store these values ?

The javadoc of nextClearBit says:
Returns the index of the first bit that is set to false that occurs on or after the specified starting index.
You have set 5, 6 and 9 to true. That means that starting from 5, the first index set to false is 7. And starting from 9, the first index set false is 10. Which according to your own output is also what is returned.
If you want to know how BitSet works and what it does, read its Javadoc and look at the source. It is included with the JDK.

BitSet uses bits to store the information, like this:
╔═══╦═══╦═══╦═══╦═══╦═══╦═══╦═══╦═══╦═══╦═══╗
Bits: ║ 0 ║ 1 ║ 0 ║ 0 ║ 1 ║ 1 ║ 0 ║ 0 ║ 0 ║ 0 ║ 0 ║
...╚═══╩═══╩═══╩═══╩═══╩═══╩═══╩═══╩═══╩═══╩═══╝
Position: 10 9 8 7 6 5 4 3 2 1 0
Whenever you use set(n) - it sets the bit in the corresponding position. The underlying implementation is with a series of longs - but for understanding the API, it's enough to imagine it as a long array of bits - zeros and ones - like in the drawing. It extends itself if it needs to.
When it needs to look for the next clear bit after 5, goes to the bit number 5, and starts searching until it reaches a zero. Actually, the implementation is a lot faster, relying on bit-manipulation tricks, but again, to understand the API, that's how you can imagine it.

Your question indicates you might have thought that the result of b.nextClearBit(i) was somehow affected by the order in which the different bits where set to true or false.
This is false because BitSet does not remember the order in which indices were given values.
next means "next in the order of the indices" and not "next in the order of having been values assigned".
b.nextClearBit(i) returns the smallest index j larger or equal than i for which b.get(i) == false.

BitSet is a set. The order (of insertion) is irrelevant. The method just gives the index of the next higher clear bit.
The internal implementation has been explained in a previous question. For each method, you can check the source. (The code may contain obscure "bit bashing" (also available in java.lang.Integer/java.lang.Long, may be implemented as intrinsics).)

How to create a random boolean with chances?

I saw the libgdx MathUtils.randomBoolean(chances), I guess this would help me but I'm not sure.
MathUtils.randomBoolean(10); // I'm not sure if this will give 10% chance?

MathUtils.randomBoolean(float chance) gives the true with probability given by the parameter. But the parameter chance can take value between 0 - 1, meaning that for example 0.1 gives 10% (0.1) probability of returning true.
Your example - 10 - would always result in true as it's bigger than 1.

Take a look at the LibGDX Javadocs regarding MathUtils:
randomBoolean
public static boolean randomBoolean(float chance)
Returns true if a random value between 0 and 1 is less than the specified value.
That means if you specify a number (passed as an argument), the method will return true if the randomly generated number (between 0 and 1) is less than the passed chance. In this case it would be:
MathUtils.randomBoolean(0.1);
This is because 0.1 is 10%, or 10/100. Thus, a random number between 0 and 1, if less than 0.1, will cause the method to return true.
Your code previously would always return true because a number between 0 and 1 is always less than 10.

excel VBA and statement with numbers

I'm re-writing an old vba app on excel to java, I found this lines:
a = b and (c-1)
b and c are numbers, so I tried doing and with numbers on the watch tab and I'm confused about the results
"5 and 3" equals 1
"3 and 4" equals 5
"123 And 55" equals 51
There's also a line with or:
2147455232 or &H80000000 equals -28416
this numbers are used to read a binary file that I need to load on java but can't make sense of those lines,
thanks in advance
Jose Suero

These are using And and Or as bitwise operators:
1 and 1 = 1
1 and 0 = 0 and 1 = 0 and 0 = 0
For something like 123 And 55: 123 = 1111011, 55 = 0110111 this is done component wise:
1111011
and 0110111
-------
0110011 <= 51
Or is also being used bitwise: 0 or 0 = 0 and 0 or 1 = 1 or 0 = 1 or 1 = 1
2147455232 or &H80000000 'a hex literal
in binary is:
10000000000000000000000000000000
Or 01111111111111111001000100000000
--------------------------------
= 11111111111111111001000100000000
This last number is 4294938880, which wraps around to 4294938880 - 2^32 = -28416 when stored in a long.
Java also comes equipped with bit-wise operators (& for And and | for Or) -- although what happens with signed numbers can be a bit tricky, as your example with Or shows. Make sure that you use compatible data types (if the VBA codes uses 32-bit signed integers then the Java code should as well).

What does this boolean "(number & 1) == 0" mean?

On CodeReview I posted a working piece of code and asked for tips to improve it. One I got was to use a boolean method to check if an ArrayList had an even number of indices (which was required). This was the code that was suggested:
private static boolean isEven(int number)
{
return (number & 1) == 0;
}
As I've already pestered that particular user for a lot of help, I've decided it's time I pestered the SO community! I don't really understand how this works. The method is called and takes the size of the ArrayList as a parameter (i.e. ArrayList has ten elements, number = 10).
I know a single & runs the comparison of both number and 1, but I got lost after that.
The way I read it, it is saying return true if number == 0 and 1 == 0. I know the first isn't true and the latter obviously doesn't make sense. Could anybody help me out?
Edit: I should probably add that the code does work, in case anyone is wondering.

Keep in mind that "&" is a bitwise operation. You are probably aware of this, but it's not totally clear to me based on the way you posed the question.
That being said, the theoretical idea is that you have some int, which can be expressed in bits by some series of 1s and 0s. For example:
...10110110
In binary, because it is base 2, whenever the bitwise version of the number ends in 0, it is even, and when it ends in 1 it is odd.
Therefore, doing a bitwise & with 1 for the above is:
...10110110 & ...00000001
Of course, this is 0, so you can say that the original input was even.
Alternatively, consider an odd number. For example, add 1 to what we had above. Then
...10110111 & ...00000001
Is equal to 1, and is therefore, not equal to zero. Voila.

You can determine the number either is even or odd by the last bit in its binary representation:
1 -> 00000000000000000000000000000001 (odd)
2 -> 00000000000000000000000000000010 (even)
3 -> 00000000000000000000000000000011 (odd)
4 -> 00000000000000000000000000000100 (even)
5 -> 00000000000000000000000000000101 (odd)
6 -> 00000000000000000000000000000110 (even)
7 -> 00000000000000000000000000000111 (odd)
8 -> 00000000000000000000000000001000 (even)
& between two integers is bitwise AND operator:
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1
So, if (number & 1) == 0 is true, this means number is even.
Let's assume that number == 6, then:
6 -> 00000000000000000000000000000110 (even)
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1 -> 00000000000000000000000000000001
-------------------------------------
0 -> 00000000000000000000000000000000
and when number == 7:
7 -> 00000000000000000000000000000111 (odd)
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1 -> 00000000000000000000000000000001
-------------------------------------
1 -> 00000000000000000000000000000001

& is the bitwise AND operator. && is the logical AND operator
In binary, if the digits bit is set (i.e one), the number is odd.
In binary, if the digits bit is zero , the number is even.
(number & 1) is a bitwise AND test of the digits bit.
Another way to do this (and possibly less efficient but more understandable) is using the modulus operator %:
private static boolean isEven(int number)
{
if (number < 0)
throw new ArgumentOutOfRangeException();
return (number % 2) == 0;
}

This expression means "the integer represents an even number".
Here is the reason why: the binary representation of decimal 1 is 00000000001. All odd numbers end in a 1 in binary (this is easy to verify: suppose the number's binary representation does not end in 1; then it's composed of non-zero powers of two, which is always an even number). When you do a binary AND with an odd number, the result is 1; when you do a binary AND with an even number, the result is 0.
This used to be the preferred method of deciding odd/even back at the time when optimizers were poor to nonexistent, and % operators required twenty times the number of cycles taken by an & operator. These days, if you do number % 2 == 0, the compiler is likely to generate code that executes as quickly as (number & 1) == 0 does.

Single & means bit-wise and operator not comparison
So this code checks if the first bit (least significant/most right) is set or not, which indicates if the number is odd or not; because all odd numbers will end with 1 in the least significant bit e.g. xxxxxxx1

& is a bitwise AND operation.
For number = 8:
1000
0001
& ----
0000
The result is that (8 & 1) == 0. This is the case for all even numbers, since they are multiples of 2 and the first binary digit from the right is always 0. 1 has a binary value of 1 with leading 0s, so when we AND it with an even number we're left with 0.

The & operator in Java is the bitwise-and operator. Basically, (number & 1) performs a bitwise-and between number and 1. The result is either 0 or 1, depending on whether it's even or odd. Then the result is compared with 0 to determine if it's even.
Here's a page describing bitwise operations.

It is performing a binary and against 1, which returns 0 if the least significant bit is not set
for your example
00001010 (10)
00000001 (1)
===========
00000000 (0)

This is Logical design concept bitwise & (AND)operater.
return ( 2 & 1 ); means- convert the value to bitwise numbers and comapre the (AND) feature and returns the value.
Prefer this link http://www.roseindia.net/java/master-java/java-bitwise-and.shtml

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spark - Find rows with same ID but values with opposite sign - java

This would work: df.alias("t1").join(df.alias("t2"), (col("t1.id")===col("t2.id")) && (col("t1.value")===col("t2.value").(-1)), "left") .withColumn("Flag", when(col("t2.id").isNull, "false").otherwise("true")) .select("t1.", "Flag") .show() Input: Output:

Related

How do I represent potency in Java with all zeros? [duplicate]

How nextClearBit() of BitSet in java actually works?

How to create a random boolean with chances?

excel VBA and statement with numbers

What does this boolean "(number & 1) == 0" mean?

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spark - Find rows with same ID but values with opposite sign - java

This would work: df.alias("t1").join(df.alias("t2"), (col("t1.id")===col("t2.id")) && (col("t1.value")===col("t2.value").*(-1)), "left") .withColumn("Flag", when(col("t2.id").isNull, "false").otherwise("true")) .select("t1.*", "Flag") .show() Input: Output:

Related

How do I represent potency in Java with all zeros? [duplicate]

How nextClearBit() of BitSet in java actually works?

How to create a random boolean with chances?

excel VBA and statement with numbers

What does this boolean "(number & 1) == 0" mean?

Categories

Resources

This would work: df.alias("t1").join(df.alias("t2"), (col("t1.id")===col("t2.id")) && (col("t1.value")===col("t2.value").(-1)), "left") .withColumn("Flag", when(col("t2.id").isNull, "false").otherwise("true")) .select("t1.", "Flag") .show() Input: Output: