a very simple implementation of an onion router

a very simple implementation of an onion router - java

I want to write a very simple implementation of an onion router in Java (but including chaum mixes) - a lot of the public / private key encryption seems pretty straightforward, but struggling to understand how the last router would know that the final onionskin has been 'peeled'.
I was thinking of having some sort of checksum also encoded, so that each router tries a decryption with their private key, and if the checksum works - forwards the newly peeled onion to the next router.
Only this way, (assuming that some bit of the checksum are stripped every time a successful decryption occurs) there will be a way (looking at the checksum) to estimate how close it is to decryption -- this this a major vulnerability ? is the checksum methodology an appropriate simplification?

Irrespective of the problem you mention, it's generally good practice to include some integrity check whenever you encrypt/decrypt data. However, checksums aren't really suitable for this. Have a look at Secure Hash algorithms such as SHA-256 (there are implementations built into the standard Java cryptography framework).
Now, coming back to your original question... To each node of the onion, you're going to pass an encrypted "packet", but that packet won't just include the actual data to pass on-- it'll include details of the next node, your hash code, and whatever else... including whatever flag/indication to say whether the next "node" is an onion router or the actual end host. Indeed the data for the last node will have to have some special information, namely the details of the actual end host to communicate with. In other words, the last node knows the onion has been peeled because you encode this fact in the data it ends up receiving.
Or at least, I think that's how I'd do it... ;-)
N.B. The encryption per se isn't that complicated I don't think, but there may be one or two subtleties to be careful of. For example, in a normal single client-server conversation, one subtlety you have to be careful of is to never encrypt the same block of data twice with the same key (or at least, that's what it boils down to-- research "block modes" and "initialisation vectors" if you're not familiar with this concept). In a single client-server conversation the client and server can dictate parts of the initialisation vector. In an onion router, some other solution will have to be found (at worst, using strongly-generated random numbers generated by the client alone, I suppose).

You could hide the number of checksums by storing them in a cyclic array, whose initial offset is chosen at random when the onion in constructed. Equivalently, you could cyclically shift that array after every decryption.

Related

Obfuscation to hide some algorithm

I am working on a java client/server application. Every user must be able to create and modify files (containing some sensitive data) through the client application (marking them with a digital signature) or manually (marking them with a 99.99999% chance wrong signature). The signature does not use client identity, only the content of the file, which mean two distant clients creating the exact same file would end up with two files with the exact same signature).
After doing pros and cons, I ended up thinking about using obfuscation to protect from malicious users than would use reverse-engineering to find the algorithm delivering digital signature for a given file.
But if I've understood it correctly, obfuscation makes code harder to read for human, harder to understand, but my goal is more about hiding the algorithm behind digital signature. Any idea on how to make it:
Hard to read?
Hard to find?
At the moment my idea are:
Using very random names and some useless treatments
Putting it in a random class at a random place and using stuff from random places
Remove comments
Randomize
Also I'm not sure to understand how compiling and reverse engineering work.
When a code is compiled, I ever thought variables were nicknamed in the "method area", and that a reverse engineering would give us back a code with variables named a, b, c... etc. But it appears not to be the case and it makes sense now that I think about it, since reflection is possible in java, am I right on that last part?
To conclude, I'm not sure to understand how this would prevent user to reverse my code (except for variable names' part).

I ended up thinking about using obfuscation to protect from malicious users than would use reverse-engineering to find the algorithm delivering digital signature for a given file.
I think this is misguided for the following reasons.
There are a few well-known cryptographic hashing functions that are understood to be sufficiently secure against reverse engineering, given the current "state of the art" in cryptography. You can read about some of the common ones here:
https://en.wikipedia.org/wiki/Cryptographic_hash_function
You can combine a cryptographic hash function with public key encryption to provide digital signatures that are (should be) secure enough for your use-case. For example:
https://en.wikipedia.org/wiki/Digital_Signature_Algorithm
There are solid implementations of these technologies available for Java. There is no need to implement your own.
Designing and implementing your own digital signature algorithm is unwise. Unless you are an expert in the mathematics of cryptography, your algorithm is likely to have weaknesses that you are unaware of. And you are an expert, you will fully understand the difficulty in creating a strong system.
Obfuscation is not an adequate protection against reverse engineering to extract secrets (such as an algorithm) from code. Indeed, in the case of Java it is little more than a "speed bump" for a skilled hacker.
OK, I'm just struggling to understand how my app will be able to determine that the signature of "a" is equals to some word while a user can't find the same algorithm on the internet to do exactly the same and find the same signature.
You have a point. If the "text" that you are creating a hash for is known to be very short and/or easy to "guess", then it will be feasible to brute-force its hash, assuming that the algorithm is known. (For example, Gravatar's approach of using hashes of email addresses for privacy is flawed, because it is not hard to assemble a list of known email addresses, generate their hashes and store them in a database that can be queried.)
However, once you have gotten beyond a few tens of random bytes of data, or a few tens words of text, brute-force (and rainbow table) attacks become impractical. So, you can can start with your document, add an "envelop" with a timestamp, other identifying information, and (if necessary) some random junk to pad out the source text. Then hash the lot. The other end merely needs to repeat the process and see if they get the same hash.
(There is more stuff you need to do to create a full digital signature ... but read the link above.)

Let's clarify your misconceptions about obfuscation:
You don't do it on your source code. In the java world, if at all you obfuscate the binary delivery, in other words: your class files. Or to be precise: it is mostly about class file obfuscation, there are commercial tools for source code obfuscation.
Obfuscation is still used within the Android realm, but "pure" java shops, it is rarely used these days
And most importantly: "security by obscurity" rarely works.
If you are really serious about running your code at the client side (where you have no control over it), it might be better to do that part in native code, and to deliver machine compiled binaries for that part.

Compare 2 large string arrays between client/server

I have a big string array which has between 24-32 random characters (which include 0123456789abcdefghijklmnopqrstuvwxyz!##$%^&*()_+=-[]';/.,<>?}{). Some times the array is empty, but other times the array has more than 1000 elements inside it.
I send them to my client, which is a browser, via AJAX every time he requests them and I want to reload a part of my application only if that array is different. That means if there was a modification, adding/removing in said array. So I want to send the entire array, and some kind of hash of all the elements inside it. I can't use md5 or anything like that because the elements inside the array might move around.
What do you suggest I do? The server uses Java to serve pages.

Are you sure transmitting 1000 characters is actually a problem in your use case? For instance, this stackoverflow page is currently 17000 bytes large, and stackoverflow makes no effort to only transmit it if it has changed. Put differently, transmitting 1000 characters will take about 1000 bytes, or 1 ms on a 1 MBit connection (which is slow by modern standards ;-).
That said, transmitting data only if it has changed is such a basic optimization strategy that it has been incorporated into the HTTP standard itself. The HTTP standard describes both time based and etag based invalidation, and is implemented by virtually any software or hardware interacting using HTTP, including browsers and CDNs. To learn more, read an tutorial by Google or the normative specification.
You could be using time based invalidation, either by specifying a fixed lifetime or interpreting the If-Modified-Since header. You could also use an ETag that is not sensitive to ordering, by putting your elements into a particular order (e.g. through sorting) before hashing.

I would suggest a system that allows you to skip sending the strings altogether if the client has the latest version. The client keeps the version number (or hash code) of the latest version it received. If it hasn't received any strings yet, it can default to 0.
So, when the client needs to get the strings, it can say, "Give me the strings if the current version isn't X," where X is the version that the client currently has.
The server maintains a version number or hash code which it updates whenever the strings change. If it receives a request, and the client's version is the same as the current version, then the server returns a result that says, "You already have the current version."
The point here is twofold: prevent transmitting information that you don't need to transmit, and prevent the client from having to compute a hash code.
If the server needs to compute a hash at every request rather than just keeping a current hash code value, have the server sort the array of strings first, and then do an MD5 or CRC or whatever.

Java/Groovy: Non Deterministic Crypto Algorithm

I'm working on a Groovy application that has to provide users with a link containing encrypted query parameters. Currently we use the AES encryption algorithm with the same IV used for all links. We know this is bad (hence the reason we want to switch), but the reason this was done was to limit the size of the query parameters (including a base64 encoded 16 byte initialization vector with each query parameter makes the links very long). We'd like to switch to a non deterministic algorithm so that we have the required randomness in the query data but don't have to store the IV in the query parameters.
Since we're working with Groovy we can use anything from Java. Having not done much encryption work though I'm not sure which algorithm to start looking into. Ideally we'd want one that's available in Java SE or as a freely usable Java library. Also any links with details on how to implement these algorithms is highly appreciated.

The aim is to not use the same initialization vector twice. If you have something unique in the request that the server know when decrypting, you might use that as the initialization vector.
Otherwise, you could simply use a counter as the initialization vector. Keep a central counter that you increase each time you encrypt something, and put that counter in front of the encrypted data. Then expand this counter to a 16 byte initialization vector.

GAMEPROGRAMMING Client/Server communication through .... ? (JAVA)

I have a general question about programming the client/server communication on a network game.
I use TCP as protocol, and the communication ... works, but I'm not sure, if it is a efficient way.
In general , actions that happen on the client-side will go throught all this steps:
Some action (eg. a Fireball is cast)
[*]For this action i defined a string (eg. #F#270#130#, which means the 'F' says it's a fireball and 270 is (for example) the degree of the angle, 130 - the speed of the fireball that is shoot.)
String goes into outputpuffer of Client & waitingqueue
String is sent
String is received by server
[*] Server needs a lineinterpreter that can detect the meaning of the string (here : what means F? It is a fireball!) & adds a unique identity, based on, from which client the command was received.
[*]The Server needs to calculate logic, based on the action happened (fireball does damage to someone, does it hit someone (immediately) or does it just fly first?)
Server sends an (updated) string of the action(s) that occur to all clients. (eg. maybe the fireball is slowed down in speed for some reason - here will be an updated string (#F#12345#270#90# - 12345 is the unique player identity)
clients receive string
[*] clients resolve string to a command + handle it (fire an animationsequence...)
client that originaly sent the command compares received string with string in waitingqueue - when equal, do nothing (to smoothe out some action, otherwise through connection problems /delay, some action would occur twice or jump from location to location, based on ping
Is it really necessary to go through all these steps? At all steps marked with [*] i need to define new lineinterpreters/action for each command, so i'm coding each action twice, client & server-side.
I read something about sending serializable objects, but in genereal the idea seems to be the same to me, i send an object, that has to be interpreted+handled and i send an object back...
Any hints? To solve the whole communication more elegant, with less coding ? Or a bit more sorted - all these #F# #M# #H# tags for different actions are making it more and more complicated :)
(In fact i actually have the following handlers/actions:
-move
-look/rotate
-hpchange
-firearrow
-spawn/disconnect
...)
Hope you understand what I mean - I know, I could just continue coding like that, and it would work somehow, but it just seems too complicated as it could be.
Thanks!

You could do it in a more OO way if you:
Define an object called Action or something like that, which has all of the above parameters - type of action, direction of action (or
target), damage dealt, etc..
Create those Action objects as your game normally executes
Use ObjectOutputStream chained to your TPC Socket to output the whole Action object to the server/ pass it back to the client.
On the server, interpret what happens by examining the recieved object from ObjectInputStream.
I think this way would be cleaner and more flexible in case you add more logic, than just analyzing strings, but not as fast (since objects going into ObjectOutputStream need to be serialized).

You need to look at several factors before you decide if your game requires any changes.
Firstly, is TCP the best communication channel? Have you compared it to UDP. Would it be better to implement the networking using UDP? Would it matter if a few packets went missing? What happens if the network is slow?
Secondly, look at how often you are polling/pushing to the server. Can this be reduced? Does the game have to be in real-time. Do all parts of a game have to be in realtime. Perhaps certain things can be non-realtime. A fireball will continue in a straight path so you dont have to keep updating the server about its position, you can just tell it about its direction and speed. What other aspects of the game can be non real-time. The only thing that needs sending is, players locations and actions. Most other things like collision detection can be offloaded to the client.
Thirdly, does every keypress need to be sent to the server? If the user is against the wall and wants to move further, the client knows that they cannot and thus will not send those keypresses to the server. If the user has moved to a new valid location, then update the server. Can certain things be buffered and sent to the server in one go, instead of sending several queries to the server. i.e. if I move forward, jump and throw a fireball, thats 3 keypresses on the client side, can they be buffered and sent at the next 500th millisecond?
If you are worried about the networking overhead, you should be working at the bit level. Instead of sending a long string "#F#270#130#" - which is 11 bytes long, would it make sense to send 3 consecutive bytes (24 bits).
7 bits represent the action (127 different actions).
9 bits will represent the angle (1-512), but you only need it up to 0-360 degrees.
8 bits represent the force.
This or any other byte format is shorter and easier to use over the network, and produces tighter code. Also binary comparison is faster, so writing your action parser on the server is now made easier. I.e. instead of a large switch case looking for #F#, you just look at the first 7 bits and compare it to an int.
Can you reduce other networking overheads, instead of force being decided by the client, can the server decide this. i.e. a standard force, or 2 levels of force (much better as this can be represented by 1 bit). Which stops the client and malicious users sending rubbish data across to the server (like force of 999), now the force can either be a 0 or a 1, i.e. speed of 10 or 20, nothing silly.

Efficient way to extract protocol data units from tcp byte stream

I've had two java projects (simple multiplayer games) that relied on a byte-based connection-oriented protocol
for communication.
In both cases I was unhappy with the implementation of the communication, since I couldn't come up with an intelligent, non-verbose and object-orientied way of writing and especially parsing the bytes.
For writing I had something like
ProtocolDataUnitX pdux = new ProtocolDataUnitX("MyName", 2013);
int[] bytes = pdux.getBytes();
out.write(bytes); // surrounded with try/catch etc.
That was acceptable to some extent, since I had an AbstractPDU class with some byte conversion convenience methods. But I had to define the getBytes() method
for every protocol data unit (pdu).
My approach for parsing the incoming byte stream lacked even more innovation.
private InputStream in;
...
#Override
public void run() {
int c;
while ((c = in.read()) != -1)) {
if (c == 0x01) {
// 0x01 means we have pdu #1 and can continue reading
// since we know what is coming.
// after we have all bytes and know the pdu
// we can determine the paramters. I.e., every pdu has a
// reverse constructor: bytes -> pdu
}
QUESTION
How do you handle these situations? What are the best practises here? Some protocols have the total length field encoded, some not. Some protocol data units have variable length.
Is there a reasonable approach here? Maybe some kind of schema definition? I don't want to produce ugly and confusing code anylonger for this matter.

Summary: best practice is to use an existing, mature protocol compiler. Google protobufs is a popular choice.
Over the years, many protocol definition systems have been developed. Most of these include compilers which take a protocol description and produce client and server code, often in multiple languages. The existence of such a compiler is very helpful in projects which are not restricted to a single client (or server) implementation, since it allows other teams to easily create their own clients or servers using the standard PDU definitions. Also, as you've observed, making a clean object-oriented interface is non-trivial, even in a language like Java which has most of the features you would need.
The question of whether PDUs should have explicit length or be self-delimiting (say, with an end-indicator) is interesting. There are a lot of advantages to explicit length: for one thing, it is not necessary to have a complete parser in order to accept the PDU, which can make for much better isolation of deserialization from transmission. If a transmission consists of a stream of PDUs, the explicit length field makes error recovery simpler, and allows early dispatch of PDUs to handlers. Explicit length fields also make it easier to embed a PDU inside another PDU, which is often useful, particularly when parts of the PDU must be encrypted.
On the other hand, explicit length fields require that the entire PDU be assembled in memory before transmission, which is awkward for large PDUs and impossible for streaming with a single PDU. If the length field itself is of variable length, which is almost always necessary, then it becomes awkward to create PDU components unless the final length is known at the start. (One solution to this problem is to create the serialized string backwards, but that is also awkward, and doesn't work for streaming.)
By and large, the balance has been in favour of explicit length fields, although some systems allow "chunking". A simple form of chunking is to define a maximum chunk size, and concatenate successive chunks with the maximum size along with the first following chunk with a size less than the maximum. (It's important to be able to specify 0-length chunks, in case the PDU is an even multiple of the maximum size.) This is a reasonable compromise; it allows streaming (with some work); but its a lot more engineering effort and it creates a lot of corner cases which need to be tested and debugged.
One important maxim in designing PDU formats is that every option is a potential information leak. To the extent possible, try to make any given internal object have only a single possible serialization. Also, remember that redundancy has a cost: anywhere there is duplication, it implies a test for validity. Keeping tests to a minimum is the key to efficiency, particularly on deserialization. Skipping a validity test is an invitation to security attacks.
In my opinion making an ad hoc protocol parser is not usually a good idea. For one thing, it's a lot of work. For another thing, there are lots of subtle issues and its better to use a system which has dealt with them.
While I'm personally a fan of ASN.1, which is widely used particularly in the telecommunications industry, it is not an easy technology to fit into a small project. The learning curve is pretty steep and there are not as many open-source tools as one might like.
Currently, probably the most popular option is Google protobufs, which is available for C++, Java and Python (and a number of other languages through contributed plugins). It's simple, reasonably easy to use, and open source.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.