the title actually tells the issue. And before you get me wrong, I DO NOT want to know how this can be done, but how I can prevent it.
I want to write a file uploader (in Java with JPA and MySQL database). Since I'm not yet 100% sure about the internal management, there is the possibility that at some point the file could be executed/opened internally.
So, therefor I'd be glad to know, what there is, an attacker can do to harm, infect or manipulate my system by uploading whatever type of file, may it be a media file, a binary or whatever.
For instance:
What about special characters in the file name?
What about manipulating meta data like EXIF?
What about "embedded viruses" like in an MP3 file?
I hope this is not too vague and I'd be glad to read your tips and hints.
Best regards,
Stacky
It's really very application specific. If you're using a particular web app like phpBB, there are completely different security needs than if you're running a news group. If you want tailored security recommendations, you'll need to search for them based on the context of what you're doing. It could range from sanitizing input to limiting upload size and format.
For example, an MP3 file virus probably only works on a few specific MP3 players. Not on all of them.
At any rate, if you want broad coverage from viruses, then scan the files with a virus scanner, but that probably won't protect you from things like script injection.
If your server doesn't do something inherently stupid, there should be no problem. But...
Since I'm not yet 100% sure about the internal management, there is the possibility that at some point the file could be executed/opened internally.
... this qualifies as inherently stupid. You have to make sure you don't accidently execute uploaded files (permissions on the upload directory are a starting point, limit the upload to specific directories etc.).
Aside from executing, if the server attempts any file type specific processing (e.g. make thumbnails of images) there is always the possibility that the processing can be attacked through buffer overflow exploits (these are specific for each type of software/library though).
A pure file server (e.g. FTP) that just stores/serves files is save (when there are no other holes).
Related
I'm required to create a part of a java program that logs all activity "secretly" that the user is doing with the program. purely to catch people trying to "cheat" the system. The thing is, multiple people will be using the same program on multiple computers
All the information needs to be written using PrintWriter to a single text file that will, later on, be used by an administrative part of the program.
PrintWriter printer = new PrintWriter(new FileWriter(serverFolderLocation + "\\LogUserInfo.txt", true));
It's expected that 50+- computers will be using this program, every few seconds an array of 7+ lines of text is expected to be written to that file every time a specific button is pressed on a computer
I know that writing text to a text file is extremely quick and this is unlikely to happen, but If 2 or more computers happens to write to the text file at the same time, while append is set to true, will data go missing? or will it append normally?
Is this even possible? 2+ devices writing data to a text file at different times?
Do note that it is important that all the data from all 50+ computers arrive at the destinated file.
If problems are likely to occur, what other methods can be used of doing something like this, other than setting up a dedicated database?
Setting append to true in the FileWriter:
append - boolean if true, then data will be written to the end of the file rather than the beginning.
If you've got 50 machines doing this all at the same time, there's going to be conflicts. Your computer complains when you try to modify a file that's being used by another process, don't go throwing in 50 more contenders.
You could try using Sockets.
Whichever machine holds that file, designate that as the 'server' and make the other machines 'clients'. Your clients send messages to the server, your server appends those messages to the file in a synchronised manner.
You could then prevent your ~50 clients from directly changing the logs file with some network & server security.
You are simply re-inventing the wheel here, but in a very wrong way. The other answer is correct: using a simple file, and having multiple distributed users write to the same file just screams for failure.
But I disagree with the idea to use sockets instead. That is really low level, and requires you to implement a lot of (complicated) things yourself. You will have to think about network issues, multi threading, locking, buffering, ...
Sure, if this is for education, then building something like that is a challenge. But if you the goal here is to come to a robust solution that just works, you should rather think about using some 3rd party off-the-shelf solution.
You could start reading here. And then pick a framework such as logback. Alternatively, you could look into messaging services, such as ActiveMQ, or RabbitMQ.
Again: creating and collection logs in distributed environments is A) hard to get right but B) a solved problem.
Which of these ways is better (faster, less storage)?
Save thousands of xyz.properties in every file — about 30 keys/values
One .properties file with all the data in it — about 30,000 keys/values
I think there are two aspects here:
As Guenther has correctly pointed out, dealing with files comes with overhead. You need "file handles"; and possible other data structures that deal with files; so there might many different levels where having one huge file is better than having many small files.
But there is also "maintainability". Meaning: from a developers point of view, dealing with a property file that contains 30 K key/values is something you really don't want to get into. If everything is in one file, you have to constantly update (and deploy) that one huge file. One change; and the whole file needs to go out. Will you have mechanisms in place that allow for "run-time" reloading of properties; or would that mean that your application has to shut down? And how often will it happen that you have duplicates in that large file; or worse: you put a value for property A on line 5082, and then somebody doesn't pay attention and overrides property A on line 29732. There are many things that can go wrong; just because of having all that stuff in one file; unable to be digested by any human being anymore! And rest assured: debugging something like that will be hard.
I just gave you some questions to think about; so you might want to step back to give more requirements from your end.
In any way; you might want to look into a solution where developers deal with the many small property file (you know, like one file per functionality). And then you use tooling to build that one large file used in the production environment.
Finally: if your application really needs 30K properties; then you should very much more worry about the quality of your product. In my eyes, this isn't a design "smell"; it sounds like a design fetidness. Meaning: no reasonably application should require 30K properties to function on.
Opening and closing 1000s of files is a major overhead with the operating system, so you'd probably best off with one big file.
In my folder assets/data, there are a lot of XML files containing static data for my app.
It's really easy for someone to retrieve an APK, modify a part of it and install on a device.
I would like to prevent users to alter my static data by checking the integrity of my assets/data folder.
Initially I was considering to use MD5 checksum, but it will probably be too slow for the amount of files I gonna have (50-100).
Do you have any suggestion?
Edit:
This app is a game with an XML file describing each level.
I'll describe how you can effectively protect against modification and repackaging, not how you can protect the assets on their own, although you could ultimately apply the same technique to encrypting them. It's imperfect, but you can make modification significantly more difficult.
You sign the application with a certificate. Although they can remove yours, noone else can produce the same certificate when putting it back together. You can therefore check the signature of the application at runtime, to make sure it's what you expect.
Here's some cheap and nasty code to do this:
PackageManager pm = context.getPackageManager();
PackageInfo info = pm.getPackageInfo( context.getPackageName(), PackageManager.GET_SIGNATURES );
if ( info.signatures[ 0 ].toCharsString().equals( YOUR_SIGNATURE ) )
{
//signature is OK
}
where YOUR_SIGNATURE is a constant, obtained from running this code on the signed app.
Now, there are two remaining problems that you have already hinted at:
how can you stop someone just modifying the constant in the source code to match their certificate, then repackaging and re-signing the app?
how can you stop someone finding the check method and removing it?
Answer to both: you can't, not absolutely, but you can do a pretty good job through obfuscation. The free Proguard, but more usefully the commercial Dexguard, are tools for doing this. You may baulk at the current €350 cost of the latter; on the other hand, I have tried to reverse engineer apps that are protected like this, and unless the stakes were very high, it isn't worth the trouble.
To an extent, you could also do the obfuscation for (1) yourself; have the signature 'constant' assembled at runtime through some complicated programmatic method that makes it difficult to find and replace.
(2) is really a software design issue; making it sufficiently complicated or annoying to remove the check. Obfuscation just makes it more difficult to find in the first place.
As a further note, you might want to look at whether stuff like Google Licensing gives you any protection in this area. I don't have any experience of it though, so you're on your own there.
Sort of an answer although it is in the negative.
If the person has your apk and has decoded it, then even if you used a checksum, they can just update the code portion with the new checksum. I don't think you can win this one. You can put a great deal of effort into protecting it but if you assume somebody can obtain and modify the apk, then they can also undo the protection. On my commercial stuff, I just try to make the decoding non-obvious but not bullet proof. I know anything more is not worth the effort or even possible.
Perhaps you could zip up the xml files and put it in the assets/data folder; and then do a checksum on that .zip. On the first run, you could unzip the files to get the .xml layouts. See Unzip file from zip archive of multiple files using ZipFile class for unzipping an archive.
Probably the most reliable way would be for the level XML data to be downloaded from a server when the app is started with a check of the time stamp and sizes of the level files. That also lets you provide updates to level data over time. Of course this means you have the added expense of a server to host which may be another problem.
Does anyone know any java libraries (open source) that provides features for handling a large number of files (write/read) from a disk. I am talking about 2-4 millions of files (most of them are pdf and ms docs). it is not a good idea to store all files in a single directory. Instead of re-inventing the wheel, I am hoping that it has been done by many people already.
Features I am looking for
1) Able to write/read files from disk
2) Able to create random directories/sub-directories for new files
2) Provide version/audit (optional)
I was looking at JCR API and it looks promising but it starts with a workspace and not sure what will be the performance when there are many nodes.
Edit: JCP does look pretty good. I'd suggest trying it out to see how it actually does perform for your use-case.
If you're running your system on Windows and noticed a horrible n^2 performance hit at some point, you're probably running up against the performance hit incurred by automatic 8.3 filename generation. Of course, you can disable 8.3 filename generation, but as you pointed out, it would still not be a good idea to store large numbers of files in a single directory.
One common strategy I've seen for handling large numbers of files is to create directories for the first n letters of the filename. For example, document.pdf would be stored in d/o/c/u/m/document.pdf. I don't recall ever seeing a library to do this in Java, but it seems pretty straightforward. If necessary, you can create a database to store the lookup table (mapping keys to the uniformly-distributed random filenames), so you won't have to rebuild your index every time you start up. If you want to get the benefit of automatic deduplication, you could hash each file's content and use that checksum as the filename (but you would also want to add a check so you don't accidentally discard a file whose checksum matches an existing file even though the contents are actually different).
Depending on the sizes of the files, you might also consider storing the files themselves in a database--if you do this, it would be trivial to add versioning, and you wouldn't necessarily have to create random filenames because you could reference them using an auto-generated primary key.
Combine the functionality in the java.io package with your own custom solution.
The java.io package can write and read files from disk and create arbitrary directories or sub-directories for new files. There is no external API required.
The versioning or auditing would have to be provided with your own custom solution. There are many ways to handle this, and you probably have a specific need that needs to be filled. Especially if you're concerned about the performance of an open-source API, it's likely that you will get the best result by simply coding a solution that specifically fits your needs.
It sounds like your module should scan all the files on startup and form an index of everything that's available. Based on the method used for sharing and indexing these files, it can rescan the files every so often or you can code it to receive a message from some central server when a new file or version is available. When someone requests a file or provides a new file, your module will know exactly how it is organized and exactly where to get or put the file within the directory tree.
It seems that it would be far easier to just engineer a solution specific to your needs.
Is there some library for using some sort of cursor over a file? I have to read big files, but can't afford to read them all at once into memory. I'm aware of java.nio, but I want to use a higher level API.
A little backgrond: I have a tool written in GWT that analyzes submitted xml documents and then pretty prints the xml, among other things. Currently I'm writing the pretty printed xml to a temp file (my lib would throw me an OOMException if I use plain Strings), but the temp file's size are approaching 18 megs, I can't afford to respond a GWT RPC with 18 megs :)
So I can have a widget to show only a portion of the xml (check this example), but I need to read the corresponding portion of the file.
Have you taken a look at using FileChannels (i.e., memory mapped files)? Memory mapped files allow you to manipulate large files without bringing the entire file into memory.
Here's a link to a good introduction:
http://www.developer.com/java/other/article.php/1548681
Maybe java.io.RandomAccessFile can be of use to you.
I don't understand when you ask for a "higher level API" when positioning the file pointer. It is the higher levels that may need to control the "cursor". If you want control, go lower, not higher.
I am certain that lower level Java io clases allow you to position yourself anywhere within any sized file without reading anything into memory until you want to. I know I have done it before. Try RandomAccessFile as one example.