Detect if user changed file extension to upload? - java

Using a Java servlet, is it possible to detect the true file type of a file, regardless of its extension?
Scenario: You only allow plain text file uploads (.txt and .csv) The user takes the file, mypicture.jpg, renames it to mypicture.txt and proceeds to upload the file. Your servlet expects only text files and blows up trying to read the jpg.
Obviously this is user error, but is there a way to detect that its not plain text and not proceed?

You can do this using the builtin URLConnection#guessContentTypeFromStream() API. It's however pretty limited in content types it can detect, you can then better use a 3rd party library like jMimeMagic.
See also:
Best way to determine file type in Java
When do browsers send application/octet-stream as Content-Type?

No. There is no way to know what type of file you are being uploaded. You must make all verifications on the server before taking any actions with the file.

I think you should consider why your program might blow up when give a JPEG (say) and make it defensive against this. For example a JPEG file is likely to have apparently very long lines (any LF of CR LF will be soemwhat randomly spread). But a so called text file could equally have long lines that might kill your program,

What exactly do you mean by "plain text file"? Would a file consisting of Chinese text be a plain text file? If you assume English text in ASCII or ANSI coding, you would have to read the full file as binary file, and check that e. g. all byte values are between, say, 32 and 127 plus 13, 10, 9, maybe.

Related

How to create a binary file?

I want to create my own binary file.
When I've opened a .3ds file, at the beginning I've found :
MM����
������==���>=
�����������������Standard_1������ ������ ����� ������0����� ������#����0����
�A����0������P����0������R����0������S����0�����������������0�������������
������?��3���0����d������LUMBERJA.PNG�Q������S�
These letters are unreadable. I would like to make my own file which no one can read.
I want to make this in Java on Android. Note that I don't want to use Cypher.
Those letters are not unreadable.
Let's say you don't speak french. At all.
That doesn't make french text 'unreadable'... it is merely unreadable to you. It is perfectly readable to just about everybody who lives in France, for starters.
Your example is no different. It may not be readable to you, but anybody who knows the file format can write software that can read this data.
Making software that can read something, in a way that no other software can be written that can read it, is also impossible: Your software runs on a device you do not control. The owner of that device can sandbox your application and figure out the format. even if you try to encrypt the data, that won't work: The owner of the device can use tools to extract the key from your software.
If you want truly 'unreadable' data, either [A] have that data in a location where you can control who gets to read it (example: Host files on a server, not in the application), or [B] encrypt the data and have the user provide the key information for it (for example, a password).
If your aim is merely to be able to produce a binary file.. java's OutputStream can do that. Write whatever bytes you like.

Zip folder to Array of byte using Scala

I am working on application where I have to convert .Zip folder to array of byte and I am using Scala and Play framework.
As of now I'm using,
val byteOfArray = Source.fromFile("resultZip.zip", "UTF-8").map(_.toByte).toArray
But when I am performing operation with byteOfArray I was getting error.
I have printed byteOfArray and found the result as below
empty parser
can you please let me know is this the correct way to convert .zip to array of byte?
Also let me know if is there another good way to convert array of byte.
Your solution is incorrect. UTF-8 is a text encoding, and zip files are binary files. It might happen by accident that a zip file is a valid UTF-8 file, but even in this case UTF-8 can use multiple bytes for a single character which you'll then convert to a single byte. Source is only intended to work with text files (as you can see from the presence of encoding parameter, Char type use, etc.). There is nothing in the standard Scala library to work with binary IO.
If you really hate the idea of using Java standard library (you shouldn't; that's what any Scala solution is going to be based on, and it doesn't get less verbose than a single method call), use better-files (not tested, just based on README examples):
import better.files._
val file = File("resultZip.zip")
file.bytes.toArray // if you really need an Array and can't work with Iterator
but for this specific case it isn't a real win, you just need to add an extra dependency.
I mean a folder contains files and another folders having files in it
If you have a folder which contains .zip files and possibly some others in nested folders, you can get all of them with
val zipFiles = File(directoryName).glob("**/*.zip")
and then
zipFiles.map(_.bytes.toArray)
will give you a Seq[Array[Byte]] containing all zip files as byte arrays. Modify to taste if you need to use file names and/or paths, etc. in further processing.

How can I output data with special characters visible?

I have a text file that was provided to me and no one knows the encoding on it. Looking at it in a text editor, everything looks fine, aligned properly into neat columns.
However, I'm seeing some anomalies when I read the data. Even though, visually, the field "Foo" appears in the same columns in the text file (for instance, in columns 15-20), when I try to pull it out using substring(15,20) my data varies wildly. Sometimes I'll pull bytes 11-16, sometimes 18-23, sometimes 15-20...there's no consistency between records.
I suspect that there are some special chartacters, invisible to my text editor, but readable by (and counted in the index of) the String methods. Is there any way in Java to dump the contents of the file with any special characters visible so I can see what I need to Strings I need replace with regex?
If not in Java, can anyone recommed a tool that may be able to help me out?
I would start with having a look at the file directly. Any code adds a layer of doubt. Take a Total Commander (or equivalent on your platform), view the file (F3) and switch to hex mode. You suggest that the special characters behavior is not even consistent between lines, so you should get some visual clue about the format before you even attempt to fix it algorithmically.
Have you tried printing the contents of the file as individual integers or bytes? That way you can see if there are any hidden characters.

Get file's encoding in Java [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Java : How to determine the correct charset encoding of a stream
User will upload a CSV file to the server, server need to check if the CSV file is encoded as UTF-8. If so need to inform user, (s)he uploaded a wrong encoding file. The problem is how to detect the file user uploaded is UTF-8 encoding? The back end is written in Java. So anyone get the suggestion?
At least in the general case, there's no way to be certain what encoding is used for a file -- the best you can do is a reasonable guess based on heuristics. You can eliminate some possibilities, but at best you're narrowing down the possibilities without confirming any one. For example, most of the ISO 8859 variants allow any byte value (or pattern of byte values), so almost any content could be encoded with almost any ISO 8859 variant (and I'm only using "almost" out of caution, not any certainty that you could eliminate any of the possibilities).
You can, however, make some reasonable guesses. For example, a file that start out with the three characters of a UTF-8 encoded BOM (EF BB BF), it's probably safe to assume it's really UTF-8. Likewise, if you see sequences like: 110xxxxx 10xxxxxx, it's a pretty fair guess that what you're seeing is encoded with UTF-8. You can eliminate the possibility that something is (correctly) UTF-8 enocded if you ever see a sequence like 110xxxxx 110xxxxx. (110xxxxx is a lead byte of a sequence, which must be followed by a non-lead byte, not another lead byte in properly encoded UTF-8).
You can try and guess the encoding using a 3rd party library, for example: http://glaforge.free.fr/wiki/index.php?wiki=GuessEncoding
Well, you can't. You could show kind of a "preview" (or should I say review?) with some sample data from the file so the user can check if it looks okay. Perhaps with the possibility of selecting different encoding options to help determine the correct one.

How to retrieve forbidden characters for filenames, in Java?

There are some restricted characters (and even full filenames, in Windows), for file and directory names. This other question covers them already.
Is there a way, in Java, to retrieve this list of forbidden characters, which would be depending on the system (a bit like retrieving the line breaker chars)? Or can I only put the list myself, checking for the system?
Edit: More background on my particular situation, aside from the general question.
I use a default name, coming from some data (no real control over their content), and this name is given to a JFileChooser, as a default file name to use (with setSelectedFile()). However, this one truncates anything prior to the last invalid character.
These default names occasionally end with dates in a "mm/dd/yy" format, which leaves only the "yy", in the default name, because "/" are forbidden. As such, checking for Exceptions is not really an option there, because the file itself is not even created yet.
Edit bis: Hmm, that makes me think, if JFileChooser is truncating the name, it probably has access to a list of such characters, can be interesting to check that further.
Edit ter: Ok, checking sources from JFileChooser shows something completely simple. For the text field, it uses file.getName(). It doesn't actually check for invalid characters, it's simply that it takes the "/" as a path separator, and keeps only the end, the "actual filename". Other forbidden characters actually go through.
When it comes to dealing with "forbidden" characters I'd rather be overcautious and ban/replace all "special" characters that may cause a problem on any filesystem.
Even if technically allowed, sometimes those characters can cause weirdness.
For example, we had an issue where the PDF files were being written (successfully) to a SAN, but when served up via a web server from that location some of the characters would cause issues when we were embedding the PDF in an HTML page that was being rendered in Firefox. It was fine if the PDF was accessed directly and it was fine in other browser. Some weird error with how Firefox and Adobe Reader interact.
Summary: "Special" characters in file names -> weird errors waiting to happen
Ultimately, the only way to be sure is to use a white-list.
Having certain "forbidden characters" is just one of many things that can go wrong when creating a file (others are access rights and file and path name lengths).
It doesn't really make sense to try and catch some of these early when there are others you can't catch until you actually try to create the file. Just handle the exceptions properly.
Have you tried using File.getCanonicalPath and comparing it to the original file name (or whatever is retrieved from getAbsolutePath)?
This will not give you the actual characters, but it may help you in determining whether this is a valid filename in the OS you're running on.
Have a look at this link for some info on how to get the OS the application is running on. Basically you need to use System.getProperty("os.name") and do an equals() or contains() to find out the operating system.
Something to be weary of though is that knowing the OS does not necessarily tell you the underlying file system being used, for example a Mac can read and write onto the FAT32 file system.
source: http://www.mkyong.com/java/how-to-detect-os-in-java-systemgetpropertyosname/

Categories

Resources