I am working on application where I have to convert .Zip folder to array of byte and I am using Scala and Play framework.
As of now I'm using,
val byteOfArray = Source.fromFile("resultZip.zip", "UTF-8").map(_.toByte).toArray
But when I am performing operation with byteOfArray I was getting error.
I have printed byteOfArray and found the result as below
empty parser
can you please let me know is this the correct way to convert .zip to array of byte?
Also let me know if is there another good way to convert array of byte.
Your solution is incorrect. UTF-8 is a text encoding, and zip files are binary files. It might happen by accident that a zip file is a valid UTF-8 file, but even in this case UTF-8 can use multiple bytes for a single character which you'll then convert to a single byte. Source is only intended to work with text files (as you can see from the presence of encoding parameter, Char type use, etc.). There is nothing in the standard Scala library to work with binary IO.
If you really hate the idea of using Java standard library (you shouldn't; that's what any Scala solution is going to be based on, and it doesn't get less verbose than a single method call), use better-files (not tested, just based on README examples):
import better.files._
val file = File("resultZip.zip")
file.bytes.toArray // if you really need an Array and can't work with Iterator
but for this specific case it isn't a real win, you just need to add an extra dependency.
I mean a folder contains files and another folders having files in it
If you have a folder which contains .zip files and possibly some others in nested folders, you can get all of them with
val zipFiles = File(directoryName).glob("**/*.zip")
and then
zipFiles.map(_.bytes.toArray)
will give you a Seq[Array[Byte]] containing all zip files as byte arrays. Modify to taste if you need to use file names and/or paths, etc. in further processing.
Related
I have an arraylist of gzipped xml files. Is it possible to view and manipulate the contents of these xml files all without unzipping them and taking up disk space? If so, what would be the correct class(es) to use for this task?
I know I can create a gzipinputstream from a fileinputstream of the zip file but from there I'm not sure what to do. I have only this written:
GZIPInputStream in = new GZIPInputStream(new FileInputStream(zippedFiles.get(i)));
I need some way to parse text within the xml files and modify the xml itself but again, extracting all of them would take up too much disk space.
What exactly are you going to achieve? You can extract the file into memory using a ByteArrayOutputStream and convert it into a byte-Array that you forward to your XML parser library (converting it to String and passing that is not recommended as the encoding is specified inside the XML file itself and the conversion to String must therefore be done by the XML parser internally). Most XML parsers also support reading directly from any InputStream, so you could pass yours directly to it which will probably further reduce your memory consumption. Disk space will only be occupied when writing data back to it by simply reversing the described procedure. Still, as you directly replace the source file by overwriting it, there is nowhere any disk space wasted.
The fact that they're in a list doesn't change much, but no.
Ignoring compression, files are stored linearly on disks. You can append to them cheaply, you can replace bytes cheaply, but you can't replace sequences of different lengths (like replace("Testing Procedure Specification", "TPS")) without rewriting the file after the modified substring.
Gziping the file complicates things, but the same rule applies. In general, making arbitrary modifications to a file requires rewriting the file.
Your code for reading the files is on the right track, though. You can easily read through gziped files as streams and without having to decompress the entire file.
I have a server written in Java, that in a single request, gets a whole file from the client. The file is passed to the server as a list of bytes, and is finally represented in the java server as a byte array.
Is there some standard way / standard library that could tell whether a file represented by a byte array is a valid zip file?
Files are typically identified using magic numbers in the beginning of the file.
To make an educated guess about a given file Java has built-in method of detecting some file types: Files.probeContentType. Plus, there are various third party libraries: simplemagic or Apache Tika (which supports more than only magic numbers).
But content detection alone won't tell you whether the file is valid. For that, you'd need something that actually knows how to read Zip files, such as Java's ZipFile.
If you want to standard way to implement for this process, you can use serialization API.For that use following articles that found myself while searching about this topic.
Article 1 - javaworld
Article 2 - developer.com
Check Zip4j library. It is really easy to use and the ZipFile class has a isValidZipFile() method
The easiest way is to check the "PK" magic at the beginning of the byte array.
Something like this:
"PK".equals(new String(array, 0,2))
I AM NOT LOOKING FOR A CODE but and idea on how to approach the problem.
I have multiple text files with the following format
NAME_EMAIL_CONTROL_DATE.txt
NAME_EMAIL_CONTROL2_DATE.txt
I want to zip both the files given the DATE.
I am not sure how I can approach the problem.
If date is being stored at a specific constant spot on all the files (beginning of file, end of file) you can use a FileInputStream to read those specific bits into a buffer and check if the two contain the same data, which you could then continue to use said FileInputStream to read the contents of both into buffers and use a FileOutputStream to create your new combination file.
Assuming that what you mean is that the file NAMES all have dates in them, at the end of their filename 'stems'...
Write a function to make a list of all your files -- given a directory containing the files, use listFiles() to get a list of all of them and compare the date portion to whatever you want, ending up with a list.
Then for each such file, use the zip file creation facility in java to add each file.
If all of these are in one directory, the command line zip command to do this will be fairly trivial, the hardest part will be the regular expression for the filename.
There is a built-in method in the Java JDK that detects file types:
Files.probeContentType(Paths.get("/temp/word.doc"));
The javadoc says that a FileTypeDetector may examine the filename, or it may examine a few bytes in the file, which means that it would have to actually try to pull the file from a URL.
This is unacceptable in our app; the content of the file is available only through an InputStream.
I tried to step through the code to see what the JDK is actually doing, but it seems that it goes to FileTypeDetectors.defaultFileTypeDetector.probeContentType(path) which goes to sun.nio.fs.AbstractFileTypeDetector, and I couldn't step into that code because there's no source attachment.
How do I use JDK file type detection and force it to use file content that I supply, rather than having it go out and perform I/O on its own?
The docs for Files.probeContentType() explain how to plug in your own FileTypeDetector implementation, but if you follow the docs you'll find that there is no reliable way to ensure that your implementation is the one that is selected (the idea is that different implementations serve as fallbacks for each other, not alternatives). There is certainly no documented way to prevent the built-in implementation from ever reading the target file.
You can surely find a map of common filename extensions to content types in various places around the web and probably on your own system; mime.types is a common name for such files. If you want to rely only on such a mapping file then you probably need to use your own custom facility, not the Java standard library's.
The JDK's Files.probeContentType() simply loads a FileTypeDetector available in your JDK installation and asks it to detect the MIME type. If none exists then it does nothing.
Apache has a library called Tika which does exactly what you want. It determines the MIME type of the given content. It can also be plugged into your JDK to make your Files.probeContentType() function using Tika. Check this tutorial for quick code - http://wilddiary.com/detect-file-type-from-content/
If you are worried about reading the contents of an InputStream you can wrap it in a PushBackInputStream to "unread" those bytes so the next detector implementation can read it.
Usually binary file's magic numbers are 4 bytes so having a new PushBackInputStream(in, 4) should be sufficient.
PushBackInputStream pushbackStream = new PushBackInputStream(in, 4);
byte[] magicNumber = new byte[4];
//for this example we will assume it reads whole array
//for production you will need to check all 4 bytes read etc
pushbackStream.read(magicNumber);
//now figure out content type basic on magic number
ContentType type = ...
//now pushback those 4 bytes so you can read the whole stream
pushbackStream.unread(magicNumber);
//now your downstream process can read the pushbackStream as a
//normal InputStream and gets those magic number bytes back
...
Using a Java servlet, is it possible to detect the true file type of a file, regardless of its extension?
Scenario: You only allow plain text file uploads (.txt and .csv) The user takes the file, mypicture.jpg, renames it to mypicture.txt and proceeds to upload the file. Your servlet expects only text files and blows up trying to read the jpg.
Obviously this is user error, but is there a way to detect that its not plain text and not proceed?
You can do this using the builtin URLConnection#guessContentTypeFromStream() API. It's however pretty limited in content types it can detect, you can then better use a 3rd party library like jMimeMagic.
See also:
Best way to determine file type in Java
When do browsers send application/octet-stream as Content-Type?
No. There is no way to know what type of file you are being uploaded. You must make all verifications on the server before taking any actions with the file.
I think you should consider why your program might blow up when give a JPEG (say) and make it defensive against this. For example a JPEG file is likely to have apparently very long lines (any LF of CR LF will be soemwhat randomly spread). But a so called text file could equally have long lines that might kill your program,
What exactly do you mean by "plain text file"? Would a file consisting of Chinese text be a plain text file? If you assume English text in ASCII or ANSI coding, you would have to read the full file as binary file, and check that e. g. all byte values are between, say, 32 and 127 plus 13, 10, 9, maybe.