If I have 100 short XML files in folder and I'd like to know which of them contains text aaabbbccc (for later accurate parsing). Is this a good idea to read them as Strings one after other and try to use contains function to determine what files not contains this text?
As I know the contains function is very fast.
Related
I AM NOT LOOKING FOR A CODE but and idea on how to approach the problem.
I have multiple text files with the following format
NAME_EMAIL_CONTROL_DATE.txt
NAME_EMAIL_CONTROL2_DATE.txt
I want to zip both the files given the DATE.
I am not sure how I can approach the problem.
If date is being stored at a specific constant spot on all the files (beginning of file, end of file) you can use a FileInputStream to read those specific bits into a buffer and check if the two contain the same data, which you could then continue to use said FileInputStream to read the contents of both into buffers and use a FileOutputStream to create your new combination file.
Assuming that what you mean is that the file NAMES all have dates in them, at the end of their filename 'stems'...
Write a function to make a list of all your files -- given a directory containing the files, use listFiles() to get a list of all of them and compare the date portion to whatever you want, ending up with a list.
Then for each such file, use the zip file creation facility in java to add each file.
If all of these are in one directory, the command line zip command to do this will be fairly trivial, the hardest part will be the regular expression for the filename.
I am working on application where I have to convert .Zip folder to array of byte and I am using Scala and Play framework.
As of now I'm using,
val byteOfArray = Source.fromFile("resultZip.zip", "UTF-8").map(_.toByte).toArray
But when I am performing operation with byteOfArray I was getting error.
I have printed byteOfArray and found the result as below
empty parser
can you please let me know is this the correct way to convert .zip to array of byte?
Also let me know if is there another good way to convert array of byte.
Your solution is incorrect. UTF-8 is a text encoding, and zip files are binary files. It might happen by accident that a zip file is a valid UTF-8 file, but even in this case UTF-8 can use multiple bytes for a single character which you'll then convert to a single byte. Source is only intended to work with text files (as you can see from the presence of encoding parameter, Char type use, etc.). There is nothing in the standard Scala library to work with binary IO.
If you really hate the idea of using Java standard library (you shouldn't; that's what any Scala solution is going to be based on, and it doesn't get less verbose than a single method call), use better-files (not tested, just based on README examples):
import better.files._
val file = File("resultZip.zip")
file.bytes.toArray // if you really need an Array and can't work with Iterator
but for this specific case it isn't a real win, you just need to add an extra dependency.
I mean a folder contains files and another folders having files in it
If you have a folder which contains .zip files and possibly some others in nested folders, you can get all of them with
val zipFiles = File(directoryName).glob("**/*.zip")
and then
zipFiles.map(_.bytes.toArray)
will give you a Seq[Array[Byte]] containing all zip files as byte arrays. Modify to taste if you need to use file names and/or paths, etc. in further processing.
Here is the thing; I have to store simple data which I have to define once (manually). I must have a functionality to search in it after using keywords. It's like this:
My first item
title:my_title
description:my_description (long, few hundreds words)
keyword1:my_keyword1
keywordx:my_keywordx
And I want a lot of items like this. For example 100 or 1000.
And after in code I want to make search function to look for specific items (as result may be a few, not only one) based on keywords and show the result as text in TextView field for example.
Do You have any idea how I should storage this data? I would prefer .xml file (person who will create data is not a programmer, it'll be much easier for him).
Put your data in JSON format as a text file, create a "res" folder, and put that text file in there. From the "your activity", read this file from the raw folder and parse the JSON.
So I am working on a GAE project. I need to look up cities, Country Names and Country Codes for sign ups, LBS, ect ...
Now I figured that putting all the information in the Datastore is rather stupid as it will be used quite frequently and its gonna eat up my datastore quotations for no reason, specially that these lists arent going to change, so its pointless to put in datastore.
Now that leaves me with a few options:
API - No budget for paid services, free ones are not exactly reliable.
Upload Parse-able file - Favorable option as I like the certainty that the data will always be there.
So I got the files needed from GeoNames (link has source files for all countries in case someone needs it). The file for each country is a regular UTF-8 tab delimited file which is great.
However, now that I have the option to choose how to format and access the data, the question is:
What is the best way to format and retrieve data systematically from a static file in a Java servelet container ?
The best way being the fastest, and least resource hungry method.
Valid options:
TXT file, tab delimited
XML file Static
Java Class with Tons of enums
I know that importing country files as Java Enums and going through their values will be very fast, but do you think this is going to affect memory beyond reasonable limits ? On the other hand, every time I need to access a record, the loop will go through a few thousand lines until it finds the required record ... reading line by line so no memory issues, but incredibly slow ... I have had some experience with parsing an excel file in a Java servelet and it took something like 20 seconds just to parse 250 records, on large scale, response time WILL timeout (no doubt about it) so is XML anything like excel ??
Thank you very much guys !! Please provide opinions, all and anything is appreciated !
Easiest and fastest way would be to have the file as a static web resource file, under the WEB-INF folder and on application startup, have a context listener to load the file into memory.
In memory, it should be a Map, mapping from a key you want to search by. This will allow you like a constant access time.
Memory consumption would only matter if it is really big. A hundred thousand record for example not worth optimizing if you need to access this many times.
The static file should be plain text format or CSV, they are read and parsed most efficiently. No need XML formatting as parsing it would be slow.
If the list is really big, you can break it up into multiple, smaller files, and only parse those and only when they are required. A reasonable, easy partitioning would be to break it up by country, but any other partitioning would work (like based on its name using the first few characters from its name).
You could also consider building this Map in the memory once, and then serialize this map to a binary file, and include that binary file as a static resource file, and that way you would only have to deserialize this Map and would be no need to parse/process it as a text file and build objects yourself.
Improvements on the data file
An alternative to having the static resource file as a text/CSV file or a serialized Map
data file would be to have it as a binary data file where you could create your own custom file format.
Using DataOutputStream you can write data to a binary file in a very compact and efficient way. Then you could use DataInputStream to load data from this custom file.
This solution has the advantages that the file could be much less (compared to plain text / CSV / serialized Map), and loading it would be much faster (because DataInputStream doesn't use number parsing from a text for example, it reads the bytes of a number directly).
Hold the data in source form as XML. At start of day, or when it changes, read it into memory: that's the only time you incur the parsing cost. There are then two main options:
(a) your in-memory form is still an XML tree, and you use XPath/XQuery to query it.
(b) your in-memory form is something like a java HashMap
If the data is very simple then (b) is probably best, but it only allows you to do one kind of query, which is hard-coded. If the data is more complex or you have a variety of possible queries, then (a) is more flexible.
I have 2 xml files (one around 50 MB and other one 120K) which can be linked using a common field. I need to read the files, link them on the common field and produce a combined output in CSV.
Can someone please advise what would be the most efficient way to do that using JAVA ?
Here is a good example to look at ( http://www.developerfusion.com/code/2064/a-simple-way-to-read-an-xml-file-in-java/ ). You will have to tweak it some, but the point of the example shows you how to gain access to the file (in your case, files) and then processing each parent node, you can get the child nodes and do your processing, then write out a CSV. Let me know if you need more information or get stuck.