Which of these ways is better (faster, less storage)?
Save thousands of xyz.properties in every file — about 30 keys/values
One .properties file with all the data in it — about 30,000 keys/values
I think there are two aspects here:
As Guenther has correctly pointed out, dealing with files comes with overhead. You need "file handles"; and possible other data structures that deal with files; so there might many different levels where having one huge file is better than having many small files.
But there is also "maintainability". Meaning: from a developers point of view, dealing with a property file that contains 30 K key/values is something you really don't want to get into. If everything is in one file, you have to constantly update (and deploy) that one huge file. One change; and the whole file needs to go out. Will you have mechanisms in place that allow for "run-time" reloading of properties; or would that mean that your application has to shut down? And how often will it happen that you have duplicates in that large file; or worse: you put a value for property A on line 5082, and then somebody doesn't pay attention and overrides property A on line 29732. There are many things that can go wrong; just because of having all that stuff in one file; unable to be digested by any human being anymore! And rest assured: debugging something like that will be hard.
I just gave you some questions to think about; so you might want to step back to give more requirements from your end.
In any way; you might want to look into a solution where developers deal with the many small property file (you know, like one file per functionality). And then you use tooling to build that one large file used in the production environment.
Finally: if your application really needs 30K properties; then you should very much more worry about the quality of your product. In my eyes, this isn't a design "smell"; it sounds like a design fetidness. Meaning: no reasonably application should require 30K properties to function on.
Opening and closing 1000s of files is a major overhead with the operating system, so you'd probably best off with one big file.
Related
i got three files for internationalization: messages_es.properties, messages_en.properties and messages_pt.properties, those files follow the rule:
message1=value
message2=value2
and it's values changes according the file. example:
messages_en.properties:
hello=welcome
messages_pt.properties:
hello=bem vindo
the problem is, along the project construction those files becames inconsistent, like, lines that exists in one file doesn't exist on the others, the lines are not ordened in these files... i want to know if there is some way to easy rearrange and format those i18n files so the lines that exists in one file and don't exists in the other should be copied and the lines be ordered equals?
Interesting question, you are dealing with text files so there are a lot of possible options to manage this situation but depends on your scenario (source control, ide, etc).
If your are using Eclipse check: http://marketplace.eclipse.org/content/eclipse-resourcebundle-editor
And for IntelliJ: https://www.jetbrains.com/idea/features/i18n_support.html
Yes, the messages should usually appear in each file, unless there's a default message for some key that doesn't need translating (perhaps technical terms). Different IDEs have different support for managing message files.
As far as ordering the messages, there's no technical need to do so, but it can help the human maintainers. Any text-editor's sort routine will work just fine.
The NetBeans IDE has a properties editor across languages, displaying them side-by-side in a matrix. Similarly there are stand-alone editors that allow to do this. One would assume that such an editor would keep the source text synchronized and in one consistent layout.
First go looking for a translator's editor that can maintain a fixed layout. A format like gettext (.po/.pot) which is similar to .properties might be a better choice, depending on the tool.
For more than three languages it would make sense to use a source format more directed at translators, like the XML format xliff (though .properties are well known). And generate from this source (via XSLT perhaps) the several .properties files, or even ListResourceBundles.
The effort for i18n should not stop at providing a list of phrases to
translate, but some info where needed (disambiguating note), and maybe
even a glossary for a consistent use of the same term. The text
presented to the user is a very significant of the products quality
and appeal. Using different synonyms may make the user-interface
fuzzy, needlessly unclear, tangled.
The problem you are facing is invalid Localization process. It has nothing to do with properties files and it is likely that you shouldn't even compare these files now (that is until you fix the process).
To compare properties files, you can use very simple trick: sort each one of them and use standard diff tool to show differences. Sure, you'll miss the comments and logical arrangement in the English file, but at least you can see what's going on. That could be done, but it is a lot of manual work.
Instead of manually fix the files, you should fix the broken process. The successful localization process is basically similar to this one:
Once English file is modified, send the English file for translation. By that I mean all the translations should be based on English file and the localization files should be recreated (stay tuned).
Use Translation Memory to fill up the translations you already have. This could be done by your translation service provider or yourself if you really know how to do it (guess what? it is difficult).
Have the translators translate strings that are missing.
Put localized file back.
Before releasing the software to public have somebody to walk the Linguistic Reviewer through the UI and correct mistranslations.
I intentionally skipped few steps (like localization testing, using pseudo-translations and searching for i18n defects, etc.), but if you use this kind of process, your properties files should always be in sync.
And now your question could be reduced to the one that was already asked (and answered):
Managing the localization of Java properties files.
Look at java.util.PropertyResourceBundle. It is a convenience class for reading a property file and you can obtain a Set<String> of the keys. This should help to compare the contents of several resource files.
But I think that a better approach is to maintain the n languages in a single file, e.g., using XML and to generate the resource files from a single source.
<entry>
<key>somekey</key>
<value lang="en">good bye</value>
<value lang="es">hasta luego</value>
</entry>
I recently started working on a POORLY designed and developed web application.. I am finding that it uses about 300 properties files, and all the properties files are being read somewhat like this:
Properties prop= new Properties();
FileInputStream fisSubsSysten = new FileInputStream("whatever.properties");
prop.load(fisSubsSysten);
That is, it is reading the properties files from current working directory.. Another problem is the developers have chosen to use the above lines multiple times within the same java file. For example if there are 10 methods, each method will have the above code instead of having one method and calling it wherever necessary..
This means, we can NEVER change the location of the properties files, currently they are directly under the websphere profiles directory, isn't this ugly? If I move them somewhere else, and set that location in classpath, it does not work.
I tried changing the above lines like this using Spring IO utils library:
Resource resource = new ClassPathResource("whatever.properties");
Properties prop = PropertiesLoaderUtils.loadProperties(resource);
But this application has over 1000 files, and I am finding it impossible to change each file.. How would you go about refactoring this mess? Is there any easy way around?
Thanks!
In these cases of "refactoring" i use a simple find and replace approach. Notepad++ has a " find in files" feature but there are plenty of similar programs.
Create a class which does the properties loading with a method probably with a name parameter for the property file.
This can be a java singleton or a spring bean.
Search and replace all "new Properties()" lines with an empty line.
Replace all "load..." lines with a reference to your new class/ method. Notepad++ supports regex replacement, so you can use the filename as a parameter.
Once this is done go to eclipse and launch a "cleanup" or "organize imports" and fix some compile errors manually if needed.
This approach is quite straight forward and takes no more than 10min if you are lucky or 1 hour if you are unlucky, f.e. the code formatting is way of and each file looks different.
You can make your replace simpler if you format the project once before with a line length of 300 or more so each java statement is on one line. This makes find and replace a bit easier as you dont have newlines to consider.
I can only agree that I find your project a bit daunting, from your reference.
However, the choice of how to maintain or improve of it is a risk that merely needs to be assessed and prioritised.
Consider building a highrise and subsequently realising the bolts that holds the infrastructure have a design flaw. The prospect of replacing them all is indeed daunting as well, so considerations into how to change them and if they really, really needs to be replaced, few, many or all.
I assume it must be a core system for the company, which somebody built and they have probably left the project (?), and you have consideration about improvement or maintaining them. But again, you must assess whether it really is important to move your property files, or if you can just for instance use symbolic links in your file system. Alternatively, do you really need to move them all or is there just a few that would really benefit from being moved. Can you just mark all places in the code with a marker to-be-fixed-later. I sometimes mark bad classes with deprecation, and promise to fix affected classes but postpone until I have other changes in those classes until finally the deprecated class can be safely removed.
Anyway you should assess your options, leave files, replace all or partials, and provide some estimation of cost and consequences, and ask your manager which course to take.
Just note that always overestimate the solution you don't want to do, as you would be twice as likely to stop for coffee breaks, and a billboard of told-you-so's is a great leverage for decision making :)
On the technology side of your question, regex search and replace is probably the only option. I would normally put configuration files in a place accessible by classpath.
You can try using eclipse search feature. For example if you right click on load() method of the properties class and select References -> Project it will give you all location in your project where that method is used.
Also from there maybe you can attempt a global regex search and replace.
the title actually tells the issue. And before you get me wrong, I DO NOT want to know how this can be done, but how I can prevent it.
I want to write a file uploader (in Java with JPA and MySQL database). Since I'm not yet 100% sure about the internal management, there is the possibility that at some point the file could be executed/opened internally.
So, therefor I'd be glad to know, what there is, an attacker can do to harm, infect or manipulate my system by uploading whatever type of file, may it be a media file, a binary or whatever.
For instance:
What about special characters in the file name?
What about manipulating meta data like EXIF?
What about "embedded viruses" like in an MP3 file?
I hope this is not too vague and I'd be glad to read your tips and hints.
Best regards,
Stacky
It's really very application specific. If you're using a particular web app like phpBB, there are completely different security needs than if you're running a news group. If you want tailored security recommendations, you'll need to search for them based on the context of what you're doing. It could range from sanitizing input to limiting upload size and format.
For example, an MP3 file virus probably only works on a few specific MP3 players. Not on all of them.
At any rate, if you want broad coverage from viruses, then scan the files with a virus scanner, but that probably won't protect you from things like script injection.
If your server doesn't do something inherently stupid, there should be no problem. But...
Since I'm not yet 100% sure about the internal management, there is the possibility that at some point the file could be executed/opened internally.
... this qualifies as inherently stupid. You have to make sure you don't accidently execute uploaded files (permissions on the upload directory are a starting point, limit the upload to specific directories etc.).
Aside from executing, if the server attempts any file type specific processing (e.g. make thumbnails of images) there is always the possibility that the processing can be attacked through buffer overflow exploits (these are specific for each type of software/library though).
A pure file server (e.g. FTP) that just stores/serves files is save (when there are no other holes).
I am writing an application which will search for for files with special filename extension on computer. (JPG for example). Input data: "D:", ".JPG" Output: txt file with results(file directories); I know one simple reccursive algo, but may be there is smth better. So, may be you tell me an efficient algorithm to traverse the file directory. Also I want to use multithreading for solving this problem to make better performance. But how many threads should I use? If I will use 1 thread for 1 directory - this will be stupid.
The recursive option you name is the only way to go, unless you want to get your hands dirty with the file system. I suspect you don't.
Regarding thread performance, your best choice is to make the number of threads configurable, create some sample directories, and measure performance for each setting.
By the way, most file-finders create an index of files. They scan the disc on a schedule, and update a file which contains the relevant information about the files and directories on disk. The file is in a format designed to facilitate searching. This index file is used to perform actual searches. If you're planning on repeatedly running this search against the same directory, you should do this.
Does anyone know any java libraries (open source) that provides features for handling a large number of files (write/read) from a disk. I am talking about 2-4 millions of files (most of them are pdf and ms docs). it is not a good idea to store all files in a single directory. Instead of re-inventing the wheel, I am hoping that it has been done by many people already.
Features I am looking for
1) Able to write/read files from disk
2) Able to create random directories/sub-directories for new files
2) Provide version/audit (optional)
I was looking at JCR API and it looks promising but it starts with a workspace and not sure what will be the performance when there are many nodes.
Edit: JCP does look pretty good. I'd suggest trying it out to see how it actually does perform for your use-case.
If you're running your system on Windows and noticed a horrible n^2 performance hit at some point, you're probably running up against the performance hit incurred by automatic 8.3 filename generation. Of course, you can disable 8.3 filename generation, but as you pointed out, it would still not be a good idea to store large numbers of files in a single directory.
One common strategy I've seen for handling large numbers of files is to create directories for the first n letters of the filename. For example, document.pdf would be stored in d/o/c/u/m/document.pdf. I don't recall ever seeing a library to do this in Java, but it seems pretty straightforward. If necessary, you can create a database to store the lookup table (mapping keys to the uniformly-distributed random filenames), so you won't have to rebuild your index every time you start up. If you want to get the benefit of automatic deduplication, you could hash each file's content and use that checksum as the filename (but you would also want to add a check so you don't accidentally discard a file whose checksum matches an existing file even though the contents are actually different).
Depending on the sizes of the files, you might also consider storing the files themselves in a database--if you do this, it would be trivial to add versioning, and you wouldn't necessarily have to create random filenames because you could reference them using an auto-generated primary key.
Combine the functionality in the java.io package with your own custom solution.
The java.io package can write and read files from disk and create arbitrary directories or sub-directories for new files. There is no external API required.
The versioning or auditing would have to be provided with your own custom solution. There are many ways to handle this, and you probably have a specific need that needs to be filled. Especially if you're concerned about the performance of an open-source API, it's likely that you will get the best result by simply coding a solution that specifically fits your needs.
It sounds like your module should scan all the files on startup and form an index of everything that's available. Based on the method used for sharing and indexing these files, it can rescan the files every so often or you can code it to receive a message from some central server when a new file or version is available. When someone requests a file or provides a new file, your module will know exactly how it is organized and exactly where to get or put the file within the directory tree.
It seems that it would be far easier to just engineer a solution specific to your needs.