how to format i18n files?

how to format i18n files? - java

i got three files for internationalization: messages_es.properties, messages_en.properties and messages_pt.properties, those files follow the rule:
message1=value
message2=value2
and it's values changes according the file. example:
messages_en.properties:
hello=welcome
messages_pt.properties:
hello=bem vindo
the problem is, along the project construction those files becames inconsistent, like, lines that exists in one file doesn't exist on the others, the lines are not ordened in these files... i want to know if there is some way to easy rearrange and format those i18n files so the lines that exists in one file and don't exists in the other should be copied and the lines be ordered equals?

Interesting question, you are dealing with text files so there are a lot of possible options to manage this situation but depends on your scenario (source control, ide, etc).
If your are using Eclipse check: http://marketplace.eclipse.org/content/eclipse-resourcebundle-editor
And for IntelliJ: https://www.jetbrains.com/idea/features/i18n_support.html

Yes, the messages should usually appear in each file, unless there's a default message for some key that doesn't need translating (perhaps technical terms). Different IDEs have different support for managing message files.
As far as ordering the messages, there's no technical need to do so, but it can help the human maintainers. Any text-editor's sort routine will work just fine.

The NetBeans IDE has a properties editor across languages, displaying them side-by-side in a matrix. Similarly there are stand-alone editors that allow to do this. One would assume that such an editor would keep the source text synchronized and in one consistent layout.
First go looking for a translator's editor that can maintain a fixed layout. A format like gettext (.po/.pot) which is similar to .properties might be a better choice, depending on the tool.
For more than three languages it would make sense to use a source format more directed at translators, like the XML format xliff (though .properties are well known). And generate from this source (via XSLT perhaps) the several .properties files, or even ListResourceBundles.
The effort for i18n should not stop at providing a list of phrases to
translate, but some info where needed (disambiguating note), and maybe
even a glossary for a consistent use of the same term. The text
presented to the user is a very significant of the products quality
and appeal. Using different synonyms may make the user-interface
fuzzy, needlessly unclear, tangled.

The problem you are facing is invalid Localization process. It has nothing to do with properties files and it is likely that you shouldn't even compare these files now (that is until you fix the process).
To compare properties files, you can use very simple trick: sort each one of them and use standard diff tool to show differences. Sure, you'll miss the comments and logical arrangement in the English file, but at least you can see what's going on. That could be done, but it is a lot of manual work.
Instead of manually fix the files, you should fix the broken process. The successful localization process is basically similar to this one:
Once English file is modified, send the English file for translation. By that I mean all the translations should be based on English file and the localization files should be recreated (stay tuned).
Use Translation Memory to fill up the translations you already have. This could be done by your translation service provider or yourself if you really know how to do it (guess what? it is difficult).
Have the translators translate strings that are missing.
Put localized file back.
Before releasing the software to public have somebody to walk the Linguistic Reviewer through the UI and correct mistranslations.
I intentionally skipped few steps (like localization testing, using pseudo-translations and searching for i18n defects, etc.), but if you use this kind of process, your properties files should always be in sync.
And now your question could be reduced to the one that was already asked (and answered):
Managing the localization of Java properties files.

Look at java.util.PropertyResourceBundle. It is a convenience class for reading a property file and you can obtain a Set<String> of the keys. This should help to compare the contents of several resource files.
But I think that a better approach is to maintain the n languages in a single file, e.g., using XML and to generate the resource files from a single source.
<entry>
<key>somekey</key>
<value lang="en">good bye</value>
<value lang="es">hasta luego</value>
</entry>

Related

Save Properties File MOST EFFICIENT

Which of these ways is better (faster, less storage)?
Save thousands of xyz.properties in every file — about 30 keys/values
One .properties file with all the data in it — about 30,000 keys/values

I think there are two aspects here:
As Guenther has correctly pointed out, dealing with files comes with overhead. You need "file handles"; and possible other data structures that deal with files; so there might many different levels where having one huge file is better than having many small files.
But there is also "maintainability". Meaning: from a developers point of view, dealing with a property file that contains 30 K key/values is something you really don't want to get into. If everything is in one file, you have to constantly update (and deploy) that one huge file. One change; and the whole file needs to go out. Will you have mechanisms in place that allow for "run-time" reloading of properties; or would that mean that your application has to shut down? And how often will it happen that you have duplicates in that large file; or worse: you put a value for property A on line 5082, and then somebody doesn't pay attention and overrides property A on line 29732. There are many things that can go wrong; just because of having all that stuff in one file; unable to be digested by any human being anymore! And rest assured: debugging something like that will be hard.
I just gave you some questions to think about; so you might want to step back to give more requirements from your end.
In any way; you might want to look into a solution where developers deal with the many small property file (you know, like one file per functionality). And then you use tooling to build that one large file used in the production environment.
Finally: if your application really needs 30K properties; then you should very much more worry about the quality of your product. In my eyes, this isn't a design "smell"; it sounds like a design fetidness. Meaning: no reasonably application should require 30K properties to function on.

Opening and closing 1000s of files is a major overhead with the operating system, so you'd probably best off with one big file.

Drools: Adding and Removing Rules In Run Time

Is there any way to dynamically edit the rules loaded into Drools without reloading a new DRL file?
We are trying to use Drools as a rules engine, but in our use-case, rules are added and removed quite frequently, and we'd like to avoid having to reload the whole .drl file each time this happens.

The recommendation in the documentation is to spread your rules across multiple files:
https://docs.jboss.org/drools/release/5.2.0.Final/drools-expert-docs/html/ch05.html#d0e2785
...you are also able to spread your rules across multiple rule files (in that case, the extension .rule is suggested, but not required) - spreading rules across files can help with managing large numbers of rules.
Suggest you split your rules up into logical groups that change together, or one rule per file if such is more appropriate.

Validating Java MessageFormat strings

I'm working on a Play 2 app which is being translated. Play uses Java's MessageFormat behind the scenes so I have a fair number of property values, ala:
my.interface.key={0,choice,0#{0} families|1#1 family|1<{0,number,integer} families}
I just received back a translation of this in the form:
my.interface.key={0,choix,0#{0} familles|1#1 famille|1<{0,nombre,entier} familles}
If it's not obvious, some bits of that should not have been translated, but mistakes will happen from time to time. That's fair enough, but I'm sure there must be a way of validating these strings prior to my app crashing at runtime with a IllegalArgumentException: unknown format type at ... exception. Preferably with a Git commit hook, or even an SBT build task.
If I was to hack this up myself I would probably make a tool to read these property files and check that, for each value, running MessageFormat.format(value) doesn't blow up.
Ideally I could do this via a Perl (or Python) script. Sadly, the only non-Java library I can find - Text::MessageFormat on CPAN - doesn't seem to support the most error-prone formats, such as pluralisation.
Can anyone suggest a more sensible approach based on existing tooling before I dive in?

We had a similar problem. Our solution was to create classes that model the structure of the message format, then use XML to define the messages in our message bundle.
If the translator uses an XML editor then there is some hope they won't "break" the structure of the message.
See this answer for details.

Java webapplication - properties file nightmare

I recently started working on a POORLY designed and developed web application.. I am finding that it uses about 300 properties files, and all the properties files are being read somewhat like this:
Properties prop= new Properties();
FileInputStream fisSubsSysten = new FileInputStream("whatever.properties");
prop.load(fisSubsSysten);
That is, it is reading the properties files from current working directory.. Another problem is the developers have chosen to use the above lines multiple times within the same java file. For example if there are 10 methods, each method will have the above code instead of having one method and calling it wherever necessary..
This means, we can NEVER change the location of the properties files, currently they are directly under the websphere profiles directory, isn't this ugly? If I move them somewhere else, and set that location in classpath, it does not work.
I tried changing the above lines like this using Spring IO utils library:
Resource resource = new ClassPathResource("whatever.properties");
Properties prop = PropertiesLoaderUtils.loadProperties(resource);
But this application has over 1000 files, and I am finding it impossible to change each file.. How would you go about refactoring this mess? Is there any easy way around?
Thanks!

In these cases of "refactoring" i use a simple find and replace approach. Notepad++ has a " find in files" feature but there are plenty of similar programs.
Create a class which does the properties loading with a method probably with a name parameter for the property file.
This can be a java singleton or a spring bean.
Search and replace all "new Properties()" lines with an empty line.
Replace all "load..." lines with a reference to your new class/ method. Notepad++ supports regex replacement, so you can use the filename as a parameter.
Once this is done go to eclipse and launch a "cleanup" or "organize imports" and fix some compile errors manually if needed.
This approach is quite straight forward and takes no more than 10min if you are lucky or 1 hour if you are unlucky, f.e. the code formatting is way of and each file looks different.
You can make your replace simpler if you format the project once before with a line length of 300 or more so each java statement is on one line. This makes find and replace a bit easier as you dont have newlines to consider.

I can only agree that I find your project a bit daunting, from your reference.
However, the choice of how to maintain or improve of it is a risk that merely needs to be assessed and prioritised.
Consider building a highrise and subsequently realising the bolts that holds the infrastructure have a design flaw. The prospect of replacing them all is indeed daunting as well, so considerations into how to change them and if they really, really needs to be replaced, few, many or all.
I assume it must be a core system for the company, which somebody built and they have probably left the project (?), and you have consideration about improvement or maintaining them. But again, you must assess whether it really is important to move your property files, or if you can just for instance use symbolic links in your file system. Alternatively, do you really need to move them all or is there just a few that would really benefit from being moved. Can you just mark all places in the code with a marker to-be-fixed-later. I sometimes mark bad classes with deprecation, and promise to fix affected classes but postpone until I have other changes in those classes until finally the deprecated class can be safely removed.
Anyway you should assess your options, leave files, replace all or partials, and provide some estimation of cost and consequences, and ask your manager which course to take.
Just note that always overestimate the solution you don't want to do, as you would be twice as likely to stop for coffee breaks, and a billboard of told-you-so's is a great leverage for decision making :)
On the technology side of your question, regex search and replace is probably the only option. I would normally put configuration files in a place accessible by classpath.

You can try using eclipse search feature. For example if you right click on load() method of the properties class and select References -> Project it will give you all location in your project where that method is used.
Also from there maybe you can attempt a global regex search and replace.

Java content APIs for a large number of files

Does anyone know any java libraries (open source) that provides features for handling a large number of files (write/read) from a disk. I am talking about 2-4 millions of files (most of them are pdf and ms docs). it is not a good idea to store all files in a single directory. Instead of re-inventing the wheel, I am hoping that it has been done by many people already.
Features I am looking for
1) Able to write/read files from disk
2) Able to create random directories/sub-directories for new files
2) Provide version/audit (optional)
I was looking at JCR API and it looks promising but it starts with a workspace and not sure what will be the performance when there are many nodes.

Edit: JCP does look pretty good. I'd suggest trying it out to see how it actually does perform for your use-case.
If you're running your system on Windows and noticed a horrible n^2 performance hit at some point, you're probably running up against the performance hit incurred by automatic 8.3 filename generation. Of course, you can disable 8.3 filename generation, but as you pointed out, it would still not be a good idea to store large numbers of files in a single directory.
One common strategy I've seen for handling large numbers of files is to create directories for the first n letters of the filename. For example, document.pdf would be stored in d/o/c/u/m/document.pdf. I don't recall ever seeing a library to do this in Java, but it seems pretty straightforward. If necessary, you can create a database to store the lookup table (mapping keys to the uniformly-distributed random filenames), so you won't have to rebuild your index every time you start up. If you want to get the benefit of automatic deduplication, you could hash each file's content and use that checksum as the filename (but you would also want to add a check so you don't accidentally discard a file whose checksum matches an existing file even though the contents are actually different).
Depending on the sizes of the files, you might also consider storing the files themselves in a database--if you do this, it would be trivial to add versioning, and you wouldn't necessarily have to create random filenames because you could reference them using an auto-generated primary key.

Combine the functionality in the java.io package with your own custom solution.
The java.io package can write and read files from disk and create arbitrary directories or sub-directories for new files. There is no external API required.
The versioning or auditing would have to be provided with your own custom solution. There are many ways to handle this, and you probably have a specific need that needs to be filled. Especially if you're concerned about the performance of an open-source API, it's likely that you will get the best result by simply coding a solution that specifically fits your needs.
It sounds like your module should scan all the files on startup and form an index of everything that's available. Based on the method used for sharing and indexing these files, it can rescan the files every so often or you can code it to receive a message from some central server when a new file or version is available. When someone requests a file or provides a new file, your module will know exactly how it is organized and exactly where to get or put the file within the directory tree.
It seems that it would be far easier to just engineer a solution specific to your needs.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.