Java 8 dealing unicode characters in file names unexpectedly

Java 8 dealing unicode characters in file names unexpectedly - java

I am frustrated to find a solution to the issue as explained in the question title. To help you understand clearly my problem, I am presenting below a little detail.
I have some files having an accent in their names which are storing in a directory. Running ls -lht, it cannot show the files correctly. The accents are decoded incorrectly. But if I press the tab key to apply autocompletion from the terminal, the file name can be shown as expected. See the snippet below.
tc_pst03#login-01: 6] ls -lht
total 704K
-rw-r----- 1 tc_pst03 pst_pub 86K Oct 9 00:27 Li??2012_Modelling osteomyelitis_Control Model.cps
-rw-r----- 1 tc_pst03 pst_pub 46K Oct 9 00:27 Li??2012_Modelling osteomyelitis_Control Model.xml
-rw-r----- 1 tc_pst03 pst_pub 11K Oct 9 00:27 Li??2012_Modelling osteomyelitis_Control Model.sedml
tc_pst03#login-01: 6] mv Liò2012_Modelling\ osteomyelitis_Control\ Model.
When using a Java snippet to get all those files, I get the results which are Li??2012... instead of Liò. I have looked for shared solutions in our communities but no solution works for my problem. Below is the Java snippet I have tried to get the list of those files.
List<File> get(String modelId, int revisionNumber) throws ModelException {
File modelDirectory = new File(modelCacheDir, modelId)
File revisionDirectory
List returnedFiles = new LinkedList<File>()
try {
revisionDirectory = new File(modelDirectory, revisionNumber.toString())
if (!revisionDirectory.exists()) {
throw new FileNotFoundException()
} else {
returnedFiles = Files.list(revisionDirectory.toPath())*.toFile() //revisionDirectory.listFiles().toList()
}
if (returnedFiles?.isEmpty()) {
String message = """The cache directory of this model ${modelId} revision ${revisionNumber} is empty. \
The model cache builder will be launched again."""
throwModelException(modelId, message)
}
} catch (FileNotFoundException me) {
String message = """The files associated with this model ${modelId}, \
revision ${revisionNumber} hasn't been cached yet"""
throwModelException(modelId, message)
}
return returnedFiles
}
I suspected JVM does use the default charset so I manually enable UTF-8 by defining it in JAVA_TOOLS_OPTIONS: export JAVA_TOOLS_OPTIONS=-Dfile.encoding="UTF-8".
Some results are printed below:
[
._Li?2012_Modelling osteomyelitis_Control Model.xml4483388255187556135.tmp,
._Li?2012_Modelling osteomyelitis_Control Model.xml8578169841449575225.tmp,
Li��2012_Modelling osteomyelitis_Control Model.sedml,
._Li?2012_Modelling osteomyelitis_Control Model.xml1056906750418910165.tmp,
Li��2012_Modelling osteomyelitis_Control Model.xml,
Li��2012_Modelling osteomyelitis_Control Model.cps]
I need to get those files' names to compare with the same files names persisted in the database. However, the file names getting from the file system are decoded improperly so it is never equal to the ones already saved in the database.
Do anyone know why the issue is happening. Any ideas? Thanks!

Related

R code in Java working in Linux but not in Windows

What am I doing?
I am writing a data analysis program in Java which relies on R´s arulesViz library to mine association rules.
What do I want?
My purpose is to store the rules in a String variable in Java so that I can process them later.
How does it work?
The code works using a combination of String.format and eval Java and RJava instructions respectively, being its behavior summarized as:
Given properly formatted Java data structures, creates a data frame in R.
Formats the recently created data frame into a transaction list using the arules library.
Runs the apriori algorithm with the transaction list and some necessary values passed as parameter.
Reorders the generated association rules.
Given that the association rules cannot be printed, they are written to the standard output with R´s write method, capture the output and store it in a variable. We have converted the association rules into a string variable.
We return the string.
The code is the following:
// Step 1
Rutils.rengine.eval("dataFrame <- data.frame(as.factor(c(\"Red\", \"Blue\", \"Yellow\", \"Blue\", \"Yellow\")), as.factor(c(\"Big\", \"Small\", \"Small\", \"Big\", \"Tiny\")), as.factor(c(\"Heavy\", \"Light\", \"Light\", \"Heavy\", \"Heavy\")))");
//Step 2
Rutils.rengine.eval("transList <- as(dataFrame, 'transactions')");
//Step 3
Rutils.rengine.eval(String.format("info <- apriori(transList, parameter = list(supp = %f, conf = %f, maxlen = 2))", supportThreshold, confidenceThreshold));
// Step 4
Rutils.rengine.eval("orderedRules <- sort(info, by = c('count', 'lift'), order = FALSE)");
// Step 5
REXP res = Rutils.rengine.eval("rulesAsString <- paste(capture.output(write(orderedRules, file = stdout(), sep = ',', quote = TRUE, row.names = FALSE, col.names = FALSE)), collapse='\n')");
// Step 6
return res.asString().replaceAll("'", "");
What´s wrong?
Running the code in Linux Will work perfectly, but when I try to run it in Windows, I get the following error referring to the return line:
Exception in thread "main" java.lang.NullPointerException
This is a common error I have whenever the R code generates a null result and passes it to Java. There´s no way to syntax check the R code inside Java, so whenever it´s wrong, this error message appears.
However, when I run the R code in brackets in the R command line in Windows, it works flawlessly, so both the syntax and the data flow are OK.
Technical information
In Linux, I am using R with OpenJDK 10.
In Windows, I am currently using Oracle´s latest JDK release, but trying to run the program with OpenJDK 12 for Windows does not solve anything.
Everything is 64 bits.
The IDE used in both operating systems is IntelliJ IDEA 2019.
Screenshots
Linux run configuration:
Windows run configuration:

Apache Beam get source File Name

EDIT: RESOLVED!
I have multiple Text files from multiple languages. I want to add a language tag to each line using Apache Beam.
Example:
File text_en:
This is a sentence.
File text_de: Dies ist ein Satz.
What I want is this:
en: This is a sentence.
de: Dies ist ein Satz.
What I've tried:
I initially tried to just use one TextIO.Read.From(dataSetDirectory+"/*") and look for an option that looks something like .getSource(). However, this doesn't seem to exist.
Next I tried to read every File one by one like this:
File[] files = new File(datasetDirectory).listFiles();
PCollectionList<String> dataSet=null;
for (File f: files) {
String language = f.getName();
logger.debug(language);
PCollection<String> newPCollection = p.apply(
TextIO.Read.from(f.getAbsolutePath()))
.apply(ParDo.of(new LanguageTagAdder(language)));
if (dataSet==null) {
dataSet=PCollectionList.of(newPCollection);
} else {
dataSet.and(newPCollection);
}
}
PCollection<String> completeDataset= dataSet.apply(Flatten.<String>pCollections())
Reading the Files this way works perfectly fine, however my DoFn LanguageTagAdder is only initialized with the first language, thus all Files have the same added language.
LanguageTagAdder looks like this:
public class LanguageTagAdder
extends DoFn<String,String> {
private String language;
public LanguageTagAdder(String language) {
this.language=language;
}
#ProcessElement
public void processElement(ProcessContext c) {
c.output(language+c.element());
}
}
I realize this behavior is intended and needed so that the data can be parrallelized, but how would I go about solving my Problem? Is there a Beam-way to solve it?
PS: I get following warnings when creating a new LanguageTagAdder for the second time (with a second language):
DEBUG 2016-12-05 17:09:55,070 [main] de.kdld16.hpi.FusionDataset - en
DEBUG 2016-12-05 17:09:56,216 [main] de.kdld16.hpi.FusionDataset - de
WARN 2016-12-05 17:09:56,219 [main] org.apache.beam.sdk.Pipeline - Transform TextIO.Read2 does not have a stable unique name. This will prevent updating of pipelines.
EDIT:
The Problem was the line
dataSet.and(newPCollection);
It needed to be rewritten as:
dataSet=dataSet.and(newPCollection);
The way it was, dataSet only contained the first File.... No wonder they all had the same language Tag!

The Problem was the line
dataSet.and(newPCollection);
It needed to be rewritten as:
dataSet=dataSet.and(newPCollection);
The way it was, dataSet only contained the first File.

Java class loader bug: Caused by: java.io.IOException: Stream closed

I'm getting a strange bug regarding I believe class loader issues when I deploy my webapp to Tomcat. The bug doesn't appear when I run my webapp locally using Jetty. It seems like my input streams for my .yml resource files are being closed for some reason when they shouldn't be. This bug first appeared when I tried to convert my single module project into a multi module project. Before that, it was working fine on Tomcat using the exact same code:
Caused by: org.yaml.snakeyaml.error.YAMLException: java.io.IOException: Stream closed
at org.yaml.snakeyaml.reader.StreamReader.update(StreamReader.java:200)
at org.yaml.snakeyaml.reader.StreamReader.<init>(StreamReader.java:60)
at org.yaml.snakeyaml.Yaml.load(Yaml.java:412)
at com.config.ConfigProvider.<init>(ConfigProvider.java:20)
... 49 more
Caused by: java.io.IOException: Stream closed
at java.io.PushbackInputStream.ensureOpen(PushbackInputStream.java:57)
at java.io.PushbackInputStream.read(PushbackInputStream.java:149)
at org.yaml.snakeyaml.reader.UnicodeReader.init(UnicodeReader.java:90)
at org.yaml.snakeyaml.reader.UnicodeReader.read(UnicodeReader.java:122)
at java.io.Reader.read(Reader.java:123)
at org.yaml.snakeyaml.reader.StreamReader.update(StreamReader.java:184)
... 55 more
Here's the line that causes the bug:
String s = ConfigProvider.getConfig().getString("test");
Here's the ConfigProvider class. It basically scans for all resource files of regex ^.*\\.config\\.yml$, converts it into a Map<String, Object>, and combines all the obtained Map<String, Object> into a single Map<String, Object>:
1 public class ConfigProvider {
2 protected static final String CONFIG_PACKAGE = ConfigProvider.class.getPackage().getName();
3 protected static final Pattern CONFIG_PATH_REGEX = Pattern.compile("^.*\\.config\\.yml$");
4
5 private static final ConfigProvider INSTANCE = new ConfigProvider();
6 private Map<String, Object> configMap;
7
8 protected ConfigProvider() {
9 configMap = new HashMap<String, Object>();
10
11 Set<String> configPaths = new Reflections(CONFIG_PACKAGE,
12 new ResourcesScanner()).getResources(CONFIG_PATH_REGEX);
13
14 if (configPaths.isEmpty()) {
15 throw new RuntimeException("no config paths found");
16 }
17
18 for (String path : configPaths) {
19 InputStream inputStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(path);
20 Map<String, Object> fullConfig = new Map<String, Object>((Map) new Yaml().load(inputStream));
21
22 try {
23 inputStream.close();
24 } catch (IOException e) {
25 throw new RuntimeException("error closing stream");
26 }
27
28 MapUtils.merge(configMap, fullConfig);
29 }
30 }
31
32 public static ConfigMap getConfig() {
33 return INSTANCE.configMap;
34 }
35 }
Here's my project structure, titled Foo:
- Foo (this is a module)
- .idea
- application (this is a module)
- src
- main
- java
- resources
- application.config.yml
- webapp
- test
- pom.xml
- client (this is a module)
- src
- main
- java
- resources
- client.config.yml
- webapp
- test
- pom.xml
- pom.xml
ConfigProvider is a class I get from my parent pom file (Foo/pom.xml). I package a WAR file from the application module (Foo/application/target/application.war), and deploy it with Tomcat. Previously, my project was a single module project with just a Foo module being identical to application module. Then I added a client module and converted the project into a multi module project, and the problem has showed up. I think it's because my class loader is getting messed up due to the multiple modules. I've spent a lot of time trying to debug this and still haven't gotten anywhere. Anyone know what could be the cause, or can think of possible things to try?
Please let me know if you need more info.

According to this post, that exception could mean that the .yml file is simply not found. Since you changed your project structure, it is possible that the logic used to build the configPaths needs to be modified for the new structure. Did you try to log the content of configPaths to see if the paths are correct for the new structure?
Also make sure that the .yml files are included in the .war file. Some build systems handle resources differently than java class files.

Bean class member variable should match with .yml file keys and second option is there might be chances that you might be giving wrong file path.

Get history of a Deleted File from SVN using SVNKit

I am trying to access Revision History of a file that has been deleted using SVNKit.
Following is what I am doing to achieve that.
SVNClientManager manager = SVNClientManager.newInstance();
SVNLogClient logClient = manager.getLogClient();
logClient.doLog(svnURL, new String[] { fileName }, SVNRevision.create(deletedRevision),
SVNRevision.UNDEFINED, SVNRevision.UNDEFINED, false, false, true, -1, null,
new ISVNLogEntryHandler() {
public void handleLogEntry(SVNLogEntry logEntry) throws SVNException {
log.debug(" ==== " + logEntry.getChangedPaths() + " === "
+ logEntry.getRevision());
}
});
Here, deletedRevision => The SVN revision in which File was deleted.
When this code is executed I keep on getting following exceptions:
org.tmatesoft.svn.core.SVNException: svn: '<FilePath>' path not found: 404 Not Found (https://<RepositoryURL>
at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:64)
at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:51)
at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.logImpl(DAVRepository.java:976)
at org.tmatesoft.svn.core.io.SVNRepository.log(SVNRepository.java:1034)
at org.tmatesoft.svn.core.wc.SVNLogClient.doLog(SVNLogClient.java:1024)
at org.tmatesoft.svn.core.wc.SVNLogClient.doLog(SVNLogClient.java:891)
at com.blueoptima.connectors.scr.SVN.getWorkingFileList(SVN.java:711)
... 4 more
Is it something that I am doing wrong here? Is there any other way to get the History of a deleted file using SVNKit

Though this question has been asked more than a year back but still thought of answering it if it could be other's help.
I didnt try for retrieving history of a deleted file but i could retrieve the history of a deleted branch using -
SVNLogClient.doLog(SVNURL.parseURIEncoded(path), new String[] { "" }, pegRevision, SVNRevision.create(0),pegRevision, stopOnCopy, discoverChangedPaths, logsLimit, logHandler);
This is similar to the call you are making but you need to supply proper values for pegRevision, startRevision and endRevision. Use of UNDEFINED may not give correct result, instead use the revision at which file was deleted as pegRevision and startRevision as 0 and it should work.

You should specify a revision where the file existed as a peg revision. Obviously it is deletedRevision-1. And maybe (I'm not sure here, just try) the file should exist in both start and end revisions.

How to apply a patch

I have this patch code which i downloaded from a web article (Calling Matlab from Java).
http://www.cs.virginia.edu/~whitehouse/matlab/JavaMatlab.html
But I donot know how to apply it in my windowsXp running computer.
What I'm trying to do is call Matlab script file from java. I have found the necessary source codes and every thing but this mater is holding be back.
Any help is highly appreciated. Thank you.
Here's the patch code.
Index: MatlabControl.java
===================================================================
RCS file: /cvsroot/tinyos/tinyos-1.x/tools/java/net/tinyos/matlab/MatlabControl.java,v
retrieving revision 1.3
diff -u -r1.3 MatlabControl.java
--- MatlabControl.java 31 Mar 2004 18:43:50 -0000 1.3
+++ MatlabControl.java 16 Aug 2004 20:36:51 -0000
## -214,7 +214,8 ##
matlab.evalConsoleOutput(command);
}else{
- matlab.fevalConsoleOutput(command, args, 0, null);
+ // matlab.fevalConsoleOutput(command, args, 0, null);
+ matlab.fevalConsoleOutput(command, args);
}
} catch (Exception e) {
System.out.println(e.toString());

I'd download the standard UNIX patch tool and use:
patch -p0 <my_patch.diff

You need to apply that patch to the file MatlabControl.java. On Unix, you have the standard patch program to do that, but that ofcourse isn't normally present on Windows.
But looking at the patch file, it's very small and you could easily do the change by hand. Look at the patch file: The lines with a - in the left column must be removed. The lines with a + must be added.
So you must look in MatlabControl.java and remove this line:
matlab.fevalConsoleOutput(command, args, 0, null);
And add these lines:
// matlab.fevalConsoleOutput(command, args, 0, null);
matlab.fevalConsoleOutput(command, args);
In other words, it's a very small and simple change, you just have to remove the last two arguments to the method call to fevalConsoleOutput().
If you want the patch command (and lots of other Unix utilities) on Windows, you could download and install Cygwin.

If you use dev tools like Eclipse you can easily apply it as it is an option in the contextual menu (right click) go to Team - > Apply Patch. It should work.

This patch is so small, you can easily apply it by hand.
So simply open the file MatlabControl.java and change line 214 (the one prepended with -) to fit the lines prepended with +.
After that your code should look like:
else{
// matlab.fevalConsoleOutput(command, args, 0, null);
matlab.fevalConsoleOutput(command, args);
}

JMI (Java-to-Matlab Interface)'s Matlab class and its fevalConsoleOutput method are explained here: http://UndocumentedMatlab.com/blog/jmi-java-to-matlab-interface/

By Tortoise SVN, we can apply patch by following the below way. Click on Apply patch and browse the patch file.
Tortoise SVN

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java 8 dealing unicode characters in file names unexpectedly - java

Related

R code in Java working in Linux but not in Windows

Apache Beam get source File Name

Java class loader bug: Caused by: java.io.IOException: Stream closed

Get history of a Deleted File from SVN using SVNKit

How to apply a patch

Categories

Resources