Java - Read specific files in specific order from a folder [closed]

Java - Read specific files in specific order from a folder [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
In my Java program i process a certain amount of files.
Those Files are named in this way:
thu 21 mar 2013_01.55.22_128.txt
thu 21 mar 2013_01.55.22_129.txt
thu 21 mar 2013_01.55.22_130.txt
....
sat 23 mar 2013_01.45.55_128.txt
sat 23 mar 2013_01.45.55_129.txt
sat 23 mar 2013_01.45.55_130.txt
Where the last three numbers are the cell number.
Consider that i already read in order of date the files coming from the same cell.
Consider that all the files are on the same folder.
Consider also that this problem, but for a single cell, was correctly solved on This Post
My question now is: how can i read first all the txt coming from a specific cell (e.g 128), then all the files coming from cell 129 and so on? (below: a graphic example)
thu 21 mar 2013_01.55.22_128.txt
sat 23 mar 2013_01.45.55_128.txt
...
thu 21 mar 2013_01.55.22_129.txt
sat 23 mar 2013_01.45.55_129.txt
...
thu 21 mar 2013_01.55.22_130.txt
sat 23 mar 2013_01.45.55_130.txt
I hope I was clear

You may get all files in directory using listFiles() into array then sort it using custom comparator.
File[] files = dir.istFiles();
Array.sort(files, new Comparator<File> {
#Override
public int compare(File lhs, File rhs) {
//return -1 if lhs should go before
//0 if it doesn't matter
//1 if rhs should go after
}
});

Well, you could read the folder in order to get the File objects (or maybe just file names).
Then parse the file names, extract the cell and put the files into a map whose key is the cell.
Some pseudo code:
Map<String, List<File>> filesPerCell = new LinkedHashMap<String, List<File>>();
File[] files = folder.listFiles();
for( File file : files ) {
String filename = file.getName();
String cell = ... ; //extract from filename
List<File> l = filesPerCell.get( cell );
//create new list if l is null
l.add( file );
}
for( List<File> cellList : filesPerCell.values() ) {
//do whatever you want with the files for that cell
}

You will have your file names sorted by cell number, and inside the cell, by date/time. You could do this most easily, if your file names were like this:
cellnumber_yyyymmdd_hhmmss
where cellnumber would be the same number of digits in all cases.
Otherwise you must write a custom comparator (as #RiaD writes), but it is not trivial because of the dates that must be parsed so one could decide on later/earlier.

Related

Java 8 dealing unicode characters in file names unexpectedly

I am frustrated to find a solution to the issue as explained in the question title. To help you understand clearly my problem, I am presenting below a little detail.
I have some files having an accent in their names which are storing in a directory. Running ls -lht, it cannot show the files correctly. The accents are decoded incorrectly. But if I press the tab key to apply autocompletion from the terminal, the file name can be shown as expected. See the snippet below.
tc_pst03#login-01: 6] ls -lht
total 704K
-rw-r----- 1 tc_pst03 pst_pub 86K Oct 9 00:27 Li??2012_Modelling osteomyelitis_Control Model.cps
-rw-r----- 1 tc_pst03 pst_pub 46K Oct 9 00:27 Li??2012_Modelling osteomyelitis_Control Model.xml
-rw-r----- 1 tc_pst03 pst_pub 11K Oct 9 00:27 Li??2012_Modelling osteomyelitis_Control Model.sedml
tc_pst03#login-01: 6] mv Liò2012_Modelling\ osteomyelitis_Control\ Model.
When using a Java snippet to get all those files, I get the results which are Li??2012... instead of Liò. I have looked for shared solutions in our communities but no solution works for my problem. Below is the Java snippet I have tried to get the list of those files.
List<File> get(String modelId, int revisionNumber) throws ModelException {
File modelDirectory = new File(modelCacheDir, modelId)
File revisionDirectory
List returnedFiles = new LinkedList<File>()
try {
revisionDirectory = new File(modelDirectory, revisionNumber.toString())
if (!revisionDirectory.exists()) {
throw new FileNotFoundException()
} else {
returnedFiles = Files.list(revisionDirectory.toPath())*.toFile() //revisionDirectory.listFiles().toList()
}
if (returnedFiles?.isEmpty()) {
String message = """The cache directory of this model ${modelId} revision ${revisionNumber} is empty. \
The model cache builder will be launched again."""
throwModelException(modelId, message)
}
} catch (FileNotFoundException me) {
String message = """The files associated with this model ${modelId}, \
revision ${revisionNumber} hasn't been cached yet"""
throwModelException(modelId, message)
}
return returnedFiles
}
I suspected JVM does use the default charset so I manually enable UTF-8 by defining it in JAVA_TOOLS_OPTIONS: export JAVA_TOOLS_OPTIONS=-Dfile.encoding="UTF-8".
Some results are printed below:
[
._Li?2012_Modelling osteomyelitis_Control Model.xml4483388255187556135.tmp,
._Li?2012_Modelling osteomyelitis_Control Model.xml8578169841449575225.tmp,
Li��2012_Modelling osteomyelitis_Control Model.sedml,
._Li?2012_Modelling osteomyelitis_Control Model.xml1056906750418910165.tmp,
Li��2012_Modelling osteomyelitis_Control Model.xml,
Li��2012_Modelling osteomyelitis_Control Model.cps]
I need to get those files' names to compare with the same files names persisted in the database. However, the file names getting from the file system are decoded improperly so it is never equal to the ones already saved in the database.
Do anyone know why the issue is happening. Any ideas? Thanks!

Getting unknown integer value in Map-reduce output file

I am working on a hadoop map-reduce program where i am not setting the mapper and reducer and not setting any other parameter to the Job configuration from my program. I did so assuming that the the Job will send the same output as the input to the output file.
But what i found that it is printing some dummy integer value in the output file with every line separated by tab(i guess).
Here is my code:
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class MinimalMapReduce extends Configured implements Tool {
public int run(String[] args) throws Exception {
Job job = new Job(getConf());
job.setJarByClass(getClass());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) {
String argg[] = {"/Users/***/Documents/hadoop/input/input.txt",
"/Users/***/Documents/hadoop/output_MinimalMapReduce"};
try{
int exitCode = ToolRunner.run(new MinimalMapReduce(), argg);
System.exit(exitCode);
}catch(Exception e){
e.printStackTrace();
}
}
}
And here is the input:
2011 22
2011 25
2012 40
2013 35
2013 38
2014 44
2015 43
And here is the output:
0 2011 22
8 2011 25
16 2012 40
24 2013 35
32 2013 38
40 2014 44
48 2015 43
How can i get the same ouput as the input?

I did so assuming that the the Job will send the same output as the input to the output file
You were correct in assuming that. Technically, you are getting the whatever you have in the file as the output. Remember that mappers and reducers take Key-Value pair as an input.
The input to a mapper is the an input split of the file and the input to a reducer is output of the mapper(s).
But what i found that it is printing some dummy integer value in the output file with every line separated by tab
These dummy integer are nothing but the offset of that line from the start of the file. Since each row you have consists of [4 DIGITS]<space>[2 DIGITS]<new-line>, your offsets are multiple of eights.
Why are you getting this offset since you haven't defined any mapper or reducer, you might ask? This is because , a mapper will always run which will do this task of mapping each line to it's offset and is referred to as an IdentityMapper.
How can i get the same ouput as the input?
Well you can define a mapper and just map the input lines to the output and strip the offsets.
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
// Some cool logic here
}
In the above code, key contains the dummy integer value i.e. offset. And value contains the value of each line, one at a time.
You can write your own code to write the value using the context.write function and then using no reducer and setting job.setNumReduceTasks(0) to get the desired output.

I agree with the #philantrovert's answer, but here is the more details i found.
According to the Hadoop- The Definitive Guide, it is TextInputFormat which adds the offset to the line numbers. Here is the documentation about the TextInputFormat:
TextInputFormat is the default InputFormat. Each record is a line of input. The key, a LongWritable, is the byte offset within the file of the beginning of the line. The value is the contents of the line, excluding any line terminators (e.g., newline or carriage return), and is packaged as a Text object. So, a file containing the following text:
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.
is divided into one split of four records. The records are interpreted as the following key-value pairs:
(0, On the top of the Crumpetty Tree)
(33, The Quangle Wangle sat,)
(57, But his face you could not see,)
(89, On account of his Beaver Hat.)
Clearly, the keys are not line numbers. This would be impossible to implement in general, in that a file is broken into splits at byte, not line, boundaries. Splits are processed independently. Line numbers are really a sequential notion. You have to keep a count of lines as you consume them, so knowing the line number within a split would be possible, but not within the file.
However, the offset within the file of each line is known by each split independently of the other splits, since each split knows the size of the preceding splits and just adds this onto the offsets within the split to produce a global file offset. The offset is usually sufficient for applications that need a unique identifier for each line. Combined with the file’s name, it is unique within the filesystem. Of course, if all the lines are a fixed width, calculating the line number is simply a matter of dividing the offset by the width.

Java FileOutputStream Windows/Linux

Hy I got this Code:
OutputStream output = new FileOutputStream("res/" + new java.util.Date().toString() +".properties");
I'm working on my Code at university (Laptop - Ubunutu) and at home (Desktop - windows).
I Sync the Eclipse Project with github and EGit.
Now I wasn't able to excecute this line of code on my Windows machine, but it's still working on my laptop.. the exact same Code..
Getting the Error:
java.io.FileNotFoundException: res\Thu Jan 08 15:54:39 CET 2015.properties (Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch)
And: at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(Unknown Sourc
at java.io.FileOutputStream.<init>(Unknown Source)

Windows does not allow colons ':' in filenames.

I think the problem is that : is not a valid character for a path name in Windows.
From the MSDN article "Naming Files, Paths, and Namespaces":
[...]
Use any character in the current code page for a name, including
Unicode characters and characters in the extended character set
(128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
[...]

new java.util.Date().toString() will give date in following format
Thu Jan 08 20:51:01 IST 2015
It contains :
That character is not allow to use in names in windows. But you can use them in linux
Therefore if you want to use same thing in both platform, you have to change the format of the date may be using SimpleDateFormat or something else
e.g.
SimpleDateFormat sd = new SimpleDateFormat("YY-MM-DD");
System.out.println(sd.format(new Date()));

write logs to oldest file in directory

I want to do some tweaks to my logging for my application...
I would like some help to enhance what I have below in main method:
public static void main(String[] args) {
try {
Date date = new Date();
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
Handler h = new FileHandler("../logs/MyLogFile_"
+ sdf.format(date) + ".log", true);
h.setFormatter(new SingleLineFormatter());
h.setLevel(Level.ALL);
logger.setUseParentHandlers(false);
logger.addHandler(h);
}
//...
}
It creates a log file with date stamp everytime I run the application. But I want to achieve something like this in my Unix Directory:
-rw-r--r-- 1 r787848 dev 45271 Feb 4 11:31 MyLogFile.log.06
-rw-r--r-- 1 r787848 dev 45308 Feb 5 11:36 MyLogFile.log.05
-rw-r--r-- 1 r787848 dev 44336 Feb 6 06:50 MyLogFile.log.04
-rw-r--r-- 1 r787848 dev 44379 Feb 7 08:41 MyLogFile.log.03
-rw-r--r-- 1 r787848 dev 44409 Feb 10 08:45 MyLogFile.log.02
-rw-r--r-- 1 r787848 dev 44446 Feb 11 12:36 MyLogFile.log.01
I want to define a set of lets say 6 log files to capture logging of daily run of the application. When it comes to logging, I want the application to write to the log file that is oldest, so in the above instance, running the application on Feb 12 08:45 should clear MyLogFile.log.06 and write fresh for feb 12.
How can this be achieved with java.util.logging on top of what I have. Unfortunately, I am not able to configure log4j properties and want to use java.util.logging only.

The only close approximation is to do the following:
Handler h = new FileHandler("../logs/MyLogFile_"
+ sdf.format(date) + ".log", Integer.MAX_VALUE, 6, false);
See: JDK-6350749 - Enhance FileHandler to have Daily Log Rotation capabilities.

adding methods to existing java class and recompile with my project

I'm trying to do something that should be pretty easy but some how I keep failing...
the idea is to take existing java class from java repository (in our case java sun)
modify it a bit.. recompile the class and use the modified class in our project
the steps (I took String.java from java.lang for example)
modify String String.java by adding:
public int zzz() {
return 123;
}
just under the class constructors.
recompile String.java
javac -d String String.java
jar -cf the compiled files
this is the the result of: jar -vtf String.jar:
0 Wed May 22 10:31:06 IDT 2013 META-INF/
68 Wed May 22 10:31:06 IDT 2013 META-INF/MANIFEST.MF
9763 Wed May 22 10:30:44 IDT 2013 java/lang/String$1.class
1232 Wed May 22 10:26:04 IDT 2013 java/lang/String$CaseInsensitiveComparator.class
17269 Wed May 22 10:26:04 IDT 2013 java/lang/String.class
write short main class:
public class main {
/**
* #param args
*/
public static void main(java.lang.String[] args) {
// TODO Auto-generated method stub
java.lang.String s = new java.lang.String(" xxx ");
s = s.concat("bla bla");
System.out.println(s);
System.out.println(s.zzz());
}
}
(I get the same behavior when trying java.lang.String and just String.)
5.compile my main.java with the modified String class
javac -Xbootclasspath/p:String.jar main.java
6.run main
java -Xbootclasspath/p:String.jar main
that gives us the following output:
myusername#machinename:~/work/sand_box$ java -Xbootclasspath/p:String.jar main
xxx bla bla
Exception in thread "main" java.lang.NoSuchMethodError: java.lang.String.<init>([CZ)V
at java.lang.Integer.toString(Integer.java:333)
at java.lang.Integer.toString(Integer.java:136)
at java.lang.String.valueOf(String.java:2948)
at java.io.PrintStream.print(PrintStream.java:597)
at java.io.PrintStream.println(PrintStream.java:736)
at main.main(main.java:12)
I can't figure out what am I doing wrong
can someone please shed some light on this please?
10x to all the contributors out there.

From my pt of view it would be better to just extend String instead of recompiling a modified version of a JDK class
e.g.: public class String extends java.lang.String
That way you'll create a new String class in your package
Of course, depending on your needs this may not be the best option
But in general I think it's no good idea to modify JDK classes directly - at least if you don't plan to include (and recompile) all the SDK from source by yourself
KR
Florian

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.