Regarding stitching of multiple files into a single file - java

I work on query latencies and have a requirement where I have several files which contain data. I want to aggregate this data into a single file. I use a naive technique where I open each file and collect all the data in a global file. I do this for all the files but this is time taking. Is there a way in which you can stitch the end of one file to the beginning of another and create a big file containing all the data. I think many people might have faced this problem before. Can anyone kindly help ?

I suppose you are currently doing the opening and appending by hand; otherwise I do not know why it would take a long time to aggregate the data, especially since you describe the amount of files using multiple and several which seem to indicate it's not an enormous number.
Thus, I think you are just looking for a way to automatically to the opening and appending for you. In that case, you can use an approach similar to below. Note this creates the output file or overwrites it if it already exists, then appends the contents of all specified files. If you want to call the method multiple times and append to the same file instead of overwriting an existing file, an alternative is to use a FileWriter instead with true as a second argument to its constructor so it will append to an existing file.
void aggregateFiles(List<String> fileNames, String outputFile) {
PrintWriter writer = null;
try {
writer = new PrintWriter(outputFile);
for(String fileName : fileNames) {
Path path = Paths.get(fileName);
String fileContents = new String(Files.readAllBytes(path));
writer.println(fileContents);
}
} catch(IOException e) {
// Handle IOException
} finally {
if(writer != null) writer.close();
}
}
List<String> files = new ArrayList<>();
files.add("f1.txt");
files.add("someDir/f2.txt");
files.add("f3.txt");
aggregateFiles(files, "output.txt");

Related

Java File is disappearing from the path /tmp/hsperfdata_*username*/

This is very confusing problem.
We have a Java-application (Java8 and running on JBoss 6.4) that is looping a certain amount of objects and writing some rows to a File on each round.
On each round we check did we receive the File object as a parameter and if we did not, we create a new object and create a physical file:
if (file == null){
File file = new File(filename);
try{
file.createNewFile();
} catch (IOException e) {e.printStackTrace();}}
So the idea is that the file get's created only once and after that the step is skipped and we proceed straight to writing. The variable filename is not a path, it's just a file name with no path so the file gets created to a path jboss_root/tmp/hsperfdata_username/
edit1. I'll add here also the methods used from writing if they happen to make relevance:
fw = new FileWriter(indeksiFile, true); // append = true
bw = new BufferedWriter(fw);
out = new PrintWriter(bw);
.
.
out.println(..)
.
.
out.flush();
out.close(); // this flushes as well -> line above is useless
So now the problem is that occasionally, quite rarely thou, the physical file disappears from the path in the middle of the process. The java-object reference is never lost, but is seems that the object itself disappears because the code automatically creates the file again to the same path and keeps on writing stuff to it. This would not happen if the condition file == null would not evaluate to true. The effect is obviously that we loose the rows which were written to the previous file. Java application does not notice any errors and keeps on working.
So, I would have three questions which are strongly related for which I was not able to find answer from google.
If we call method File.CreateNewFile(), is the resulting file a permanent file in the filesystem or some JVM-proxy-file?
If it's permanent file, do you have any idea why it's disappearing? The default behavior in our case is that at some point the file is always deleted from the path. My guess is that same mechanism is deleting the file too early. I just dunno how to control that mechanism.
My best guess is that this is related to this path jboss_root/tmp/hsperfdata_username/ which is some temp-data folder created by the JVM and probably there is some default behavior that cleans the path. Am I even close?
Help appreciated! Thanks!
File.createNewFile I never used in my code: it is not needed.
When afterwards actually writing to the file, it probaby creates it anew, or appends.
In every case there is a race on the file system. Also as these are not atomic actions,
you might end up with something unstable.
So you want to write to a file, either appending on an existing file, or creating it.
For UTF-8 text:
Path path = Paths.get(filename);
try (PrintWriter out = new PrintWriter(
Files.newBufferedWriter(path, StandardOpenOption.CREATE, StandardOpenOption.APPEND),
false)) {
out.println("Killroy was here");
}
After comment
Honestly as you are interested in the cause, it is hard to say. An application restart or I/O (?) exceptions one would find in the logs. Add logging to a specific log for appending to the files, and a (logged) periodic check for those files' existence.
Safe-guard
Here we are doing repeated physical access to the file system.
To prevent appending to a file twice at the same time (of which I would expect an exception), one can make a critical section in some form.
// For 16 semaphores:
final int semaphoreCount = 16;
final int semaphoreMask = 0xF;
Semaphore[] semaphores = new Semaphore[semaphoreCount];
for (int i = 0; i < semaphores.length; ++i) {
semaphores[i] = new Semaphore(1, true); // FIFO
}
int hash = filename.hashcode() & semaphoreMask ; // toLowerCase on Windows
Semaphore semaphore = semaphores[hash];
try {
semaphore.aquire();
... append
} finally {
semaphore.release();
}
File locks would have been a more technical solution, which I would not like to propose.
The best solution, you perhaps already have, would be to queue messages per file.

Java - How can I more effectively "scan" a portion of my file system with this program?

I am working on part of a proof of concept program in Java for an antivirus idea I had. Right now I'm still just kicking the idea around and the details aren't important, but I want the program I'm writing to get the file paths of every file within a certain range of each other(say 5 levels apart) in the directory and write them to a text file.
What I have right now(I will include my code below) can do this to a limited extent by checking if there are files in a given folder in the directory and writing their file paths to a text file, and then going down another level and doing it again. I have it set up to do 2 levels in the directory currently and it sort of works. But it only works if there is only one item in the given level of the directory. If there is one text file it will write that filepath to another text file and then terminate. But if there's a text file and folder, it ignores the text file and goes down to the next level of directory and records the file path of whatever text file it finds there. If there are two or more folders it will always choose one in particular over the other or others.
I realize now that it's doing that because I used the wrong conditional. I used if else and should have done something else, but I'm not sure which one I should have used. However I have to do it, I want to fix it so that it branches out with each level. For example, I start the program and give it starting directory C:/Users/"Name"/Desktop/test/. Test has 2 folders and a text file in it. Working the way I want it to, it would then record the file path of the .txt, go down a level into both folders, record any .txts or other files it found there, and then go down another level into each folder it found in those two folders, record what it found there, and so on until it finished the pre-determined number of levels to go through.
EDIT: To clarify confusion over what the problem is, I'll sum it up. I want the program to write the file paths of any files it finds in each level of the directory it goes through in another text file. It will do this, but only if there is one file in a given level of directory. If there is just one .txt for example, it will write the file path of that .txt to the other text file. But if there are multiple files in that level of directory(for example, two .txts) it will only write the file path of one of them and ignore the other. If there's a .txt and a folder, it ignores the .txt and enters the folder to go to the next level of directory. I want it to record all files in a given location and then branch into all the folders in that same location.
EDIT 2: I got the part of my code that gets the file path from this question( Read all files in a folder ) and the section that writes to my other text file from this one( How do I create a file and write to it in Java? )
EDIT 3: How can I edit my code to have recursion, as #horatius pointed out that I need?
EDIT 4: How can I edit my code so that it doesn't need a hard coded starting file path to work, and can instead detect the location of the executable .jar and use that as its starting directory?
Here is my code:
public class ScanFolder {
private static final int LEVELS = 5;
private static final String START_DIR = "C:/Users/Joe/Desktop/Test-Level1/";
private static final String REPORT_FILE = "C:/Users/Joe/Desktop/reports.txt";
public static void main(String[] args) throws IOException {
try (PrintWriter writer = new PrintWriter(REPORT_FILE, "UTF-8");
Stream<Path> pathStream = Files.walk(Paths.get(START_DIR), LEVELS)) {
pathStream.filter(Files::isRegularFile).forEach(writer::println);
} catch (Exception e) {
e.printStackTrace(System.err);
}
}
}
Thanks in advance
If you are using Files.walk(...) it does all the recursion for you.
Opening and writing to the PrintWriter will truncate your output file each time it is opened/written to, leaving just the last filename written.
I think something like the below is what you are after. As you progress, rather than writing to a file, you may want to put the found Path objects into an ArrayList<Path> or similar for easier later processing, but not clear from your question what requirements you have here.
public class Walk
{
public static void main(String[] args) throws IOException {
try (PrintWriter writer = new PrintWriter("C:/Users/Joe/Desktop/reports.txt", "UTF-8")) {
Files.walk(Paths.get("C:/Users/Joe/Desktop/test")).forEach(filePath -> {
if (Files.isRegularFile(filePath)) {
writer.println(filePath);
}
});
}
}
}
Here is an improved example that you can use to limit depth. It also deals with properly closing the Stream returned by Files.walk(...) that the previous example did not, and is a little more streams/lambda idiomatic:
public class Walk
{
// Can use Integer.MAX_VALUE for all
private static final int LEVELS = 2;
private static final String START_DIR = "C:/Users/Joe/Desktop/test";
private static final String REPORT_FILE = "C:/Users/Joe/Desktop/reports.txt";
public static void main(String[] args) {
try (PrintWriter writer = new PrintWriter(REPORT_FILE, "UTF-8");
Stream<Path> pathStream = Files.walk(Paths.get(START_DIR), LEVELS)) {
pathStream.filter(Files::isRegularFile).forEach(writer::println);
} catch (Exception e) {
e.printStackTrace(System.err);
}
}
}

Checking for duplicate string in file java

I have an file, where I am writing data to it. I've tried googling, but all examples I have tried have just confused me more.
I am inputting data into a file, and this is happening correctly, where the items selected are being appended to the file. Where my issue is, is that I want to check whether the string being inputted already exists in the file, and if it does, I want to skip it.
The code I am using to input the data to the file is below, but I am not sure how to change it to check for a duplicate.
for (EventsObj p : boxAdapter.getBox()) {
if (p.box){
String result = p.name + " " + p.price;
try {
// open file for writing
OutputStreamWriter out= new OutputStreamWriter(openFileOutput("UserEvents.txt",MODE_APPEND));
// write the contents to the file
out.write(result);
out.write('\n');
// close the file
out.close();
}
catch (java.io.IOException e) {
//do something if an IOException occurs.
Toast.makeText(this, "Sorry Text could't be added", Toast.LENGTH_LONG).show();
}
}
}
It is getting the checkboxes ticked, then getting the name and price related to it and appending it to file. But I want to carry out a check that this does not already exist. Any help would be appreciated and I've exhausted google and tried many things.
So, if I understood your question correctly the file contains a number of strings delimited by newline.
What you want to do is to read the file contents line by line, and store the lines in a HashSet<String>. Then, you open the file for appending and append the additional string, but only if the file did not contain the string already. As the other answer suggested, you use the contains method. However, unlike the other answer I'm not suggesting to use a list of strings; instead, I'm suggesting the use of a HashSet as it's more efficient.
While reading the file contents line by line, you can perform some basic checks: does the file already contain duplicate rows? You may want to handle those by giving the user a warning that the file format is invalid. Or you may want to proceed nevertheless.
You should firstly read from the file and create a list of strings with all your inputs.
Then before adding to the file you can check if the list of strings contains the string you want to add (just make sure that the strings share the same format such that a match will be found). If it returns false add to the file, if yes don't add to the file.
Shouldn't be such a tremendous task. You can make use of the contains method.
You might need to keep the contents of the file in a String in your program. A little inefficient, but at the moment I do not see any other way but to keep track of things in your program instead of on the file.
So before you run the program which appends text to the file, the very first thing you should probably do is parse the file for all text:
File yourFile = new File("file-path-goes-here");
Scanner input = null;
try {
input = new Scanner (new FileInputStream(yourFile) );
}
catch (FileNotFoundException e) {;;;}
String textFromFile = "";
while (input.hasNextLine())
textFromFile += input.nextLine() + "\n";
//Now before adding to the file simply run something like this
if(textFromFile.indexOf("string-to-write-to-file") != -1)
;//do not write to file
else {
;//write to file and add to textFromFile
textFromFile += "string-you-added-to-file" + "\n";
}
Hope this answers your question. Let me know if something is not clear.

Read a CSV file in Java

I wrote a programm that reads a csv file and puts it into a TableModel. My problem is that I want to expand the programm so, that if the csv file gets changes from outside my tablemodel gets updated and gets the new values.
I would now programm a scheduler so that the thread sleeps for about a minute and checks it every minute if the timestamp of the file changed. If so it would read the file again. But i dont know what happens to the whole programm if i use a scheduler because this little software i write will be a part of a much much bigger software wich is running on JDK 6. So I search for a performant and independent from the bigger software solution to get the changes in the tablemodel.
Can someone help out?
java.nio.file package now contains the Watch Service API. This, effectively:
This API enables you to register a directory (or directories) with the
watch service. When registering, you tell the service which types of
events you are interested in: file creation, file deletion, or file
modification. When the service detects an event of interest, it is
forwarded to the registered process. The registered process has a
thread (or a pool of threads) dedicated to watching for any events it
has registered for. When an event comes in, it is handled as needed.
See reference here.
Oh! This API is only available from JDK 7 (onwards).
**OpenCsv is a best way to read csv file in java.
if your are using maven then you can use below dependency or download it's jar from web.**
#SuppressWarnings({"rawtypes", "unchecked"})
public void readCsvFile() {
CSVReader csvReader;
CsvToBean csv;
File fileEntry;
try {
fileEntry = new File("path of your file");
csv = new CsvToBean();
csvReader = new CSVReader(new FileReader(fileEntry), ',', '"', 1);
List list = csv.parse(setColumMapping(), csvReader);
//List of LabReportSampleData class
} catch (IOException e) {
e.printStackTrace();
}
}
//Below function is used to map the your csv file to your mapping object.
//columns String array: The value inside your csv file. means 0 index map with degree variable in your mapping class.
#SuppressWarnings({"rawtypes", "unchecked"})
private static ColumnPositionMappingStrategy setColumMapping() {
ColumnPositionMappingStrategy strategy = new ColumnPositionMappingStrategy();
strategy.setType(LabReportSampleData.class);
String[] columns =
new String[] {"degree", "radian", "shearStress", "shearingStrain", "sourceUnit"};
strategy.setColumnMapping(columns);
return strategy;
}

How to move/rename uploaded file?

I followed this tutorial for uploading a file in my JSF2 application.
The application works fine but I am unhappy with one aspect.
While rebuilding the request, the File sent via request is saved somewhere on the disk.
Even though the file is saved I need to rename the file with a name which is available after entering the Managed Bean containing the action method.
Therefore I decided to create a new file with de desired name, copy the already saved file, and then delete the unneeded one.
private File uploadFile;
//...
try {
BufferedWriter bw = new BufferedWriter(new FileWriter(newFile));
BufferedReader br = new BufferedReader(new FileReader(uploadFile));
String line = "";
while ((line = br.readLine()) != null){
bw.write(line);
}
} catch (Exception e){}
The new file appears in the desired location but this error is thrown when I'm trying to open the file: "Invalid or unsupported PNG file"
These are my questions:
Is there a better way to solve this problem?
Is this solution the best way to upload a picture? Is there a reason to save the file before the business logic when there may be need to resize the picture or the desired name is not available yet.
LE:
I know abot this tutorial as well but I'm trying to do this mojarra only.
There is a rename method built into java.io.File object already, I'd be surprised if it didn't work for your situation.
public boolean renameTo(File dest)
Renames the file denoted by this abstract pathname.
Many aspects of the behavior of this method are inherently platform-dependent:
The rename operation might not be able to move a file from one filesystem to
another, it might not be atomic, and it might not succeed if a file with the
destination abstract pathname already exists. The return value should always
be checked to make sure that the rename operation was successful.
You can also check if a file exists before saving it, and you can use the ImageIO class to do validations on the uploaded file before performing the initial save.
Don't use Reader and Writer when you deal with binary files like images. Use streams: FileInputStream and FileOutputStream. And the best variant is to use #Perception solution with renameTo method.
Readers read file as if it consists of characters (e.g. txt, properties, yaml files). Image files are not characters, they are binary and you must use streams for that.

Categories

Resources