Splitting a text file

Splitting a text file - java

I have this text file of the format:
Token:A1
sometext
Token:A2
sometext
Token:A3
I want to split this file into multiple files, such that
File 1 contains
A1
sometext
File 2 contains
A2
sometext
I do not have much idea about any programming or scripting language as such, what would be the best way to go about the process? I was thinking of using Java to solve the problem.

if you want to use java, I would look into using Scanner in conjunction with File and PrintWriter with a for loop and some exception handling you will be good to go.
import the proper libraries!
import java.io.*;
import java.util.*;
declare the class of course
public class someClass{
public static void main(String [] args){
now here's where stuff starts to get interesting. We use the class File to create a new file that has the name of the file to be read passed as a parameter. You can put whatever you want there whether its a path to the file or just the file name if its in the same directory as your code.
File currentFile = new File("new.txt");
if (currentFile.exists() && currentFile.canRead()){
try{
next we create a scanner to scan through that newly created File object. the for loop continues on as long as the file has new tokens to scan through. .hasNext() returns true only if the input in the scanner has another token. PrintWriter writes and creates the files. I have it set that it will create the files based on the iteration of the loop (0,1,2,3 etc) but that can be easily changed. (see new PrintWriter(i + ".txt". UTF-8); )
Scanner textContents = new Scanner(currentFile);
for(int i = 0; textContents.hasNext(); i++){
PrintWriter writer = new PrintWriter(i + ".txt", "UTF-8");
writer.println(textContents.next());
writer.close();
}
these catch statements are super important! Your code wont even compile without them. If there is an error they will make sure your code doesn't crash. I left the inside of them empty so you can do what you see fit.
} catch (FileNotFoundException e) {
// do something
}
catch (UnsupportedEncodingException i){
//do something
}
}
}
}
and thats pretty much it! if you have any questions be sure to comment!

There is no best way and it depends on your environment and need actually. But for any language figure out your basic algorithm and try using the best available data structure(s). If you are using Java, consider using guava splitter and do look into its implementation.

Related

Is this bad practice for passing an input file in Java?

I have a main class that expects an input file name to be provided through the command line argument; if this is not true then the program exits with an error message.
We are assuming the existence of a class called SudokuReducer. After making sure there is an input file, the main function will pass the input file (not just the name of the file) to an instance of SudokuReducer.
What I want to know, is this bad form/practice? Is it wrong to put the entirety of the scan inside a try/catch like this? Because then if I wanted to declare the SudokuReducer instance in 'main' outside of the try/catch instead of in, I can't since it doesn't recognize what 'fileInput' has been passed due to its limited scope inside the 'try'
Is there a better way of doing this? Here's what I have:
import java.io.File;
public class MainMV {
File inputFile;
public static void main(String args[]) {
// check if file is entered and handle exception
try {
if (args.length > 0) {
File inputFile = new File(args[0]);
System.out.println("Processing file");
SudokuReducer reducer = new SudokuReducer(inputFile);
} else {
System.out.println("No file entered.");
System.exit(1);
}
} catch (Exception e) {
System.out.println("File failed to open.");
System.exit(1);
}
}
}

To answer the question in the title: no, it's not bad practice, if that method needs a File to do its work.
Another option would be passing a String; and that's a poor choice, because it doesn't convey that the parameter is supposed to represent a File of some sort.
Perhaps a better option would be to pass in an InputStream to the method, since a) that clearly conveys that it's going to be used for input (as opposed to being a File that you will write to); b) it's more flexible, because it doesn't have to refer to a file on disk.
To answer the question in the question: no, it's not really good practice to wrap everything in one try/catch like this. It makes it hard to distinguish the modes of failure: many different things could go wrong in your code, and it's better to handle those things separately, e.g. to provide specific failure messages.
A better way to structure the code is something like this:
if (args.length == 0) {
System.out.println("No file entered.");
System.exit(1);
}
File inputFile = new File(args[0]);
System.out.println("Processing file");
try {
SudokuReducer reducer = new SudokuReducer(inputFile);
// Do something with reducer.
} catch (IOException e) {
e.printStackTrace();
System.out.println("File failed to open.");
System.exit(1);
}
Note that this has small blocks, handling specific errors, rather than a great big block where the error handling is separated from the thing causing the error.
Also, note that it's not catching Exception: you really don't want to do that unless you have to, because you're not correctly handling exceptions that it would catch that have to be handled in special ways (i.e. InterruptedException). Catch the most specific exception type you can.

From my understanding, Java passes parameters as a reference of the object by value which for me was terribly confusing.
Link to explanation of Pass by Reference vs Pass by Value.
Link to explanation of the Java implementation
Depending on how much information from the file is required to generate an instance of your SudokuReducer class, the overhead of this could be significant. If this is the case, you'll want to parse your input line by line doing
something like this in your main method.
try {
SudokuReducer reducer = SudokuReducer.makeSudokuReducer(args[0])
}
catch(Exception e) {
System.out.println("No file entered.");
System.exit(1);
}
Here's an example of reading a file line by line
There are many ways to do this, but the most efficient way I can think of is by using Java 8's Stream and Files classes.
The method signature will look something like this:
public static SudokuReducer makeSudokuReducer(String filename) {
//Open file
//Parse input line by line
//Use information to create a new instance of your class
//Return the instance of this class
}
You'll be able to call this static method anywhere to produce a new instance of your class from a filename.

I want to read contains methods, in a .java file as below,

As an example, I have a .Java file like this,
public class A {
private void callData() {
//There can be custom methods like this
checkImage("A",true);
checkObject("B","C",true);
}
}
I want to read this methods name and parameters. I dont need to go inside those methods and take the values but I want to take the name and parameters. This A.java is a file located in my machine. Now I want to write a code to read this method names and parameters. I think this is clear :)
Thank you

Most questions should contain where possible a minimal example of things you have already tried as it helps us answers the question more efficiently in the problem you are having with your code. Anyway for picking out method headers and parameters you should probably be looking into using a regex if you dont want a more fully complete solution for code parsing:
Firstly read your file in, you can see examples for different version of java here from #Grimy:
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public String readFile(String filename)
{
String content = null;
File file = new File(filename); //for ex foo.txt
FileReader reader = null;
try {
reader = new FileReader(file);
reader.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
if(reader !=null){reader.close();}
}
return content;
}
Then theres a number of solutions for a possible regex you could use here and here
(?:(?:public)|(?:private)|(?:static)|(?:protected)\s+)*
Once you have designed a regex you could use any of the normal ways to group the matches and output what you need from the matches.
Its worth mentioning for this type of program solutions such as ANTLR can be useful (but overkill, thenless you are looking to extend beyond just method headers) as they can generate an entire parser for you to use.

New "File" in java

1. what does new file in new File("scores.dat") line mean? It will create a new file?
2. When I run this piece of code, I get this as output:
"java.io.FileNotFoundException: scores.dat (The system cannot find the file specified)"
Does anybody know what the problem is?
3. There is not any "finally" section in this code; putting "finally" is optional in exceptions?
import java.util.Scanner;
import java.io.File;
import java.io.IOException;
public class ReadAndPrintScores
{
public static void main(String[] args)
{
try
{
Scanner s = new Scanner( new File("scores.dat") );
while( s.hasNextInt() )
{
System.out.println( s.nextInt() );
}
}
catch(IOException e)
{
System.out.println( e );
}
}
}

1. new File("scores.dat") does not create the file. It will just create something like a handle to this file (whether it exists or not). You can use this File object to ask whether the file already exists, to create a new file if it does not exist yet, and so on. You can see a full documentation of the File class in the official JavaDocs.
2. Since you do not create the file by simply creating a File object for it, the file does not exist yet and so there is nothing to read from.
3. The finally structure is optional. It is good practice to use it to make sure you close resources you do not need anymore, because finally blocks are always executed if their according try block was entered. Read more about the finally keyword here.

new File() is the constructor of File class. So a new instance of File will be created.
scores.dat must be a file that exists in the same directory of your code.
Yes finally is optional
Check the Java Doc for more information about the File class.

Binary editing in java

I have a file that I am trying to binary edit to cut off a header.
I have identified the start address of the actual data I want to keep in the file, however I am trying to find a way in Java where I can specify a range of bytes to delete from a file.
At the moment I am reading the file in a (Buffered)FileInputStream, and the only way I can see to cut off the header of this file is to save from my start address to the end of the file in memory, then write that out overwriting the original file.
Is there any functionality to remove bits in files without having to go through the process of creating a whole new file?

There is a method to truncate the file (setLength) but there is not API to remove an arbitrary sequence from inside.
If the file is so large that there is a performance issue to rewrite it, I suggest to split it into several files. Some performance maybe can be gained by using RandomAccessFile to seek to the point of deletion, rewrite from there and then truncate.

Try this, it uses a RandomAccessFile to wipe out the un-needed parts of the file, by first seeking to the start index, then wiping the un-needed characters onwards.
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
public class Main {
public static void main(String[] args) {
int startIndex = 21;
int numberOfCharsToRemove = 20;
// Using a RandomAccessFile, overwirte the part you want to wipe
// out using the NUL character
try (RandomAccessFile raf = new RandomAccessFile(new File("/Users/waleedmadanat/Desktop/sample.txt"), "rw")) {
raf.seek(startIndex);
for (int i = 1; i <= numberOfCharsToRemove; i++) {
raf.write('\u0000');
}
} catch (IOException e) {
e.printStackTrace();
}
}
}

I couldn't find any API method to perform what I wanted (goes with the answers above)
I solved it by just re-writing the file back out to a new file, then replacing the old one with the new one.
I used the following code to perform the replacement:
FileOutputStream fout = new FileOutputStream(inFile.getAbsolutePath() + ".tmp");
FileChannel chanOut = fout.getChannel();
FileChannel chanIn = fin.getChannel();
chanIn.transferTo(pos, chanIn.size(), chanOut);
Where pos is my start address to begin the file transfer, which occurs directly under the header that I am cutting out of this file.
I have also noticed no slowdowns using this method

How to write JUnit for a function which uses java.util.regex

This is the function that I've written:
private String getPatternFromLogFile() {
final File dir = new File(".");
try {
String fileContents = readFile(dir.getCanonicalPath()
+ "\\sshxcute.log");
Pattern patternDefinition = Pattern.compile(
".*Start to run command(.*)Connection channel closed",
Pattern.DOTALL);
Matcher inputPattern = patternDefinition.matcher(fileContents);
if (inputPattern.find()) {
return inputPattern.group(1);
}
} catch (IOException e) {
LOGGER.log(Level.DEBUG, e.getMessage());
}
return "";
}
I'm trying to get the contents of the file "sshxcute.log".
readFile() is a function in this class itself.
I want to write a test case which goes inside the if block and returns whatever I want so that I can assert it.
Is this the right approach?
Could someone please help me to write JUnit for this.
I'm new to JUnit and any help would be great.
Thank you.

Your method is doing a couple of things, and that makes it difficult to reliably unit test. It's reading a file, and then applying the regexp(s).
I would perhaps split this up. Create a class that reads the file and creates an array of lines, a stream etc. Then create a component that reads this array, stream etc. In that class perform your regexp work.
That way you can test your regexp component easily against canned data and not rely on the contents of files (I note your path above is hard-coded - that makes life more difficult). Becuase you've split the regexp component from the file reading component you can easily capture its output (for many different scenarios - not just the one provided by your one example file)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Splitting a text file - java

There is no best way and it depends on your environment and need actually. But for any language figure out your basic algorithm and try using the best available data structure(s). If you are using Java, consider using guava splitter and do look into its implementation.

Related

Is this bad practice for passing an input file in Java?

I want to read contains methods, in a .java file as below,

New "File" in java

Binary editing in java

How to write JUnit for a function which uses java.util.regex

Categories

Resources