Search text file for a specific line - java

I want to search for specific lines of text in a text file. If the piece of text I am looking for is on a specific line, I would like to read further on that line for more input.
So far I have 3 tags I am looking for.
#public
#private
#virtual
If I find any of these on a line, I would like to read what comes next so for example I could have a line like this:
#public double getHeight();
If I determine that the tag I found is #public then I have to take the following part after the white-space until I reach the semicolon. The problem is, that I can't really think of an efficient way to do this without excessive use of charAt(..) which neither looks pretty but probably isn't good either in the long run for a large file, or for multiple files in a row.
I would like help to solve this efficiently as I currently can't comprehend how I would do it. The code itself is used to parse comments in a C++ file, to later generate a Header file. The Pseudo Code part is where I am stuck. Some people suggest BufferedReader, others say Scanner. I went with Scanner as that seems to be the replacement for BufferedReader.
public void run() {
Scanner scanner = null;
String filename, path;
StringBuilder puBuilder, prBuilder, viBuilder;
puBuilder = new StringBuilder();
prBuilder = new StringBuilder();
viBuilder = new StringBuilder();
for(File f : files) {
try {
filename = f.getName();
path = f.getCanonicalPath();
scanner = new Scanner(new FileReader(f));
} catch (FileNotFoundException ex) {
System.out.println("FileNotFoundException: " + ex.getMessage());
} catch (IOException ex) {
System.out.println("IOException: " + ex.getMessage());
}
String line;
while((line = scanner.nextLine()) != null) {
/**
* Pseudo Code
* if #public then
* puBuilder.append(line.substring(after white space)
* + line.substring(until and including the semicolon);
*/
}
}
}

I may be misunderstanding you.. but are you just looking for String.contains()?
if(line.contains("#public")){}

String tag = "";
if(line.startsWith("#public")){
tag = "#public";
}else if{....other tags....}
line = line.substring(tag.length(), line.indexOf(";")).trim();
This gives you a string that goes from the end of the tag (which in this case is public), and then to the character preceding the semi-colon, and then trims off the whitespace on the ends.

if (line.startsWith("#public")) {
...
}

if you are allow to use open source libraries i suggest using the apache common-io and common-lang libraries. these are widely use java librariues that will make you life a lot more simpler.
String text = null;
InputStream in = null;
List<String> lines = null;
for(File f : files) {
try{
in = new FileInputStream(f);
lines = IOUtils.readLines(in);
for (String line: lines){
if (line.contains("#public"){
text = StringUtils.substringBetween("#public", ";");
...
}
}
}
catch (Exception e){
...
}
finally{
// alway remember to close the resource
IOUtils.closeQuietly(in);
}
}

Related

finding character count between two special symbols

Am trying to find the character count between = and \n new line character using below java code. But \n is not considering in my case.
am using import org.apache.commons.lang3.StringUtils; package
Please find my below java code.
public class CharCountInLine {
public static void main(String[] args)
{
BufferedReader reader = null;
try
{
reader = new BufferedReader(new FileReader("C:\\wordcount\\sample.txt"));
String currentLine = reader.readLine();
String[] line = currentLine.split("=");
while (currentLine != null ){
String res = StringUtils.substringBetween(currentLine, "=", "\n"); // \n is not working.
if(res != null) {
System.out.println("line -->"+res.length());
}
currentLine = reader.readLine();
}
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
try
{
reader.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
}
Please find my sample text file.
sample.txt
Karthikeyan=123456
sathis= 23546
Arun = 23564
Well, you're reading the string using readLine(), which according to the Javadoc (emphasis mine):
Returns:
A String containing the contents of the line, not including
any line-termination characters, or null if the end of the stream has
been reached
So your code doesn't work because the string does not contain a newline character.
You can address this in a number of ways:
Use StringUtils.substringAfter() instead of StringUtils.substringBetween().
If it meets the requirements, treat your file as a Java properties file so you don't need to parse it yourself.
Use String.split().
Use String.lastIndexOf().
Some simple regex matching and grouping.
You don't need to change how you read the lines, simply change your logic to extract the text after =.
Pattern p = Pattern.compile("(?:.+)=(.+)$");
Matcher m = p.matcher("Karthikeyan=123456");
if (m.find()) {
System.out.println(m.group(1).length());
}
No need for Apache StringUtils either, simple Java regex will do. If you don't want to count whitespace, trim the string before calling length().
Alternatively, you can also split the line around = as discussed here.
10x simpler code:
Path p = Paths.get("C:\\wordcount\\sample.txt");
Files.lines(p)
.forEach { line ->
// Put the above code here
}

split method to output values under each other when reading from a file

My code works fine however it prints the values side by side instead of under each other line by line. Like this:
iatadult,DDD,
iatfirst,AAA,BBB,CCC
I have done a diligent search on stackoverflow and none of my solution's seem to work. I know that I have to make the change while the looping is going on. However none of the examples I have seen have worked. Any further understanding or techniques to achieve my goal would be helpful. Whatever I am missing is probably very small. Please help.
String folderPath1 = "C:\\PayrollSync\\client\\client_orginal.txt";
File file = new File (folderPath1);
ArrayList<String> fileContents = new ArrayList<>(); // holds all matching client names in array
try {
BufferedReader reader = new BufferedReader(new FileReader(file));// reads entire file
String line;
while (( line = reader.readLine()) != null) {
if(line.contains("fooa")||line.contains("foob")){
fileContents.add(line);
}
//---------------------------------------
}
reader.close();// close reader
} catch (Exception e) {
System.out.println(e.getMessage());
}
System.out.println(fileContents);
Add a Line Feed before you add to fileContents.
fileContents.add(line+"\n");
By printing the list directly as you are doing you are invoking the method toString() overridden for the list which prints the contents like this:
obj1.toString(),obj2.toString() .. , objN.toString()
in your case the obj* are of type String and the toString() override for it returns the string itself. That's why you are seeing all the strings separated by comma.
To do something different, i.e: printing each object in a separate line you should implement it yourself, and you can simply append the new line character('\n') after each string.
Possible solution in java 8:
String result = fileContents.stream().collect(Collectors.joining('\n'));
System.out.println(result);
A platform-independent way to add a new line:
fileContents.add(line + System.lineSeparator);
Below is my full answer. Thanks for your help stackoverflow. It took me all day but I have a full solution.
File file = new File (folderPath1);
ArrayList<String> fileContents = new ArrayList<>(); // holds all matching client names in array
try {
BufferedReader reader = new BufferedReader(new FileReader(file));// reads entire file
String line;
while (( line = reader.readLine()) != null) {
String [] names ={"iatdaily","iatrapala","iatfirst","wpolkrate","iatjohnson","iatvaleant"};
if (Stream.of(names).anyMatch(line.trim()::contains)) {
System.out.println(line);
fileContents.add(line + "\n");
}
}
System.out.println("---------------");
reader.close();// close reader
} catch (Exception e) {
System.out.println(e.getMessage());
}

Java extract text from text file from a certain point on the text

I have created a method with BufferedReader that opens a text file created previously by the program and extracts some characters. My problem is that it extracts the whole line and I want to extract only after a specified character, the :.
Here is my try/catch block:
try {
InputStream ips = new FileInputStream("file.txt");
InputStreamReader ipsr = new InputStreamReader(ips);
BufferedReader br1 = new BufferedReader(ipsr);
String ligne;
while((ligne = br1.readLine()) != null) {
if(ligne.startsWith("Identifiant: ")) {
System.out.println(ligne);
id = ligne;
}
if(ligne.startsWith("Pass: ")) {
System.out.println(ligne);
pass = ligne;
}
}
System.out.println(ligne);
System.out.println(id);
System.out.println(pass);
br1.close();
} catch (Exception ex) {
System.err.println("Error. "+ex.getMessage());
}
At the moment, I return to my String id the entire ligne, and same for pass – by the way, all the sysout are tests and are useless there.
If anybody knows how to send to id the line after the :and not the entire line, I probably searched bad, but google wasn't my friend.
Assuming there's only one : symbol in the string you can go with
id = ligne.substring(ligne.lastIndexOf(':') + 1);
Use StringUtils
StringUtils.substringAfter(id ,":"),
Why don't you try to do a split() on ligne?
If you use String[] splittedLigne = ligne.split(":");, you will have the following in splittedLigne:
splittedLigne[0] -> What is before the :
splittedLigne[1] -> What is after the :
This will give you what you need for every line. Also, this will work for you if you have more than one :.

Parse a text file into multiple text file

I want to get multiple file by parsing a input file Through Java.
The Input file contains many fasta format of thousands of protein sequence and I want to generate raw format(i.e., without any comma semicolon and without any extra symbol like ">", "[", "]" etc) of each protein sequence.
A fasta sequence starts form ">" symbol followed by description of protein and then sequence of protein.
For example ► >lcl|NC_000001.10_cdsid_XP_003403591.1 [gene=LOC100652771]
[protein=hypothetical protein LOC100652771] [protein_id=XP_003403591.1] [location=join(12190..12227,12595..12721,13403..13639)]
MSESINFSHNLGQLLSPPRCVVMPGMPFPSIRSPELQKTTADLDHTLVSVPSVAESLHHPEITFLTAFCL
PSFTRSRPLPDRQLHHCLALCPSFALPAGDGVCHGPGLQGSCYKGETQESVESRVLPGPRHRH
Like above formate the input file contains 1000s of protein sequence. I have to generate thousands of raw file containing only individual protein sequence without any special symbol or gaps.
I have developed the code for it in Java but out put is : Cannot open a file followed by cannot find file.
Please help me to solve my problem.
Regards
Vijay Kumar Garg
Varanasi
Bharat (India)
The code is
/*Java code to convert FASTA format to a raw format*/
import java.io.*;
import java.util.*;
import java.util.regex.*;
import java.io.FileInputStream;
// java package for using regular expression
public class Arrayren
{
public static void main(String args[]) throws IOException
{
String a[]=new String[1000];
String b[][] =new String[1000][1000];
/*open the id file*/
try
{
File f = new File ("input.txt");
//opening the text document containing genbank ids
FileInputStream fis = new FileInputStream("input.txt");
//Reading the file contents through inputstream
BufferedInputStream bis = new BufferedInputStream(fis);
// Writing the contents to a buffered stream
DataInputStream dis = new DataInputStream(bis);
//Method for reading Java Standard data types
String inputline;
String line;
String separator = System.getProperty("line.separator");
// reads a line till next line operator is found
int i=0;
while ((inputline=dis.readLine()) != null)
{
i++;
a[i]=inputline;
a[i]=a[i].replaceAll(separator,"");
//replaces unwanted patterns like /n with space
a[i]=a[i].trim();
// trims out if any space is available
a[i]=a[i]+".txt";
//takes the file name into an array
try
// to handle run time error
/*take the sequence in to an array*/
{
BufferedReader in = new BufferedReader (new FileReader(a[i]));
String inline = null;
int j=0;
while((inline=in.readLine()) != null)
{
j++;
b[i][j]=inline;
Pattern q=Pattern.compile(">");
//Compiling the regular expression
Matcher n=q.matcher(inline);
//creates the matcher for the above pattern
if(n.find())
{
/*appending the comment line*/
b[i][j]=b[i][j].replaceAll(">gi","");
//identify the pattern and replace it with a space
b[i][j]=b[i][j].replaceAll("[a-zA-Z]","");
b[i][j]=b[i][j].replaceAll("|","");
b[i][j]=b[i][j].replaceAll("\\d{1,15}","");
b[i][j]=b[i][j].replaceAll(".","");
b[i][j]=b[i][j].replaceAll("_","");
b[i][j]=b[i][j].replaceAll("\\(","");
b[i][j]=b[i][j].replaceAll("\\)","");
}
/*printing the sequence in to a text file*/
b[i][j]=b[i][j].replaceAll(separator,"");
b[i][j]=b[i][j].trim();
// trims out if any space is available
File create = new File(inputline+"R.txt");
try
{
if(!create.exists())
{
create.createNewFile();
// creates a new file
}
else
{
System.out.println("file already exists");
}
}
catch(IOException e)
// to catch the exception and print the error if cannot open a file
{
System.err.println("cannot create a file");
}
BufferedWriter outt = new BufferedWriter(new FileWriter(inputline+"R.txt", true));
outt.write(b[i][j]);
// printing the contents to a text file
outt.close();
// closing the text file
System.out.println(b[i][j]);
}
}
catch(Exception e)
{
System.out.println("cannot open a file");
}
}
}
catch(Exception ex)
// catch the exception and prints the error if cannot find file
{
System.out.println("cannot find file ");
}
}
}
If you provide me correct it will be much easier to understand.
This code will not win prices, due to missing java expertice. For instance I would expect OutOfMemory even if it is correct.
Best would be a rewrite. Nevertheless we all began small.
Give full path to file. Also on the output the directory is probably missing from the file.
Better use BufferedReader etc. i.o. DateInputStream.
Initialize i with -1. Better use for (int i = 0; i < a.length; ++i).
Best compile the Pattern outside the loop. But remove the Matcher. You can do if (s.contains(">") as well.
. One does not need to create a new file.
Code:
const String encoding = "Windows-1252"; // Or "UTF-8" or leave away.
File f = new File("C:/input.txt");
BufferedReader dis = new BufferedReader(new InputStreamReader(
new FileInputStream(f), encoding));
...
int i= -1; // So i++ starts with 0.
while ((inputline=dis.readLine()) != null)
{
i++;
a[i]=inputline.trim();
//replaces unwanted patterns like /n with space
// Not needed a[i]=a[i].replaceAll(separator,"");
Your code contains the following two catch blocks:
catch(Exception e)
{
System.out.println("cannot open a file");
}
catch(Exception ex)
// catch the exception and prints the error if cannot find file
{
System.out.println("cannot find file ");
}
Both of these swallow the exception and print a generic "it didn't work" message, which tells you that the catch block was entered, but nothing more than that.
Exceptions often contain useful information that would help you track down where the real problem is. By ignoring them, you're making it much harder to diagnose your problem. Worse still, you're catching Exception, which is the superclass of a lot of exceptions, so these catch blocks are catching lots of different types of exceptions and ignoring them all.
The simplest way to get information out of an exception is to call its printStackTrace() method, which prints the exception type, exception message and stack trace. Add a call to this within both of these catch blocks, and that will help you see more clearly what exception is being thrown and from where.

Modify a .txt file in Java

I have a text file that I want to edit using Java. It has many thousands of lines. I basically want to iterate through the lines and change/edit/delete some text. This will need to happen quite often.
From the solutions I saw on other sites, the general approach seems to be:
Open the existing file using a BufferedReader
Read each line, make modifications to each line, and add it to a StringBuilder
Once all the text has been read and modified, write the contents of the StringBuilder to a new file
Replace the old file with the new file
This solution seems slightly "hacky" to me, especially if I have thousands of lines in my text file.
Anybody know of a better solution?
I haven't done this in Java recently, but writing an entire file into memory seems like a bad idea.
The best idea that I can come up with is open a temporary file in writing mode at the same time, and for each line, read it, modify if necessary, then write into the temporary file. At the end, delete the original and rename the temporary file.
If you have modify permissions on the file system, you probably also have deleting and renaming permissions.
if the file is just a few thousand lines you should be able to read the entire file in one read and convert that to a String.
You can use apache IOUtils which has method like the following.
public static String readFile(String filename) throws IOException {
File file = new File(filename);
int len = (int) file.length();
byte[] bytes = new byte[len];
FileInputStream fis = null;
try {
fis = new FileInputStream(file);
assert len == fis.read(bytes);
} catch (IOException e) {
close(fis);
throw e;
}
return new String(bytes, "UTF-8");
}
public static void writeFile(String filename, String text) throws IOException {
FileOutputStream fos = null;
try {
fos = new FileOutputStream(filename);
fos.write(text.getBytes("UTF-8"));
} catch (IOException e) {
close(fos);
throw e;
}
}
public static void close(Closeable closeable) {
try {
closeable.close();
} catch(IOException ignored) {
}
}
You can use RandomAccessFile in Java to modify the file on one condition:
The size of each line has to be fixed otherwise, when new string is written back, it might override the string in the next line.
Therefore, in my example, I set the line length as 100 and padding with space string when creating the file and writing back to the file.
So in order to allow update, you need to set the length of line a little larger than the longest length of the line in this file.
public class RandomAccessFileUtil {
public static final long RECORD_LENGTH = 100;
public static final String EMPTY_STRING = " ";
public static final String CRLF = "\n";
public static final String PATHNAME = "/home/mjiang/JM/mahtew.txt";
/**
* one two three
Text to be appended with
five six seven
eight nine ten
*
*
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException
{
String starPrefix = "Text to be appended with";
String replacedString = "new text has been appended";
RandomAccessFile file = new RandomAccessFile(new File(PATHNAME), "rw");
String line = "";
while((line = file.readLine()) != null)
{
if(line.startsWith(starPrefix))
{
file.seek(file.getFilePointer() - RECORD_LENGTH - 1);
file.writeBytes(replacedString);
}
}
}
public static void createFile() throws IOException
{
RandomAccessFile file = new RandomAccessFile(new File(PATHNAME), "rw");
String line1 = "one two three";
String line2 = "Text to be appended with";
String line3 = "five six seven";
String line4 = "eight nine ten";
file.writeBytes(paddingRight(line1));
file.writeBytes(CRLF);
file.writeBytes(paddingRight(line2));
file.writeBytes(CRLF);
file.writeBytes(paddingRight(line3));
file.writeBytes(CRLF);
file.writeBytes(paddingRight(line4));
file.writeBytes(CRLF);
file.close();
System.out.println(String.format("File is created in [%s]", PATHNAME));
}
public static String paddingRight(String source)
{
StringBuilder result = new StringBuilder(100);
if(source != null)
{
result.append(source);
for (int i = 0; i < RECORD_LENGTH - source.length(); i++)
{
result.append(EMPTY_STRING);
}
}
return result.toString();
}
}
If the file is large, you might want to use a FileStream for output, but that seems pretty much like it is the simplest process to do what you're asking (and without more specificity i.e. on what types of changes / edits / deletions you're trying to do, it's impossible to determine what more complicated way might work).
No reason to buffer the entire file.
Simply write each line as your read it, insert lines when necessary, delete lines when necessary, replace lines when necessary.
Fundamentally, you will not get around having to recreate the file wholesale, especially if it's just a text file.
What kind of data is it? Do you control the format of the file?
If the file contains name/value pairs (or similar), you could have some luck with Properties, or perhaps cobbling together something using a flat file JDBC driver.
Alternatively, have you considered not writing the data so often? Operating on an in-memory copy of your file should be relatively trivial. If there are no external resources which need real time updates of the file, then there is no need to go to disk every time you want to make a modification. You can run a scheduled task to write periodic updates to disk if you are worried about data backup.
In general you cannot edit the file in place; it's simply a very long sequence of characters, which happens to include newline characters. You could edit in place if your changes don't change the number of characters in each line.
Can't you use regular expressions, if you know what you want to change ? Jakarta Regexp should probably do the trick.
Although this question was a time ago posted, I think it is good to put my answer here.
I think that the best approach is to use FileChannel from java.nio.channels package in this scenario. But this, only if you need to have a good performance! You would need to get a FileChannel via a RandomAccessFile, like this:
java.nio.channels.FileChannel channel = new java.io.RandomAccessFile("/my/fyle/path", "rw").getChannel();
After this, you need a to create a ByteBuffer where you will read from the FileChannel.
this looks something like this:
java.nio.ByteBuffer inBuffer = java.nio.ByteBuffer.allocate(100);
int pos = 0;
int aux = 0;
StringBuilder sb = new StringBuilder();
while (pos != -1) {
aux = channel.read(inBuffer, pos);
pos = (aux != -1) ? pos + aux : -1;
b = inBuffer.array();
sb.delete(0, sb.length());
for (int i = 0; i < b.length; ++i) {
sb.append((char)b[i]);
}
//here you can do your stuff on sb
inBuffer = ByteBuffer.allocate(100);
}
Hope that my answer will help you!
I think, FileOutputStream.getFileChannel() will help a lot, see FileChannel api
http://java.sun.com/javase/6/docs/api/java/nio/channels/FileChannel.html
private static void modifyFile(String filePath, String oldString, String newString) {
File fileToBeModified = new File(filePath);
StringBuilder oldContent = new StringBuilder();
try (BufferedReader reader = new BufferedReader(new FileReader(fileToBeModified))) {
String line = reader.readLine();
while (line != null) {
oldContent.append(line).append(System.lineSeparator());
line = reader.readLine();
}
String content = oldContent.toString();
String newContent = content.replaceAll(oldString, newString);
try (FileWriter writer = new FileWriter(fileToBeModified)) {
writer.write(newContent);
}
} catch (IOException e) {
e.printStackTrace();
}
}
You can change the txt file to java by saving on clicking "Save As" and saving *.java extension.

Categories

Resources