Search for appearances of string inside text

Search for appearances of string inside text - java

I have a .txt file with some text in it.
For example Hello, world.
I'd like to search the whole file and find out how many appearances a string has as well as the position of them, For example "wo" on the above text has one. That number should be placed in an edittext. However I only know how to search a specific char and not whole text, can you please help me? Thanks a lot
BufferedReader reader = new BufferedReader(new FileReader("somefile.txt"));
int ch;
char charToSearch='a';
int counter=0;
while((ch=reader.read()) != -1) {
if(charToSearch == (char)ch) {
counter++;
}
};
reader.close();
System.out.println(counter);

public static int countWord(String word, FileInputStream fis) {
BufferedReader in = new BufferedReader(new InputStreamReader(fis));
String readLine = "";
int count = 0;
try {
while ((readLine = in.readLine()) != null) {
String[] words = readLine.split(" ");
for (String s : words) {
if (s.contains(word))
count++;
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return count;
}

You can use something like:
int nFound = 0;
String target = ".............Your long text..................";
String search = "find this"
int startIndex = 0
do
{
int index = target.indexOf(search, startIndex);
if(index !=-1)
{
// Found
nFound++;
// Here you have the index variable, which says you the position of the found match
/* DO your job */
/* Update the index to start the search again on the rest of the string, until no matches are found*/
startIndex = index+1;
}
else
break;
}while(true);
Before doing this, concatenate the whole text in "target" string, or dexecute the previous code for each line if you are sure the target string is not going to appear at the end of some line and the begining of the next line

If you are using Java 7, then according to this, you can get a String with the whole file in it:
String text = new String(Files.readAllBytes(Paths.get("file")), StandardCharsets.UTF_8);
Then, you can do this:
public void print(String word)
{
String tempStr = null;
int count = 0;
while (tempStr.indexOf(word) != -1)
{
System.out.printf("Position: %d, Count: %d\r\n", tempStr.indexOf(word), ++count);
tempStr = tempStr.substring(tempStr.indexOf(word) + word.length());
}
}

For simplicity, I would read a line and use "string.split(String regex)".
while(readLine) {
String[] str = readLine.split(regex);
//you can tell based on the array, how many matches and their position.
}
You can also use util.Scanner or regex.Pattern.
But if you are looking for performance, I think 'string.indexOf' is the best approach.

Related

Need help reading file line by line

The current code that I have reads only the last line of the file. Can someone help me establish a way so that the code reads a file line by line?
import java.util.*;
import java.io.*;
public class Searcher extends File {
Scanner scn;
public Searcher(String filename) {
super(filename);
}
public void search(String input)
{
try {
scn = new Scanner(this);
String data = "";
while (scn.hasNext()) {
data = scn.nextLine();
}
int count = 0, fromIndex = 0;
while ((fromIndex = data.indexOf(input, fromIndex)) != -1) {
count++;
fromIndex++;
}
System.out.println("Total occurrences: " + count);
scn.close();
} catch (Exception e) {
System.out.println("Cant find file ");
}
}
public static void main(String[] args) {
Searcher search = new Searcher("src/ihaveadream.txt");
search.search("we");
}
}
appreciate any help !

while (scn.hasNext()) {
data = scn.nextLine();
}
You are replacing the value every time so you end up with the last value as that's what it gets set to in the end. Perhaps you wanted to append?
while (scn.hasNext()) {
data = data + scn.nextLine();
}
Good luck.

Your problem:
while (scn.hasNext()) {
data = scn.nextLine(); // right here
}
each next line replaces previous line.
Depending on what you need you can either:
make all lines as one String
data = data + scn.nextLine();
// another syntax to do the same:
data += scn.nextLine();
or use List to keep each line as separate element:
List<String> dataList = new ArrayList<>();
while (scn.hasNext()) {
dataList.add(scn.nextLine());
}

As everyone has already suggested, you are replacing the the data in your data variable in the while loop, and since the loop runs till the end of the file is reached, only the last line is stored in the data variable, and any further processing on data would get you results only from the last line, so what you can do is what everybody else here suggested, or you can try this as an alternative solution (close the while loop after you check for index values):
public void search(String input)
{
int count = 0, fromIndex = 0; //moved outside so that we don't reset it to 0 with every iteration of the loop
try {
scn = new Scanner(this);
String data = "";
while (scn.hasNext()) {
data = scn.nextLine();
//} instead of here
//int count = 0, fromIndex = 0; move these variables outside of the loop
while ((fromIndex = data.indexOf(input, fromIndex)) != -1) {
count++;
fromIndex++;
}
} //close it here
System.out.println("Total occurrences: " + count);
scn.close();
} catch (Exception e) {
System.out.println("Cant find file ");
}
}

How to ignore duplicate strings when using RegEx to match string?

EDIT: editted for clarity as to what I'm having trouble with. I'm not getting the right responses as its counting dupes. I HAVE to use RegEx, can use tokenizer however but I did not.
What I am trying to do here is, there is 5 input files. I need to calculate how many "USER DEFINED VARIABLES" there are. Please ignore the messy code, I'm just learning Java.
I replaced: everything within ( and ), all non-word characters, any statements such as int, main etc, any digit with a space infront of it, and any blank space with a new line then trim it.
This leaves me with a list that has a variety of strings which I will match with my RegEx. However, at this point, how make my count only include unique identifiers?
EXAMPLE:
For example, in the input file I have attached beneath the code, I am receiving
"distinct/unique identifiers: 10" in my output file, when it should be "distinct/unique identifiers: 3"
And for example, in the 5th input file I have attached, I should have "distinct/unique identifiers: 3" instead I currently have "distinct/unique identifiers: 6"
I cannot use Set, Map etc.
Any help is great! Thanks.
import java.util.*
import java.util.regex.*;
import java.io.*;
public class A1_123456789 {
public static void main(String[] args) throws IOException {
if (args.length < 1) {
System.out.println("Wrong number of arguments");
System.exit(1);
}
for (int i = 0; i < args.length; i++) {
FileReader jk = new FileReader(args[i]);
BufferedReader ij = new BufferedReader(jk);
FileWriter fw = null;
BufferedWriter bw = null;
String regex = "\\b(\\w+)(\\s+\\1\\b)+";
Pattern p = Pattern.compile("[_a-zA-Z][_a-zA-Z0-9]{0,30}");
String line;
int count = 0;
while ((line = ij.readLine()) != null) {
line = line.replaceAll("\\(([^\\)]+)\\)", " " );
line = line.replaceAll("[^\\w]", " ");
line = line.replaceAll("\\bint\\b|\\breturn\\b|\\bmain\\b|\\bprintf\\b|\\bif\\b|\\belse\\b|\\bwhile\\b", " ");
line = line.replaceAll(" \\d", "");
line = line.replaceAll(" ", "\n");
line = line.trim();
Matcher m = p.matcher(line);
while (m.find()) {
count++;
}
}
try {
String s1 = args[i];
String s2 = s1.replaceAll("input","output");
fw = new FileWriter(s2);
bw = new BufferedWriter(fw);
bw.write("distinct/unique identifiers: " + count);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (bw != null) {
bw.close();
}
if (fw != null) {
bw.close();
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}
//This is the 3rd input file below.
int celTofah(int cel)
{
int fah;
fah = 1.8*cel+32;
return fah;
}
int main()
{
int cel, fah;
cel = 25;
fah = celTofah(cel);
printf("Fah: %d", fah);
return 0;
}
//This is the 5th input file below.
int func2(int i)
{
while(i<10)
{
printf("%d\t%d\n", i, i*i);
i++;
}
}
int func1()
{
int i = 0;
func2(i);
}
int main()
{
func1();
return 0;
}

Try this
LinkedList dtaa = new LinkedList();
String[] parts =line.split(" ");
for(int ii =0;ii<parts.length;ii++){
if(ii == 0)
dtaa.add(parts[ii]);
else{
if(dtaa.contains(parts[ii]))
continue;
else
dtaa.add(parts[ii]);
}
}
count = dtaa.size();
instead of
Matcher m = p.matcher(line);
while (m.find()) {
count++;
}

Amal Dev has suggested a correct implementation, but given the OP wants to keep Matcher, we have:
// Previous code to here
// Linked list of unique entries
LinkedList uniqueMatches = new LinkedList();
// Existing code
while ((line = ij.readLine()) != null) {
line = line.replaceAll("\\(([^\\)]+)\\)", " " );
line = line.replaceAll("[^\\w]", " ");
line = line.replaceAll("\\bint\\b|\\breturn\\b|\\bmain\\b|\\bprintf\\b|\\bif\\b|\\belse\\b|\\bwhile\\b", " ");
line = line.replaceAll(" \\d", "");
line = line.replaceAll(" ", "\n");
line = line.trim();
Matcher m = p.matcher(line);
while (m.find()) {
// New code - get this match
String thisMatch = m.group();
// If we haven't seen this string before, add it to the list
if(!uniqueMatches.contains(thisMatch))
uniqueMatches.add(thisMatch);
}
}
// Now see how many unique strings we have collected
count = uniqueMatches.size();
Note I haven't compiled this, but hopefully it works as is...

Java replace characters in a TextFile - Alice In Wonderland

I'm trying to make a compressor for TextFiles and I get stuck at replacing characters.
This is my code:
compress.setOnAction(event ->
{
String line;
try(BufferedReader reader = new BufferedReader(new FileReader(newFile)))
{
while ((line = reader.readLine()) != null)
{
int length = line.length();
String newLine = "";
for (int i = 1; i < length; i++)
{
int c = line.charAt(i);
if (c == line.charAt(i - 1))
{
}
}
}
}
catch (IOException ex)
{
ex.printStackTrace();
}
});
So what I want to do is: I want to find all the words where two characters are equal, if they are aside (Like 'Took'). When the if statement is true, I want to replace the first letter of the two equals characters, so it would look like: 'T2ok'.
I've tried a lot of things and I get an ArrayOutOfbounds, StringOutOfbounds, and so on, all the time...
Hope someone has a great answer :-)
Regards

Create a method that compress one String as follows:
Loop throu every character using a while loop. Count the duplicates in another nested while loop that increments the current index while duplicates are found and skips them from being written to output. Additionally this counts their occurence.
public String compress(String input){
int length = input.length(); // length of input
int ix = 0; // actual index in input
char c; // actual read character
int ccounter; // occurrence counter of actual character
StringBuilder output = // the output
new StringBuilder(length);
// loop over every character in input
while(ix < length){
// read character at actual index then inc index
c = input.charAt(ix++);
// we count one occurrence of this character here
ccounter = 1;
// while not reached end of line and next character
// is the same as previously read
while(ix < length && input.charAt(ix) == c){
// inc index means skip this character
ix++;
// and inc character occurence counter
ccounter++;
}
// if more than one character occurence is counted
if(ccounter > 1){
// print the character count
output.append(ccounter);
}
// print the actual character
output.append(c);
}
// return the full compressed output
return output.toString();
}
Now you can use this method to create a file input to output stream using java8 techniques.
// create input stream that reads line by line, create output writer
try (Stream<String> input = Files.lines(Paths.get("input.txt"));
PrintWriter output = new PrintWriter("output.txt", "UTF-8")){
// compress each input stream line, and print to output
input.map(s -> compress(s)).forEachOrdered(output::println);
} catch (IOException e) {
e.printStackTrace();
}
If you really want to. You can remove the input file and rename the output file afterwards with
Files.move(Paths.get("output.txt"), Paths.get("input.txt"),StandardCopyOption.REPLACE_EXISTING);
I think this is the most efficient way to do what you want.

try this:
StringBuilder sb = new StringBuilder();
String line;
try(BufferedReader reader = new BufferedReader(new FileReader(newFile)))
{
while ((line = reader.readLine()) != null)
{
if (!line.isEmpty()) {
//clear states
boolean matchedPreviously = false;
char last = line.charAt(0);
sb.setLength(0);
sb.append(last);
for (int i = 1; i < line.length(); i++) {
char c = line.charAt(i);
if (!matchedPreviously && c == last) {
sb.setLength(sb.length()-1);
sb.append(2);
matchedPreviously = true;
} else matchedPreviously = false;
sb.append(last = c);
}
System.out.println(sb.toString());
}
}
}
catch (IOException ex)
{
ex.printStackTrace();
}
This solution uses only a single loop, but can only find occurrences of length 2

ArrayIndexOutOfBounds looping through text file

I have loaded from text files many times before without thi issue, I have read and re-read my code and I (personally) cant see why I would get this issue, I am completely lost.
static public ArrayList<Media> importMedia(String fileName) throws IOException
{
try {
ArrayList<Media> mList = new ArrayList<>();
BufferedReader reader = new BufferedReader(new FileReader(fileName));
String line = reader.readLine();
int numberOfItems = Integer.valueOf(line);
while((line = reader.readLine()) != null)
{
String[] split = line.split(",");
if(split[0].contains("mp3"))
{
Mp3 mp3 = new Mp3(split[1]/*title*/,split[0]/*filename*/,Integer.parseInt(split[4])/*releaseyear*/,split[2]/*artist*/,split[3]/*album*/,split[5]/*label*/,Double.parseDouble(split[6])/*runtime*/);
mList.add(mp3);
}else if (split[0].contains("gif"))
{
Gif gif = new Gif(split[1]/*title*/,split[0]/*filename*/,Integer.parseInt(split[6])/*releaseyear*/,Double.parseDouble(split[2])/*width*/,Double.parseDouble(split[3])/*height*/,split[4]/*equipName*/,split[5]/*equipModel*/);
mList.add(gif);
}else if(split[0].contains("avi"))
{
String castNames = "";
boolean first = true;
for(int i = 7; i < 15; i++)
{
if(!(split[i].isEmpty()))
{
if(first)
{
castNames += split[i];
first = false;
}else{
castNames += "," + split[i];
}
}
}
Avi avi = new Avi(split[1]/*title*/,split[0]/*filename*/,Integer.parseInt(split[3])/*releaseyear*/,split[2]/*studio*/,split[5]/*director*/,castNames/*castnames*/,Double.parseDouble(split[4])/*runtime*/,Integer.parseInt(split[6])/*cast*/);
mList.add(avi);
}else{
}
}
return mList;
} catch (Exception ex) { System.out.println(ex.toString()); }
return null;
}
Now it will only get the first 3 files(Console shown in picture)
I am simply trying to loop through and I am not sure why it would be out of bounds, I cannot see anything wrong with the loop, or why its giving me some but not all.

In this code you are using a String Array split from index 0 to index 14.
It would be good to do some defensive programming by checking length of String Array.
Please check the length of array before proceeding to use it in your programme.
like split.length >14
By using this habit you can always escape from 'ArrayIndexOutOfBoundsException'

Trying to extract a substring from a buffered reader that reads between certain tags

I'm extracting 5 webpages using bufferedreader, each separated by a space, I want to use a substring to extract each pages url, html, source, and date. But I need guidance on how to use the substring properly to achieve this, cheers.
public static List<WebPage> readRawTextFile(Context ctx, int resId) {
InputStream inputStream = ctx.getResources().openRawResource(
R.raw.pages);
InputStreamReader inputreader = new InputStreamReader(inputStream);
BufferedReader buffreader = new BufferedReader(inputreader);
String line;
StringBuilder text = new StringBuilder();
try {
while ((line = buffreader.readLine()) != null) {
if (line.length() == 0) {
// ignore for now
//Will be used when blank line is encountered
}
if (line.length() != 0) {
//here I want the substring to pull out the correctStrings
int sURL = line.indexOf("<!--");
int eURL = line.indexOf("-->");
line.substring(sURL,eURL);
**//Problem is here**
}
}
} catch (IOException e) {
return null;
}
return null;
}

I think what u want is like this ,
public class Test {
public static void main(String args[]) {
String text = "<!--Address:google.co.uk.html-->";
String converted1 = text.replaceAll("\\<!--", "");
String converted2 = converted1.replaceAll("\\-->", "");
System.out.println(converted2);
}
}
result show : Address:google.co.uk.html

In catch block don't return null, use printStackTrace();. It will help you to find if something went wrong.
String str1 = "<!--Address:google.co.uk.html-->";
// Approach 1
int st = str1.indexOf("<!--"); // gives index which starts from <
int en = str1.indexOf("-->"); // gives index which starts from -
str1 = str1.substring(st + 4, en);
System.out.println(str1);
// Approach 2
String str2 = "<!--Address:google.co.uk.html-->";
str2 = str2.replaceAll("[<>!-]", "");
System.out.println( str2);
Note $100: be aware that using regex in replaceAll it will replace everything in string containing regex params.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Search for appearances of string inside text - java

Related

Need help reading file line by line

How to ignore duplicate strings when using RegEx to match string?

Java replace characters in a TextFile - Alice In Wonderland

ArrayIndexOutOfBounds looping through text file

Trying to extract a substring from a buffered reader that reads between certain tags

Categories

Resources