How to ignore duplicate strings when using RegEx to match string? - java

EDIT: editted for clarity as to what I'm having trouble with. I'm not getting the right responses as its counting dupes. I HAVE to use RegEx, can use tokenizer however but I did not.
What I am trying to do here is, there is 5 input files. I need to calculate how many "USER DEFINED VARIABLES" there are. Please ignore the messy code, I'm just learning Java.
I replaced: everything within ( and ), all non-word characters, any statements such as int, main etc, any digit with a space infront of it, and any blank space with a new line then trim it.
This leaves me with a list that has a variety of strings which I will match with my RegEx. However, at this point, how make my count only include unique identifiers?
EXAMPLE:
For example, in the input file I have attached beneath the code, I am receiving
"distinct/unique identifiers: 10" in my output file, when it should be "distinct/unique identifiers: 3"
And for example, in the 5th input file I have attached, I should have "distinct/unique identifiers: 3" instead I currently have "distinct/unique identifiers: 6"
I cannot use Set, Map etc.
Any help is great! Thanks.
import java.util.*
import java.util.regex.*;
import java.io.*;
public class A1_123456789 {
public static void main(String[] args) throws IOException {
if (args.length < 1) {
System.out.println("Wrong number of arguments");
System.exit(1);
}
for (int i = 0; i < args.length; i++) {
FileReader jk = new FileReader(args[i]);
BufferedReader ij = new BufferedReader(jk);
FileWriter fw = null;
BufferedWriter bw = null;
String regex = "\\b(\\w+)(\\s+\\1\\b)+";
Pattern p = Pattern.compile("[_a-zA-Z][_a-zA-Z0-9]{0,30}");
String line;
int count = 0;
while ((line = ij.readLine()) != null) {
line = line.replaceAll("\\(([^\\)]+)\\)", " " );
line = line.replaceAll("[^\\w]", " ");
line = line.replaceAll("\\bint\\b|\\breturn\\b|\\bmain\\b|\\bprintf\\b|\\bif\\b|\\belse\\b|\\bwhile\\b", " ");
line = line.replaceAll(" \\d", "");
line = line.replaceAll(" ", "\n");
line = line.trim();
Matcher m = p.matcher(line);
while (m.find()) {
count++;
}
}
try {
String s1 = args[i];
String s2 = s1.replaceAll("input","output");
fw = new FileWriter(s2);
bw = new BufferedWriter(fw);
bw.write("distinct/unique identifiers: " + count);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (bw != null) {
bw.close();
}
if (fw != null) {
bw.close();
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}
//This is the 3rd input file below.
int celTofah(int cel)
{
int fah;
fah = 1.8*cel+32;
return fah;
}
int main()
{
int cel, fah;
cel = 25;
fah = celTofah(cel);
printf("Fah: %d", fah);
return 0;
}
//This is the 5th input file below.
int func2(int i)
{
while(i<10)
{
printf("%d\t%d\n", i, i*i);
i++;
}
}
int func1()
{
int i = 0;
func2(i);
}
int main()
{
func1();
return 0;
}

Try this
LinkedList dtaa = new LinkedList();
String[] parts =line.split(" ");
for(int ii =0;ii<parts.length;ii++){
if(ii == 0)
dtaa.add(parts[ii]);
else{
if(dtaa.contains(parts[ii]))
continue;
else
dtaa.add(parts[ii]);
}
}
count = dtaa.size();
instead of
Matcher m = p.matcher(line);
while (m.find()) {
count++;
}

Amal Dev has suggested a correct implementation, but given the OP wants to keep Matcher, we have:
// Previous code to here
// Linked list of unique entries
LinkedList uniqueMatches = new LinkedList();
// Existing code
while ((line = ij.readLine()) != null) {
line = line.replaceAll("\\(([^\\)]+)\\)", " " );
line = line.replaceAll("[^\\w]", " ");
line = line.replaceAll("\\bint\\b|\\breturn\\b|\\bmain\\b|\\bprintf\\b|\\bif\\b|\\belse\\b|\\bwhile\\b", " ");
line = line.replaceAll(" \\d", "");
line = line.replaceAll(" ", "\n");
line = line.trim();
Matcher m = p.matcher(line);
while (m.find()) {
// New code - get this match
String thisMatch = m.group();
// If we haven't seen this string before, add it to the list
if(!uniqueMatches.contains(thisMatch))
uniqueMatches.add(thisMatch);
}
}
// Now see how many unique strings we have collected
count = uniqueMatches.size();
Note I haven't compiled this, but hopefully it works as is...

Related

Split lines with "," and do a trim on every element [duplicate]

This is some code that I found to help with reading in a 2D Array, but the problem I am having is this will only work when reading a list of number structured like:
73
56
30
75
80
ect..
What I want is to be able to read multiple lines that are structured like this:
1,0,1,1,0,1,0,1,0,1
1,0,0,1,0,0,0,1,0,1
1,1,0,1,0,1,0,1,1,1
I just want to essentially import each line as an array, while structuring them like an array in the text file.
Everything I have read says to use scan.usedelimiter(","); but everywhere I try to use it the program throws straight to the catch that replies "Error converting number". If anyone can help I would greatly appreciate it. I also saw some information about using split for the buffered reader, but I don't know which would be better to use/why/how.
String filename = "res/test.txt"; // Finds the file you want to test.
try{
FileReader ConnectionToFile = new FileReader(filename);
BufferedReader read = new BufferedReader(ConnectionToFile);
Scanner scan = new Scanner(read);
int[][] Spaces = new int[10][10];
int counter = 0;
try{
while(scan.hasNext() && counter < 10)
{
for(int i = 0; i < 10; i++)
{
counter = counter + 1;
for(int m = 0; m < 10; m++)
{
Spaces[i][m] = scan.nextInt();
}
}
}
for(int i = 0; i < 10; i++)
{
//Prints out Arrays to the Console, (not needed in final)
System.out.println("Array" + (i + 1) + " is: " + Spaces[i][0] + ", " + Spaces[i][1] + ", " + Spaces[i][2] + ", " + Spaces[i][3] + ", " + Spaces[i][4] + ", " + Spaces[i][5] + ", " + Spaces[i][6]+ ", " + Spaces[i][7]+ ", " + Spaces[i][8]+ ", " + Spaces[i][9]);
}
}
catch(InputMismatchException e)
{
System.out.println("Error converting number");
}
scan.close();
read.close();
}
catch (IOException e)
{
System.out.println("IO-Error open/close of file" + filename);
}
}
I provide my code here.
public static int[][] readArray(String path) throws IOException {
//1,0,1,1,0,1,0,1,0,1
int[][] result = new int[3][10];
BufferedReader reader = new BufferedReader(new FileReader(path));
String line = null;
Scanner scanner = null;
line = reader.readLine();
if(line == null) {
return result;
}
String pattern = createPattern(line);
int lineNumber = 0;
MatchResult temp = null;
while(line != null) {
scanner = new Scanner(line);
scanner.findInLine(pattern);
temp = scanner.match();
int count = temp.groupCount();
for(int i=1;i<=count;i++) {
result[lineNumber][i-1] = Integer.parseInt(temp.group(i));
}
lineNumber++;
scanner.close();
line = reader.readLine();
}
return result;
}
public static String createPattern(String line) {
char[] chars = line.toCharArray();
StringBuilder pattern = new StringBuilder();;
for(char c : chars) {
if(',' == c) {
pattern.append(',');
} else {
pattern.append("(\\d+)");
}
}
return pattern.toString();
}
The following piece of code snippet might be helpful. The basic idea is to read each line and parse out CSV. Please be advised that CSV parsing is generally hard and mostly requires specialized library (such as CSVReader). However, the issue in hand is relatively straightforward.
try {
String line = "";
int rowNumber = 0;
while(scan.hasNextLine()) {
line = scan.nextLine();
String[] elements = line.split(',');
int elementCount = 0;
for(String element : elements) {
int elementValue = Integer.parseInt(element);
spaces[rowNumber][elementCount] = elementValue;
elementCount++;
}
rowNumber++;
}
} // you know what goes afterwards
Since it is a file which is read line by line, read each line using a delimiter ",".
So Here you just create a new scanner object passing each line using delimter ","
Code looks like this, in first for loop
for(int i = 0; i < 10; i++)
{
Scanner newScan=new Scanner(scan.nextLine()).useDelimiter(",");
counter = counter + 1;
for(int m = 0; m < 10; m++)
{
Spaces[i][m] = newScan.nextInt();
}
}
Use the useDelimiter method in Scanner to set the delimiter to "," instead of the default space character.
As per the sample input given, if the next row in a 2D array begins in a new line, instead of using a ",", multiple delimiters have to be specified.
Example:
scan.useDelimiter(",|\\r\\n");
This sets the delimiter to both "," and carriage return + new line characters.
Why use a scanner for a file? You already have a BufferedReader:
FileReader fileReader = new FileReader(filename);
BufferedReader reader = new BufferedReader(fileReader);
Now you can read the file line by line. The tricky bit is you want an array of int
int[][] spaces = new int[10][10];
String line = null;
int row = 0;
while ((line = reader.readLine()) != null)
{
String[] array = line.split(",");
for (int i = 0; i < array.length; i++)
{
spaces[row][i] = Integer.parseInt(array[i]);
}
row++;
}
The other approach is using a Scanner for the individual lines:
while ((line = reader.readLine()) != null)
{
Scanner s = new Scanner(line).useDelimiter(',');
int col = 0;
while (s.hasNextInt())
{
spaces[row][col] = s.nextInt();
col++;
}
row++;
}
The other thing worth noting is that you're using an int[10][10]; this requires you to know the length of the file in advance. A List<int[]> would remove this requirement.

Java: Read file as long as a new date is found

I want to read the file and add each entry to an arraylist on a date. But the date should also be included.
File Example:
15.09.2002 Hello, this is the first entry.
\t this line, I also need in the first entry.
\t this line, I also need in the first entry.
\t this line, I also need in the first entry.
17.10.2020 And this ist the next entry
I tried this. But the Reader reads only the first Line
public class versuch1 {
public static void main(String[] args) {
ArrayList<String> liste = new ArrayList<String>();
String lastLine = "";
String str_all = "";
String currLine = "";
try {
FileReader fstream = new FileReader("test.txt");
BufferedReader br = new BufferedReader(fstream);
while ((currLine = br.readLine()) != null) {
Pattern p = Pattern
.compile("[0-3]?[0-9].[0-3]?[0-9].(?:[0-9]{2})?[0-9]{2} [0-2]?[0-9]:[0-6]?[0-9]:[0-5]");
Matcher m = p.matcher(currLine);
if (m.find() == true) {
lastLine = currLine;
liste.add(lastLine);
} else if (m.find() == false) {
str_all = currLine + " " + lastLine;
liste.set((liste.indexOf(currLine)), str_all);
}
}
br.close();
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
System.out.print(liste.get(0) + " "+liste.get(1);
}
}
I have solved my problem :)
public class versuch1 {
public static void main(String[] args) {
ArrayList<String> liste = new ArrayList<String>();
String lastLine = "";
String currLine = "";
String str_all = "";
try {
FileReader fstream = new FileReader("test.txt");
BufferedReader br = new BufferedReader(fstream);
currLine = br.readLine();
while (currLine != null) {
Pattern p = Pattern
.compile("[0-3]?[0-9].[0-3]?[0-9].(?:[0-9]{2})?[0-9]{2} [0-2]?[0-9]:[0-6]?[0-9]:[0-5]");
Matcher m = p.matcher(currLine);
if (m.find() == true) {
liste.add(currLine);
lastLine = currLine;
} else if (m.find() == false) {
liste.set((liste.size() - 1), (str_all));
lastLine = str_all;
}
currLine = br.readLine();
str_all = lastLine + currLine;
}
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
System.out.print(liste.get(1) + " ");
}
}
While reading the lines, keep a "current entry".
If the line read begins with a date, then it belongs to a new entry. In this case add the current entry to the list of entries and create a new current entry consisting of the read line.
If the line did not begin with a date, just add it to the current entry.
For this to work, you need to read the first line into the current entry before the loop. And after the loop you need to add the current entry to the list of entries. This in turn only works if there is at least one line and the first line begins with a date. So handle the special case of no lines specially (use if-else). And report an error if the first line does not begin with a date.
Happy coding.

How can i use splitter ^ in java

I have a problem with my java program. I have to read lines from a file, the form of these lines is:
1#the^cat#the^dog#the^bird#^fish#bear
2#the^cat#the^dog#the^bird#^fish#bear
and print all, accept the "#" and "^" at textfields in my GUI. The "^" must appear in case there in not article. For exaple ^fish, i have to print it as ^fish but the^dog i have to print the dog.
As far i can read and print the lines in the textfields but i can't find a way to skip the "^" between the words.
Here is my code:
try {
FileReader file = new FileReader("C:\\Guide.txt");
BufferedReader BR = new BufferedReader(file);
boolean eof = false;
int i=0;
while (!eof) {
String line = BR.readLine();
if (line == null)
eof = true;
else {
i++;
System.out.println("Parsing line "+i+" <"+line+">");
String[] words = line.split("#");
if (words.length != 7) continue;
number=words[0];
onomastiki=words[1];
geniki=words[2];
aitiatiki=words[3];
klitiki=words[4];
genos=words[5];
Region=words[6];
E = new CityEntry(number,onomastiki,geniki,
aitiatiki,klitiki,
genos,Region);
Cities.add(E);
}
You can try something like this.
FileReader file = new FileReader("C:\\\\Users\\\\aq104e\\\\Desktop\\\\text");
BufferedReader BR = new BufferedReader(file);
boolean eof = false;
int i = 0;
while (!eof) {
String line = BR.readLine();
if (line == null)
eof = true;
else {
i++;
System.out.println("Parsing line " + i + " <" + line + ">");
String[] words = line.split("#");
for (int j = 0; j < words.length; j++) {
if(words[j].contains("^")) {
if(words[j].indexOf("^") == 0) {
// write your code here
//This is case for ^fish
}else {
// split using ^ and do further manipulations
}
}
}
}
}
Let me know if this works for you.
That is gonna work, but it is not best way)
foreach(String word : words){
if(word.contains"the"){
word.replace("^"," ");
}
}

Search for multiline String in a text file

I have a text file from which i am trying to search for a String which has multiple lines. A single string i am able to search but i need multi line string to be searched.
I have tried to search for single line which is working fine.
public static void main(String[] args) throws IOException
{
File f1=new File("D:\\Test\\test.txt");
String[] words=null;
FileReader fr = new FileReader(f1);
BufferedReader br = new BufferedReader(fr);
String s;
String input="line one";
// here i want to search for multilines as single string like
// String input ="line one"+
// "line two";
int count=0;
while((s=br.readLine())!=null)
{
words=s.split("\n");
for (String word : words)
{
if (word.equals(input))
{
count++;
}
}
}
if(count!=0)
{
System.out.println("The given String "+input+ " is present for "+count+ " times ");
}
else
{
System.out.println("The given word is not present in the file");
}
fr.close();
}
And below are the file contents.
line one
line two
line three
line four
Use the StringBuilder for that, read every line from file and append them to StringBuilder with lineSeparator
StringBuilder lineInFile = new StringBuilder();
while((s=br.readLine()) != null){
lineInFile.append(s).append(System.lineSeparator());
}
Now check the searchString in lineInFile by using contains
StringBuilder searchString = new StringBuilder();
builder1.append("line one");
builder1.append(System.lineSeparator());
builder1.append("line two");
System.out.println(lineInFile.toString().contains(searchString));
More complicated solution from default C (code is based on code from book «The C programming language» )
final String searchFor = "Ich reiß der Puppe den Kopf ab\n" +
"Ja, ich reiß' ich der Puppe den Kopf ab";
int found = 0;
try {
String fileContent = new String(Files.readAllBytes(
new File("puppe-text").toPath()
));
int i, j, k;
for (i = 0; i < fileContent.length(); i++) {
for (k = i, j = 0; (fileContent.charAt(k++) == searchFor.charAt(j++)) && (j < searchFor.length());) {
// nothig
}
if (j == searchFor.length()) {
++found;
}
}
} catch (IOException ignore) {}
System.out.println(found);
Why don't you just normalize all the lines in the file to one string variable and then just count the number of occurrences of the input in the file. I have used Regex to count the occurrences but can be done in any custom way you find suitable.
public static void main(String[] args) throws IOException
{
File f1=new File("test.txt");
String[] words=null;
FileReader fr = new FileReader(f1);
BufferedReader br = new BufferedReader(fr);
String s;
String input="line one line two";
// here i want to search for multilines as single string like
// String input ="line one"+
// "line two";
int count=0;
String fileStr = "";
while((s=br.readLine())!=null)
{
// Normalizing the whole file to be stored in one single variable
fileStr += s + " ";
}
// Now count the occurences
Pattern p = Pattern.compile(input);
Matcher m = p.matcher(fileStr);
while (m.find()) {
count++;
}
System.out.println(count);
fr.close();
}
Use StringBuilder class for efficient string concatenation.
Try with Scanner.findWithinHorizon()
String pathToFile = "/home/user/lines.txt";
String s1 = "line two";
String s2 = "line three";
String pattern = String.join(System.lineSeparator(), s1, s2);
int count = 0;
try (Scanner scanner = new Scanner(new FileInputStream(pathToFile))) {
while (scanner.hasNext()) {
String withinHorizon = scanner.findWithinHorizon(pattern, pattern.length());
if (withinHorizon != null) {
count++;
} else {
scanner.nextLine();
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
System.out.println(count);
Try This,
public static void main(String[] args) throws IOException {
File f1 = new File("./src/test/test.txt");
FileReader fr = new FileReader(f1);
BufferedReader br = new BufferedReader(fr);
String input = "line one";
int count = 0;
String line;
while ((line = br.readLine()) != null) {
if (line.contains(input)) {
count++;
}
}
if (count != 0) {
System.out.println("The given String " + input + " is present for " + count + " times ");
} else {
System.out.println("The given word is not present in the file");
}
fr.close();
}

how to fix error: cannot find symbol

I feel as if I am missing something really simple but I can't find it.
The goal of this code is to take a Shakespeare file and use a hash map to find the number of times a word is given by the text as well as words of "n" characters long. However I can't even get to the debugging portion because I get the error
Bard.java:13: error: cannot find symbol
Pattern getout = Pattern.compile("[\\w']+"); //this will take only the words
^ symbol: class Pattern location: class Bard
Bard.java:13: error: cannot find symbol
Pattern getout = Pattern.compile("[\\w']+"); //this will take only the words
plus a few more location. Help would be greatly appreciated.
import java.io.*;
import java.util.*;
public class Bard {
public static void main(String[] args) {
HashMap < String, Integer > m1 = new HashMap < String, Integer > (); // sets the hashmap
//create file reader for the shakespere text
try (BufferedReader br = new BufferedReader(new FileReader("shakespeare.txt"))) {
String line = br.readLine();
Pattern getout = Pattern.compile("[\\w']+"); //this will take only the words
//create the hashmap
while (line != null) {
Matcher m = getout.matcher(line); //find the relevent information
while (m.find()) {
if (m1.get(m.group()) == null && !m.group().toUpperCase().equals(m.group())) { //find new word that is not in all caps.
m1.put(m.gourp(), 1);
} else { //increments the onld word
int newValue = m1.get(m.group());
newValue++;
m1.put(m.group, newValue);
}
}
line = br.readLine();
}
} catch (Exception e) {
e.printStackTrace();
}
try (BufferedReader br2 = new BufferedReader(new FileReader("input.txt"))) {
String line2 = br2.readLine();
FileWriter output = new FileWriter("analysis.txt");
while (line2 != null) {
if (line2.matches("[\\d\\s]+")) { // if i am dealing with the two integers
String[] args = line.split(" "); // split them up
wordSize = Integer.parseInt(args[0]); // set the first on the the word size
numberOfWords = Integer.parseInt(args[1]); // set the other one to the number of words wanted
String[] wordsToReturn = new String[numberOfWords]; //create array to place the words
int i = 0;
int j;
for (String word: m1.keySet()) { //
if (word.length() == wordSize) {
wordToReturn[i] = word;
i++;
}
for (j = 0; numberOfWords > j; j++) {
output.write(wordToReturn[j]);
}
}
} else {
output.write(m1.get(line2));
}
}
line2 = br2.readLine();
} catch (Exception e) {
e.printStackTrace();
}
}
}
You have not imported the Pattern class. Import it with :-
import java.util.regex.*;

Categories

Resources