Multiple string search in txt file (java)

Multiple string search in txt file (java) - java

BufferedReader br2 = new BufferedReader(
new InputStreamReader(new FileInputStream(id_zastavky), "windows-1250")
);
for (int i = 0; i < id_linky_list.size(); i++)
{
while ((sCurrentLine2 = br2.readLine()) != null)
{
String pom = id_linky_list.get(i);
String[] result = sCurrentLine2.split("\\|");
if((result[1].toString()).equals(pom.toString()))
{
System.out.println(result[1].toString()+" " +pom.toString() + " " + result[3]);
}
}
}
br2.close();
Hey guys. Anyone can give me advice why is my FOR loop using only first item in my id_linky_list a then it quits? I think that the problem is on this line
while ((sCurrentLine2 = br2.readLine()) != null)
. I have over 5 000 items in my list and I need to compare them if they exist in my txt file. If I run my App the for loop only takes first item. How should I modify my code to make it work properly? Thank you for any help.

during the first iteration of for loop, the whole file will be read and br2.readLine() will always return null for next iterations.
Instead of that if the file size is small you could build a map and you can use that map to check the content
File file = new File("filename");
List<String> lines = Files.linesOf(file, Charset.defaultCharset());
Map<String, List<String>> map = lines.stream().collect(Collectors.groupingBy(line -> line.split("\\|")[1]));
List<String> id_linky_list = null;
for (int i = 0; i < id_linky_list.size(); i++) {
if (map.get(id_linky_list.get(i)) != null) {
//sysout
}
}
Update
Map<String, List<String>> text = Files.lines(file.toPath(), Charset.forName("windows-1250")).collect(Collectors.groupingBy(line -> line.split("\\|")[1]));

Anyone can give me advice why is my FOR loop using only first item in
my id_linky_list a then it quits?
Simply because you read your entire file in the loop while ((sCurrentLine2 = br2.readLine()) != null) when you first call it which is when i = 0 next calls will do nothing as the file content has already been read so br2.readLine() will return null.
How should I modify my code to make it work properly?
You need to invert the loops for and while as next
while ((sCurrentLine2 = br2.readLine()) != null)
{
for (int i = 0; i < id_linky_list.size(); i++)
{
To get better performances consider using a Set instead of a List to store your words and simply check if a given word exists by using the method contains(object) instead of iterating over your List.

Related

check if a string is contained in a text file of words in java

I have a text file (collection of all valid english words) from a github project that looks like this words.txt
My text file is under the resources folder in my project.
I have also a list of rows obtained from a table in mysql.
What i'm trying to do is to check if all the words in a every row are valid english words, that's why I compare each row with the words contained in my file.
This what i've tried so far :
public static void englishCheck(List<String> rows) throws IOException {
ClassLoader classLoader = ClassLoader.getSystemClassLoader();
int lenght, occurancy = 0;
for ( String row : rows ){
File file = new File(classLoader.getResource("words.txt").getFile());
lenght = 0;
if ( !row.isEmpty() ){
System.out.println("the row : "+row);
String[] tokens = row.split("\\W+");
lenght = tokens.length;
for (String token : tokens) {
occurancy = 0;
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null ){
if ((line.trim().toLowerCase()).equals(token.trim().toLowerCase())){
occurancy ++ ;
}
if (occurancy == lenght ){ System.out.println(" this is english "+row);break;}
}
}
}
}
}
this works only for the very first rows, after that my method loops over the rows only displaying them and ignores the comparison, I would like to know why this isn't working for my set of rows, It works also if I predefined my list like this List<String> raws = Arrays.asList(raw1, raw2, raw3 ) and so on

You can use the method List#containsAll(Collection)
Returns true if this list contains all of the elements of the
specified collection.
lets assume you have both list flled myListFromRessources and myListFromRessources then you can do:
List<String> myListFromRessources = Arrays.asList("A", "B", "C", "D");
List<String> myListFromRessources = Arrays.asList("D", "B");
boolean myInter = myListFromRessources.containsAll(myListFromSQL);
System.out.println(myInter);
myListFromSQL = Arrays.asList("D", "B", "Y");
myInter = myListFromRessources.containsAll(myListFromSQL);
System.out.println(myInter);

You can read words.txt file, convert words into lower case, then put words into HashSet.
Use the boolean contains(Object o) or boolean containsAll(Collection<?> c); methods to compare each word.
The time was O(n).
TIP： Do not read file in every loop. Reading file is very very slow.
ClassLoader classLoader = ClassLoader.getSystemClassLoader();
InputStream inputStream = classLoader.getResourceAsStream("words.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
List<String> wordList = new LinkedList<String>(); // You do not know word count, LinkedList is a better way.
String line = null;
while ((line = reader.readLine()) != null) {
String[] words = line.toLowerCase().split("\\W+");
wordList.addAll(Arrays.asList(words));
}
Set<String> wordSet = new HashSet<String>(wordList.size());
wordSet.addAll(wordList);
// then you can use the wordSet to check.
// You shold convert the tokens to lower case.
String[] tokens = row.toLowerCase().split("\\W+");
wordSet.containsAll(Arrays.asList(tokens));

The reason your code doesn't work is that occurancy can never be anything other than 0 or 1. You can see that by following the logic or going through a debugger.
If your words.txt file is not too large, and you have enough RAM available, you can speed up processing by reading the words.txt file into memory at the start. Also, you only ever need to call toLowerCase() once, instead of every time you compare. However, be careful with locales. The following code should work as long as you haven't got any non-English characters such as a German eszett or a Greek sigma.
public static void englishCheck(List<String> rows) throws IOException {
final URI wordsUri;
try {
wordsUri = ClassLoader.getSystemResource("words.txt").toURI();
} catch (URISyntaxException e) {
throw new AssertionError(e); // can never happen
}
final Set<String> words = Files.lines(Paths.get(wordsUri))
.map(String::toLowerCase)
.collect(Collectors.toSet());
for (String row: rows)
if (!row.isEmpty()) {
System.out.println("the row : " + row);
String[] tokens = row.toLowerCase().split("\\W+");
if (words.containsAll(Arrays.asList(tokens)))
System.out.println(" this is english " + row);
}
}

sending all read lines to string array

I have a concept but I'm not sure how to go at it. I would like to parse a website and use regex to find certain parts. Then store these parts into a string. After I would like to do the same, but find differences between before and after.
The plan:
parse/regex add lines found to the array before.
refresh the website/parse/regex add lines found to the array after.
compare all strings before with all of string after. println any new ones.
send all after strings to before strings.
Then repeat from 2. forever.
Basically its just checking a website for updated code and telling me what's updated.
Firstly, is this doable?
Here's my code for part 1.
String before[] = {};
int i = 0;
while ((line = br.readLine()) != null) {
Matcher m = p.matcher(line);
if (m.find()) {
before[i]=line;
System.out.println(before[i]);
i++;
}
}
It doesn't work and I am not sure why.

You could do something like this, assuming you're reading from a file:
Scanner s = new Scanner(new File("oldLinesFilePath"));
List<String> oldLines = new ArrayList<String>();
List<String> newLines = new ArrayList<String>();
while (s.hasNext()){
oldLines.add(s.nextLine());
}
s = new Scanner(new File("newLinesFilePath"));
while (s.hasNext()){
newLines.add(s.nextLine());
}
s.close();
for(int i = 0; i < newLines.size(); i++) {
if(!oldLines.contains(newLines.get(i)) {
System.out.println(newLines.get(i));
}
}

Best way to compare big csv files?

I must do an application, that compares some very big csv files, each one having 40,000 records. I have done an application, that works properly, but it spends a lot of time in doing that comparison, because the two files could be disordenated or have different records - for that I must iterate (40000^2)*2 times.
Here is my code:
if (nomFich.equals("CAR"))
{
while ((linea = br3.readLine()) != null)
{
array =linea.split(",");
spliteado = array[0]+array[1]+array[2]+array[8];
FileReader fh3 = new FileReader(cadena + lista2[0]);
BufferedReader bh3 = new BufferedReader(fh3);
find=0;
while (((linea2 = bh3.readLine()) != null))
{
array2 =linea2.split(",");
spliteado2 = array2[0]+array2[1]+array2[2]+array2[8];
if (spliteado.equals(spliteado2))
{
find =1;
}
}
if (find==0)
{
bw3.write("+++++++++++++++++++++++++++++++++++++++++++");
bw3.newLine();
bw3.write("Se han incorporado los siguientes CGI en la nueva lista");
bw3.newLine();
bw3.write(linea);
bw3.newLine();
aparece=1;
}
bh3.close();
}
I think that using a Set in Java is a good option, like the following post suggests:
Comparing two csv files in Java
But before I try it this way, I would like to know, if there are any better options.
Thanks for all.

As far as I can interpret your code, you need to find out which lines in the first CSV file do not have an equal line in the second CSV file. Correct?
If so, you only need to put all lines of the second CSV file into a HashSet. Like so (Java 7 code):
Set<String> linesToCompare = new HashSet<>();
try (BufferedReader reader = new BufferedReader(new FileReader(cadena + lista2[0]))) {
String line;
while ((line = reader.readLine()) != null) {
String[] splitted = line.split(",");
linesToCompare.add(splitted[0] + splitted[1] + splitted[2] + splitted[8]);
}
}
Afterwards you can simply iterate over the lines in the first CSV file and compare:
try (BufferedReader reader = new BufferedReader(new FileReader(...))) {
String line;
while ((line = reader.readLine()) != null) {
String[] splitted = line.split(",");
String joined = splitted[0] + splitted[1] + splitted[2] + splitted[8];
if (!linesToCompare.contains(joined)) {
// handle missing line here
}
}
}
Does that fit your needs?

HashMap<String, String> file1Map = new HashMap<String, String>();
while ((String line = file1.readLine()) != null) {
array =line.split(",");
key = array[0]+array[1]+array[2]+array[8];
file1Map.put(key, key);
}
while ((String line = file2.readLine()) != null) {
array =line.split(",");
key = array[0]+array[1]+array[2]+array[8];
if (file1Map.containsKey(key)) {
//if file1 has same line in file2
}
else {
//if file1 doesn't have line like in file2
}
}

Assuming this all won't fit in memory I would first convert the files to their stripped down versions (el0, el1, el2, el8, orig-file-line-nr-for-reference-afterwards) and then sort said files. After that you can stream through both files simultaneously and compare the records as you go... Taking the sorting out of the equation you only need to compare them 'about once'.
But I'm guessing you could do the same using some List/Array object that allows for sorting and storing in memory; 40k records really doesn't sound all that much to me, unless the elements are very big of course. And it's going to be magnitudes faster.

OpenCSV reader for Java - how to get the number of columns?

I have OpenCSV reader in a Java project and its reading data from CSV files for me, but right now Im hardcoding the number of colums in the loop that reads it.
I can recall there was some method that could get the number of columns but I dont remember what it was called and how to use it.
Here is the code:
public ArrayList<ArrayList<String>> getCSVContent(String filepath) throws Exception {
CSVReader reader = new CSVReader(new FileReader(FILE_PATH));
ArrayList<ArrayList<String>> array = new ArrayList<ArrayList<String>>();
String[] nextLine;
while ((nextLine = reader.readNext()) != null) {
ArrayList<String> list = new ArrayList<String>();
for (int i = 0; i < 5; i++) { //5 is the number of columns in the file
list.add(nextLine[i]);
}
array.add(list);
}
reader.close();
return array;
}

Just count the items in the array with nextLine.length.
for (int i = 0; i < nextLine.length; i++) {
list.add(nextLine[i]);
}
Or use a for-each loop:
for (String col : nextLine) {
list.add(col);
}

Well a easy way is that you read the first line with a Scanner of BufferedReader and count the ";" or what you use to split the columns.
You are able to count it if you use
string.toCharArray();
and ++ a Integer if it is ';'.
A second way is the look at the methods of CSV Reader. I don't know them but you can type anywhere in eclipse(don't know how it works in netbeans) "reader." and press control + space. If you have luck there is one.

Try this one to get the column count: reader.getColumnCount()

Reading lines of data from text files using java

I have a text file with x amount of lines. each line holds a integer number. When the user clicks a button, an action is performed via actionlistener where it should list all the values as displayed on the text file. However, right now I have linenum set to 10 implying I already told the code that just work with 10 lines of the text file. So, if my text file has only 3 rows/lines of data...it will list those lines and for rest of the other 7 lines, it will spit out "null".
I recall there is a way to use ellipsis to let the program know that you don't know the exact value but at the end it calculates it based on the given information. Where my given information will the number of lines with numbers(data).
Below is part of the code.
private class thehandler implements ActionListener{
public void actionPerformed(ActionEvent event){
BufferedReader inputFile=null;
try {
FileReader freader =new FileReader("Data.txt");
inputFile = new BufferedReader(freader);
String MAP = "";
int linenum=10;
while(linenum > 0)
{
linenum=linenum-1;
MAP = inputFile.readLine();//read the next line until the specfic line is found
System.out.println(MAP);
}
} catch( Exception y ) { y.printStackTrace(); }
}}

just put instead of linenum > 0 the next line (MAP = inputFile.readLine()) != null
and delete the next lines, linenum=linenum-1;
MAP = inputFile.readLine(); and next time a bit of googling might help +)
The null value of the last line would not be printed because it sets the line to be the current line and compares it with the null value so if the last line is null it will not print it and what about the 10 lines limit? you can do it easily you can just add an an index to the for loop and to index and to check with && if the i is lower then 10

Test the value that comes back from BufferedReader.readLine(), if it is null stop looping, like so:
BufferedReader reader = new BufferedReader(new FileReader("Data.txt"));
try {
for (String line; (line = reader.readLine()) != null;) {
System.out.println(line);
}
} finally {
reader.close();
}
EDIT: forgot the requirement to take first 10 lines, you can change above code to put output in a list and return the list, then you can filter it through a function like this:
public List<String> takeFirst(int howMany, List<String> lines) {
return lines.size() <= howMany ? lines : lines.subList(0, howMany);
}
If the file is huge then this will be inefficient, of course, and if that matters you will end up doing something like:
BufferedReader reader = new BufferedReader(new FileReader("Data.txt"));
try {
int linesRead = 0;
for (String line; (line = reader.readLine()) != null && linesRead < 10;) {
System.out.println(line);
linesRead += 1;
}
} finally {
reader.close();
}
which is uglier but reads only the lines you need.

How about you don't print MAP if its value is null?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Multiple string search in txt file (java) - java

Related

check if a string is contained in a text file of words in java

sending all read lines to string array

Best way to compare big csv files?

OpenCSV reader for Java - how to get the number of columns?

Reading lines of data from text files using java

Categories

Resources