how to find i of a token in an array[i] - java

So, I've found a word in a document and print the line in which the word is present like this:
say example file contains : "The quick brown fox jumps over the lazy dog.Jackdaws love my big sphinx of quartz."
FileInputStream fstream = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while((strLine = br.readLine()) != null){
//check to see whether testWord occurs at least once in the line of text
check = strLine.toLowerCase().contains(testWord.toLowerCase());
if(check){
//get the line, and parse its words into a String array
String[] lineWords = strLine.split("\\s+");
for(int i=0;i<lineWords.length;i++){
System.out.print(lineWords[i]+ ' ');
}
And if I search for 'fox' , then linewords[] will contain tokens from the first sentence. and linewords[3] = fox. To print the color of the fox, I need linewords[2].
I was wondering how can we get the 'i' of a token in that linewords[i], because I want the output to be linewords[i-1]

You could use a hashMap which stores the word and a list with the indices.
HashMap<String, List<Integer>> indices = new HashMap<>();
So in the for loop you fill the HashMap:
for(int i=0;i<lineWords.length;i++){
String word = lineWords[i];
if (!indices.contains(word)) {
indices.put(word, new ArrayList<>();
}
indices.get(word).add(i);
}
To get all the indices of a specific word call:
List<Integer> indicesForWord = indices.get("fox");
And to get the i - 1 word call:
for (int i = 0; i < indicesForWord.size(); i++) {
int index = indicesForWord[i] - 1;
if (index >= 0 || index >= lineWords.length) {
System.out.println(lineWords[index]);
}
}

If you are using Java 8, it is straightforward:
List<String> words = Files.lines(Paths.get("files/input.txt"))
.flatMap(line -> Arrays.stream(line.split("\\s+")))
.collect(Collectors.toList());
int index = words.indexOf("fox");
System.out.println(index);
if(index>0)
System.out.println(words.get(index-1));
This solution works also when the word you are searching is the first words in a line. I hope it helps!
If you need to find all occurences, you can use the indexOfAll method from this post.

That can be done by traversing the array and when you get your word , print the one before it.Here's how:-
if(lineWords[0].equals(testWord)
return;//no preceding word
for(int i=1;i<lineWords.length;i++){
if(lineWords[i].equals(testWord){
System.out.println(lineWords[i-1]);
break;
}
}

Related

How do I split multiple lines with multiple delimeters?

I have a .txt file with multiple lines with many emails such as Mike#sport.member.com, Laura#music.member.com, Quinn#music.member.com. How do I split them so I can add them to seperate arraylists like music or sport?
Mike: sport
Laura: music
Quinn: music
Thanks so much.
you can use regex
Pattern pattern = Pattern.compile("(\\w+)#(\\w+).(\\w+).(\\w+)");
Matcher matcher = pattern.matcher("Laura#music.member.com");
if (matcher.find()) {
matcher.group(1); //this return the name
matcher.group(2); //this return music sport
matcher.group(3); // get the company name
matcher.group(4); // get the domain com in this case
}
use this to fill the arraylist as you like
and have a nice day :)
If you don't mind using ArrayUtils you can use this code:
String[] emails = new String[] {"Mike#sport.member.com", "Laura#music.member.com", "Quinn#music.member.com"};
int startLength;
String[][] split = new String[emails.length][];
for(int i = 0; i < emails.length; i++) {
split[i] = emails[i].split("#"); //split by #
startLength = split[i].length;
for(int j = 0; j < startLength; j++) {
split[i] = ArrayUtils.addAll(split[i], split[i][0].split("\\.")); //Split by . and add items to the end of the array
split[i] = ArrayUtils.remove(split[i], 0); //remove the first item of the array
}
}
System.out.println(Arrays.deepToString(split));
Output:
[[Mike, sport, member, com], [Laura, music, member, com], [Quinn, music, member, com]]
What this does is it uses .split("#") to split the emails into arrays separated by "#" Then goes through for array and splits each item in the array with .split(".").
As it splits the elements the second time it adds them to the end of the split array and removes the first item (which was the item that it just split up).
Note that I had to make the startLength variable to keep track of the original size of the split array because in the second loop its constantly changing length.
This gives you a 2D array with all the items but I won't blame you for no using this because its a bit of a mess.
There are two steps to solve this problem.
Read lines from file(a.txt).
public List<String> readFile() throws IOException {
String fileName = "/Users/folder/a.txt";
FileInputStream inputStream = new FileInputStream(fileName);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String str = null;
List<String> list = new ArrayList<>();
while ((str = bufferedReader.readLine()) != null) {
list.add(str);
}
//close
inputStream.close();
bufferedReader.close();
return list;
}
Split line, take out data.
List<String> lines = readFile();
Pattern pattern = Pattern.compile("([a-zA-Z]+)#([a-zA-Z]+)");
for (String line : lines) {
Matcher matcher = pattern.matcher(line);
if (matcher.find()){
System.out.println(matcher.group(1) + ":" + matcher.group(2));
}
}
The result is
Mike:sport
Laura:music
Quinn:music

check if a string is contained in a text file of words in java

I have a text file (collection of all valid english words) from a github project that looks like this words.txt
My text file is under the resources folder in my project.
I have also a list of rows obtained from a table in mysql.
What i'm trying to do is to check if all the words in a every row are valid english words, that's why I compare each row with the words contained in my file.
This what i've tried so far :
public static void englishCheck(List<String> rows) throws IOException {
ClassLoader classLoader = ClassLoader.getSystemClassLoader();
int lenght, occurancy = 0;
for ( String row : rows ){
File file = new File(classLoader.getResource("words.txt").getFile());
lenght = 0;
if ( !row.isEmpty() ){
System.out.println("the row : "+row);
String[] tokens = row.split("\\W+");
lenght = tokens.length;
for (String token : tokens) {
occurancy = 0;
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null ){
if ((line.trim().toLowerCase()).equals(token.trim().toLowerCase())){
occurancy ++ ;
}
if (occurancy == lenght ){ System.out.println(" this is english "+row);break;}
}
}
}
}
}
this works only for the very first rows, after that my method loops over the rows only displaying them and ignores the comparison, I would like to know why this isn't working for my set of rows, It works also if I predefined my list like this List<String> raws = Arrays.asList(raw1, raw2, raw3 ) and so on
You can use the method List#containsAll(Collection)
Returns true if this list contains all of the elements of the
specified collection.
lets assume you have both list flled myListFromRessources and myListFromRessources then you can do:
List<String> myListFromRessources = Arrays.asList("A", "B", "C", "D");
List<String> myListFromRessources = Arrays.asList("D", "B");
boolean myInter = myListFromRessources.containsAll(myListFromSQL);
System.out.println(myInter);
myListFromSQL = Arrays.asList("D", "B", "Y");
myInter = myListFromRessources.containsAll(myListFromSQL);
System.out.println(myInter);
You can read words.txt file, convert words into lower case, then put words into HashSet.
Use the boolean contains(Object o) or boolean containsAll(Collection<?> c); methods to compare each word.
The time was O(n).
TIP: Do not read file in every loop. Reading file is very very slow.
ClassLoader classLoader = ClassLoader.getSystemClassLoader();
InputStream inputStream = classLoader.getResourceAsStream("words.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
List<String> wordList = new LinkedList<String>(); // You do not know word count, LinkedList is a better way.
String line = null;
while ((line = reader.readLine()) != null) {
String[] words = line.toLowerCase().split("\\W+");
wordList.addAll(Arrays.asList(words));
}
Set<String> wordSet = new HashSet<String>(wordList.size());
wordSet.addAll(wordList);
// then you can use the wordSet to check.
// You shold convert the tokens to lower case.
String[] tokens = row.toLowerCase().split("\\W+");
wordSet.containsAll(Arrays.asList(tokens));
The reason your code doesn't work is that occurancy can never be anything other than 0 or 1. You can see that by following the logic or going through a debugger.
If your words.txt file is not too large, and you have enough RAM available, you can speed up processing by reading the words.txt file into memory at the start. Also, you only ever need to call toLowerCase() once, instead of every time you compare. However, be careful with locales. The following code should work as long as you haven't got any non-English characters such as a German eszett or a Greek sigma.
public static void englishCheck(List<String> rows) throws IOException {
final URI wordsUri;
try {
wordsUri = ClassLoader.getSystemResource("words.txt").toURI();
} catch (URISyntaxException e) {
throw new AssertionError(e); // can never happen
}
final Set<String> words = Files.lines(Paths.get(wordsUri))
.map(String::toLowerCase)
.collect(Collectors.toSet());
for (String row: rows)
if (!row.isEmpty()) {
System.out.println("the row : " + row);
String[] tokens = row.toLowerCase().split("\\W+");
if (words.containsAll(Arrays.asList(tokens)))
System.out.println(" this is english " + row);
}
}

complete indexing of text file java

im trying to read a text file, sort the words within alphabetically and display what line numbers those words appear on.
Im new to java so not sure what the most efficient way to approach the system is.
My plan so far is to:
-use a scanner to parse file into one string
-string.split
-lineCount++
-(somehow sort those split strings alphabetically)
-print sorted words with line number next to them
Is that the best way of going about this? im not sure if java has some sort of ordered dictionary maybe i could use?
A Scanner is fine, as you could scan per word, not even needing a split.
A BufferedReader would be for line-wise reading, and there exists a LineNumberReader for your goal: counting lines.
I head indicate the encoding of the file.
SortedMap<String, SortedSet<Integer>> linenosPerWord = new TreeMap<>();
// A BufferedReader with a linenumber counter:
try (LineNumberReader in = new LineNumberReader(new InputStreamReader(
new FileInputSTream(file, StandardCharsets.UTF_8))) {
for (;;) {
String line = in.readLine();
if (line == null) {
break;
}
int lineno = in.getLineNumber();
String[] words = line.split("\\P{LM}"); // Split on non-letters and non-accents
for (String word : words) {
word = word.toLowerCase(); // Possible with Locale
SortedSet<Integer> linenos = linenosPerWord.get(word);
if (linenos == null) {
linenos = new TreeSet<>();
linenosPerWord.put(word, lineno);
}
linenos.add(lineno);
}
}
}
linenosPerWord.remove(""); // Remove a possibly found empty word, like in "-Hello"

sending all read lines to string array

I have a concept but I'm not sure how to go at it. I would like to parse a website and use regex to find certain parts. Then store these parts into a string. After I would like to do the same, but find differences between before and after.
The plan:
parse/regex add lines found to the array before.
refresh the website/parse/regex add lines found to the array after.
compare all strings before with all of string after. println any new ones.
send all after strings to before strings.
Then repeat from 2. forever.
Basically its just checking a website for updated code and telling me what's updated.
Firstly, is this doable?
Here's my code for part 1.
String before[] = {};
int i = 0;
while ((line = br.readLine()) != null) {
Matcher m = p.matcher(line);
if (m.find()) {
before[i]=line;
System.out.println(before[i]);
i++;
}
}
It doesn't work and I am not sure why.
You could do something like this, assuming you're reading from a file:
Scanner s = new Scanner(new File("oldLinesFilePath"));
List<String> oldLines = new ArrayList<String>();
List<String> newLines = new ArrayList<String>();
while (s.hasNext()){
oldLines.add(s.nextLine());
}
s = new Scanner(new File("newLinesFilePath"));
while (s.hasNext()){
newLines.add(s.nextLine());
}
s.close();
for(int i = 0; i < newLines.size(); i++) {
if(!oldLines.contains(newLines.get(i)) {
System.out.println(newLines.get(i));
}
}

Read file and store values into two arrays, one for each line

I got this code here:
try{
FileReader file = new FileReader("/Users/Tda/desktop/ReadFiles/tentares.txt");
BufferedReader br = new BufferedReader(file);
String line = null;
while((line = br.readLine()) != null){
String[] values = line.split(",");
grp1 = new int[values.length];
for(int i=0; i<grp1.length; i++){
try {
grp1[i]= Integer.parseInt(values[i]);
}catch (NumberFormatException e) {
continue;
}
}
System.out.println(Arrays.toString(grp1));
}
System.out.println("");
br.close();
}catch(IOException e){
System.out.println(e);
}
This is what the file im reading contains.
grp1:80,82,91,100,76,65,85,88,97,55,69,88,75,97,81
grp2:72,89,86,85,99,47,79,88,100,76,83,94,84,82,93
Right now im storing the values into one int array.
But if i wanted to store each line of values into two arrays?
Thought about using Arrays.CopyOfRange somehow, and copy the values from the int array
into two new arrays.
This answer won't correspond to your question, but will give a hint to my comment under your question post.
Try this at the beginning of your while loop:
Use String.IndexOf() to find the first occurence of the char : into each line. This will be the beginning index for the second part.
Call String.Substring() from your new beginning index to line.length. This will give you the line without the characters and your first numbers aren't lost.
Before the while
List<int[]> groups = new ArrayList<>();
Before the end of the loop:
groups.add(grp1);
Afterwards:
for (int[] grp : groups) {
...
}
A List is useful for a growing "array".
groups.size() grp1.length
groups.get(3) grp1[3]
groups.set(3, x) grp1[3 = x

Categories

Resources