So I have a programming exercise that involves concordance. I am attempting to take a .txt file, use regex to parse it into strings containing all words, then create a Hashtable that has the key (the word) and the value (the number of times the word appears in the document). We are supposed to be able to account for both case sensitive and non-case sensitive scenarios by passing in a boolean.
Here is what I have:
public Concordance( String pathName, boolean is_case_sensitive ) {
Scanner file = new Scanner(pathName);
try {
file = new Scanner(new File(pathName));
} catch (FileNotFoundException e) {
System.out.println("No File Found");
}
String[] words;
Pattern pattern = Pattern.compile("\\W+");
words = pattern.split(file.nextLine());
String[] wordsCopy = new String[words.length];
for (int i = 0; i < words.length; i++){
wordsCopy[i] = new String(words[i]);
}
int counter = 0;
while (file.hasNext()){
for (String w : words){
counter = 0;
for (String w2 : wordsCopy){
if (is_case_sensitive == false){
if (w.equalsIgnoreCase(w2)){
counter++;
//w2 = null;
tableOfWords.put(w, counter);
file.next();
}
}
if (is_case_sensitive == true){
if (w.equals(w2)){
counter++;
//w2 = null;
tableOfWords.put(w, counter);
file.next();
}
}
}
}
}
}
To walk you through where I am and where my error I believe is....
I use the scanner to "take in" the file the use the regex \W+ to get all of words. I create a String array, I split the Pattern pattern into the String array. Then I create a deep copy of the array to use during comparison. (So I now have two String arrays - words and wordsCopy). I use an int counter variable to keep track of how many times it appears and address case sensitivity by using an if statement and the equals/equalsIgnoreCase methods. I have been going back and forth between assigning w2 to null (its currently commented out) as I intuitively feel like if it is not set to null, it will be counted twice, but I can't seem to think through it appropriately. I think I am counting items in duplicate, but can't seem to figure out a solution. Any insight? Thanks!
You dont need any extra String[] to check case sensitive
Pattern pattern = Pattern.compile("\\W+");
HashMap<String, AtomicInteger> tableOfWords = new HashMap<String, AtomicInteger>();
while (file.hasNextLine()){
words = pattern.split(file.nextLine());
for (String w : words){
String tmp = w;
if (!is_case_sensitive){
tmp = String.valueOf(w.toLowerCase());
}
AtomicInteger count = tableOfWords.get(tmp);
if (count == null){
count = new AtomicInteger(0);
}
count.incrementAndGet();
tableOfWords.put(tmp,count);
}
}
Convert the actual word into low / high case if case sensitive is not required. then everything work perfectly.
As far as I can see you are actually counting words multiple times (more than twice aswell)
I'll give you a simple foreach loop to explain what you're doing,
some of the syntax might be wrong as i'm not using an ide to write this code
int[5] ints = {1,2,3,4,5};
int[5] intcopy = ints;
for(int i:ints){
for(int j: intcopy){
system.out.println(j);
}
}
What you will end out printing is
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
So instead of counting 5 things you are counting 25 things, hope this helps
Related
I'm trying to count the occurrences per line from a text file containing a large amount of codes (numbers).
Example of text file content:
9045,9107,2376,9017
2387,4405,4499,7120
9107,2376,3559,3488
9045,4405,3559,4499
I want to compare a similar set of numbers that I get from a text field, for example:
9107,4405,2387,4499
The only result I'm looking for, is if it contains more than 2 numbers (per line) from the text file. So in this case it will be true, because:
9045,9107,2376,9017 - false (1)
2387,4405,4499,7120 - true (3)
9107,2387,3559,3488 - false (2)
9045,4425,3559,4490 - false (0)
From what I understand, the best way to do this, is by using a 2d-array, and I've managed to get the file imported successfully:
Scanner in = null;
try {
in = new Scanner(new File("areas.txt"));
} catch (FileNotFoundException ex) {
Logger.getLogger(NewJFrame.class.getName()).log(Level.SEVERE, null, ex);
}
List < String[] > lines = new ArrayList < > ();
while ( in .hasNextLine()) {
String line = in .nextLine().trim();
String[] splitted = line.split(", ");
lines.add(splitted);
}
String[][] result = new String[lines.size()][];
for (int i = 0; i < result.length; i++) {
result[i] = lines.get(i);
}
System.out.println(Arrays.deepToString(result));
The result I get:
[[9045,9107,2376,9017], [2387,4405,4499,7120], [9107,2376,3559,3488], [9045,4405,3559,4499], [], []]
From here I'm a bit stuck on checking the codes individually per line. Any suggestions or advice? Is the 2d-array the best way of doing this, or is there maybe an easier or better way of doing it?
The expected number of inputs defines the type of searching algorithm you should use.
If you aren't searching through thousands of lines then a simple algorithm will do just fine. When in doubt favour simplicity over complex and hard to understand algorithms.
While it is not an efficient algorithm, in most cases a simple nested for-loop will do the trick.
A simple implementation would look like this:
final int FOUND_THRESHOLD = 2;
String[] comparedCodes = {"9107", "4405", "2387", "4499"};
String[][] allInputs = {
{"9045", "9107", "2376", "9017"}, // This should not match
{"2387", "4405", "4499", "7120"}, // This should match
{"9107", "2376", "3559", "3488"}, // This should not match
{"9045", "4405", "3559", "4499"}, // This should match
};
List<String[] > results = new ArrayList<>();
for (String[] input: allInputs) {
int numFound = 0;
// Compare the codes
for (String code: input) {
for (String c: comparedCodes) {
if (code.equals(c)) {
numFound++;
break; // Breaking out here prevents unnecessary work
}
}
if (numFound >= FOUND_THRESHOLD) {
results.add(input);
break; // Breaking out here prevents unnecessary work
}
}
}
for (String[] result: results) {
System.out.println(Arrays.toString(result));
}
which provides us with the output:
[2387, 4405, 4499, 7120]
[9045, 4405, 3559, 4499]
To expand on my comment, here's a rough outline of what you could do:
String textFieldContents = ... //get it
//build a set of the user input by splitting at commas
//a stream is used to be able to trim the elements before collecting them into a set
Set<String> userInput = Arrays.stream(textFieldContents .split(","))
.map(String::trim).collect(Collectors.toSet());
//stream the lines in the file
List<Boolean> matchResults = Files.lines(Path.of("areas.txt"))
//map each line to true/false
.map(line -> {
//split the line and stream the parts
return Arrays.stream(line.split(","))
//trim each part
.map(String::trim)
//select only those contained in the user input set
.filter(part -> userInput.contains(part))
//count matching elements and return whether there are more than 2 or not
.count() > 2l;
})
//collect the results into a list, each element position should correspond to the zero-based line number
.collect(Collectors.toList());
If you need to collect the matching lines instead of a flag per line you could replace map() with filter() (same content) and change the result type to List<String>.
How to delete the characters at x and keep the rest? The output should be "12345678" Deleting every '9' in the position that x is on. X is i*(i+1)/2 so that the number is added to the next number. So every number at 0,1,3,6,10,15,21,28,etc.
public class removeMysteryI {
public static String removeMysteryI(String str) {
String newString = "";
int x=0;
for(int i=0;i<str.length();i++){
int y = (i*(i+1)/2)+1;
if(y<=str.length()){
x=i*(i+1)/2;
newString=str.substring(0, x) + str.substring(x + 1);
}
}
return newString;
}
public static void main(String[] args) {
String str = "9919239456978";
System.out.println(removeMysteryI(str));
}
}
OK, so there are a couple of mistakes in your code. One is easy to fix. The others not so easy.
The easy one first:
newString=str.substring(0, x) + str.substring(x + 1);
OK so that is creating a string with the character at position x removed. The problem is what it is operating on. The str variable is the input parameter. So at the end of the day newString will still only be str with one character removed.
The above actually needs to be operating on the string from the previous loop iterations ... if you are going to remove more than one character.
The next problem arises when you try to solve the first one. When you remove a character from a string, all characters after the removal point are renumbered; e.g. after removing the character at 5, the character at 6 becomes the character at 5, the character at 7 becomes the character at 6, and so on.
So if you are going to remove characters by "snipping" the string, you need to make sure that the indexes for the positions for the "snips" are adjusted for the number of characters you have already removed.
That can be done ... but you need to think about it.
The final problem is efficiency. Each time your current code removes a single character (as above), it is actually copying all remaining characters to a new string. For small strings, that's OK. For really large strings, the repeated copying could have a serious performance impact1.
The solution to this is to use a different approach to removing the characters. Instead of snipping out the characters you want to discard, copy the characters that you want to keep. The StringBuilder class is one way of doing this2. If you are not permitted to use that, then you could do it with an array of char, and an index variable to keep track of your "append" position in the array. Finally, there is a String constructor that can create a String from the relevant part of the char[].
I'll leave it to you to work out the details.
1 - Efficiency could be viewed as beyond the scope of this exercise.
2 - #Horse's answer uses a StringBuilder but in a different way to what I am suggesting. This will also suffer from the repeated copying problem because each deleteCharAt call will copy all characters after the deletion point.
Follow the steps below:
Initialize with builderIndexToDelete = 0
Initialize with counter = 1
Repeat the following till the index is valid:
delete character at builderIndexToDelete
update builderIndexToDelete to counter - 1 (-1 as a character is deleted in every iteration)
increment the counter
public static String deleteNaturalSumIndexes(String str) {
StringBuilder builder = new StringBuilder(str);
int counter = 1;
int builderIndexToDelete = 0;
while (builderIndexToDelete < builder.length()) {
builder.deleteCharAt(builderIndexToDelete);
builderIndexToDelete += (counter - 1);
counter++;
}
return builder.toString();
}
public static void main(String[] args) {
String str = "9919239456978";
System.out.println(deleteNaturalSumIndexes(str));
}
Thank you #dreamcrash and #StephenC
Using #StephenC suggestion to improve performance
public static String deleteNaturalSumIndexes(String str) {
StringBuilder builder = new StringBuilder();
int nextNum = 1;
int indexToDelete = 0;
while (indexToDelete < str.length()) {
// check whether this is a valid range to continue
// handles 0,1 specifically
if (indexToDelete + 1 < indexToDelete + nextNum) {
// min is used to limit the index of last iteration
builder.append(str, indexToDelete + 1, Math.min(indexToDelete + nextNum, str.length()));
}
indexToDelete += nextNum;
nextNum++;
}
return builder.toString();
}
public static void main(String[] args) {
System.out.println(deleteNaturalSumIndexes(""));
System.out.println(deleteNaturalSumIndexes("a"));
System.out.println(deleteNaturalSumIndexes("ab"));
System.out.println(deleteNaturalSumIndexes("abc"));
System.out.println(deleteNaturalSumIndexes("99192394569"));
System.out.println(deleteNaturalSumIndexes("9919239456978"));
}
Question explaination: as some of the comments suggested, I will try my best to make this question clearer. The inputs are from a file and the code is just one example. Supposedly the code should work for any inputs in the format. I understand that I need to use Scanner to read the file. The question would be what code do I use to get to the output.
Input Specification:
The first line of input contains the number N, which is the number of lines that follow. The next
N lines will contain at least one and at most 80 characters, none of which are spaces.
Output Specification:
Output will be N lines. Line i of the output will be the encoding of the line i + 1 of the input.
The encoding of a line will be a sequence of pairs, separated by a space, where each pair is an
integer (representing the number of times the character appears consecutively) followed by a space,
followed by the character.
Sample Input
4
+++===!!!!
777777......TTTTTTTTTTTT
(AABBC)
3.1415555
Output for Sample Input
3 + 3 = 4 !
6 7 6 . 12 T
1 ( 2 A 2 B 1 C 1 )
1 3 1 . 1 1 1 4 1 1 4 5
I have only posted two questions so far, and I don't quite understand the standard of a "good" question and a "bad" question? Can someone explain why this is a bad question? Appreciate it!
Complete working code here try it.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
public class CharTask {
public static void main(String[] args) {
List<String> lines = null;
try {
File file = new File("inp.txt");
FileInputStream ins =new FileInputStream(file);
Scanner scanner = new Scanner(ins);
lines = new ArrayList<String>();
while(scanner.hasNext()) {
lines.add(scanner.nextLine());
}
List<String> output = processInput(lines);
for (int i=1;i<output.size(); i++) {
System.out.println(output.get(i));
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
private static List<String> processInput(List<String> lines){
List<String> output = new ArrayList<String>();
for (String line: lines) {
output.add(getProcessLine(line));
}
return output;
}
private static String getProcessLine(String line) {
if(line.length() == 0) {
return null;
}
String output = "";
char prev = line.charAt(0);
int count = 1;
for(int i=1;i<line.length();i++) {
char c = line.charAt(i);
if (c == prev) {
count = count +1;
}
else {
output = output + " "+count + " "+prev;
prev = c;
count = 1;
}
}
output = output + " "+count+" "+prev;
return output;
}
}
Input
(inp.txt)
4
+++===!!!!
777777......TTTTTTTTTTTT
(AABBC)
3.1415555
Output
3 + 3 = 4 !
6 7 6 . 12 T
1 ( 2 A 2 B 1 C 1 )
1 3 1 . 1 1 1 4 1 1 4 5
There are two different problems you need to address, and I think it is going to help you to address them separately. The first is to read in the input. It's not clear to me whether you are going to prompt for it and whether it is coming from the console or a file or what exactly. For that you will want to initialize a scanner, use nextInt to get the number of lines, call nextLine() to clear the rest of that line and then run a for loop from 0 up to the number of lines, reading the next line (using nextLine()) into a String variable. To make sure that is working well, I would suggest printing out the unaltered string and see if what is coming out is what is going in.
The other task is to convert a given input String into the desired output String. You can work on that independently, then pull things back together later. You will want a method that takes in a string and returns a string. You can test it by passing the sample Strings and seeing if it gives you back the desired output strings. Set the result="". Looping over the characters in the String using charAt, it will want variables for the currentCharacter and currentCount, and when the character changes or the end of the string is encountered, concatenate the number and character onto the string and reset the character count and current character as needed. Outside the loop, return the result.
Once the two tasks are solved, pull them together by printing out what the method returns for the input string as opposed to the input string itself.
I think that gives you direction on the method to use. It's not a full-blown solution, but that's not what you requested or needed.
I have an array of line, which is somewhat like below
Here's example:
A-NUMBER ROUTINF ACO AO L MISCELL
0-0 0 1-20
0-00
0-01 FDS 3-20
0-02 6 7 3-20
0-03 4 3-20
1-0 F=PRE
ANT=3
NAPI=1
1-1 F=PRE
ANT=3
I need to parse the line according to column by skipping the column which has blank values and create a new line like below
ANUM = 0-0, ACO=0, L=1-20;
ANUM = 0-00;
ANUM = 0-01, ROUTINF=FDS, L=3-20;
ANUM = 0-02, ACO=6, AO=7, L=3-20;
ANUM = 0-03, AO=4,L=3-20;
ANUM = 1-0, F=PRE, ANT=3, NAPI=1;
ANUM = 1-1, F=PRE, ANT=3;
I can split the line but my code can't remember which column the value belongs to and when to skip the values.
String[] splitted = null;
for (Integer i = 0; i < lines.size(); i++) {
splitted = lines.get(i).split("\\s+");
for(String str : splitted)
if(!(splitted.length == 1)){
anum = splitted[0];
routinf = splitted[1];
aco = splitted[2];
ao = splitted[3];
l = splitted[4];
}else {
miscell = splitted[0];
}
}
The columns in your file seems to be of fixed length (I don't see any other way to distinguish each column). If that is the case then I would recommend using substring(srat, end) instead of split.
Create a class to hold one single record.
class Record {
String aNumber,
List<String> routingf, aco, ao, l, miscell;
public Record(String aNumber) {
this.aNumber = aNumber;
this.routingf = new ArrayList<>();
// init other lists like above ...
}
public void addRoutingf(String routingf) {
// add only of not null and is not empty trimmed
if(routingf != null && routiingf.trim().length() > 0) {
this.routingf.add(routiingf);
}
}
// implement add-methods for other lists like above ...
}
While parsing each line remember the last created record. If in the actual line A-NUMBER is empty then use the last created record to store the values, otherwise create a new record and remember it as last/actual so you can use it for the upcoming lines if necessary.
Save all record in a list
List<Record> records = new ArrayList<>();
What is the common separator? Just split on that... Your + at the moment will consume any amount of white space. \s{1,4} wil limit it to between 1 and 4 characters. Find the right numbers for your data.
if your input time use one space char (for instance tab) between columns your code is almost OK
String[] splitted = null;
for (Integer i = 0; i < lines.size(); i++) {
splitted = lines.get(i).split("\\s");
if(!(splitted.length == 1)){
anum = splitted[0];
routinf = splitted[1];
aco = splitted[2];
ao = splitted[3];
l = splitted[4];
}else {
miscell = splitted[0];
}
}
//print only not empty fields
pls note removing of unnecessary for loop and change of split character to \s from \s+
Just a thought, but you could also experiment if it helps to keep the whitespaces in the result for defining which column it belongs to.
lines.get(i).split(yourDelimiter, -1);
Its hard to tell if this helps without knowing what exactly your origin files are looking like, but you could give it a try.
e.g. if the values are always at a certain point in the splitted string with whitespaces, you could easily tell which column it belongs to and extract them.
Hey guys, I'm new to Java (well, 3/4 of a year spent on it).
So I don't know much about it, I can do basic things, but the advanced concepts have not been explained to me, and there is so much to learn! So please go a little but easy on me...
Ok, so I have this project where I need to read lines of text from a file into an array but only those which meet specific conditions. Now, I read the lines into the array, and then skip out on all of those which don't meet the criteria. I use a for loop for this. This is fine, but then when I print out my array (required) null values crop up all over the place where I skipped out on the words.
How would I remove the null elements specifically? I have tried looking everywhere, but the explanations have gone way over my head!
Here is the code that I have to deal with the arrays specifically: (scanf is the scanner, created a few lines ago):
//create string array and re-open file
scanf = new Scanner(new File ("3letterWords.txt"));//re-open file
String words [] = new String [countLines];//word array
String read = "";//to read file
int consonant=0;//count consonants
int vowel=0;//count vowels
//scan words into array
for (int i=0; i<countLines; i++)
{
read=scanf.nextLine();
if (read.length()!=0)//skip blank lines
{
//add vowels
if (read.charAt(0)=='a'||read.charAt(0)=='e'||read.charAt(0)=='i'||read.charAt(0)=='o'||read.charAt(0)=='u')
{
if (read.charAt(2)=='a'||read.charAt(2)=='e'||read.charAt(2)=='i'||read.charAt(2)=='o'||read.charAt(2)=='u')
{
words[i]=read;
vowel++;
}
}
//add consonants
if (read.charAt(0)!='a'&&read.charAt(0)!='e'&&read.charAt(0)!='i'&&read.charAt(0)!='o'&&read.charAt(0)!='u')
{
if (read.charAt(2)!='a'&&read.charAt(2)!='e'&&read.charAt(2)!='i'&&read.charAt(2)!='o'&&read.charAt(2)!='u')
{
words[i]=read;
consonant++;
}
}
}//end if
//break out of loop when reached EOF
if (scanf.hasNext()==false)
break;
}//end for
//print data
System.out.println("There are "+vowel+" vowel words\nThere are "+consonant+" consonant words\nList of words: ");
for (int i=0; i<words.length; i++)
System.out.println(words[i]);
Thanks so much for any help received!
Just have a different counter for the words array and increment it only when you add a word:
int count = 0;
for (int i=0; i<countLines; i++) {
...
// in place of: words[i] = read;
words[count++] = read;
...
}
When printing the words, just loop from 0 to count.
Also, here's a simpler way of checking for a vowel/consonant. Instead of:
if (read.charAt(0)=='a'||read.charAt(0)=='e'||read.charAt(0)=='i'||read.charAt(0)=='o'||read.charAt(0)=='u')
you can do:
if ("aeiou".indexOf(read.charAt(0)) > -1)
Update: Say read.charAt(0) is some character x. The above line says look for that character in the string "aeiou". indexOf returns the position of the character if found or -1 otherwise. So anything > -1 means that x was one of the characters in "aeiou", in other words, x is a vowel.
public static String[] removeElements(String[] allElements) {
String[] _localAllElements = new String[allElements.length];
for(int i = 0; i < allElements.length; i++)
if(allElements[i] != null)
_localAllElements[i] = allElements[i];
return _localAllElements;
}