Using HashMaps to find string frequencies

Using HashMaps to find string frequencies - java

My code is meant to find the most frequent codon from a file that contains a lonf string of DNA (i.e. CTAAATCGATGGCGATGATAAATG...). starting at the initial position pos, every three characters makes up one codon. The problem I have is that whenever I run the code, it tell me that the string index is out of range. I know that the issue is in the line
str = line.substring(idx, idx + 2);
but don't know how to fix it. Also, I am not sure whether I am counting frequencies correctly. I needed to increment the value of every key that is seen more than once.
public static void findgene(String line){
int idx, pos;
int[] freq = new int[100];
String str;
//pos is the position to start at
pos = 0;
idx = pos;
for(int i = 0; i < line.length(); i++){
if(idx >= 0){
//makes every three characters into a codon
str = line.substring(idx, idx + 2);
//checks if the codon was previously seen
if(genes.containsKey(str)){
genes.put(str, freq[i]++);
}
idx = idx + 2;
}
}
}

You are incrementing idx by 2 with each iteration of the loop. However, you did not impose any restriction on the upper limit of idx.
Thus the the parameter of the substring() function soon goes out of range in the line:
str = line.substring(idx, idx + 2);
What you need to do is change the condition to:
if(idx+2<=line.length()){
//code here
}

Related

Is there a way to initialize a 2d array with other arrays?

I have been given a homework assignment as follows:
Read three sentences from the console application. Each sentence should not exceed 80 characters. Then, copy each character in each input sentence in a [3 x 80] character array.
The first sentence should be loaded into the first row in the reverse order of characters – for example, “mary had a little lamb” should be loaded into the array as “bmal elttil a dah yram”.
The second sentence should be loaded into the second row in the reverse order of words – for example, “mary had a little lamb” should be loaded into the array as “lamb little a had mary”.
The third sentence should be loaded into the third row where if the index of the array is divisible by 5, then the corresponding character is replaced by the letter ‘z’ – for example, “mary had a little lamb” should be loaded into the array as “mary zad azlittze lazb” – that is, characters in index
positions 5, 10, 15, and 20 were replaced by ‘z’. Note that an empty space is also a character, and that the index starts from position 0.
Now print the contents of the character array on the console.
The methods, return types, and parameters in the code below are all specified as required so I cannot change any of that information. I am having trouble initializing the 2d array. The instructions say that the sentences must be loaded into the array already reversed etc. but the parameters of the methods for doing this calls for strings. I assume that means I should read the lines as strings and then call the methods to modify them, then use toCharyArray to convert them before loading them into the 2d array. I don't understand how to initialize the 2D array with the values of the char arrays. Is there some kind of for loop I can use? Another issue is that no processing can be done inside of the main method but in the instructions there is no method that I can call to fill the array.
import java.util.regex.Pattern;
public class ReversedSentence {
public static String change5thPosition(String s){
char[] chars = s.toCharArray();
for (int i = 5; i < s.length(); i = i + 5) {
chars[i] = 'z';
}
String newString = new String(chars);
return newString;
}
public static String printChar2DArray(char[][] arr){
for (int x = 0; x < 3; x++) {
for (int y = 0; y < 80; y++) {
// just a print so it does not make new lines for every char
System.out.print(arr[x][y]);
}
}
return null;
}
public static String reverseByCharacter(String s){
String reverse = new StringBuffer(s).reverse().toString();
return reverse;
}
public static String reverseByWord(String s){
Pattern pattern = Pattern.compile("\\s"); //splitting the string whenever there
String[] temp = pattern.split(s); // is whitespace and store in temp array.
String result = "";
for (int i = 0; i < temp.length; i++) {
if (i == temp.length - 1)
result = temp[i] + result;
else
result = " " + temp[i] + result;
}
return result;
}
public static String truncateSentence(String s){
if (s==null || s.length() <= 80)
return s;
int space = s.lastIndexOf(' ', 80);
if (space < 0)
return s.substring(0, 80);
return s;
}
public static void main(String[] args) {
String sentence1 = ("No one was available, so I went to the movies alone.");
String sentence2 = "Ever since I started working out, I am so tired.";
String sentence3 = "I have two dogs and they are both equally cute.";
char[][] arr = new char[3][80];
arr[0] = reverseByCharacter(sentence1).toCharArray();
arr[1] = reverseByWord(sentence2).toCharArray();
arr[2] = change5thPosition(sentence3).toCharArray();
printChar2DArray(arr);
}
}
The error I am getting is:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 52 out of bounds for length 52
at ReversedSentence.printChar2DArray(ReversedSentence.java:20)
at ReversedSentence.main(ReversedSentence.java:71)
.enola seivom eht ot tnew I os ,elbaliava saw eno oN

The problem is that in your printChar2DArray method you assume that every array has the length of 80, but that's actually not the case here. In Java, a 2D array is just an array of arrays. So, when you have this: char[][] arr = new char[3][80], you are creating an array of 3 arrays, and each of these arrays has the length of 80 characters. That may seem ok, but in the next lines you reinitialize the 3 arrays with something different entirely.
arr[0] = reverseByCharacter(sentence1).toCharArray();
arr[1] = reverseByWord(sentence2).toCharArray();
arr[2] = change5thPosition(sentence3).toCharArray();
Now none of these arrays has the length of 80. Each of them has the length of the respective string.
You can solve this in 2 ways (depending on how constrained your task actually is).
First, you can copy the string into arrays, instead of assigning the arrays to the results of the toCharArray method. You can achieve this with a simple loop, but I wouldn't recommend this approach, because you will end up with arrays of 80 characters, even if the strings contain less.
String firstSentence = reverseByCharacter(sentence1);
for (int i = 0; i < firstSentence.length(); i++) {
arr[0][i] = firstSentence.charAt(i);
}
Or:
char[] firstSentence = reverseByCharacter(sentence1).toCharArray();
for (int i = 0; i < firstSentence.length; i++) {
arr[0][i] = firstSentence[i];
}
Second, you can drop the assumption of the arrays' lengths in the printChar2DArray method. I recommend this approach, because it makes you code much more flexible. Your printChar2DArray method will then look like this:
public static String printChar2DArray(char[][] arr){
for (int x = 0; x < arr.length; x++) {
for (int y = 0; y < arr[x].length; y++) {
// just a print so it does not make new lines for every char
System.out.print(arr[x][y]);
}
}
return null;
}
You can see that I've substituted numbers with the length field, which can be accessed for any array.
Also, this way you don't need to initialize the inner arrays, because you reinitialize them in the next lines anyway.
char[][] arr = new char[3][];
arr[0] = reverseByCharacter(sentence1).toCharArray();
arr[1] = reverseByWord(sentence2).toCharArray();
arr[2] = change5thPosition(sentence3).toCharArray();
But this approach may not be suitable for your task, because then the sentences could be of any length, and they wouldn't be constained to 80 characters max.
UPDATE - To answer the question in the comment
To print a newline character you can use System.out.println() without parameters. It's better than putting the newline character into the arrays because it's not a logical part of the sentences.
So your for-loop in the printChar2DArray would look like this:
for (int x = 0; x < 3; x++) {
for (int y = 0; y < 80; y++) {
// just a print so it does not make new lines for every char
System.out.print(arr[x][y]);
}
System.out.println();
}

Storing Reverse String in 2D Array

I am trying to get the contents of a string, and store it in the last row of my 2D array here is what I have so far:
char[][] square = new char[5][5];
String number = new String("three");
for(int k = number.length() - 1; k >= 0; k--)
{
square[4][k] = number.charAt(k);
}
The output the code is giving me is the string in non reversed order.
Isn't this the logic for reversing a string? All I am doing here is setting the fourth column, and all rows starting at the end of the string to it's value. What am I missing?
Thanks

just walk through the loop by hand.
The first time through, k is 4.
So, square[4][4] is set to the character returned by .charAt(4), which is an 'e'.
then square[4][3] becomes 'e', ... and square[4][0] becomes 't'.
square[4] now reads t,h,r,e,e.
You've basically reversed both ends. Try this:
for (int k = 0; k < number.length(); k++) {
square[4][k] = number.charAt(number.length() - k - 1);
}

Yes, because you're not reversing it. You're setting the same character back again at the same position. If you need it reversed then the logic should be
square[4][number.length() - (k + 1)] = number.charAt(k);

You need a different index beyond square[4][k] when you are also copying the value from number.charAt(k) - that is, you are copying the characters backwards (but into the array also backward). There is no need to call new String(String). You could do
char[][] square = new char[5][5];
String number = "three";
for (int i = 0; i < number.length(); i++) {
int k = number.length() - i - 1;
square[4][k] = number.charAt(i);
}
But I would prefer a StringBuilder and StringBuilder.reverse() myself. Like,
char[][] square = new char[5][5];
String number = "three";
square[4] = new StringBuilder(number).reverse().toString().toCharArray();

How I can count duplicate with Strings?

I am new to programming , I am developing with strings , I am not yet with Hash Maps my only problem is the last letter. The last letter for example s The value contains 2 instead one. How can I do that?
public static void main(String[] args) {
String word = "Chris",
curr_char,
next_char;
int length_string = word.length(),
count = 0;
char end_letter = word.charAt(word.length()-1);
String end = Character.toString(end_letter);
for(int index = 0; index < word.length(); index++)
{
curr_char = word.substring(index, index+1);
for(int next_index = 0;next_index<word.length(); next_index++)
{
next_char = word.substring(next_index, next_index+1);
if (curr_char.equalsIgnoreCase(next_char))
{
count = 1;
}
if(curr_char.contains(end))
{
count = count + 1;
}
}
System.out.println(word.charAt(index) + " " + count);
}
}

You have some issues in your algorithm logic. The algorithm will not work with strings such as "Chriss" or "Chcriss". Your output with input string "Chriss" would be
C 1
h 1
r 1
i 1
s 2
s 1
Additionally, you have 2 iterations, which makes the algorithm not so efficient. An algorithm to be efficient should take less time (high speed) & less space (less memory).
Your above problem is usually solved by having an integer array, say charArrayCount , of size 26, as there are 26 letters in the English alphabet. Each element of this integer array represents a character in the alphabet & is used to count how many times it appears in a string. You would iterate through each character in your string & use the formula,
charArrayCount[25 - ('z' - ch)] += 1;
where 'ch' would be a character in your string. You could then iterate through your array 'charArrayCount' & get those values > 1. You would have to take care of upper case & lower case characters.
In this case, you have only 1 iteration through the string & no matter how long your string is, say a thousand characters, you create space for an integer array of 26 elements only.
Try this & see if it helps.

This code runs perfect now :
public static void main(String args[]) {
String word = "Chris" , curr_char , next_char;
int length_string = word.length();
char end_letter = word.charAt(word.length()-1);
String end = Character.toString(end_letter);
for(int index = 0; index <word.length(); index++)
{
int count = 0; //resetting the value of count every time
curr_char = word.substring(index, index+1);
for(int next_index = 0;next_index<word.length(); next_index++)
{
next_char = word.substring(next_index, next_index+1);
if (curr_char.equalsIgnoreCase(next_char))
{
count = count + 1;
//if any character repeats it increase the value of count
}
}
System.out.println(word.charAt(index) + " " + count);
}
}
Test this once...

Matching subsequence of length 2 (at same index) in two strings

Given 2 strings, a and b, return the number of the positions where they contain the same length 2 substring. For instance a and b is respectively "xxcaazz" and "xxbaaz" yields 3, since the "xx", "aa", and "az" substrings appear in the same place in both strings.
What is wrong with my solution?
int count=0;
for(int i=0;i<a.length();i++)
{
for(int u=i; u<b.length(); u++)
{
String aSub=a.substring(i,i+1);
String bSub=b.substring(u,u+1);
if(aSub.equals(bSub))
count++;
}
}
return count;
}

In order to fix your solution, you really don't need the inner loop. Since the index should be same for the substrings in both string, only one loop is needed.
Also, you should iterate till 2nd last character of the smaller string, to avoid IndexOutOfBounds. And for substring, give i+2 as second argument instead.
Overall, you would have to change your code to something like this:
int count=0;
for(int i=0; i < small(a, b).length()-1; i++)
{
String aSub=a.substring(i,i+2);
String bSub=b.substring(i,i+2);
if(aSub.equals(bSub))
count++;
}
}
return count;
Why I asked about the length of string is, it might become expensive to create substrings of length 2 in loop. For length n of smaller string, you would be creating 2 * n substrings.
I would rather not create substring, and just match character by character, while keeping track of whether previous character matched or not. This will work perfectly fine in your case, as length of substring to match is 2. Code would be like:
String a = "iaxxai";
String b = "aaxxaaxx";
boolean lastCharacterMatch = false;
int count = 0;
for (int i = 0; i < Math.min(a.length(), b.length()); i++) {
if (a.charAt(i) == b.charAt(i)) {
if (lastCharacterMatch) {
count++;
} else {
lastCharacterMatch = true;
}
} else {
lastCharacterMatch = false;
}
}
System.out.println(count);

The heart of the problem lies with your usage of the substring method. The important thing to note is that the beginning index is inclusive, and the end index is exclusive.
As an example, dissecting your usage, String aSub=a.substring(i,i+1); in the first iteration of the loop i = 0 so this line is then String aSub=a.substring(0,1); From the javadocs, and my explanation above, this would result in a substring from the first character to the first character or String aSub="x"; Changing this to i+2 and u+2 will get you the desired behavior but beware of index out of bounds errors with the way your loops are currently written.

String a = "xxcaazz";
String b = "xxbaaz";
int count = 0;
for (int i = 0; i < (a.length() > b.length() ? b : a).length() - 1; i++) {
String aSub = a.substring(i, i + 2);
String bSub = b.substring(i, i + 2);
if (aSub.equals(bSub)) {
count++;
}
}
System.out.println(count);

Counting occurrences of characters in a word list

I'm trying to make an AI for a hangman game, part of which requires counting all occurrences of each possible character in the word list. I'm planning on culling the word list before this counting to make things run faster (by first culling out all words that are not the same length as the guessable phrase, and then by culling out words that do not match the guessed characters).
The problem I am having is in the code below. Somehow, it always returns a list of e's that are the correct length (matching the number of possible characters). I'm not sure exactly what I'm doing wrong here, but the problem is definitely somewhere in countCharacters.
MethodicComputer(){
guessable = parseGuessable();
wordList = parseText();
priorities = countCharacters(guessable);
}
public char guessCharacter(String hint){
char guess = 0;
System.out.println(guessable);
System.out.println(priorities);
guess = priorities.charAt(0);
priorities = priorities.replaceAll("" + guess, "");
return guess;
}
private String countCharacters(String possibleChars){
charCount = new Hashtable();
String orderedPriorities = "";
char temp = 0;
char adding = 0;
int count = 0;
int max = 0;
int length = possibleChars.length();
for (int i = 0; i<length; i++){
temp = possibleChars.charAt(i);
count = wordList.length() - wordList.replaceAll("" + temp, "").length();
charCount.put(temp, count);
}
while (orderedPriorities.length() < length){
for (int i = 0; i < possibleChars.length(); i++){
temp = possibleChars.charAt(i);
if (max < (int) charCount.get(temp)){
max = (int) charCount.get(temp);
adding = temp;
}
}
orderedPriorities += adding;
possibleChars = possibleChars.replaceAll("" + adding, "");
}
return orderedPriorities;
}

The problem is that I did not update the max variable, so it never entered the if statement and updated the adding variable. A simple addition of
max = 0;
to the end of the while loop fixed it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using HashMaps to find string frequencies - java

Related

Is there a way to initialize a 2d array with other arrays?

Storing Reverse String in 2D Array

How I can count duplicate with Strings?

Matching subsequence of length 2 (at same index) in two strings

Counting occurrences of characters in a word list

Categories

Resources