File tokenization in java - java

The below code is for tokenization in java. I am having a small bug which I am just not able to fix. This is regarding file tokenization. In this code, if a user enters four capital words in a file. It is not supposed to be tokenised and is supposed to be retained at the same line. The rest of the words have to be tokenized if there is no capital letter or anything.
For example
United States Of America
Hi I am Walt.
The O/P is supposed to look like this below.
United States Of America
Hi
I
am
Walt.
This is how it's supposed to look like. After I wrote my code I am facing a small bug.
The O/P is showing up like this.
United States Of America
States
Of
America
Hi
I
am
Walt.
Basically I need to get rid of "States Of America". In the piece of code where I am checking for uppercase. Could you please help me in solving this problem, as I am just not able to get my around it? Anything to make this possible will be helpful.
Please feel free to alter my code and try getting my output.
import java.io.*;
import java.util.*;
public class Tokenize {
public static void main (String[] args) {
try {
BufferedReader inputReader=new BufferedReader(new FileReader("C:/Users/Advait/Desktop/nlp_wikipedia.txt"));
String currentLine;
while ((currentLine = inputReader.readLine())!=null) {
// START STUDENT CODE
char atUpper;
char atUpper1;
int keeper = 1;
int keeper1 = 0;
String temp = "";
int j;
int i;
int counter = 0;
int m=0;
int n=0;
String temp1 = "";
boolean boolKeeper,boolKeeper1;
String Delimeter = "[\\s,:;'!?()\"]+";
for(j=0;j<(currentLine.length()-1);j++) {
if(currentLine.contains("://")) {
currentLine=currentLine.replace("://","#");
}
}
String token1[] = currentLine.split(Delimeter);
for(j=0;j<(token1.length)-1;j++) {
if(j>0) {
if(keeper==0) {
atUpper = token1[j+1].charAt(0);
atUpper1 = token1[keeper].charAt(0);
boolKeeper = Character.isUpperCase(atUpper);
boolKeeper1 = Character.isUpperCase(atUpper1);
if(boolKeeper==true && boolKeeper1==true) {
m++;
temp1 = token1[keeper].concat(" ").concat(token1[j+1]);
token1[keeper] = temp1;
}
} else {
i=j+1;
atUpper = token1[j].charAt(0);
atUpper1 = token1[i].charAt(0);
boolKeeper = Character.isUpperCase(atUpper);
boolKeeper1 = Character.isUpperCase(atUpper1);
if(boolKeeper==true && boolKeeper1==true) {
counter=counter+1;
if(counter == 1) {
keeper1 = j;
}
n++;
temp = token1[keeper1].concat(" ").concat(token1[i]);
token1[keeper1] = temp;
}
}
} else {
i=j+1;
atUpper = token1[j].charAt(0);
atUpper1 = token1[i].charAt(0);
boolKeeper = Character.isUpperCase(atUpper);
boolKeeper1 = Character.isUpperCase(atUpper1);
if(boolKeeper==true && boolKeeper1==true) {
keeper = 0;
m++;
temp = token1[j].concat(" ").concat(token1[i]);
token1[j] = temp;
}
}
ArrayList<String> LineList = new ArrayList<String>();
for (String token : token1) {
if (!token.equals("%")) {
LineList.add(token);
}
}
token1 = LineList.toArray(new String[LineList.size()]);
String token2 = token1[j];
for (int l=0;l<(token2.length()-1);l++) {
if(token2.charAt(l) == '-' && token2.charAt(l+1) == '\n') {
String token3[] = token2.split("-");
token1[j] = token3[0] + token3[1];
}
}
}
for(int k=0;k<(token1.length);k++) {
if(token1[k].contains(".") && token1[k].contains("#")) {
token1[k] = token1[k].replace(".", "*");
}
if(token1[k].contains("#") && token1[k].contains(".")) {
token1[k] = token1[k].replace("#","://");
token1[k] = token1[k].replace(".","*");
}
}
for(int k=0;k<(token1.length);k++) {
StringTokenizer st = new StringTokenizer(token1[k],".");
while (st.hasMoreTokens()) {
token1[k] = st.nextToken();
}
}
for(int k=0;k<(token1.length);k++) {
String token4 = token1[k];
for (int l=0;l<(token4.length()-1);l++) {
if(token4.contains("#") && token4.contains("*")) {
token1[k] = token4.replace("*",".");
}
if(token1[k].contains("://") && token1[k].contains("*")) {
token1[k] = token4.replace("*",".");
}
}
}
for(int k=0;k<(token1.length);k++) {
System.out.println(token1[k]);
}
// END STUDENT CODE
}
}
catch (IOException e) {
System.err.println("Caught IOException: "+e.getMessage());
}
}
}

Your first problem is that you are cramming everything into a single huge function. You need to split the code into meaningful units that each perform a well-defined, easily-understood operation. For the specific issue of capitalized words, I recommend a function int capitalizedWordStreakLength(String[] tokens, int i). You can use that function in a loop that assembles a List<String> of resulting tokens by iterating over the String[] of your "raw" tokens and, if that function returns four or more, concats those words into a single token.

Related

Java--I get an infinite loop when attempting to run my alphabetizer program. I can't seem to find the error

The code runs correctly until the for-loops. That's really all to it; I just can't seem to figure out why I'm getting an infinite loop. I've just been trying to sort the word "May" and "Code", so it outputs [Code, May].
import java.util.*;
public class Alphabetical
{
public static void main(String[]args)
{
List<String>words = new ArrayList();
String word = "";
while(!word.equals("stop"))
{
Scanner s = new Scanner(System.in);
System.out.println("Enter a word. Enter 'stop' when finished.");
word = s.nextLine();
words.add(word);
}
words.remove(words.size()-1);
System.out.print(words);
String temp = "";
boolean breakthis = false ;
for(int i = words.size()-1; i >= 1;i--)
{
if(breakthis)
break;
temp = words.get(i);
for(int j = 0;j < words.size();i++)
{
if(temp.charAt(0) == words.get(j).charAt(0))
{
breakthis = true;
break;
}
if(temp.charAt(0) > words.get(j).charAt(0))
{
words.add(j, temp);
words.remove(words.size()-1);
i = words.size()-1;
break;
}
}
}
System.out.print(words);
}
}

How to split a String into an Array if a .contains() condition is met?

I'm doing a hackerrank medium challenge for a password cracker. I want to be able to check if a given string, attempt, contains all the words in pass. pass is an array of passwords and attempt is a concatenation of random entries in pass. If attempt contains ONLY words that are found as entries in pass, then it is deemed a good password and the words from the input of attempt, limited with spaces, is printed.
Sample Input
3 //3 attempts
6 //6 words for attempt 1
because can do must we what //pass[]
wedowhatwemustbecausewecan //attempt
2 //...
hello planet
helloworld
3
ab abcd cd
abcd
Expected Output
we do what we must because we can
WRONG PASSWORD //Because planet is not in pass[]
ab cd
Code
public class Solution {
static String passwordCracker(String[] pass, String attempt) {
int arrayLength=pass.length;
int accuracy=0;
String trips_array[] = new String[pass.length];
String [] newWord = new String[20];
for (int i=0; i<pass.length;i++)
{
// int j=0;
String[] arr = pass[i].split(" ");
//-------------------------------
if (attempt.contains(pass[i]))
{
accuracy++;
newWord[i] = pass[i];
trips_array[i] = attempt.split(" ");
}
//------------------------------
}
StringBuilder sb = new StringBuilder();
for (String words : trips_array) {
sb.append(words);
}
for (int i=0; i<pass.length;i++)
{
if (accuracy==pass.length)
return sb.toString() + " ";
else
return "WRONG PASSWORD";
}
return "test";
}
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
int t = in.nextInt();
for(int a0 = 0; a0 < t; a0++){
int n = in.nextInt();
String[] pass = new String[n];
for(int pass_i = 0; pass_i < n; pass_i++){
pass[pass_i] = in.next();
}
String attempt = in.next();
String result = passwordCracker(pass, attempt);
System.out.println(result);
}
in.close();
}
}
The part in focus is the part in the //----------------- comment section. Basically, my goal is to see if the attempt contains the correct entries in pass, and if so, save that substring of the attempt (or similarly, the entry in pass) to a new array which can be printed in the correct order. If you check the expected output above, you'll see that the output is the same as attempt except with spaces.
Essentially, I would need to find the breaks in the words of attempt and print that if it fulfills the above requirements (first paragraph).
See this for more details
https://www.hackerrank.com/challenges/password-cracker/problem
If it helps you
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
int testNumb = Integer.parseInt(reader.readLine());
List<String> passList = new ArrayList<>();
List<String> attList = new ArrayList<>();
for (int i = 0; i < testNumb; i++) {
reader.readLine();
passList.add(reader.readLine());
attList.add(reader.readLine());
}
reader.close();
for (int i = 0; i < testNumb; i++) {
String s1 = passList.get(i);
String s2 = attList.get(i);
StringBuilder sb = new StringBuilder();
String[] s1Arr = s1.split(" ");
while (s2.length() > 0) {
int s2Lenght = s2.length();
for (String s : s1Arr) {
if (s2.startsWith(s)) {
sb.append(s + " ");
s2 = s2.substring(s.length());
}
}
if (s2.length() == s2Lenght) {
sb = new StringBuilder("wrong pass");
break;
}
}
System.out.println(sb.toString());
}
Your for loop looks too complicated, here is how I would approach that part.
boolean isAllWords = true;
int checksum = 0;
for (int j = 0; j < pass.length; j++) {
if (!attempt.contains(pass[j]) {
isAllWords = true;
break;
}
checksum += pass[j].length;
}
if (isAllWords && checksum == attempt.length) {
//This means attempt contains all words in pass array and nothing more
//... handle successful attempt
} else {
//... handle bad attempt
}

Making a big array out of other arrays and IF FALSE making it smaller

I've been working on this program that creates a password, it has 4 char arrays where i have all the characters i can use for the password, and a method that evaluates if the user wants to use them or not, the problem is that it only works when all the arrays are "true"(they are used), because the last array is composed of 73 spaces that are filled with the other arrays.
The problem is that if the user doesnt want to use one of them, when the for loop cycles through the array it will mostly fall on an index number that is empty breaking the code, i cant think of a way of getting over that
package javaapplication19;
import java.util.Scanner;
import java.util.concurrent.ThreadLocalRandom;
public class JavaApplication19 {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
char [] leterSmall = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','ñ','o','p','q','r','s','t','u','v','w','x','y','z'};
char[] leterBig = {'A','B','C','D','E','F','G','H','I','J','K','L','M','N','Ñ','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
char[] numbers = {'0','1','2','3','4','5','6','7','8','9'};
char [] charsRandom = {'?','¿','.',';','+','-','*','/','|'};
char [] arrayFinal = new char[73];
boolean smallOption = false;
boolean bigOption= false;
boolean numbersOption= false;
boolean charactersOption= false;
System.out.println("Do you want to use small leters?");
String answer1 = sc.next();
if( opcionmenu(answer1)==true){
smallOption = true;
for(int i=0;i<27;i++){
arrayFinal[i]=leterSmall[i];
}
}
System.out.println("Do you want to use big leters?");
String answer2 = sc.next();
if( opcionmenu(answer2)==true){
bigOption = true;
for(int i=27;i<54;i++){
arrayFinal[i]=leterBig[i-27];
}
}
System.out.println("Do you want to use numbers?");
String answer3 = sc.next();
if( opcionmenu(answer3)==true){
numbersOption = true;
for(int i=54;i<64;i++){
arrayFinal[i]=numbers[i-54];
}
}
System.out.println("Do you want to use symbols?");
String answer4 = sc.next();
if( opcionmenu(answer4)==true){
charactersOption = true;
for(int i=64;i<73;i++){
arrayFinal[i]=charsRandom[i-64];
}
}
for(int i=0;i<16;i++){
int y = ThreadLocalRandom.current().nextInt(0,73 + 0 );
System.out.print(arrayFinal[y]);
}
}
static boolean opcionmenu(String stra){
if(stra.equals("Yes")) {
return true;
}
else{
return false;
}
}
}
When building arrayFinal, keep track of how many values you've added, and only add a value right after a previous value.
Like this, where len is the number of characters added to arrayFinal:
int len = 0;
if (smallOption) {
for (char c : leterSmall) {
arrayFinal[len++] = c;
}
}
if (bigOption) {
for (char c : leterBig) {
arrayFinal[len++] = c;
}
}
if (numbersOption) {
for (char c : numbers) {
arrayFinal[len++] = c;
}
}
if (charactersOption) {
for (char c : charsRandom) {
arrayFinal[len++] = c;
}
}
ThreadLocalRandom rnd = ThreadLocalRandom.current();
for (int i = 0; i < 16; i++) {
System.out.print(arrayFinal[rnd.nextInt(len)]);
}
You can use System.arraycopy() method to . avoid this problem .
such as
int length = 0 ;
if( opcionmenu(answer2)==true){
System.arraycopy(leterBig ,0 ,arrayFinal,length,leterBig.length);
length = length + leterBig.length ;
}
// and so on
for(int i=0;i<16;i++){
int y = ThreadLocalRandom.current().nextInt(0,length );
System.out.print(arrayFinal[y]);
}

Null Cipher -- Java

I'm trying to code the null cipher for a school assignment, and I have no idea what i'm doing wrong.
The cipher is supposed to obtain the char from the number given in the pattern class. If it's a "-1", end the program and return output. If the pattern returns "0", skip the word and move on to the next pattern value. Any other integer and the program should get the char from the word in that place.
So in the example below, the pattern is {1, 0, 0, 1, 5, -1}
And the text is: "Hello, is it me you're looking for".
The output should be : "Hmr"
But i'm getting an out of bounds error, and when I tweak it, it's not printing the correct chars.
The code is below, please help me.
EDIT: I change it so that the runtimeError would disappear, but now i'm getting the incorrect output: "e'[space]"
ArrayList<Character> text;
ArrayList<Character> output;
int outputLen;
ArrayList<Integer> pattern;
public Preform()
{
text = new ArrayList<Character>();
output = new ArrayList<Character>();
pattern = new ArrayList<Integer>();
{
pattern.add(1);
pattern.add(0);
pattern.add(0);
pattern.add(1);
pattern.add(5);
pattern.add(-1);
}
}
public void updateLength()
{
outputLen = output.size();
}
public void stringToChar(String input)
{
for (int i = 0;i < input.length();i++)
{
String value = input.substring(i,i+1);
text.add(value.charAt(0));
}
}
public void printString ()
{
for (int i = 0; i < output.size();i++)
{
System.out.println(output.get(i) + ", ");
}
}
public ArrayList<Character> run()
{
int nullValue = 0;
int textVal = 0;
for (int i = 0; i < pattern.size(); i++)
{
nullValue = pattern.get(i);
if (nullValue == -1)
{
return output;
}
else if (nullValue == 0)
{
textVal = nextWord(textVal);
}
else
{
textVal += nullValue;
char temp = text.get(textVal);
output.add(temp);
textVal = nextWord(textVal);
}
}
return output;
}
public int nextWord (int starting)
{
// go to the next word
int addVal = 0;
do{
starting++;
} while(text.get(starting).equals(' '));
addVal += starting;
return addVal;
}
public static void main (String[] args)
{
Preform event = new Preform();
event.stringToChar("Hello, is it me you're "
+ "looking for");
event.run();
event.printString();
}
Thank you!
For your exception: in run you have to check for pattern.size() not test.size. always look at the line where you get the exception.
a similar issue is in run where you compare text size, but actually mean pattern
Apart from that your code is way to complicated, i.e. instead of nextWord you can simply use "my string".split(" "); and you get an array of string containing each word. There are a few other issues, but that's for you to figure out (its an assignment after all)
Edit: your main logic issue is with the way you use nextWord in run.
First you need to adapt nextowrd to actually do what you want (skip until the next space and then to the start of the next word):
public int nextWord (int starting)
{
// go to the next word
do{
starting++;
} while(!text.get(starting).equals(' '));
// skip the space
starting++;
return starting;
}
and your logic in run where you get the correct char needs to be adapted too:
else
{
char temp = text.get(textVal + nullValue - 1);
output.add(temp);
textVal = nextWord(textVal);
}

How do I count each word in a file using Java

Im trying to write a program with three instance methods, but I cant seem to get it right. My method wordCount returns the number of lines in the file. Not the number of words as its supposed to.
Im just lost in the method mostFrequentWords..
Hope someone can help me out
package opgaver;
import java.util.*;
import java.io.*;
public class TextAnalysis14 {
Scanner file;
int CountWords = 0;
boolean Contains = true;
String[] words;
String[] MFwords;
public TextAnalysis14(String sourceFileName, int maxNoOfWords) {
String wordline;
words = new String[maxNoOfWords];
String[] line;
try {
file = new Scanner(new File(sourceFileName));
} catch (FileNotFoundException e) {
file = new Scanner("");
}
while (file.hasNext()) {
wordline = file.next();
line = wordline.split("[^a-zA -Z]+");
for (int i = 0; i < line.length; i++) {
if (!line[i].equals(" ")) {
words[CountWords] = line[i];
CountWords++;
}
}
}
if (words[CountWords] == (null)) {
for (int i = CountWords; i < maxNoOfWords; i++) {
words[i] = ("empty");
}
}
}
public int wordCount() {
return CountWords;
}
public boolean contains(String word) {
for (int i = 0; i < words.length; i++) {
if (words[i].contains(word)) {
return Contains;
}
}
return false;
}
public String[] mostFrequentWords() {
Arrays.sort(words);
return MFwords;
}
}
Because of my noob status I cannot make a comment but it looks like you have a space in your regex between A and -Z.
try with this.
public static void main(String[] args) {
String str = "this is a space String"; // read all lines in a file
String[] splited = str.split(" ");
List<String> list = new ArrayList<String>();
for(int i = 0;i < splited.length; i++){
if(splited[i].length() > 0){
list.add(splited[i]);
}
}
System.out.println(list.size());
}
by calling wordline = file.next(); you are not reading lines.
In TextAnalysis14 change your condition to file.hasNextLine() and read lines with file.nextLine()
while (file.hasNextLine()) {
wordline = file.nextLine();
....
}
You can try something like that using Java 8:
Stream<String> lines = Files.lines(Paths.get("c:/", "file.txt"));
in wordCount = lines.mapToInt(s -> s.split(' ').length()).sum();
This function just cound a words count in file.

Categories

Resources