So far, I have this code, which, in summary, takes two text files and a specified block size in cmd and standardises the txt files, and then puts them into blocks based on the specified block size.
import java.io.*;
import java.util.*;
public class Plagiarism {
public static void main(String[] args) throws Exception {
//you are not using 'myPlag' anywhere, you can safely remove it
// Plagiarism myPlag = new Plagiarism();
if (args.length == 0) {
System.out.println("Error: No files input");
System.exit(0);
}
String foo = null;
for (int i = 0; i < 2; i++) {
BufferedReader reader = new BufferedReader(new FileReader(args[i]));
foo = simplify(reader);
// System.out.print(foo);
int blockSize = Integer.valueOf(args[2]);
List<String> list = new ArrayList<String>();
for (int k = 0; k < foo.length() - blockSize + 1; k++) {
list.add(foo.substring(k, k + blockSize));
}
// System.out.print(list);
}
}
public static String simplify(BufferedReader input)
throws IOException {
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = input.readLine()) != null) {
sb.append(line.replaceAll("[^a-zA-Z]", "").toLowerCase());
}
return sb.toString();
}
}
The next thing I would like to do is use Horner's polynomial accumulation method (with set value x = 33) to convert each of these blocks into a hash code. I am completely stumped on this and would appreciate some help from you guys!
Thanks for reading, and thanks in advance for any advice given!
Horner's method for hash generation is as simple as
int hash=0;
for(int i=0;i<str.length();i++)
hash = x*hash + str.charAt(i);
Related
I am very new to programming and I am attempting to modify a heap algorithm that I found online. From a previous question, I was able to get the code to work with a PrintWriter, but when attempting to use this function as a method in another class, I get an error because of the constructor. How can this code be modified to work the same, simply without a constructor?
I am not very familiar with programming, so I have tried looking at previous questions. Somehow, I thought of using a nested class (not sure how they work), but to no avail. The method worked when it was in its own class.
// Should be within a class
private PrintWriter _pw;
// This is the part that needs to go.
public HeapAlgo(PrintWriter pw) {
this._pw = pw;
}
public void heapPermutation(String a[], int size, int n) throws IOException {
// if size becomes 1 then prints the obtained
// permutation
if (size == 1)
for (int i=0; i<n; i++) {
System.out.println(a[i] + "");
this._pw.println(a[i] + "");
}
for (int i=0; i<size; i++) {
heapPermutation(a, size-1, n);
// if size is odd, swap first and last
// element
if (size % 2 == 1) {
String temp = a[0];
a[0] = a[size-1];
a[size-1] = temp;
}
// If size is even, swap ith and last
// element
else {
String temp = a[i];
a[i] = a[size-1];
a[size-1] = temp;
}
}
}
public void heap() throws IOException
{
FileWriter fw = new FileWriter("note.txt");
PrintWriter pw = new PrintWriter(fw);
File temp = new File("code.txt");
Scanner file = new Scanner(temp);
String substring = "";
String a[] = new String[4];
a[0] = "" + file.nextLine();
a[1] = "" + file.nextLine();
a[2] = "" + file.nextLine();
a[3] = "" + file.nextLine();
HeapAlgo obj = new HeapAlgo(pw); // Pass in a writer
obj.heapPermutation(a, a.length, a.length);
pw.close();
}
When I run the methods inside a large class I get an error saying \
"error: invalid method declaration; return type required".
Any Help would be greatly appreciated. Thanks.
Edit: I am trying to code this constructor:
public CodeRunner()
{
random();
HeapAlgo.heap(//not sure if anything should go here);
algorithm();
}
where random() creates random strings, and the algorithm function performs an algorithm on all possible iterations of the random string. I am trying to make objects for each set of random strings.
It seems like the following elements should be within a class called HeapAlgo:
The private variable declaration
private PrintWriter _pw;
The constructor itself
public HeapAlgo(PrintWriter pw)
The heapPermutation function
public void heapPermutation(String a[], int size, int n) throws IOException
The last remaining method, heap(), should be placed in some other class (possibly where your main() function is) and called from there.
Alternatively, you could indeed use an inner class. Wrap all the code you provided in a class (maybe called Heap) then wrap the aforementioned three elements in an inner class called HeapAlgo. Something like this (I very quickly typed this up, so there may be errors you need to fix):
public class HeapUtil {
public class HeapAlgo {
private PrintWriter _pw;
// This is the part that needs to go.
public HeapAlgo(PrintWriter pw) {
this._pw = pw;
}
public PrintWriter getPrintWriter(){
return _pw;
}
public void heapPermutation(String a[], int size, int n) throws IOException {
// if size becomes 1 then prints the obtained
// permutation
if (size == 1)
for (int i=0; i<n; i++) {
System.out.println(a[i] + "");
this._pw.println(a[i] + "");
}
for (int i=0; i<size; i++) {
heapPermutation(a, size-1, n);
// if size is odd, swap first and last
// element
if (size % 2 == 1) {
String temp = a[0];
a[0] = a[size-1];
a[size-1] = temp;
}
// If size is even, swap ith and last
// element
else {
String temp = a[i];
a[i] = a[size-1];
a[size-1] = temp;
}
}
}
}
public static HeapAlgo heap() throws IOException
{
FileWriter fw = new FileWriter("note.txt");
PrintWriter pw = new PrintWriter(fw);
File temp = new File("code.txt");
Scanner file = new Scanner(temp);
String substring = "";
String a[] = new String[4];
a[0] = "" + file.nextLine();
a[1] = "" + file.nextLine();
a[2] = "" + file.nextLine();
a[3] = "" + file.nextLine();
HeapAlgo obj = new HeapAlgo(pw); // Pass in a writer
obj.heapPermutation(a, a.length, a.length);
return obj;
}
}
Note that in this case, if you would like to use HeapAlgo outside of this class file, you will need to use Heap.HeapAlgo.
Edit: Try out the code above (I edited it). There may be a few errors since I didn't actually run it.
Usage is as follows:
public CodeRunner(){
random();
// heapAlgo is the heap object
HeapAlgo heapAlgo = HeapUtil.heap();
// this gives you access to the PrintWriter inside the HeapAlgo
PrintWriter printWriter = heapAlgo.getPrintWriter();
// do your other stuff
algorithm();
}
I am trying to create a dictionary out of a .txt file.The problem I think is in my addToDict method. I am trying to resize th array when its full because I am reading from a text file of unknown size but I can only use arrays. I get an out of bounds exception when I am printing the array. I have no idea whats wrong and I have been working on the project for days now. I am also having trouble with my else statement in my addToDict method. It is also and out of bounds exception
import java.io.*;
import java.util.Scanner;
import java.util.regex.*;
public class BuildDict {
static String dict[] = new String[20];
static int index = 0;
public static void main(String args[]) {
readIn();
}
public static void readIn() {
File inFile = new File("alice.txt");
try {
Scanner scan = new Scanner(inFile);
while (scan.hasNext()) {
String word = scan.next();
if (!Character.isUpperCase(word.charAt(0))) {
checkRegex(word);
}
}
scan.close();
} catch (IOException e) {
System.out.println("Error");
}
}
public static void addToDict(String word) {
if (index == dict.length) {
String newAr[] = new String[index * 2];
for (int i = 0; i < index; i++) {
newAr[i] = dict[i];
}
newAr[index] = word;
index++;
dict = newAr;
for (int j = 0; j < index; j++) {
System.out.println(newAr[j]);
}
} else {
dict[index] = word;
index++;
}
}
public static void checkRegex(String word) {
String regex = ("[^A-Za-z]");
Pattern check = Pattern.compile(regex);
Matcher regexMatcher = check.matcher(word);
if (!regexMatcher.find()) {
addToDict(word);
}
}
}
You haven't assigned the new array to dict.
if (index == dict.length) {
for (int i = 0; i < index; i++) {
newAr[i] = dict[i];
}
newAr[index] = word;
index++;
for (int j = 0; j < index; j++) {
System.out.println(newAr[j]);
}
// Assign dict to the new array.
dict = newAr;
} else {
dict[index] = word;
index++;
}
The value of index is 0 when the following statement is executed.
String newAr[] = new String[index*2];
Try revisiting your logic. index should be given a positive value before this method is called. That's why you are getting OutOfBounds.
EDIT: Did you mean to write index+2?
You have
static int index = 0;
You need to change the value of this variable, based on your file, otherwise you will always have an error in this line
String newAr[] = new String[index*2];
Instead of using a array use a arraylist for when you don't know the size of your array. It will save you a lot of trouble. I find they are much easier to work with in general then normal arrays.
ArrayList<String> dict = new ArrayList<>();
dict.add(word);
//displaying values
for( int i = 0; i < dict.size(); i++ ){
System.out.println(dict.get(i));
}
I'm writing a primitive version of programming language reader in Java for custom language that I made and I want to find out easiest way to print content of element from ArrayList that is located between two elements of double quotes. Here is source code:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.ArrayList;
public class PrimitiveCompiler {
public static ArrayList<String> toks = new ArrayList<String>();
public static void main(String[] args) throws FileNotFoundException {
String content = readFile("C:\\program.txt");
tokenize(content);
}
public static String readFile(String filePath) throws FileNotFoundException {
File f = new File(filePath);
Scanner input = new Scanner(f);
StringBuilder b = new StringBuilder();
while (input.hasNextLine()) {
b.append(input.nextLine());
}
input.close();
return b.toString();
}
public static ArrayList<String> tokenize(String fContent) {
int i = 0;
String tok = "";
String contents = fContent.replaceAll(" ", "").replaceAll("\n", "").replaceAll("\t", "");
for(int a = 0; a <= contents.length() - 1; a++) {
tok += contents.charAt(a);
i = a;
if(tokenFinderEquals(tok, "WRITE")) {
toks.add("WRITE");
tok = "";
}
}
System.out.println(toks);
return null;
}
public static boolean tokenFinderEquals(String s1, String s2) {
if(s1.equalsIgnoreCase(s2)) {
return true;
}
return false;
}
}
Content of text file right now is just WRITE and it succesfully finds it and add it to ArrayList. What I want to do is to count double quotes and when two double quotes are found in ArrayList to print out every element between them. Is it posibble or there's another, easier way to do this? Thanks in advance!
You'll need some kind of state to keep track of whether or not you're inside of a quote. For example:
boolean inQuote = false;
for (int a = 0; a <= contents.length() - 1; a++) {
char c = contents.charAt(a);
if (c == '"') {
// Found a quote character. Are we at the beginning or the end?
if (!inQuote) {
// Start of a quoted string.
inQuote = true;
} else {
// End of a quoted string.
inQuote = false;
toks.add(tok);
tok = "";
}
// Either way, we don't add the quote char to `tok`.
} else {
tok += c;
if (!inQuote && tokenFinderEquals(tok, "WRITE") {
// Only look for "WRITE" when outside of a quoted string.
toks.add(tok);
tok = "";
}
}
}
Using a simple loop like this can start to get tough as you add more cases, though. You may want to look into writing a recursive descent parser.
so here is all of my code for reference.
import java.io.*;
import java.util.*;
public class Plagiarism {
public static void main(String[] args) throws Exception {
//you are not using 'myPlag' anywhere, you can safely remove it
// Plagiarism myPlag = new Plagiarism();
if (args.length == 0) {
System.out.println("Error: No files input");
System.exit(0);
}
String foo = null;
for (int i = 0; i < 2; i++) {
BufferedReader reader = new BufferedReader(new FileReader(args[i]));
foo = simplify(reader);
// System.out.print(foo);
int blockSize = Integer.valueOf(args[2]);
List<String> list = new ArrayList<String>();
for (int k = 0; k < foo.length() - blockSize + 1; k++) {
list.add(foo.substring(k, k + blockSize));
int x = 33;
int hash = 0;
for (String str: list) {
for (int o = 0; o < str.length(); o++) {
hash = 33*hash + str.charAt(o);
}
}
System.out.println(hash);
/* List<Integer> newList = new ArrayList<Integer>(list.size());
for (String myInt : list) {
newList.add(Integer.parseInt(myInt));
int x = 33;
int hash = 0;
for (int o = 0; o < newList.size(); o++) {
hash = x*hash + newList.get(o);
}
} */
}
// System.out.print(list);
}
}
public static String simplify(BufferedReader input)
throws IOException {
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = input.readLine()) != null) {
sb.append(line.replaceAll("[^a-zA-Z]", "").toLowerCase());
}
return sb.toString();
}
}
Although I want to in particular focus on this part:
int x = 33;
int hash = 0;
for (String str: list) {
for (int o = 0; o < str.length(); o++) {
hash = 33*hash + str.charAt(o);
}
}
System.out.println(hash);
Some of the values returned are negative hash values. Why is this? Even when the block size is small (ie. 2) it is still doing it. I know it is something to do with "modulo p" perhaps? I am using Horner's polynomial method here.
I'm wondering if I could get some help on this?
Thanks guys in advance.
Negative values are caused by integer overflow. Any integer number with the most significant bit set to 1 is interpreted as a negative number.
Hash codes do not signify anything in particular: all that is required of them is to be the same for equal values, and try to be as different as possible for non-equal values. That is why integer overflow can be safely ignored when dealing with hash codes.
A hash is an int type which can take negative values. A negative value should not concern you.
When a java int gets too big (just over 2 billion), it will wrap round to a negative value. That is what is happening here: your multiplication of 33 will eventually cause this wraparound to a negative.
I am reading bunch of integers separated by space or newlines from the standard in using Scanner(System.in).
Is there any faster way of doing this in Java?
Is there any faster way of doing this in Java?
Yes. Scanner is fairly slow (at least according to my experience).
If you don't need to validate the input, I suggest you just wrap the stream in a BufferedInputStream and use something like String.split / Integer.parseInt.
A small comparison:
Reading 17 megabytes (4233600 numbers) using this code
Scanner scanner = new Scanner(System.in);
while (scanner.hasNext())
sum += scanner.nextInt();
took on my machine 3.3 seconds. while this snippet
BufferedReader bi = new BufferedReader(new InputStreamReader(System.in));
String line;
while ((line = bi.readLine()) != null)
for (String numStr: line.split("\\s"))
sum += Integer.parseInt(numStr);
took 0.7 seconds.
By messing up the code further (iterating over line with String.indexOf / String.substring) you can get it down to about 0.1 seconds quite easily, but I think I've answered your question and I don't want to turn this into some code golf.
I created a small InputReader class which works just like Java's Scanner but outperforms it in speed by many magnitudes, in fact, it outperforms the BufferedReader as well. Here is a bar graph which shows the performance of the InputReader class I have created reading different types of data from standard input:
Here are two different ways of finding the sum of all the numbers coming from System.in using the InputReader class:
int sum = 0;
InputReader in = new InputReader(System.in);
// Approach #1
try {
// Read all strings and then parse them to integers (this is much slower than the next method).
String strNum = null;
while( (strNum = in.nextString()) != null )
sum += Integer.parseInt(strNum);
} catch (IOException e) { }
// Approach #2
try {
// Read all the integers in the stream and stop once an IOException is thrown
while( true ) sum += in.nextInt();
} catch (IOException e) { }
If you asking from competitive programming point of view, where if the submission is not fast enough, it will be TLE.
Then you can check the following method to retrieve String from System.in.
I have taken from one of the best coder in java(competitive sites)
private String ns()
{
int b = skip();
StringBuilder sb = new StringBuilder();
while(!(isSpaceChar(b))){ // when nextLine, (isSpaceChar(b) && b != ' ')
sb.appendCodePoint(b);
b = readByte();
}
return sb.toString();
}`
You can read from System.in in a digit by digit way. Look at this answer: https://stackoverflow.com/a/2698772/3307066.
I copy the code here (barely modified). Basically, it reads integers, separated by anything that is not a digit. (Credits to the original author.)
private static int readInt() throws IOException {
int ret = 0;
boolean dig = false;
for (int c = 0; (c = System.in.read()) != -1; ) {
if (c >= '0' && c <= '9') {
dig = true;
ret = ret * 10 + c - '0';
} else if (dig) break;
}
return ret;
}
In my problem, this code was approx. 2 times faster than using StringTokenizer, which was already faster than String.split(" ").
(The problem involved reading 1 million integers of up to 1 million each.)
StringTokenizer is a much faster way of reading string input separated by tokens.
Check below example to read a string of integers separated by space and store in arraylist,
String str = input.readLine(); //read string of integers using BufferedReader e.g. "1 2 3 4"
List<Integer> list = new ArrayList<>();
StringTokenizer st = new StringTokenizer(str, " ");
while (st.hasMoreTokens()) {
list.add(Integer.parseInt(st.nextToken()));
}
In programming perspective this customized Scan and Print class is way better than Java inbuilt Scanner and BufferedReader classes.
import java.io.InputStream;
import java.util.InputMismatchException;
import java.io.IOException;
public class Scan
{
private byte[] buf = new byte[1024];
private int total;
private int index;
private InputStream in;
public Scan()
{
in = System.in;
}
public int scan() throws IOException
{
if(total < 0)
throw new InputMismatchException();
if(index >= total)
{
index = 0;
total = in.read(buf);
if(total <= 0)
return -1;
}
return buf[index++];
}
public int scanInt() throws IOException
{
int integer = 0;
int n = scan();
while(isWhiteSpace(n)) /* remove starting white spaces */
n = scan();
int neg = 1;
if(n == '-')
{
neg = -1;
n = scan();
}
while(!isWhiteSpace(n))
{
if(n >= '0' && n <= '9')
{
integer *= 10;
integer += n-'0';
n = scan();
}
else
throw new InputMismatchException();
}
return neg*integer;
}
public String scanString()throws IOException
{
StringBuilder sb = new StringBuilder();
int n = scan();
while(isWhiteSpace(n))
n = scan();
while(!isWhiteSpace(n))
{
sb.append((char)n);
n = scan();
}
return sb.toString();
}
public double scanDouble()throws IOException
{
double doub=0;
int n=scan();
while(isWhiteSpace(n))
n=scan();
int neg=1;
if(n=='-')
{
neg=-1;
n=scan();
}
while(!isWhiteSpace(n)&& n != '.')
{
if(n>='0'&&n<='9')
{
doub*=10;
doub+=n-'0';
n=scan();
}
else throw new InputMismatchException();
}
if(n=='.')
{
n=scan();
double temp=1;
while(!isWhiteSpace(n))
{
if(n>='0'&&n<='9')
{
temp/=10;
doub+=(n-'0')*temp;
n=scan();
}
else throw new InputMismatchException();
}
}
return doub*neg;
}
public boolean isWhiteSpace(int n)
{
if(n == ' ' || n == '\n' || n == '\r' || n == '\t' || n == -1)
return true;
return false;
}
public void close()throws IOException
{
in.close();
}
}
And the customized Print class can be as follows
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
public class Print
{
private BufferedWriter bw;
public Print()
{
this.bw = new BufferedWriter(new OutputStreamWriter(System.out));
}
public void print(Object object)throws IOException
{
bw.append("" + object);
}
public void println(Object object)throws IOException
{
print(object);
bw.append("\n");
}
public void close()throws IOException
{
bw.close();
}
}
You can use BufferedReader for reading data
BufferedReader inp = new BufferedReader(new InputStreamReader(System.in));
int t = Integer.parseInt(inp.readLine());
while(t-->0){
int n = Integer.parseInt(inp.readLine());
int[] arr = new int[n];
String line = inp.readLine();
String[] str = line.trim().split("\\s+");
for(int i=0;i<n;i++){
arr[i] = Integer.parseInt(str[i]);
}
And for printing use StringBuffer
StringBuffer sb = new StringBuffer();
for(int i=0;i<n;i++){
sb.append(arr[i]+" ");
}
System.out.println(sb);
Here is the full version fast reader and writer. I also used Buffering.
import java.io.*;
import java.util.*;
public class FastReader {
private static StringTokenizer st;
private static BufferedReader in;
private static PrintWriter pw;
public static void main(String[] args) throws IOException {
in = new BufferedReader(new InputStreamReader(System.in));
pw = new PrintWriter(new BufferedWriter(new OutputStreamWriter(System.out)));
st = new StringTokenizer("");
pw.close();
}
private static int nextInt() throws IOException {
return Integer.parseInt(next());
}
private static long nextLong() throws IOException {
return Long.parseLong(next());
}
private static double nextDouble() throws IOException {
return Double.parseDouble(next());
}
private static String next() throws IOException {
while(!st.hasMoreElements() || st == null){
st = new StringTokenizer(in.readLine());
}
return st.nextToken();
}
}
Reading from disk, again and again, makes the Scanner slow. I like to use the combination of BufferedReader and Scanner to get the best of both worlds. i.e. speed of BufferredReader and rich and easy API of the scanner.
Scanner scanner = new Scanner(new BufferedReader(new InputStreamReader(System.in)));