I need to decrypt a file efficiently [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
I am trying to decrypt an encrypted file with unknown key - the only thing I know about it is that the key is an integer x, 0 <= x < 1010 (i.e. a maximum of 10 decimal digits).
public static String enc(String msg, long key) {
String ans = "";
Random rand = new Random(key);
for (int i = 0; i < msg.length(); i = i + 1) {
char c = msg.charAt(i);
int s = c;
int rd = rand.nextInt() % (256 * 256);
int s2 = s ^ rd;
char c2 = (char) (s2);
ans += c2;
}
return ans;
}
private static String tryToDecode(String string) {
String returnedString = "";
long key;
String msg = reader(string);
for (long i = 0; i <= 999999999; i++) {
System.out.println("decoding message with key + " + i);
key = i;
System.out.println("decoding with key: " + i + "\n" + enc(msg, key));
}
return returnedString;
}
I expect to find the plain text
The program works very slowly, is there any way to make it more efficient?

You can use Parallel Array Operations added in JAVA 8 if you are using Java 8 to achive this.
The best fit for you would be to use Spliterator
public void spliterate() {
System.out.println("\nSpliterate:");
int[] src = getData();
Spliterator<Integer> spliterator = Arrays.spliterator(src);
spliterator.forEachRemaining( n -> action(n) );
}
public void action(int value) {
System.out.println("value:"+value);
// Perform some real work on this data here...
}
I am still not clear about your situation. Here some great tutorials and articles to figure out which parallel array operations of java 8 is going to help you ?
http://www.drdobbs.com/jvm/parallel-array-operations-in-java-8/240166287
https://blog.rapid7.com/2015/10/28/java-8-introduction-to-parallelism-and-spliterator/

First things first: You can't println billions of lines. This will take forever, and it's pointless - you won't be able to see the text as it scrolls by, and your buffer won't save billion of lines so you couldn't scroll back up later even if you wanted to. If you prefer (and don't mind it being 2-3% slower than it otherwise would be), you can output once every hundred million keys, just so you can verify your program is making progress.
You can optimize things by not concatenating Strings inside the loop. Strings are immutable, so the old code was creating a rather large number of Strings, especially in the enc method. Normally I'd use a StringBuilder, but in this case a simple character array will meet our needs.
And there's one more thing we need to do that your current code doesn't do: Detect when we have the answer. If we assume that the message will only contain characters from 0-127 with no Unicode or extended ASCII, then we know we have a possible answer when the entire message contains only characters in this range. And we can also use this to further optimize, as we can then immediately discard any message that has a character outside of this range. We don't even have to finish decoding it and can move on to the next key. (If the message is of any length, the odds are that only one key will produce a decoded message with characters in that range - but it's not guaranteed, which is why I do not stop when I get to a valid message. You could probably do that, though.)
Due to the way random numbers are generated in Java, anything in the seed above 32 bits is not used by the encoding/decoding algorithm. So you only need to go up to 4294967295 instead of 9999999999. (This also means the key that was originally used to encode the message might not be the key this program uses to decode it, since 2-3 keys in the 10 digit range will produce the same encoding/decoding.)
private static String tryToDecode4(String msg) {
String returnedString = "";
for (long i=0; i<=4294967295l; i++)
{
if (i % 100000000 == 0) // This part is just to see that it's making progress. Remove if desired for a small speed gain.
System.out.println("Trying " + i);
char[] decoded = enc4(msg, i);
if (decoded == null)
continue;
returnedString = String.valueOf(decoded);
System.out.println("decoding with key: " + i + " " + returnedString);
}
return returnedString;
}
private static char[] enc4(String msg, long key) {
char[] ansC = new char[msg.length()];
Random rand = new Random(key);
for(int i=0;i<msg.length();i=i+1)
{
char c = msg.charAt(i);
int s = c;
int rd = rand.nextInt()%(256*256);
int s2 = s^rd;
char c2 = (char)(s2);
if (c2 > 127)
return null;
ansC[i] = c2;
}
return ansC;
}
This code finished running in a little over 3 minutes on my machine, with a message of "Hello World".
This code will not work well for very short messages (3-4 characters or less.) It will not work if the message contains Unicode or extended ASCII, although it could easily be modified to do so if you know the range of characters that might be in the message.

Related

How to generate 1000 unique email-ids using java

My requirement is to generate 1000 unique email-ids in Java. I have already generated random Text and using for loop I'm limiting the number of email-ids to be generated. Problem is when I execute 10 email-ids are generated but all are same.
Below is the code and output:
public static void main() {
first fr = new first();
String n = fr.genText()+"#mail.com";
for (int i = 0; i<=9; i++) {
System.out.println(n);
}
}
public String genText() {
String randomText = "abcdefghijklmnopqrstuvwxyz";
int length = 4;
String temp = RandomStringUtils.random(length, randomText);
return temp;
}
and output is:
myqo#mail.com
myqo#mail.com
...
myqo#mail.com
When I execute the same above program I get another set of mail-ids. Example: instead of 'myqo' it will be 'bfta'. But my requirement is to generate different unique ids.
For Example:
myqo#mail.com
bfta#mail.com
kjuy#mail.com
Put your String initialization in the for statement:
for (int i = 0; i<=9; i++) {
String n = fr.genText()+"#mail.com";
System.out.println(n);
}
I would like to rewrite your method a little bit:
public String generateEmail(String domain, int length) {
return RandomStringUtils.random(length, "abcdefghijklmnopqrstuvwxyz") + "#" + domain;
}
And it would be possible to call like:
generateEmail("gmail.com", 4);
As I understood, you want to generate unique 1000 emails, then you would be able to do this in a convenient way by Stream API:
Stream.generate(() -> generateEmail("gmail.com", 4))
.limit(1000)
.collect(Collectors.toSet())
But the problem still exists. I purposely collected a Stream<String> to a Set<String> (which removes duplicates) to find out its size(). As you may see, the size is not always equals 1000
999
1000
997
that means your algorithm returns duplicated values even for such small range.
Therefore, you'd better research already written email generators for Java or improve your own (for example, by adding numbers, some special characters that, in turn, will generate a plenty of exceptions).
If you are planning to use MockNeat, the feature for implementing email strings is already implemented.
Example 1:
String corpEmail = mock.emails().domain("startup.io").val();
// Possible Output: tiptoplunge#startup.io
Example 2:
String domsEmail = mock.emails().domains("abc.com", "corp.org").val();
// Possible Output: funjulius#corp.org
Note: mock is the default "mocking" object.
To guarantee uniqueness you could use a counter as part of the email address:
myqo0000#mail.com
bfta0001#mail.com
kjuy0002#mail.com
If you want to stick to letters only then convert the counter to base 26 representation using 'a' to 'z' as the digits.

Convert python's struct.unpack code to java

I am trying to integrate Beuerer BF480 device with java program. I found python code which converts the received data through serial USB interface in the required format. Below is the code snippet which does the job in python:
frmt = "!" + "H"*64
x = struct.unpack(frmt, byte_array)
Could someone help me in understanding these 2 lines of code? If anyone knows java equivalent of this, it would be great to know.
I hate to answer my own question.
But, as I got it working for now, I would like to share the solution to this problem. Basically the python code is trying to convert byte array into Hexadecimal values. The expression "!" + "H"*64 tells that its BIG_ENDIAN expression (start reading the byte array from the left) and convert bytes array into 64 Hex values.
I did not find the equivalent code in java which will do this job, but after struggling to decode the byte array I am able to get the intended results using below code.
public static int[] unpack(final byte[] byte_array) {
final int[] integerReadings = new int[byte_array.length / 2];
for(int counter = 0, integerCounter = 0; counter < byte_array.length;) {
integerReadings[integerCounter] = convertTwoBytesToInteger(byte_array[counter], byte_array[counter + 1]);
counter += 2;
integerCounter++;
}
return integerReadings;
}
private static int convertTwoBytesToInteger(final byte byte1, final byte byte2) {
final int unsignedInteger1 = getUnsignedInteger(byte1);
final int unsignedInteger2 = getUnsignedInteger(byte2);
return unsignedInteger1 * 256 + unsignedInteger2;
}
private static int getUnsignedInteger(final byte b) {
int unsignedInteger = b;
if(b < 0) {
unsignedInteger = b + 256;
}
return unsignedInteger;
}
This code is about byte level manipulation of the data stream. However, it solves my problem as of now and I am getting the expected results. If there is a better solution to this, I would definitely like to implement the same.

Calculate Dice Roll from Text Field

QUESTION:
How can I read the string "d6+2-d4" so that each d# will randomly generate a number within the parameter of the dice roll?
CLARIFIER:
I want to read a string and have it so when a d# appears, it will randomly generate a number such as to simulate a dice roll. Then, add up all the rolls and numbers to get a total. Much like how Roll20 does with their /roll command for an example. If !clarifying {lstThen.add("look at the Roll20 and play with the /roll command to understand it")} else if !understandStill {lstThen.add("I do not know what to say, someone else could try explaining it better...")}
Info:
I was making a Java program for Dungeons and Dragons, only to find that I have come across a problem in figuring out how to calculate the user input: I do not know how to evaluate a string such as this.
I theorize that I may need Java's eval at the end. I do know what I want to happen/have a theory on how to execute (this is more so PseudoCode than Java):
Random rand = new Random();
int i = 0;
String toEval;
String char;
String roll = txtField.getText();
while (i<roll.length) {
check if character at i position is a d, then highlight the numbers
after d until it comes to a special character/!aNumber
// so if d was found before 100, it will then highlight 100 and stop
// if the character is a symbol or the end of the string
if d appears {
char = rand.nextInt(#);
i + #'s of places;
// so when i++ occurs, it will move past whatever d# was in case
// d# was something like d100, d12, or d5291
} else {
char = roll.length[i];
}
toEval = toEval + char;
i++;
}
perform evaluation method on toEval to get a resulting number
list.add(roll + " = " + evaluated toEval);
EDIT:
With weston's help, I have honed in on what is likely needed, using a splitter with an array, it can detect certain symbols and add it into a list. However, it is my fault for not clarifying on what else was needed. The pseudocode above doesn't helpfully so this is what else I need to figure out.
roll.split("(+-/*^)");
As this part is what is also tripping me up. Should I make splits where there are numbers too? So an equation like:
String[] numbers = roll.split("(+-/*^)");
String[] symbols = roll.split("1234567890d")
// Rough idea for long way
loop statement {
loop to check for parentheses {
set operation to be done first
}
if symbol {
loop for symbol check {
perform operations
}}} // ending this since it looks like a bad way to do it...
// Better idea, originally thought up today (5/11/15)
int val[];
int re = 1;
loop {
if (list[i].containsIgnoreCase(d)) {
val[]=list[i].splitIgnoreCase("d");
list[i] = 0;
while (re <= val[0]) {
list[i] = list[i] + (rand.nextInt(val[1]) + 1);
re++;
}
}
}
// then create a string out of list[]/numbers[] and put together with
// symbols[] and use Java's evaluator for the String
wenton had it, it just seemed like it wasn't doing it for me (until I realised I wasn't specific on what I wanted) so basically to update, the string I want evaluated is (I know it's a little unorthodox, but it's to make a point; I also hope this clarifies even further of what is needed to make it work):
(3d12^d2-2)+d4(2*d4/d2)
From reading this, you may see the spots that I do not know how to perform very well... But that is why I am asking all you lovely, smart programmers out there! I hope I asked this clearly enough and thank you for your time :3
The trick with any programming problem is to break it up and write a method for each part, so below I have a method for rolling one dice, which is called by the one for rolling many.
private Random rand = new Random();
/**
* #param roll can be a multipart roll which is run and added up. e.g. d6+2-d4
*/
public int multiPartRoll(String roll) {
String[] parts = roll.split("(?=[+-])"); //split by +-, keeping them
int total = 0;
for (String partOfRoll : parts) { //roll each dice specified
total += singleRoll(partOfRoll);
}
return total;
}
/**
* #param roll can be fixed value, examples -1, +2, 15 or a dice to roll
* d6, +d20 -d100
*/
public int singleRoll(String roll) {
int di = roll.indexOf('d');
if (di == -1) //case where has no 'd'
return Integer.parseInt(roll);
int diceSize = Integer.parseInt(roll.substring(di + 1)); //value of string after 'd'
int result = rand.nextInt(diceSize) + 1; //roll the dice
if (roll.startsWith("-")) //negate if nessasary
result = -result;
return result;
}

efficient way to add two binary string including padding if necessary in Java? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I am writing the code that returns the binary string when two binaries are added together.
the first step involves checking if the length of two binary string is same, if not, pad them with '0' in the left side; eg : 00101 + 011001 should give 000101 + 011001 for padding;
I have written the code for this part. It works fine, but wondering if there is efficient way to achieve the same.
public class Common{
public static void main(String[] args){
//calling the addBinary will only print the padded binaries correctly
String result=addBinary("00101", "001010");
}
public static String addBinary(String binary1, String binary2){
String result - "";
char[] bin1= binary1.toCharArray();
char[] bin2=binary2.toCharArray();
int[] bin1int=null;
int[] bin2int=null;
System.out.println("---------------before-----------------");
System.out.println(Arrays.toString(bin1));
System.out.println(Arrays.toString(bin2));
System.out.println("-----------------After---------------------");
int bin1_len=bin1.length;
int bin2_len=bin2.length;
if (bin1_len<bin2_len){
bin1int=new int[bin2_len];
bin2int=new int[bin2_len];
int j=bin2_len-bin1_len;
for (char c: binary1.toCharArray()){
bin1int[j]=c-'0';
j++;
}
int k=0;
for (char c: bin2 ){
bin2int[k]=c-'0';
k++;
}
}
System.out.println(Arrays.toString(bin1int));
System.out.println(Arrays.toString(bin2int));
//if (bin2_len<bin1_len)
//reverse implementation of above;
//yet to implement the addition calculation.
return result;
}
}
I am interested in efficient padding. Any tips/suggestions?
Just do:
// Read the binary string to integer and add the two together
int total = Integer.parseInt(binary1, 2) + Integer.parseInt(binary2, 2);
// Convert the resulting integer back to a binary string
String str = new Integer(total).toBinaryString();
For more than 32 bits you will need to use Long.
If you really do want to pad the Strings just look at the two sizes and then left pad the shorter one with zeros to the length of the longer one.
String shorter, String longer;
if (binary1.length() < binary2.length) {
shorter = binary1;
longer = binary2;
} else {
shorter = binary2;
longer = binary1;
}
// Or any of the other multitude of ways to left pad a String.
shorter = org.apache.commons.lang.StringUtils.leftPad(smaller, larger.length(), '0')
If you are just looking for an efficient way to pad, I think the following would be a simpler and more efficient way.
if (bin1_len<bin2_len){
int j=bin2_len-bin1_len;
StringBuilder s = new StringBuilder();
for(int i = 0;i<j;i++){
s.append('0');
}
s.append(binary1); //s now contains the 0 padded binary string
}

Efficient way to search a stream for a string

Let's suppose that have a stream of text (or Reader in Java) that I'd like to check for a particular string. The stream of text might be very large so as soon as the search string is found I'd like to return true and also try to avoid storing the entire input in memory.
Naively, I might try to do something like this (in Java):
public boolean streamContainsString(Reader reader, String searchString) throws IOException {
char[] buffer = new char[1024];
int numCharsRead;
while((numCharsRead = reader.read(buffer)) > 0) {
if ((new String(buffer, 0, numCharsRead)).indexOf(searchString) >= 0)
return true;
}
return false;
}
Of course this fails to detect the given search string if it occurs on the boundary of the 1k buffer:
Search text: "stackoverflow"
Stream buffer 1: "abc.........stack"
Stream buffer 2: "overflow.......xyz"
How can I modify this code so that it correctly finds the given search string across the boundary of the buffer but without loading the entire stream into memory?
Edit: Note when searching a stream for a string, we're trying to minimise the number of reads from the stream (to avoid latency in a network/disk) and to keep memory usage constant regardless of the amount of data in the stream. Actual efficiency of the string matching algorithm is secondary but obviously, it would be nice to find a solution that used one of the more efficient of those algorithms.
There are three good solutions here:
If you want something that is easy and reasonably fast, go with no buffer, and instead implement a simple nondeterminstic finite-state machine. Your state will be a list of indices into the string you are searching, and your logic looks something like this (pseudocode):
String needle;
n = needle.length();
for every input character c do
add index 0 to the list
for every index i in the list do
if c == needle[i] then
if i + 1 == n then
return true
else
replace i in the list with i + 1
end
else
remove i from the list
end
end
end
This will find the string if it exists and you will never need a
buffer.
Slightly more work but also faster: do an NFA-to-DFA conversion that figures out in advance what lists of indices are possible, and assign each one to a small integer. (If you read about string search on Wikipedia, this is called the powerset construction.) Then you have a single state and you make a state-to-state transition on each incoming character. The NFA you want is just the DFA for the string preceded with a state that nondeterministically either drops a character or tries to consume the current character. You'll want an explicit error state as well.
If you want something faster, create a buffer whose size is at least twice n, and user Boyer-Moore to compile a state machine from needle. You'll have a lot of extra hassle because Boyer-Moore is not trivial to implement (although you'll find code online) and because you'll have to arrange to slide the string through the buffer. You'll have to build or find a circular buffer that can 'slide' without copying; otherwise you're likely to give back any performance gains you might get from Boyer-Moore.
I did a few changes to the Knuth Morris Pratt algorithm for partial searches. Since the actual comparison position is always less or equal than the next one there is no need for extra memory. The code with a Makefile is also available on github and it is written in Haxe to target multiple programming languages at once, including Java.
I also wrote a related article: searching for substrings in streams: a slight modification of the Knuth-Morris-Pratt algorithm in Haxe. The article mentions the Jakarta RegExp, now retired and resting in the Apache Attic. The Jakarta Regexp library “match” method in the RE class uses a CharacterIterator as a parameter.
class StreamOrientedKnuthMorrisPratt {
var m: Int;
var i: Int;
var ss:
var table: Array<Int>;
public function new(ss: String) {
this.ss = ss;
this.buildTable(this.ss);
}
public function begin() : Void {
this.m = 0;
this.i = 0;
}
public function partialSearch(s: String) : Int {
var offset = this.m + this.i;
while(this.m + this.i - offset < s.length) {
if(this.ss.substr(this.i, 1) == s.substr(this.m + this.i - offset,1)) {
if(this.i == this.ss.length - 1) {
return this.m;
}
this.i += 1;
} else {
this.m += this.i - this.table[this.i];
if(this.table[this.i] > -1)
this.i = this.table[this.i];
else
this.i = 0;
}
}
return -1;
}
private function buildTable(ss: String) : Void {
var pos = 2;
var cnd = 0;
this.table = new Array<Int>();
if(ss.length > 2)
this.table.insert(ss.length, 0);
else
this.table.insert(2, 0);
this.table[0] = -1;
this.table[1] = 0;
while(pos < ss.length) {
if(ss.substr(pos-1,1) == ss.substr(cnd, 1))
{
cnd += 1;
this.table[pos] = cnd;
pos += 1;
} else if(cnd > 0) {
cnd = this.table[cnd];
} else {
this.table[pos] = 0;
pos += 1;
}
}
}
public static function main() {
var KMP = new StreamOrientedKnuthMorrisPratt("aa");
KMP.begin();
trace(KMP.partialSearch("ccaabb"));
KMP.begin();
trace(KMP.partialSearch("ccarbb"));
trace(KMP.partialSearch("fgaabb"));
}
}
The Knuth-Morris-Pratt search algorithm never backs up; this is just the property you want for your stream search. I've used it before for this problem, though there may be easier ways using available Java libraries. (When this came up for me I was working in C in the 90s.)
KMP in essence is a fast way to build a string-matching DFA, like Norman Ramsey's suggestion #2.
This answer applied to the initial version of the question where the key was to read the stream only as far as necessary to match on a String, if that String was present. This solution would not meet the requirement to guarantee fixed memory utilisation, but may be worth considering if you have found this question and are not bound by that constraint.
If you are bound by the constant memory usage constraint, Java stores arrays of any type on the heap, and as such nulling the reference does not deallocate memory in any way; I think any solution involving arrays in a loop will consume memory on the heap and require GC.
For simple implementation, maybe Java 5's Scanner which can accept an InputStream and use a java.util.regex.Pattern to search the input for might save you worrying about the implementation details.
Here's an example of a potential implementation:
public boolean streamContainsString(Reader reader, String searchString)
throws IOException {
Scanner streamScanner = new Scanner(reader);
if (streamScanner.findWithinHorizon(searchString, 0) != null) {
return true;
} else {
return false;
}
}
I'm thinking regex because it sounds like a job for a Finite State Automaton, something that starts in an initial state, changing state character by character until it either rejects the string (no match) or gets to an accept state.
I think this is probably the most efficient matching logic you could use, and how you organize the reading of the information can be divorced from the matching logic for performance tuning.
It's also how regexes work.
Instead of having your buffer be an array, use an abstraction that implements a circular buffer. Your index calculation will be buf[(next+i) % sizeof(buf)], and you'll have to be careful to full the buffer one-half at a time. But as long as the search string fits in half the buffer, you'll find it.
I believe the best solution to this problem is to try to keep it simple. Remember, beacause I'm reading from a stream, I want to keep the number of reads from the stream to a minimum (as network or disk latency may be an issue) while keeping the amount of memory used constant (as the stream may be very large in size). Actual efficiency of the string matching is not the number one goal (as that has been studied to death already).
Based on AlbertoPL's suggestion, here's a simple solution that compares the buffer against the search string character by character. The key is that because the search is only done one character at a time, no back tracking is needed and therefore no circular buffers, or buffers of a particular size are needed.
Now, if someone can come up with a similar implementation based on Knuth-Morris-Pratt search algorithm then we'd have a nice efficient solution ;)
public boolean streamContainsString(Reader reader, String searchString) throws IOException {
char[] buffer = new char[1024];
int numCharsRead;
int count = 0;
while((numCharsRead = reader.read(buffer)) > 0) {
for (int c = 0; c < numCharsRead; c++) {
if (buffer[c] == searchString.charAt(count))
count++;
else
count = 0;
if (count == searchString.length()) return true;
}
}
return false;
}
If you're not tied to using a Reader, then you can use Java's NIO API to efficiently load the file. For example (untested, but should be close to working):
public boolean streamContainsString(File input, String searchString) throws IOException {
Pattern pattern = Pattern.compile(Pattern.quote(searchString));
FileInputStream fis = new FileInputStream(input);
FileChannel fc = fis.getChannel();
int sz = (int) fc.size();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, sz);
CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
CharBuffer cb = decoder.decode(bb);
Matcher matcher = pattern.matcher(cb);
return matcher.matches();
}
This basically mmap()'s the file to search and relies on the operating system to do the right thing regarding cache and memory usage. Note however that map() is more expensive the just reading the file in to a large buffer for files less than around 10 KiB.
A very fast searching of a stream is implemented in the RingBuffer class from the Ujorm framework. See the sample:
Reader reader = RingBuffer.createReader("xxx ${abc} ${def} zzz");
String word1 = RingBuffer.findWord(reader, "${", "}");
assertEquals("abc", word1);
String word2 = RingBuffer.findWord(reader, "${", "}");
assertEquals("def", word2);
String word3 = RingBuffer.findWord(reader, "${", "}");
assertEquals("", word3);
The single class implementation is available on the SourceForge:
For more information see the link.
Implement a sliding window. Have your buffer around, move all elements in the buffer one forward and enter a single new character in the buffer at the end. If the buffer is equal to your searched word, it is contained.
Of course, if you want to make this more efficient, you can look at a way to prevent moving all elements in the buffer around, for example by having a cyclic buffer and a representation of the strings which 'cycles' the same way the buffer does, so you only need to check for content-equality. This saves moving all elements in the buffer.
I think you need to buffer a small amount at the boundary between buffers.
For example if your buffer size is 1024 and the length of the SearchString is 10, then as well as searching each 1024-byte buffer you also need to search each 18-byte transition between two buffers (9 bytes from the end of the previous buffer concatenated with 9 bytes from the start of the next buffer).
I'd say switch to a character by character solution, in which case you'd scan for the first character in your target text, then when you find that character increment a counter and look for the next character. Every time you don't find the next consecutive character restart the counter. It would work like this:
public boolean streamContainsString(Reader reader, String searchString) throws IOException {
char[] buffer = new char[1024];
int numCharsRead;
int count = 0;
while((numCharsRead = reader.read(buffer)) > 0) {
if (buffer[numCharsRead -1] == searchString.charAt(count))
count++;
else
count = 0;
if (count == searchString.size())
return true;
}
return false;
}
The only problem is when you're in the middle of looking through characters... in which case there needs to be a way of remembering your count variable. I don't see an easy way of doing so except as a private variable for the whole class. In which case you would not instantiate count inside this method.
You might be able to implement a very fast solution using Fast Fourier Transforms, which, if implemented properly, allow you to do string matching in times O(nlog(m)), where n is the length of the longer string to be matched, and m is the length of the shorter string. You could, for example, perform FFT as soon as you receive an stream input of length m, and if it matches, you can return, and if it doesn't match, you can throw away the first character in the stream input, wait for a new character to appear through the stream, and then perform FFT again.
You can increase the speed of search for very large strings by using some string search algorithm
If you're looking for a constant substring rather than a regex, I'd recommend Boyer-Moore. There's plenty of source code on the internet.
Also, use a circular buffer, to avoid think too hard about buffer boundaries.
Mike.
I also had a similar problem: skip bytes from the InputStream until specified string (or byte array). This is the simple code based on circular buffer. It is not very efficient but works for my needs:
private static boolean matches(int[] buffer, int offset, byte[] search) {
final int len = buffer.length;
for (int i = 0; i < len; ++i) {
if (search[i] != buffer[(offset + i) % len]) {
return false;
}
}
return true;
}
public static void skipBytes(InputStream stream, byte[] search) throws IOException {
final int[] buffer = new int[search.length];
for (int i = 0; i < search.length; ++i) {
buffer[i] = stream.read();
}
int offset = 0;
while (true) {
if (matches(buffer, offset, search)) {
break;
}
buffer[offset] = stream.read();
offset = (offset + 1) % buffer.length;
}
}
Here is my implementation:
static boolean containsKeywordInStream( Reader ir, String keyword, int bufferSize ) throws IOException{
SlidingContainsBuffer sb = new SlidingContainsBuffer( keyword );
char[] buffer = new char[ bufferSize ];
int read;
while( ( read = ir.read( buffer ) ) != -1 ){
if( sb.checkIfContains( buffer, read ) ){
return true;
}
}
return false;
}
SlidingContainsBuffer class:
class SlidingContainsBuffer{
private final char[] keyword;
private int keywordIndexToCheck = 0;
private boolean keywordFound = false;
SlidingContainsBuffer( String keyword ){
this.keyword = keyword.toCharArray();
}
boolean checkIfContains( char[] buffer, int read ){
for( int i = 0; i < read; i++ ){
if( keywordFound == false ){
if( keyword[ keywordIndexToCheck ] == buffer[ i ] ){
keywordIndexToCheck++;
if( keywordIndexToCheck == keyword.length ){
keywordFound = true;
}
} else {
keywordIndexToCheck = 0;
}
} else {
break;
}
}
return keywordFound;
}
}
This answer fully qualifies the task:
The implementation is able to find the searched keyword even if it was split between buffers
Minimum memory usage defined by the buffer size
Number of reads will be minimized by using bigger buffer

Categories

Resources