String concatenation without allocation in java - java

Is there a way to concatenate two Strings (not final) without allocating memory?
For example, I have these two Strings:
final String SCORE_TEXT = "SCORE: ";
String score = "1000"; //or int score = 1000;
When I concatenate these two strings, a new String object is created.
font.drawMultiLine(batch, SCORE_TEXT + score, 50f, 670f);//this creates new string each time
Since this is done in the main game loop (executed ~60 times in one second), there are a lot of allocations.
Can I somehow do this without allocation?

The obvious solution is to not recreate the output String on every frame, but only when it changes.
One way to do this is to store it somewhere outside your main loop and update it when a certain event happens, i.e. the "score" actually changes. In your main loop you then just use that pre-created String.
If you can't/or don't want to have this event based approach, you can always store the "previous" score and only concatenate a new String when the previous score is different from the current score.
Depending on how often your score actually changes, this should cut out most reallocations. Unless of course the score changes at 60 fps, in which case this whole point is completely mute because nobody would be able to read the text you're printing.

Seems that drawMultiLine accepts not a String, but CharSequence. Thus you may probably implement your own CharSequence which does not actually concatenates two strings. Here's the draft implementation:
public class ConcatenatedString implements CharSequence {
final String left, right;
final int leftLength;
public ConcatenatedString(String left, String right) {
this.left = left;
this.right = right;
this.leftLength = left.length();
}
#Override
public int length() {
return leftLength+right.length();
}
#Override
public char charAt(int index) {
return index < leftLength ? left.charAt(index) : right.charAt(index-leftLength);
}
#Override
public CharSequence subSequence(int start, int end) {
if(end <= leftLength)
return left.substring(start, end);
if(start >= leftLength)
return right.substring(start-leftLength, end-leftLength);
return toString().substring(start, end);
}
#Override
public String toString() {
return left.concat(right);
}
}
Use it like this:
font.drawMultiLine(batch, new ConcatenatedString(SCORE_TEXT, score), 50f, 670f);
Internally in your case drawMultiline just needs the length and charAt methods. Using ConcatenatedString you create only one new object. In contrast when you use SCORE_TEXT + score, you create a temporary StringBuilder which creates internally char[] array, copies the input symbols, resizes the array if necessary, then creates the final String object which creates the new char[] array and copies the symbols again. Thus it's likely that ConcatenatedString will be faster.

Didn't understand the question the first time around. Have you tried using the following?
SCORE_TEXT.concat(score);

I dont think you can populate a value without allocation a memory for it.. what best you can do is create a global string variable and provide the value of SCORE_TEXT + score to it. Use that global string variable in font.drawMultiLine() method.
This way you can minimize the amount of memeory allocated as memory is allocated only once and the same location is updated again & again.

String is designed to be immutable in Java. use StringBuilder

Related

Java - Why void when I'm storing value in variable

So, the question is. If I'm calling method guess from class - Player and it is a void-type method without return statement in it, how come I'm able to store result of number = (int)(Math.random() * 10) in number variable for 3 different objects (p1, p2, p3)?
I'm little confused about when should I use return statement or void-type methods, because if number = (int)(Math.random() * 10) is giving some results which I want to use, why then I don't need to return this results from a method to pass them to the number variable which I declared in int number = 0;
public class Player {
int number = 0;
public void guess() {
number = (int)(Math.random() * 10);
System.out.println("I'm guessing " + number);
}
}
A void method does not return anything, but it still allows you to do things. (Print to the console, modify variables etc) The void keyword just means that it doesn't return a value. (In void methods you can still use a blank return; to end the method) And because you are modifying your number variable in the GuessGame object the changes you make will stay even though you don't return a variable. Try this simple test to see what I mean:
//In your GuessGame class
int number = 0;
public void foo() {
number++;
}
public static void main(String[] args) {
GuessGames games = new GuessGames();
games.foo();
System.out.println(games.number);
//Outputs 1
}
docs for the return statement
The point is: where is the result of Math.random() * 10 physically stored on your computer when your program is run? You list two options.
Options 1: Instance field
In this case the compiler instructs your operating system to reserve space for a int variable for the whole life of the Player object. The player object may live for microseconds, seconds, minutes, hours, days, months, ... it depends! This storage space is usually find in the RAM of the computer and from Java you can access it with the syntax myPlayer.number as long as you have a Player reference somewhere.
Options 2: Return value
In this case the compiler finds the space to store the result of the computation in a register of the Java virtual machine, that you can mentally map to a register of the physical processor. This value will only at best survive for a couple of processor cycles (there are gazillinos in a GHz CPU, so it's really a tiny little fracion of a second) if you don't store it somewhere else - and if you don't it's lost forever. See the following example:
private int someRandom;
private int gimmeARandom() {
return Math.random() * 10;
}
private int test() {
int someRandom = gimmeARandom(); // --> store the value until end of method
this.someRandom = someRandom; // --> further keep it so we can read it later
gimmeARandom(); // --> what did it returned? We'll never know
}
Void is different than static - void just means the function does not return anything, but it can still be a instance method, i.e. one that is associated with each new instance of a class. I think you're confusing this with the functionality of static, which allows methods to be called without an instance of the class.

Java Array : Unable to store new values into an array

I am getting battery values from a drone. I am able to display the new battery value on JLabel. However, when I am trying to store these battery values into an int array, it is only store the very first battery value on the array. The subsequent array values will only fill up with the first battery value.
I show an output so you will understand what is happening. The first value is getting from drone while the second value indicate the array index. The output clearly show that the array cannot accept new data for unknown reason.
P/S: I have no idea what is best size of array since I am getting values from drone every seconds. So I have declared an int array with size of 9999999. Any idea how can I set an array to its max size to cater the needs of getting continuous battery values from drone? Those values are being used for drawing graph later.
My code:
public class arDroneFrame extends javax.swing.JFrame implements Runnable, DroneStatusChangeListener, NavDataListener {
private String text; // string for speech
private static final long CONNECT_TIMEOUT = 10000;
public ARDrone drone;
public NavData data;
public Timer timer = new Timer();
public int batteryGraphic=0;
public int [] arrayBatt = new int[9999999];
public arDroneFrame(String text) {
this.text=text;
}
public arDroneFrame() {
initComponents();
initDrone();
}
private void initDrone() {
try {
drone = new ARDrone();
data = new NavData();
} catch (UnknownHostException ex) {
return;
}
videoDrone.setDrone(drone);
drone.addNavDataListener(this);
}
public void navDataReceived(NavData nd) {
getNavData(nd);
int battery = nd.getBattery();
cmdListOK.jlblBatteryLevelValue.setText(battery + " %");
//JLabel can get updated & always display new battery values
}
public void getNavData(NavData nd){
for(int i=0;i<arrayBatt.length;i++){
batteryGraphic= nd.getBattery();
arrayBatt[i] = batteryGraphic;
System.err.println("This is stored battery values : " + arrayBatt[i] + " " + i + "\n");
}
}
}
public static void main(String args[]) {
java.awt.EventQueue.invokeLater(new Runnable() {
public void run() {
String text = "Welcome!";
arDroneFrame freeTTS = new arDroneFrame(text);
freeTTS.speak();
new arDroneFrame().setVisible(true);
}
});
}
Result:
This is stored battery values : 39 0
This is stored battery values : 39 1
This is stored battery values : 39 2
This is stored battery values : 39 3
This is stored battery values : 39 4
This is stored battery values : 39 5
The problem lies in this method:
public void getNavData(NavData nd){
for (int batteryValue : arrayBatt){
arrayBatt[i] = nd.getBattery();
System.err.println("This is stored battery values : " + arrayBatt[i] + " " + i + "\n");
}
}
You call this method by passing it a NavData instance. This means that whatever value nd contains for nd.getBattery() is being assigned to every index in your array as the loop interates over your battery array.
What you should do, is move the loop outside of the getNavData(NavData nd) method, and pass it a new instance of NavData for each call. When you couple this with the ArrayList suggestion below, you should have a dynamic array of distinct battery values
Side solution
The way that you have declared this array is REALLY SCARY.
You should only use the space you need and NOTHING more.
I know that you are unsure of what size is actually required, but don't go over-board on it.
You should initialize your array with something smaller;
public int [] arrayBatt = new int[10000];
As a side note: having your class members as public is generally not recommended. You should make them private and create getter/setter methods to retrieve and modify the data, respectively.
Then, have a method that checks to see if your array is full. If it is full, then increase the array size by n/2, where n is the initial size of your array.
The down-side to this approach is that as your array becomes larger, you are going to spend a lot of time copying the old array to the new array, which is pretty undesirable.
A better solution
Would be to use the built-in ArrayList library, then just append items to your list and let Java do the heavy lifting.
ArrayList<Integer> batteryArray = new ArrayList <Integer>();
you can add items to your list by simply calling:
batteryArray.add(item);
The upside to this solution is that:
The batteryArray size is handled behind-the-scenes
The size of the array is easily retrievable, as well as the elements
ArrayList is a very fast storage structure.
In your loop to print out battery values, you could make it a lot cleaner by implementing a for-each loop.
Why are you using System.err to print out dialogs for the battery?? This isn't what System.err is meant to be used for and violates the Principle of Least Astonishment
public void getNavData(NavData nd){
for (int batteryValue : arrayBatt){
arrayBatt[i] = nd.getBattery();
System.err.println("This is stored battery values : " + arrayBatt[i] + " " + i + "\n");
}
}
I assume that there is some event that is triggered by the drone's hardware.
Your loop runs too fast, probably thousands of times per second so there was not time for any battery change and nd.getBattery() returns the same value.
It seems that this is the reason why the values are repeated.
On the other hand, I suspect that navDataReceived is called only when the hardware detects a change and this is why it displays the new value. When getNavData is called you are running a tight loop that locks the execution and prevents your application from receiving this event while the loop is executing.
You should only store a value when you are notified of some change.
I see your implementation of getNavData as fundamentally wrong.
Your 10 million int array is useless in this situation.
I don't know how your application interacts with the drone's hardware but the interface names DroneStatusChangeListener and NavDataListener suggest that you receive some notification when a change occurs.

What is better/faster in Java: 2 method calls or 1 object call

I'm afraid this is a terribly stupid question. However, I can't find an answer to it and therefore require some help :)
Let's start with a simplification of my real problem:
Assume I have a couple of boxes each filled with a mix of different gems.
I'm now creating an object gem which has the attribute colour and a method getColour to get the colour of the gem.
Further I'm creating an object box which has a list of gems as attribute and a method getGem to get a gem from that list.
What I want to do now is to count all gems in all boxes by colour. Now I could either do something like
int sapphire = 0;
int ruby = 0;
int emerald = 0;
for(each box = i)
for(each gem = j)
if(i.getGem(j).getColour().equals(“blue”)) sapphire++;
else if(i.getGem(j).getColour().equals(“red”)) ruby++;
else if(i.getGem(j).getColour().equals(“green”)) emerald++;
or I could do
int sapphire = 0;
int ruby = 0;
int emerald = 0;
String colour;
for(each box = i)
for(each gem = j)
colour = i.getGem(j).getColour();
if(colour.equals(“blue”)) sapphire++;
else if(colour.equals(“red”)) ruby++;
else if(colour.equals(“green”)) emerald++;
My question is now if both is essentially the same or should one be preferred over the other? I understand that a lot of unnecessary new string objects are produced in the second case, but do I get a speed advantage in return as colour is more “directly” available?
I would dare to make a third improvement:
int sapphire = 0;
int ruby = 0;
int emerald = 0;
for(each box = i) {
for(each gem = j) {
String colour = i.getGem(j).getColour();
if(“blue”.equals(colour)) sapphire++;
else if(“red”.equals(colour)) ruby++;
else if(“green”.equals(colour)) emerald++;
}
}
I use a local variable inside the for-loop. Why? Because you probably need it only there.
It is generally better to put STATIC_STRING.equals(POSSIBLE_NULL_VALUE).
This has the advantage: easier to read and should have no performance problem. If you have a performance problem, then you should consider looking somewhere else in your code. Related to this: this answer.
conceptually both codes have equal complexity i.e.: O(i*j). But if calling a method and get a returned value are considered to be two processes then the complexity of your first code will be 4*O(i*j).(consider O(i*j) as a function) and of your second code will be O(i*(j+2)). although this complexity difference is not considerable enough but if you are comparing then yes your first code is more complex and not a good programming style.
The cost of your string comparisons is going to wipe out all other considerations in this sort of approach.
You would be better off using something else (for example an enum). That would also expand automatically.
(Although your for each loop isn't proper Java syntax anyway so that's a bit odd).
enum GemColour {
blue,
red,
green
}
Then in your count function:
Map<GemColour, Integer> counts = new EnumMap<GemColour, Integer>(GemColour.class);
for (Box b: box) {
for (Gem g: box.getGems() {
Integer count = counts.get(g.getColour());
if (count == null) {
count=1;
} else {
count+=1;
}
counts.put(g.getColour(), count);
}
}
Now it will automatically extend to any new colors you add without you needing to make any code changes. It will also be much faster as it does a single integer comparison rather than a string comparison and uses that to put the correct value into the correct place in the map (which behind the scenes is just an array).
To get the counts just do, for example:
counts.get(GemColour.blue);
As has been pointed out in the comments the java Stream API would allow you to do all of this in one line:
boxes.stream().map(Box::getGems).flatMap(Collection::stream).collect(groupingBy‌​‌​(Gem::getColour, counting()))
It's less easy to understand what it is doing that way though.

Java: store to memory + numbers

Im having a function:
private void fixTurn(int turn)
And then I have:
memory1 = memory1 + count;
Now, I would like to make, if turn is 2 it should:
memory2 = memory2 + count;
I tried this:
memory + turn = memory+turn + count;
But will it will not work, should i just go with an if statement?
No, you should use a collection of some form instead of having several separate variables. For example, you could use an array:
memory[turn] += count;
Numerical indexes in variable names are generally something to be avoided.
Wanting to access such variables via the index is usually the sign of a novice programmer who hasn't gotten the point of arrays - because an array is exactly that, a bunch of variables that can be accessed via an index:
memory[turn] = memory[turn] + count;
or, shorter (using a compound assignment operator):
memory[turn] += count;
u have to write it as
memory += turn * count
you should rephrase your quesiton but I think you want to do something like this
private void fixTurn(int turn){
if(turn == 1){//note can be replaced by a switch
memory1 +=count;
}else if(turn ==2){
memory2 +=count;
}
Edit: the solution proposed by John Skeet is better in terms of readability and adaptability and I would recommend it more
My polished crystal ball tells me, that you that you have some sort of game, that is organized in "turns" and you want to change something for a given turn ("fixTurn").
You may want to store the turns in a list. That's preferrable over an array, because a list can grow (or shrink) and allows adding more and more "turns".
Assuming, you have some class that models a turn and it's named Turn, declare the list like:
List<Turn> turns = new ArrayList<Turn>();
Then you can add turns to it:
turns.add(new Turn());
And now, if you have to change some parameter for a turn, do it like this:
private void fixTurn(int number) {
Turn memory = turns.get(number);
memory.setCount(memory.getCount()+count);
}
I am not very clear about your question but I think this is what you are looking for:
memory += turn * count
This syntax is not allowed in java
memory + turn = memory+turn + count;

Efficient way to search a stream for a string

Let's suppose that have a stream of text (or Reader in Java) that I'd like to check for a particular string. The stream of text might be very large so as soon as the search string is found I'd like to return true and also try to avoid storing the entire input in memory.
Naively, I might try to do something like this (in Java):
public boolean streamContainsString(Reader reader, String searchString) throws IOException {
char[] buffer = new char[1024];
int numCharsRead;
while((numCharsRead = reader.read(buffer)) > 0) {
if ((new String(buffer, 0, numCharsRead)).indexOf(searchString) >= 0)
return true;
}
return false;
}
Of course this fails to detect the given search string if it occurs on the boundary of the 1k buffer:
Search text: "stackoverflow"
Stream buffer 1: "abc.........stack"
Stream buffer 2: "overflow.......xyz"
How can I modify this code so that it correctly finds the given search string across the boundary of the buffer but without loading the entire stream into memory?
Edit: Note when searching a stream for a string, we're trying to minimise the number of reads from the stream (to avoid latency in a network/disk) and to keep memory usage constant regardless of the amount of data in the stream. Actual efficiency of the string matching algorithm is secondary but obviously, it would be nice to find a solution that used one of the more efficient of those algorithms.
There are three good solutions here:
If you want something that is easy and reasonably fast, go with no buffer, and instead implement a simple nondeterminstic finite-state machine. Your state will be a list of indices into the string you are searching, and your logic looks something like this (pseudocode):
String needle;
n = needle.length();
for every input character c do
add index 0 to the list
for every index i in the list do
if c == needle[i] then
if i + 1 == n then
return true
else
replace i in the list with i + 1
end
else
remove i from the list
end
end
end
This will find the string if it exists and you will never need a
buffer.
Slightly more work but also faster: do an NFA-to-DFA conversion that figures out in advance what lists of indices are possible, and assign each one to a small integer. (If you read about string search on Wikipedia, this is called the powerset construction.) Then you have a single state and you make a state-to-state transition on each incoming character. The NFA you want is just the DFA for the string preceded with a state that nondeterministically either drops a character or tries to consume the current character. You'll want an explicit error state as well.
If you want something faster, create a buffer whose size is at least twice n, and user Boyer-Moore to compile a state machine from needle. You'll have a lot of extra hassle because Boyer-Moore is not trivial to implement (although you'll find code online) and because you'll have to arrange to slide the string through the buffer. You'll have to build or find a circular buffer that can 'slide' without copying; otherwise you're likely to give back any performance gains you might get from Boyer-Moore.
I did a few changes to the Knuth Morris Pratt algorithm for partial searches. Since the actual comparison position is always less or equal than the next one there is no need for extra memory. The code with a Makefile is also available on github and it is written in Haxe to target multiple programming languages at once, including Java.
I also wrote a related article: searching for substrings in streams: a slight modification of the Knuth-Morris-Pratt algorithm in Haxe. The article mentions the Jakarta RegExp, now retired and resting in the Apache Attic. The Jakarta Regexp library “match” method in the RE class uses a CharacterIterator as a parameter.
class StreamOrientedKnuthMorrisPratt {
var m: Int;
var i: Int;
var ss:
var table: Array<Int>;
public function new(ss: String) {
this.ss = ss;
this.buildTable(this.ss);
}
public function begin() : Void {
this.m = 0;
this.i = 0;
}
public function partialSearch(s: String) : Int {
var offset = this.m + this.i;
while(this.m + this.i - offset < s.length) {
if(this.ss.substr(this.i, 1) == s.substr(this.m + this.i - offset,1)) {
if(this.i == this.ss.length - 1) {
return this.m;
}
this.i += 1;
} else {
this.m += this.i - this.table[this.i];
if(this.table[this.i] > -1)
this.i = this.table[this.i];
else
this.i = 0;
}
}
return -1;
}
private function buildTable(ss: String) : Void {
var pos = 2;
var cnd = 0;
this.table = new Array<Int>();
if(ss.length > 2)
this.table.insert(ss.length, 0);
else
this.table.insert(2, 0);
this.table[0] = -1;
this.table[1] = 0;
while(pos < ss.length) {
if(ss.substr(pos-1,1) == ss.substr(cnd, 1))
{
cnd += 1;
this.table[pos] = cnd;
pos += 1;
} else if(cnd > 0) {
cnd = this.table[cnd];
} else {
this.table[pos] = 0;
pos += 1;
}
}
}
public static function main() {
var KMP = new StreamOrientedKnuthMorrisPratt("aa");
KMP.begin();
trace(KMP.partialSearch("ccaabb"));
KMP.begin();
trace(KMP.partialSearch("ccarbb"));
trace(KMP.partialSearch("fgaabb"));
}
}
The Knuth-Morris-Pratt search algorithm never backs up; this is just the property you want for your stream search. I've used it before for this problem, though there may be easier ways using available Java libraries. (When this came up for me I was working in C in the 90s.)
KMP in essence is a fast way to build a string-matching DFA, like Norman Ramsey's suggestion #2.
This answer applied to the initial version of the question where the key was to read the stream only as far as necessary to match on a String, if that String was present. This solution would not meet the requirement to guarantee fixed memory utilisation, but may be worth considering if you have found this question and are not bound by that constraint.
If you are bound by the constant memory usage constraint, Java stores arrays of any type on the heap, and as such nulling the reference does not deallocate memory in any way; I think any solution involving arrays in a loop will consume memory on the heap and require GC.
For simple implementation, maybe Java 5's Scanner which can accept an InputStream and use a java.util.regex.Pattern to search the input for might save you worrying about the implementation details.
Here's an example of a potential implementation:
public boolean streamContainsString(Reader reader, String searchString)
throws IOException {
Scanner streamScanner = new Scanner(reader);
if (streamScanner.findWithinHorizon(searchString, 0) != null) {
return true;
} else {
return false;
}
}
I'm thinking regex because it sounds like a job for a Finite State Automaton, something that starts in an initial state, changing state character by character until it either rejects the string (no match) or gets to an accept state.
I think this is probably the most efficient matching logic you could use, and how you organize the reading of the information can be divorced from the matching logic for performance tuning.
It's also how regexes work.
Instead of having your buffer be an array, use an abstraction that implements a circular buffer. Your index calculation will be buf[(next+i) % sizeof(buf)], and you'll have to be careful to full the buffer one-half at a time. But as long as the search string fits in half the buffer, you'll find it.
I believe the best solution to this problem is to try to keep it simple. Remember, beacause I'm reading from a stream, I want to keep the number of reads from the stream to a minimum (as network or disk latency may be an issue) while keeping the amount of memory used constant (as the stream may be very large in size). Actual efficiency of the string matching is not the number one goal (as that has been studied to death already).
Based on AlbertoPL's suggestion, here's a simple solution that compares the buffer against the search string character by character. The key is that because the search is only done one character at a time, no back tracking is needed and therefore no circular buffers, or buffers of a particular size are needed.
Now, if someone can come up with a similar implementation based on Knuth-Morris-Pratt search algorithm then we'd have a nice efficient solution ;)
public boolean streamContainsString(Reader reader, String searchString) throws IOException {
char[] buffer = new char[1024];
int numCharsRead;
int count = 0;
while((numCharsRead = reader.read(buffer)) > 0) {
for (int c = 0; c < numCharsRead; c++) {
if (buffer[c] == searchString.charAt(count))
count++;
else
count = 0;
if (count == searchString.length()) return true;
}
}
return false;
}
If you're not tied to using a Reader, then you can use Java's NIO API to efficiently load the file. For example (untested, but should be close to working):
public boolean streamContainsString(File input, String searchString) throws IOException {
Pattern pattern = Pattern.compile(Pattern.quote(searchString));
FileInputStream fis = new FileInputStream(input);
FileChannel fc = fis.getChannel();
int sz = (int) fc.size();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, sz);
CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
CharBuffer cb = decoder.decode(bb);
Matcher matcher = pattern.matcher(cb);
return matcher.matches();
}
This basically mmap()'s the file to search and relies on the operating system to do the right thing regarding cache and memory usage. Note however that map() is more expensive the just reading the file in to a large buffer for files less than around 10 KiB.
A very fast searching of a stream is implemented in the RingBuffer class from the Ujorm framework. See the sample:
Reader reader = RingBuffer.createReader("xxx ${abc} ${def} zzz");
String word1 = RingBuffer.findWord(reader, "${", "}");
assertEquals("abc", word1);
String word2 = RingBuffer.findWord(reader, "${", "}");
assertEquals("def", word2);
String word3 = RingBuffer.findWord(reader, "${", "}");
assertEquals("", word3);
The single class implementation is available on the SourceForge:
For more information see the link.
Implement a sliding window. Have your buffer around, move all elements in the buffer one forward and enter a single new character in the buffer at the end. If the buffer is equal to your searched word, it is contained.
Of course, if you want to make this more efficient, you can look at a way to prevent moving all elements in the buffer around, for example by having a cyclic buffer and a representation of the strings which 'cycles' the same way the buffer does, so you only need to check for content-equality. This saves moving all elements in the buffer.
I think you need to buffer a small amount at the boundary between buffers.
For example if your buffer size is 1024 and the length of the SearchString is 10, then as well as searching each 1024-byte buffer you also need to search each 18-byte transition between two buffers (9 bytes from the end of the previous buffer concatenated with 9 bytes from the start of the next buffer).
I'd say switch to a character by character solution, in which case you'd scan for the first character in your target text, then when you find that character increment a counter and look for the next character. Every time you don't find the next consecutive character restart the counter. It would work like this:
public boolean streamContainsString(Reader reader, String searchString) throws IOException {
char[] buffer = new char[1024];
int numCharsRead;
int count = 0;
while((numCharsRead = reader.read(buffer)) > 0) {
if (buffer[numCharsRead -1] == searchString.charAt(count))
count++;
else
count = 0;
if (count == searchString.size())
return true;
}
return false;
}
The only problem is when you're in the middle of looking through characters... in which case there needs to be a way of remembering your count variable. I don't see an easy way of doing so except as a private variable for the whole class. In which case you would not instantiate count inside this method.
You might be able to implement a very fast solution using Fast Fourier Transforms, which, if implemented properly, allow you to do string matching in times O(nlog(m)), where n is the length of the longer string to be matched, and m is the length of the shorter string. You could, for example, perform FFT as soon as you receive an stream input of length m, and if it matches, you can return, and if it doesn't match, you can throw away the first character in the stream input, wait for a new character to appear through the stream, and then perform FFT again.
You can increase the speed of search for very large strings by using some string search algorithm
If you're looking for a constant substring rather than a regex, I'd recommend Boyer-Moore. There's plenty of source code on the internet.
Also, use a circular buffer, to avoid think too hard about buffer boundaries.
Mike.
I also had a similar problem: skip bytes from the InputStream until specified string (or byte array). This is the simple code based on circular buffer. It is not very efficient but works for my needs:
private static boolean matches(int[] buffer, int offset, byte[] search) {
final int len = buffer.length;
for (int i = 0; i < len; ++i) {
if (search[i] != buffer[(offset + i) % len]) {
return false;
}
}
return true;
}
public static void skipBytes(InputStream stream, byte[] search) throws IOException {
final int[] buffer = new int[search.length];
for (int i = 0; i < search.length; ++i) {
buffer[i] = stream.read();
}
int offset = 0;
while (true) {
if (matches(buffer, offset, search)) {
break;
}
buffer[offset] = stream.read();
offset = (offset + 1) % buffer.length;
}
}
Here is my implementation:
static boolean containsKeywordInStream( Reader ir, String keyword, int bufferSize ) throws IOException{
SlidingContainsBuffer sb = new SlidingContainsBuffer( keyword );
char[] buffer = new char[ bufferSize ];
int read;
while( ( read = ir.read( buffer ) ) != -1 ){
if( sb.checkIfContains( buffer, read ) ){
return true;
}
}
return false;
}
SlidingContainsBuffer class:
class SlidingContainsBuffer{
private final char[] keyword;
private int keywordIndexToCheck = 0;
private boolean keywordFound = false;
SlidingContainsBuffer( String keyword ){
this.keyword = keyword.toCharArray();
}
boolean checkIfContains( char[] buffer, int read ){
for( int i = 0; i < read; i++ ){
if( keywordFound == false ){
if( keyword[ keywordIndexToCheck ] == buffer[ i ] ){
keywordIndexToCheck++;
if( keywordIndexToCheck == keyword.length ){
keywordFound = true;
}
} else {
keywordIndexToCheck = 0;
}
} else {
break;
}
}
return keywordFound;
}
}
This answer fully qualifies the task:
The implementation is able to find the searched keyword even if it was split between buffers
Minimum memory usage defined by the buffer size
Number of reads will be minimized by using bigger buffer

Categories

Resources