Standard API way to check if one array is contained in another - java

I have two byte[] arrays in a method like this:
private static boolean containsBytes(byte[] body, byte[] checker){
//Code you do not want to ever see here.
}
I want to, using the standard API as much as possible, determine if the series contained in the checker array exists anywhere in the body array.
Right now I'm looking at some nasty code that did a hand-crafted algorithm. The performance of the algorithm is OK, which is about all you can say for it. I'm wondering if there is a more standard api way to accomplish it. Otherwise, I know how to write a readable hand-crafted one.
To get a sense of scale here, the checker array would not be larger than 48 (probably less) and the body might be a few kb large at most.

Not in the standard library (like Jon Skeet said, probably nothing there that does this) but Guava could help you here with its method Bytes.indexOf(byte[] array, byte[] target).
boolean contained = Bytes.indexOf(body, checker) != -1;
Plus, the same method exists in the classes for the other primitive types as well.

I don't know of anything in the standard API to help you here. There may be something in a third party library, although it would potentially need to be implemented repeatedly, once for each primitive type :(
EDIT: I was going to look for Boyer-Moore, but this answer was added on my phone, and I ran out of time :)
Depending on the data and your requirements, you may find that a brute force approach is absolutely fine - and a lot simpler to implement than any of the fancier algorithms available. The simple brute force approach is generally my first port of call - it often turns out to be perfectly adequate :)

You probably already know this, but what you're trying to (re-)implement is basically a string search:
http://en.wikipedia.org/wiki/String_searching_algorithm
The old code might in fact be an implementation of one of the string search algorithms; for better performance, it might be good to implement one of the other algorithms. You didn't mention how often this method is going to be called, which would help to decide whether it's worth doing that.

The collections framework can both cheaply wrap an array in the List interface and search for a sublist. I think this would work reasonably well:
import java.util.Arrays;
import java.util.Collections;
boolean found = Collections.indexOfSubList(Arrays.asList(body), Arrays.asList(checker) >= 0;

Related

Is there anything inherently wrong with long variable/method names in Java? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
When is a Java method name too long?
I know this is probably is a question of personal opinion, but I want to know what's standard practice and what would be frowned upon.
One of my profs in university always seems to make his variable and method names as short as possible (getAmt() instead of getAmount) for instance.
I have no objection to this, but personally, I prefer to have mine a little longer if it adds descriptiveness so the person reading it won't have to check or refer to documentation.
For instance, we made a method that given a list of players, returns the player who scored the most goals. I made the method getPlayerWithMostGoals(), is this wrong? I toiled over choosing a way to make it shorter for awhile, but then I thought "why?". It gets the point across clearly and Eclipse makes it easy to autocomplete it when I type.
I'm just wondering if the short variable names are a piece of the past due to needing everything to be as small as possible to be efficient. Is this still a requirement?
Nothing inherently wrong, it's better to make it descriptive than cryptic. However, it's often code-smell for a method that is trying to do too much or could be refactored
Bad: getActInfPstWeek
OK: getAccountInformationForPastWeek()
Better getAccountInformation(DateRange range)
I prefer to have long variable/method names that describe what's going on. In your case, I think getPlayerWithMostGoals() is appropriate. It bothers me when I see a short variable name like "amt" and I have to transpose that in my head (into "amount").
Something like getAmt() is looks like C++ code style... In java usually are used more descriptive names.
Your professor made a good understandable method. But it's very popular word. It's not a general case. Use your "longWordStyle" style it's more java.
As per standards, longer descriptive names are advised to make it more readable and maintainable on longer term. If you use very short naming e.g. a variable as a, you will forget yourself, what that variable is meant for after sometime. This becomes more problematic in bigger programs. Though I don't see an issue in using getAmt() in place of getAmount(), but definitely getPlayerWithMostGoals() is preferable over something like getPlayer().
Long names, short names, it all depends. There are a lot of approaches and discussions but in fact a method's name should reflect its intention. This helps you to further understand the code. Take this example.
public void print(String s)
Nifty name, short, concise... isn't it? Well, actually no if there's no documentation to tell you what do you mean by "Printing". I say System.our.println is a way of printing a string but you can define printing as saving the string in a file or showing it in a dialog.
public void printInConsole(String s)
Now there are no misunderstandings. Most people can tell you that you can read the method's JavaDoc to understand it but... are you going to read a full paragraph to decide if the method you're going to use does what you need?.
IMO, methods should describe at least an action and an entity (if they're related to one). "Long" is also a perception... but really long names make the code hard to structure. It's a matter of getting the proper balance.
As a rule of thumb, I'd void abreviations and use JavaDoc to further describe a method's intention. Descriptive names can be long but the reward is both readability and a self-explainatory code.

API Design for Idiot-Proof Iteration Without Generics

When you're designing the API for a code library, you want it to be easy to use well, and hard to use badly. Ideally you want it to be idiot proof.
You might also want to make it compatible with older systems that can't handle generics, like .Net 1.1 and Java 1.4. But you don't want it to be a pain to use from newer code.
I'm wondering about the best way to make things easily iterable in a type-safe way... Remembering that you can't use generics so Java's Iterable<T> is out, as is .Net's IEnumerable<T>.
You want people to be able to use the enhanced for loop in Java (for Item i : items), and the foreach / For Each loop in .Net, and you don't want them to have to do any casting. Basically you want your API to be now-friendly as well as backwards compatible.
The best type-safe option that I can think of is arrays. They're fully backwards compatible and they're easy to iterate in a typesafe way. But arrays aren't ideal because you can't make them immutable. So, when you have an immutable object containing an array that you want people to be able to iterate over, to maintain immutability you have to provide a defensive copy each and every time they access it.
In Java, doing (MyObject[]) myInternalArray.clone(); is super-fast. I'm sure that the equivalent in .Net is super-fast too. If you have like:
class Schedule {
private Appointment[] internalArray;
public Appointment[] appointments() {
return (Appointment[]) internalArray.clone();
}
}
people can do like:
for (Appointment a : schedule.appointments()) {
a.doSomething();
}
and it will be simple, clear, type-safe, and fast.
But they could do something like:
for (int i = 0; i < schedule.appointments().length; i++) {
Appointment a = schedule.appointments()[i];
}
And then it would be horribly inefficient because the entire array of appointments would get cloned twice for every iteration (once for the length test, and once to get the object at the index). Not such a problem if the array is small, but pretty horrible if the array has thousands of items in it. Yuk.
Would anyone actually do that? I'm not sure... I guess that's largely my question here.
You could call the method toAppointmentArray() instead of appointments(), and that would probably make it less likely that anyone would use it the wrong way. But it would also make it harder for people to find when they just want to iterate over the appointments.
You would, of course, document appointments() clearly, to say that it returns a defensive copy. But a lot of people won't read that particular bit of documentation.
Although I'd welcome suggestions, it seems to me that there's no perfect way to make it simple, clear, type-safe, and idiot proof. Have I failed if a minority of people are unwitting cloning arrays thousands of times, or is that an acceptable price to pay for simple, type-safe iteration for the majority?
NB I happen to be designing this library for both Java and .Net, which is why I've tried to make this question applicable to both. And I tagged it language-agnostic because it's an issue that could arise for other languages too. The code samples are in Java, but C# would be similar (albeit with the option of making the Appointments accessor a property).
UPDATE: I did a few quick performance tests to see how much difference this made in Java. I tested:
cloning the array once, and iterating over it using the enhanced for loop
iterating over an ArrayList using
the enhanced for loop
iterating over an unmodifyable
ArrayList (from
Collections.unmodifyableList) using
the enhanced for loop
iterating over the array the bad way (cloning it repeatedly in the length check
and when getting each indexed item).
For 10 objects, the relative speeds (doing multiple repeats and taking the median) were like:
1,000
1,300
1,300
5,000
For 100 objects:
1,300
4,900
6,300
85,500
For 1000 objects:
6,400
51,700
56,200
7,000,300
For 10000 objects:
68,000
445,000
651,000
655,180,000
Rough figures for sure, but enough to convince me of two things:
Cloning, then iterating is definitely
not a performance issue. In fact
it's consistently faster than using a
List. (this is why Java's
enum.values() method returns a
defensive copy of an array instead of
an immutable list.)
If you repeatedly call the method,
repeatedly cloning the array unnecessarily,
performance becomes more and more of an issue the larger the arrays in question. It's pretty horrible. No surprises there.
clone() is fast but not what I would describe as super faster.
If you don't trust people to write loops efficiently, I would not let them write a loop (which also avoids the need for a clone())
interface AppointmentHandler {
public void onAppointment(Appointment appointment);
}
class Schedule {
public void forEachAppointment(AppointmentHandler ah) {
for(Appointment a: internalArray)
ah.onAppointment(a);
}
}
Since you can't really have it both ways, I would suggest that you create a pre generics and a generics version of your API. Ideally, the underlying implementation can be mostly the same, but the fact is, if you want it to be easy to use for anyone using Java 1.5 or later, they will expect the usage of Generics and Iterable and all the newer languange features.
I think the usage of arrays should be non-existent. It does not make for an easy to use API in either case.
NOTE: I have never used C#, but I would expect the same holds true.
As far as failing a minority of the users, those that would call the same method to get the same object on each iteration of the loop would be asking for inefficiency regardless of API design. I think as long as that's well documented, it's not too much to ask that the users obey some semblance of common sense.

What is "string bashing" and why is it bad?

My boss keeps using the term "string bashing" (we're a Java shop) and usually makes an example out of me whenever I ask him anything (as if, I'm supposed to know it already). I Googled the term only to find results pertaining to theoretical physics and string theory.
I am guessing it has something to do with using String/StringBuilders incorrectly or not in keeping with best practices, but for the life of me, I can't figure out what it is.
"String bashing" is a slang term for cutting up strings and manipulating them: splitting, joining, inserting, tokenizing, parsing, etc..
It's not inherently bad (despite the connotation of "bashing"), but as you point out, in Java, one needs to be careful not to use String when StringBuilder would be more efficient.
Why don't you ask your boss for an example of string bashing.
Don't forget to ask him for the correct way of refactoring the examples he gives you.
Out of context, "string bashing" doesn't really have any meaning in itself. It's not a buzz word for any good or bad behaviour. It would just mean "bashing strings", as in using string operations.
Whether that is good or bad depends on what you are doing, and the role of the strings would not really be important. There are good and bad ways of handling any kind of data.
Sometimes "bashing strings" is actually the best solution. Consider for example that you want to pick out the first three characters of a string. You could create a regular expression that isolates the characters, but that would certainly be overkill as there is a simple string operation that can do the same, which is a lot faster and easier to maintain.
Effective Java has an item about using strings: "Item 50: Avoid strings where other types are more appropriate". Also on stackoverflow: "Stringly typed".
A guess: It might imply something related to creation of unnecessary temporary objects, and in this particular case Strings. For example, if you're constructing a String token by token then it's usually a good idea to use a StringBuilder. If the String is not built using a builder, each concatenation will cause another temporary object to be created (and later garbage collected).
In modern VMs (I'm thinking HotSpot 1.5 or 1.6) this is rarely a problem unless you're in performance critical code or you're building long strings, e.g. in for loops.
Only a guess; might be better to ask what he or she means? I've never heard the term before.
There are a few results on google which refer to string bashing in this context. They don't appear to refer to the concern about the inefficent temporaries and using StringBuilder.
Instead, it appears to refer to simplistic string parsing. I.e. doing stuff like checking for substrings, slicing the string, etc. In particular, it appears to have the implication of it being a hacky solution to the problem.
It might be seen badly because you should either use real parsing or obtain the data in a non-string format.

When is a Java method name too long? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
In the last weeks I've seen some guys using really long names for a Method or Class (50 characters), this is usually under the premise that it improves readability, my opinion is that a long name like this is an indicator that we are trying to do a lot or too much in a method class if we need such a long name, however I wanted to know what do you guys think about it.
An Example is:
getNumberOfSkinCareEligibleItemsWithinTransaction
A name in Java, or any other language, is too long when a shorter name exists that equally conveys the behavior of the method.
Some techniques for reducing the length of method names:
If your whole program, or class, or module is about 'skin care items' you can drop skin care. For example, if your class is called SkinCareUtils,
that brings you to getNumberOfEligibleItemsWithinTransaction
You can change within to in, getNumberOfEligibleItemsInTransaction
You can change Transaction to Tx, which gets you to getNumberOfEligibleItemsInTx.
Or if the method accepts a param of type Transaction you can drop the InTx altogether: getNumberOfEligibleItems
You change numberOf by count: getEligibleItemsCount
Now that is very reasonable. And it is 60% shorter.
Just for a change, a non-subjective answer: 65536 characters.
A.java:1: UTF8 representation for string "xxxxxxxxxxxxxxxxxxxx..." is too long
for the constant pool
;-)
I agree with everyone: method names should not be too long. I do want to add one exception though:
The names of JUnit test methods, however, can be long and should resemble sentences.
Why?
Because they are not called in other code.
Because they are used as test names.
Because they then can be written as sentences describing requirements. (For example, using AgileDox)
Example:
#Test
public void testDialogClosesDownWhenTheRedButtonIsPressedTwice() {
...
}
See "Behavior Driven Design" for more info on this idea.
Context "...WithinTransaction" should be obvious. That's what object-orientation is all about.
The method is part of a class. If the class doesn't mean "Transaction" -- and if it doesn't save you from having to say "WithinTransaction" all the time, then you've got problems.
Java has a culture of encouraging long names, perhaps because the IDEs come with good autocompletion.
This site says that the longest class name in the JRE is InternalFrameInternalFrameTitlePaneInternalFrameTitlePaneMaximizeButtonWindowNotFocusedState which is 92 chars long.
As for longest method name I have found this one supportsDataDefinitionAndDataManipulationTransactions, which is 52 characters.
Never use a long word when a diminutive one will do.
I don't think your thesis of "length of method name is proportional to length of method" really holds water.
Take the example you give: "getNumberOfSkinCareEligibleItemsWithinTransaction". That sounds to me like it does just one thing: it counts the number of items in a transaction that fall into a certain category. Of course I can't judge without seeing the actual code for the method, but that sounds like a good method to me.
On the other hand, I've seen lots of methods with very short and concise names that do way to much work, like "processSale" or the ever popular "doStuff".
I think it would be tough to give a hard-and-fast rule about method name length, but the goal should be: long enough to convey what the function does, short enough to be readable. In this example, I'd think "getSkinCareCount" would probably have been sufficient. The question is what you need to distinguish. If you have one function that counts skin-care-eligible items in transactions and another that counts skin-care-eligible items in something else, then "withinTransactions" adds value. But if it doesn't mean anything to talk about such items outside of a transaction, then there's no point cluttering up the name with such superfluous information.
Two, I think it's wildly unrealistic to suppose that a name of any manageable length will tell you exactly what the function does in all but the most trivial cases. A realistic goal is to make a name that gives a reader a clue, and that can be remembered later. Like, if I'm trying to find the code that calculates how much antimatter we need to consume to reach warp speed, if I look at function names and see "calibrateTransporter", "firePhasers", and "calcAntimatterBurn", it's pretty clear that the first two aren't it but the third one might be. If I check and find that that is indeed the one I'm looking for, it will be easy to remember that when I come back tomorrow to work on this problem some more. That's good enough.
Three, long names that are similar are more confusing than short names. If I have two functions called "calcSalesmanPay" and "calcGeekPay", I can make a good guess which is which at a quick glance. But if they are called "calculateMonthlyCheckAmountForSalesmanForExportToAccountingSystemAndReconciliation" and "calculateMonthlyCheckAmountForProgrammersForExportToAccountingSystemAndReconciliation", I have to study the names to see which is which. The extra information in the name is probably counter-productive in such cases. It turns a half-second think into a 30-second think.
I tend use the haiku rule for names:
Seven syllable class names
five for variables
seven for method and other names
These are rules of thumb for max names. I violate this only when it improves readability. Something like recalculateMortgageInterest(currentRate, quoteSet...) is better than recalculateMortgageInterestRate or recalculateMortgageInterestRateFromSet since the fact that it involves rates and a set of quotes should be pretty clear from the embedded docs like javadoc or the .NET equivalent.
NOTE: Not a real haiku, as it is 7-5-7 rather than 5-7-5. But I still prefer calling it haiku.
Design your interface the way you want it to be, and make the implementation match.
For example, maybe i'd write that as
getTransaction().getItems(SKIN_CARE).getEligible().size()
or with Java 8 streams:
getTransaction().getItems().stream()
.filter(item -> item.getType() == SKIN_CARE)
.filter(item -> item.isEligible())
.count();
My rule is as follows: if a name is so long that it has to appear on a line of its own, then it is too long. (In practice, this means I'm rarely above 20 characters.)
This is based upon research showing that the number of visible vertical lines of code positively correlates with coding speed/effectiveness. If class/method names start significantly hurting that, they're too long.
Add a comment where the method/class is declared and let the IDE take you there if you want a long description of what it's for.
The length of the method itself is probably a better indicator of whether it's doing too much, and even that only gives you a rough idea. You should strive for conciseness, but descriptiveness is more important. If you can't convey the same meaning in a shorter name, then the name itself is probably okay.
When you are going to write a method name next time , just think the bellow quote
"The man who is going to maintain your code is a phyco who knows where you stay"
That method name is definitely too long. My mind tends to wander when I am reading such sized method names. It's like reading a sentence without spaces.
Personally, I prefer as few words in methods as possible. You are helped if the package and class name can convey meaning. If the responsibility of the class is very concise, there is no need for a giant method name. I'm curious why "WithinTransaction" on there.
"getNumberOfSkinCareEligibleItemsWithinTransaction" could become:
com.mycompany.app.product.SkinCareQuery.getNumEligibleItems();
Then when in use, the method could look like "query.getNumEligibleItems()"
A variable name is too long when a shorter name will allow for better code readability over the entire program, or the important parts of the program.
If a longer name allows you to convey more information about a value. However, if a name is too long, it will clutter the code and reduce the ability to comprehend the rest of the code. This typically happens by causing line wraps and pushing other lines of code off the page.
The trick is determining which will offer better readability. If the variable is used often or several times in a short amount of space, it may be better to give it a short name and use a comment clarify. The reader can refer back to the comment easily. If the variable is used often throughout the program, often as a parameter or in other complicated operations, it may be best to trim down the name, or use acronyms as a reminder to the reader. They can always reference a comment by the variable declaration if they forget the meaning.
This is not an easy trade off to make, since you have to consider what the code reader is likely to be trying to comprehend, and also take into account how the code will change and grow over time. That's why naming things is hard.
Readability is why it's acceptable to use i as a loop counter instead of DescriptiveLoopCounterName. Because this is the most common use for a variable, you can spend the least amount of screen space explaining why it exists. The longer name is just going to waste time by making it harder to understand how you are testing the loop condition or indexing into an array.
On the other end of the spectrum, if a function or variable is used rarely as in a complex operation, such as being passed to a multi-parameter function call, you can afford to give it an overly descriptive name.
As with any other language: when it no longer describes the single action the function performs.
I'd say use a combination of the good answers and be reasonable.
Completely, clearly and readably describe what the method does.
If the method name seems too long--refactor the method to do less.
It's too long when the name of the method wraps onto another line and the call to the method is the only thing on the line and starts pretty close to the margin. You have to take into account the average size of the screen of the people who will be using it.
But! If the name seems too long then it probably is too long. The way to get around it is to write your code in such a way that you are within a context and the name is short but duplicated in other contexts. This is like when you can say "she" or "he" in English instead of someone's full name.
It's too long when it too verbosively explains what the thing is about.
For example, these names are functionally equivalent.
in Java: java.sql.SQLIntegrityConstraintViolationException
in Python/Django: django.db.IntegrityError
Ask yourself, in a SQL/db package, how many more types of integrity errors can you come up with? ;)
Hence db.IntegrityError is sufficient.
An identifier name is too long when it exceeds the length your Java compiler can handle.
There are two ways or points of view here: One is that it really doesn't matter how long the method name is, as long as it's as descriptive as possible to describe what the method is doing (Java best practices basic rule). On the other hand, I agree with the flybywire post. We should use our intelligence to try to reduce as much as possible the method name, but without reducing it's descriptiveness. Descriptiveness is more important :)
A name is too long if it:
Takes more than 1 second to read
Takes up more RAM than you allocate for your JVM
Is something absurdly named
If a shorter name makes perfect sense
If it wraps around in your IDE
Honestly the name only needs to convey its purpose to the the Developers that will utilize it as a public API method or have to maintain the code when you leave. Just remember KISS (keep it simple stupid)

What is the right way to change an Integer in a Vector in Java (j2me)

As a follow up to my question about j2me dynamic arrays,
I'm now trying to figure out a way to change the values of the Integers in my Vector.
Say I have a Vector v, and Array arr, and ints x, y and i;
In c++ I could do:
v[arr[x][y]] += i;
In j2me the best way I found so far to do the same is:
v.setElementAt(new Integer(((Integer)(v.elementAt(arr[x][y]))).intValue()+i), arr[x][y]);
Is this really the best way to do it j2me?
If it is, what went wrong here? Java is supposed to make me "do less work" and "do things for me" yet I find myself again and again doing extra work for it. Is something wrong with me, or is it some problem with Java?
Edit: I'm using the J2me SDK 3.0 which looks like it is Java 1.3 so no fancy generics and auto boxing and all that stuff.
I'm afraid that's how it is in ME, although I'd split it to avoid that hairy oneliner:
Integer val = (Integer)v.elementAt(arr[x][y]);
int newVal = val.intValue() + i;
v.setElementAt(new Integer(newVal), arr[x][y]);
Stuff got a lot better with autoboxing and generics, but they came in Java 5 and J2ME is basically a stripped version of Java 1.3 unless I've been misinformed. Here's how it looks in Java 5+:
v.setElementAt(arr[x][y], v.get(arr[x][y]) + i);
Still more verbose than C++, but at least without the casting. I understand there was reluctance to add generics and such to Java as it might be "too hard" for the average programmer to understand [Citation needed]. And so we ended up with unreadable code until .Net got generics and Sun jumped on the bandwagon.
Anywho, I agree the collections framework was a pain to use before generics/boxing, but I hope at least you'll enjoy not having to debug broken pointers and corrupted memory.
Java SE has had some changes to the language (Generics) that would make this code a bit simpler, for ME I'd guess you are out of luck.
I would go for the suggested solution of creating your own class that wraps a plain array (and allocates a bigger one when needed) that was given as an answer to your previous question.
You have two things here that are conspiring to bloat the code: Lack of a typesafe collection, and an immutable int wrapper.
One solution would be to use a typesafe collection. GNU Trove has TIntArrayList for this:
v.set(arr[x][y], v.get(arr[x][y]) + i);
Alternatively, you can use a mutable class like org.jboss.util.MuInteger:
((MuInteger)v.elementAt(arr[x][y])).add(i);
Or, as a dirty hack, arrays of length 1:
((int[])v.elementAt(arr[x][y]))[0] += i;
If you can combine both (would definitely require you to write a a custom collection class, in the absence of Generics):
v.get(arr[x][y]).add(i);
I don't see the reason why you wouldn't do it the same way as you do in C++. OK, you have to implement the dynamically scaling array container yourself but if you do, you get rid of the Integer issue where Java actually creates a new Object of type Integer instead of int primitive type.
Your original question has a nice example of a dynamic int primitive type array as an answer, go check it out again.
Vector<Integer> v = new Vector<Integer>();
v.setElementAt( arr[x][y], arr[x][y] );
EDIT: While answering this question, I did not know that J2ME does not support Generics. Thanks to SO for teaching me that :-)
EDIT 2: My solution is wrong since J2ME does not support Generics.
If you have used Generics (JDK 1.5 onwards), you could do it simpler!
If I assume the declaration to be thus,
Vector<Integer> v = new Vector<Integer>();
then
v.setElementAt(new Integer(((Integer)(v.elementAt(arr[x][y]))).intValue()+i), arr[x][y]);
becomes
v.setElementAt(new Integer(v.elementAt(arr[x][y]).intValue()+i), arr[x][y]);

Categories

Resources