Groovy overhead of def keyword - java

Not sure if this is a correct question for here but I was wondering about the Groovy keyword of def (and the equivalent of other dynamic or optionally typed languages).
One useful, or nice usage of something like this is that you could have one type of value assigned to a variable and then change it to another type.
For instance, let's say you get a map of two timestamps that represent a date range from your front end
def filters = [
from: from,
to : to
]
Then when you do some validations you want to pass a date range in date objects to your DAO for SQL queries so you do something like the following
if(filters.from && filters.to) {
def normalizedDateRange = DateUtil.buildDateRange(filters.from, filters.to, maxRangeDays)
filters.from = normalizedDateRange.from
filters.to = normalizedDateRange.to
}
This is acceptable and you get away without needing to create a second map with very similar name or such. My question is if this causes too much overhead in languages like this and is this one of the reasons they are slower than JAVA let's say.
Some people say that you could consider the def as using Object in Java so it allocates enough space to store anything or maybe it store a reference and when you store something different it just frees the space it was taking and reallocates new space and just updates the reference?
Would I gain anything by creating a new object/map here and storing the new values there? Or is the gain so little that it's better to take advantage of the sugar syntax and "cheating" of Groovy?

def will be lighter, since it is simply an empty reference, which might easily be garbage collected later on.
By storing variables in a map, you are storing a value in a specific structure which calculates hashcode and whatnot1 2. It will be heavier.
Of course a map has wonderful features and one shouldn't overlook this simply based on performance without checking if it is a true bottleneck. You could try writing a microbenchmark.

Remember, Groovy is optionally typed, not dynamically typed. So if you are writing a constant that holds a filter, you can do this to give the compiler and JVM hints on what to do:
static final Map filters = [ to: 'X', from: 'Y' ]

Related

Do something when a variable is (re)assigned Java

This is a far-fetched question and I am not sure how to approach this problem, so I am open to other workarounds or proposals. As far as I am aware, what I am trying to do is impossible, but I'd like a second input.
Assume we have the following Java code:
int val = 4;
I am curious as to if some sort of function is called when this statement is executed. An overridable function that assigns a given memory location to this value, or something of that nature.
My objective would be to override that function and store this data here and in a file elsewhere as well.
This would need to work for all data types and for reassignments such as that shown below.
val = getNumber(); // Returns 6;
I would have some sort of direction if I was working with Python, but unfortunately, that is not the case.
My best idea for a solution is to call a function that simply returns a provided argument. Due to the application of this, I'd like to avoid this and keep the usage of this framework as conventional as possible.
Thanks!
I don't think any kind of function happens when we assign values. However when we assign a value to a primitive type(int, double...) variable the value is stored in the stack memory. If the data is reference type (String...), then it is stored in the heap memory. Only the reference address will be stored in the stack. Whenever you decide to change the state of that particular variable (field value) the new value will be stored in the stack overriding the previous value. So, you don't have to worry about methods to override using a method.
If you want to deny access to a variable outside the class, but still change the state of that variable, then you can use encapsulation concept of OOP in java.
For further clarification refer this article about stack vs. heap

Transform list of Either into list of left and list of right

Vavr's Either seems to solve one of my problems were some method does a lot of checks and returns either CalculationError or CalculationResult.
Either<CalculationError, CalculationResult> calculate (CalculationData calculationData) {
// either returns Either.left(new CalculationError()) or Either.right(new CalculationResult())
}
I have a wrapper which stores both errors and results
class Calculation {
List<CalculationResult> calculationResults;
List<CalculationError> calculationErrors;
}
Is there any neat solution to transform stream from Collection<CalculationData> data to Calculation?
This can be easily done using a custom collector. With a bit of pseudo code representing the Either:
Collector<Either<CalculationError, CalculationResult>, ?, Calculation> collector = Collector.of(
Calculation::new,
(calc, either) -> {
if (either has error) {
calc.calculationErrors.add(either.error);
} else {
calc.calculationResults.add(either.result);
}
},
(calc1, calc2) -> {
calc1.calculationErrors.addAll(calc2.calculationErrors);
calc1.calculationResults.addAll(calc2.calculationResults);
return calc1;
}
);
Calculation calc = data.stream()
.map(this::calculate)
.collect(collector);
Note that Calculation should initialize its two lists (in the declaration or a new constructor).
Well, you're using vavr, so 'neat' is right out. Tends to happen when you use tools that are hostile to the idiomatic form of the language. But, then again, 'neat' is a nebulous term with no clear defined meaning, so, I guess, whatever you think is 'neat', is therefore 'neat'. Neat, huh?
Either itself has the sequence method - but both of them work the way Either is supposed to work: They are left-biased in the sense that any Lefts present is treated as erroneous conditions, and that means all the Right values are discarded if even one of your Eithers is a Left. Thus, you cannot use either of the sequence methods to let Either itself bake you a list of the Right values. Even sequenceRight won't do this for you (it stops on the first Left in the list and returns that instead). The filter stuff similarly doesn't work like that - Either very much isn't really an Either in the sense of what that word means if you open a dictionary: It does not mean: A homogenous mix of 2 types. It's solely a non-java-like take on exception management: Right contains the 'answer', left contains the 'error' (you're using it correctly), but as a consequence there's nothing in the Either API to help with this task - which in effect involves 'please filter out the errors and then do something' ("Silently ignore errors" is rarely the right move. It is what is needed here, but it makes sense that the Either API isn't going to hand you a footgun. Even if you need it here).
Thus, we just write it plain jane java:
var calculation = new Calculation();
for (var e : mix) {
if (e.isLeft()) calculation.calculationErrors.add(e.getLeft());
if (e.isRight()) calculation.calculationResult.add(e.getRight());
}
(This presumes your Calculation constructor at least initializes those lists to empty mutables).
NB: Rob Spoor's answer also assumes this and is much, much longer. Sometimes the functional way is the silly, slow, unwieldy, hard to read, way.
NB2: Either.sequence(mix).orElseRun(s -> calculation.errors = s.asJava()); is a rather 'neat' way (perhaps - it's in the eye of the beholder) of setting up the errors field of your Calculation class. No joy for such a 'neat' trick to fill the 'results' part of it all, however. That's what the bulk of my answer is trying to explain: There is no nice API for that in Either, and it's probably by design, as that involves intentionally ignoring the errors in the list of Eithers.
Since you are using VAVr, you may consider using Traversable instead of Collection. This will give you the method partition, which can be used to classify your list of Eithers into two groups like so:
Traversable<Either<CalculationError, CalculationResult>> calculations = ...;
var partitionedCalcs = calculations.partition(Either::isRight);
var results = partitionedCalcs._1.map(Either::getRight);
var errors = partitionedCalcs._2.map(Either::getLeft);
Calculation calcs = new Calculation(results, errors);
If you don't want to change your existing use of Collection to use a Traversable, then you can easily convert between them by using, for example, List.ofAll(Iterator) and Value.toJavaCollection(Function).

Does it lead to data corruption when we access an outside object in a Java 8 map function?

I have object customerSummary at line #2 and accessing it at lines #11 & #12. Does it lead to data corruption in production?
private CustomerSummary enrichCustomerIdentifiers(CustomerSummaryDTO customerSummaryDTO) {
CustomerSummary customerSummary = customerSummaryDTO.getCustomerSummary();
List<CustomerIdentifier> customerIdentifiers = customerSummary
.getCustomerIdentifiers().stream()
.peek(customerIdentifier -> {
if (getCustomerReferenceTypes().contains(customerIdentifier.getIdentifierType())) {
customerIdentifier.setRefType(RefType.REF.toString());
} else {
customerIdentifier.setRefType(RefType.TAX.toString());
Country country = new Country();
country.setIsoCountryCode(customerSummary.getCustomerAddresses().get(0).getIsoCountryCode());
country.setCountryName(customerSummary.getCustomerAddresses().get(0).getCountryName());
customerIdentifier.setCountry(country);
}
}).collect(Collectors.toList());
customerSummary.setCustomerIdentifiers(customerIdentifiers);
return customerSummary;
}
The literal answer to your question is No ... assuming that the access is thread-safe.
But your code probably doesn't do what you think it does.
The peek() method returns the precise stream of objects that it is called on. So your code is effectively doing this:
summary.setCustomerIdentifiers(
new SomeListClass<>(summary.getCustomerIdentifiers()));
... while doing some operations on the identifier objects.
You are (AFAIK unnecessarily) copying the list and reassigning it to the field of the summary object.
It would be simpler AND more efficient to write it as:
for (CustomerIdentifier id: summary.getCustomerIdentifiers()) {
if (getCustomerReferenceTypes().contains(id.getIdentifierType())) {
id.setRefType(RefType.REF.toString());
} else {
id.setRefType(RefType.TAX.toString());
Country country = new Country();
Address address = summary.getCustomerAddresses().get(0);
country.setIsoCountryCode(address.getIsoCountryCode());
country.setCountryName(address.getCountryName());
id.setCountry(country);
}
}
You could do the above using a list.stream().forEach(), or a list.forEach(), but the code is (IMO) neither simpler or substantially more concise than a plain loop.
summary.getCustomerIdentifiers().forEach(
id -> {
if (getCustomerReferenceTypes().contains(id.getIdentifierType())) {
id.setRefType(RefType.REF.toString());
} else {
id.setRefType(RefType.TAX.toString());
Country country = new Country();
Address address = summary.getCustomerAddresses().get(0);
country.setIsoCountryCode(address.getIsoCountryCode());
country.setCountryName(address.getCountryName());
id.setCountry(country);
}
}
);
(A final micro-optimization would be to declare and initialize address outside of the loop.)
Java 8 streams are not the solution to all problems.
The direct answer to your question is a resounding 'no', but you're misusing streams, which presumably is part of why you are even asking this question. You're operating on mutables in stream code, which you shouldn't be doing: It's why I'm saying 'misusing' - this code compiles and works but leads to hard to read and had to maintain code that will fail in weird ways as you use more and more of the stream API. The solution is not to go against the grain so much.
You're also engaging in stringly based typing which is another style mistake.
Finally, your collect call is misleading.
So, to answer the question:
Does it lead to data corruption in production?
No. How would you imagine it would?
Style mistake #1: mutables
Streams don't work nearly as well when you're working with mutables. The general idea is that you have immutable classes (classes without any setters; the instances of these classes cannot change after construction. String is immutable, so is Integer, and so is BigDecimal. There is no .setValue() on an integer instance, there is no setChar() on a string, or even a clear() or an append() - all operations on immutables that appear to modify things actually return a new instance that contains the result of the operation. someBigDecimal.add() doesn't change what someBigDecimal is pointing at; it constructs a new bigDecimal instance and returns that.
With immutables, if you want to change things, Stream's map method is the right one to use: For example, if you have a stream of BigDecimal objects and you want to, say, print them all, but with 2.5 added to them, you'd be calling map: You want to map each input BigDecimal into an output BD by asking the BD instance to make a new BD instance by adding 2.5 to itself.
With mutables, both map and peek are more relevant. Style debates are rife on what to do. peek just lets you witness what's going through a stream pipeline. It can be misleading because stream pipelines dont process anything until you stick a terminator on the end (something like collect, or max() or whatnot, those are 'terminators'). When talking about mutables, peek in theory works just as well as map does and some (evidently, including intellij's auto-suggest authors) are of the belief that a map operation that really just mutates the underlying object in the stream and returns the same reference is a style violation and should be replaced with a peek operation instead.
But the far more relevant observation is that stream operations should not be mutating anything at all. Do not call setters.
You have 2 options:
Massively refactor this code, make CustomIdentifier immutable (get rid of the getters, make all fields final, consider adding with-ers and builders and the like), change your peek code to something like:
.map(identifier -> {
if (....) return customerIdentifier.with(RefType.REF);
return identifier.withCountry(new Country(summary.get..., summary.get...));
})
Note that Country also needs this treatment.
Do not use streams.
This is much simpler. This code is vastly less confusing and better style if you just write a foreach loop. I have no idea why you thought streams were appropriate here. Streams are not 'better'. A problem is that adherents of functional style are so incredibly convinced they are correct they spread copious FUD (Fear, Uncertainty, Doubt) about non-functional approaches and strongly insinuate that functional style is 'just better'. This is not true - it's merely a different style that is more suitable to some domains and less to others. This style goes a lot further than just 'turn for loops into streams', and unawareness of what 'functional style' really means just leads to hard to maintain, hard to read, weird code like what you pasted.
I really, really want to use streams here
This is just a bad idea here (unless you do the full rewrite to immutables), but if you MUST, the actual right answer is not what intellij said, it's to use forEach. This is peek and the terminal in one package. It gets rid of the pointless collect (which just recreates a list that is 100% identical to what customerSummary.getCustomerIdentifiers() returns) call and properly represents what is actually happening (which is NOT that you're writing code that witnesses what is flowing through the stream pipe, you're writing code that you intend to execute on each element in the stream).
But that's still much worse than this:
CustomerSummary summary = custumerSummaryDTO.getCustomerSummary();
for (CustomerIdentifier identifier : summary.getCustomerIdentifiers()) {
if (getCustomerReferenceTypes().contains(customerIdentifier.getIdentifierType())) {
customerIdentifier.setRefType(RefType.REF.toString());
} else {
customerIdentifier.setRefType(RefType.TAX.toString());
Country country = new Country();
country.setIsoCountryCode(customerSummary.getCustomerAddresses().get(0).getIsoCountryCode());
country.setCountryName(customerSummary.getCustomerAddresses().get(0).getCountryName());
customerIdentifier.setCountry(country);
}
}
return customerSummary;
Style mistake #2: stringly typing
Why isn't the refType field in CustomerIdentifier just RefType? Why are you converting RefType instances to strings and back?
DB engines support enums and if they don't, the in-between layer (your DTO) should support marshalling enums into strings and back.

What's the limit to the number of members you can have in a java enum?

Assuming you have a hypothetical enum in java like this (purely for demonstration purposes, this isn't code i'm seriously expecting to use):
enum Example{
FIRST,
SECOND,
THIRD,
...
LAST;
}
What's the maximum number of members you could have inside that enum before the compiler stops you?
Secondly, is there any performance difference at runtime when your code is referencing an enum with say, 10 members as opposed to 100 or 1,000 (other than just the obvious memory overhead required to store the large class)?
The language specification itself doesn't have a limit. Yet, there are many limitations that classfile has that bound the number of enums, with the upper bound being aruond 65,536 (2^16) enums:
Number of Fields
The JVMS 4.1 specifies that ClassFile may have up to 65,536 (2^16) fields. Enums get stored in the classfile as static field, so the maximum number of enum values and enum member fields is 65,536.
Constant Pool
The JVMS also specifies that the Constant Pool may have up to 65,536. Constant Pools store all String literals, type literals, supertype, super interfaces types, method signatures, method names, AND enum value names. So there must be fewer than 2^16 enum values, since the names strings need to share that Constant Pool limit.
Static Method Initialization
The maximum limit for a method is 65,535 bytes (in bytecode). So the static initializer for the Enum has to be smaller than 64Kb. While the compiler may split it into different methods (Look at Bug ID: 4262078) to distribute the initializations into small blocks, the compiler doesn't do that currently.
Long story short, there is no easy answer, and the answer depends not only on the number of enum values there are, but also the number of methods, interfaces, and fields the enums have!
The best way to find out the answer to this type of question is to try it. Start with a little Python script to generate the Java files:
n = input()
print "class A{public static void main(String[] a){}enum B{"
print ','.join("C%d" % x for x in range(n))
print '}}'
Now try with 1,10,100,1000... works fine, then BAM:
A.java:2: code too large
C0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15,C16,C17,C18,C19,C20,C21,C22,...
Seems like I hit some sort of internal limit. Not sure if it's a documented limit, if it's dependent on the specific version of my compiler, or if its some system dependant limit. But for me the limit was around 3000 and appears to be related to the source code size. Maybe you could write your own compiler to bypass this limit.
The maximum number of enum values will I think be just under the 65536 maximum number of fields/constant pool entries in the class. (As I mentioned in a comment above, the actual values shouldn't take up constant pool entries: they can be "inlined" into the bytecode, but the names will.)
As far as the second question is concerned, there's no direct performance difference, but it's conceivable that there'll be small indirect performance differences, partly because of the class file size as you say. Another thing to bear in mind is that when you use enum collections, there are optimised versions of some of the classes for when all of the enum values fit within a certain range (a byte, as I recall). So yes, there could be a small difference. I woudln't get paranoid, though.
This is an extension of the comments to the original question.
There are multiple problems with having a LOT of enums.
The main reason is that when you have a lot of data it tends to change, or if not you often want to add new items. There are exemptions to this like unit conversions that would never change, but for the most part you want to read data like this from a file into a collection of classes rather than an enum.
To add new items is problematic because since it's an enum, you need to physically modify your code unless you are ALWAYS using the enums as a collection, and if you are ALWAYS using them as a collection, why make them enums at all?
The case where your data doesn't change--like "conversion units" where you are converting feet, inches, etc. You COULD do this as enums and there WOULD be a lot of them, but by coding them as enums you lose the ability to have data drive your program. For instance, a user could select from a pull-down list populated by your "Units", but again, this is not an "ENUM" usage, it's using it as a collection.
The other problem will be repetition around the references to your enum. You will almost certainly have something very repetitive like:
if(userSelectedCard() == cards.HEARTS)
graphic=loadFile("Heart.jpg");
if(userSelectedCard() == cards.SPADES)
graphic=loadFile("Spade.jpg");
Which is just wrong (If you can squint to where you can't read the letters and see this kind of pattern in your code, you KNOW you are doing it wrong).
If the cards were stored in a card collection, it would be easier to just use:
graphic=cards.getGraphicFor(userSelectedCard());
I'm not saying that this can't be done with an enum as well, but I am saying that I can't see how you would use these as enums without having some nasty code-block like the one I posted above.
I'm also not saying that there aren't cases for enums--there are lots of them, but when you get more than a few (7 was a good number), you're probably better off with some other structure.
I guess the exception is when you are modeling real-world stuff that has that many types and each must be addressed with different code, but even then you are probably better off using a data file to bind a name to some code to run and storing them in a hash so you can invoke them with code like: hash.get(nameString).executeCode(). This way, again, your "nameString" is data and not hard-coded, allowing refactoring elsewhere.
If you get in the habit of brutally factoring your code like this, you can reduce many programs by 50% or more in size.
If you have to ask, you're probably doing something wrong. The actual limit is probably fairly high, but an enum with more than 10 or so values would be highly suspect, I think. Break that up into related collections, or a type hierarchy, or something.

Why can't strings be mutable in Java and .NET?

Why is it that they decided to make String immutable in Java and .NET (and some other languages)? Why didn't they make it mutable?
According to Effective Java, chapter 4, page 73, 2nd edition:
"There are many good reasons for this: Immutable classes are easier to
design, implement, and use than mutable classes. They are less prone
to error and are more secure.
[...]
"Immutable objects are simple. An immutable object can be in
exactly one state, the state in which it was created. If you make sure
that all constructors establish class invariants, then it is
guaranteed that these invariants will remain true for all time, with
no effort on your part.
[...]
Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads
accessing them concurrently. This is far and away the easiest approach
to achieving thread safety. In fact, no thread can ever observe any
effect of another thread on an immutable object. Therefore,
immutable objects can be shared freely
[...]
Other small points from the same chapter:
Not only can you share immutable objects, but you can share their internals.
[...]
Immutable objects make great building blocks for other objects, whether mutable or immutable.
[...]
The only real disadvantage of immutable classes is that they require a separate object for each distinct value.
There are at least two reasons.
First - security http://www.javafaq.nu/java-article1060.html
The main reason why String made
immutable was security. Look at this
example: We have a file open method
with login check. We pass a String to
this method to process authentication
which is necessary before the call
will be passed to OS. If String was
mutable it was possible somehow to
modify its content after the
authentication check before OS gets
request from program then it is
possible to request any file. So if
you have a right to open text file in
user directory but then on the fly
when somehow you manage to change the
file name you can request to open
"passwd" file or any other. Then a
file can be modified and it will be
possible to login directly to OS.
Second - Memory efficiency http://hikrish.blogspot.com/2006/07/why-string-class-is-immutable.html
JVM internally maintains the "String
Pool". To achive the memory
efficiency, JVM will refer the String
object from pool. It will not create
the new String objects. So, whenever
you create a new string literal, JVM
will check in the pool whether it
already exists or not. If already
present in the pool, just give the
reference to the same object or create
the new object in the pool. There will
be many references point to the same
String objects, if someone changes the
value, it will affect all the
references. So, sun decided to make it
immutable.
Actually, the reasons string are immutable in java doesn't have much to do with security. The two main reasons are the following:
Thead Safety:
Strings are extremely widely used type of object. It is therefore more or less guaranteed to be used in a multi-threaded environment. Strings are immutable to make sure that it is safe to share strings among threads. Having an immutable strings ensures that when passing strings from thread A to another thread B, thread B cannot unexpectedly modify thread A's string.
Not only does this help simplify the already pretty complicated task of multi-threaded programming, but it also helps with performance of multi-threaded applications. Access to mutable objects must somehow be synchronized when they can be accessed from multiple threads, to make sure that one thread doesn't attempt to read the value of your object while it is being modified by another thread. Proper synchronization is both hard to do correctly for the programmer, and expensive at runtime. Immutable objects cannot be modified and therefore do not need synchronization.
Performance:
While String interning has been mentioned, it only represents a small gain in memory efficiency for Java programs. Only string literals are interned. This means that only the strings which are the same in your source code will share the same String Object. If your program dynamically creates string that are the same, they will be represented in different objects.
More importantly, immutable strings allow them to share their internal data. For many string operations, this means that the underlying array of characters does not need to be copied. For example, say you want to take the five first characters of String. In Java, you would calls myString.substring(0,5). In this case, what the substring() method does is simply to create a new String object that shares myString's underlying char[] but who knows that it starts at index 0 and ends at index 5 of that char[]. To put this in graphical form, you would end up with the following:
| myString |
v v
"The quick brown fox jumps over the lazy dog" <-- shared char[]
^ ^
| | myString.substring(0,5)
This makes this kind of operations extremely cheap, and O(1) since the operation neither depends on the length of the original string, nor on the length of the substring we need to extract. This behavior also has some memory benefits, since many strings can share their underlying char[].
Thread safety and performance. If a string cannot be modified it is safe and quick to pass a reference around among multiple threads. If strings were mutable, you would always have to copy all of the bytes of the string to a new instance, or provide synchronization. A typical application will read a string 100 times for every time that string needs to be modified. See wikipedia on immutability.
One should really ask, "why should X be mutable?" It's better to default to immutability, because of the benefits already mentioned by Princess Fluff. It should be an exception that something is mutable.
Unfortunately most of the current programming languages default to mutability, but hopefully in the future the default is more on immutablity (see A Wish List for the Next Mainstream Programming Language).
Wow! I Can't believe the misinformation here. Strings being immutable have nothing with security. If someone already has access to the objects in a running application (which would have to be assumed if you are trying to guard against someone 'hacking' a String in your app), they would certainly be a plenty of other opportunities available for hacking.
It's a quite novel idea that the immutability of String is addressing threading issues. Hmmm ... I have an object that is being changed by two different threads. How do I resolve this? synchronize access to the object? Naawww ... let's not let anyone change the object at all -- that'll fix all of our messy concurrency issues! In fact, let's make all objects immutable, and then we can removed the synchonized contruct from the Java language.
The real reason (pointed out by others above) is memory optimization. It is quite common in any application for the same string literal to be used repeatedly. It is so common, in fact, that decades ago, many compilers made the optimization of storing only a single instance of a String literal. The drawback of this optimization is that runtime code that modifies a String literal introduces a problem because it is modifying the instance for all other code that shares it. For example, it would be not good for a function somewhere in an application to change the String literal "dog" to "cat". A printf("dog") would result in "cat" being written to stdout. For that reason, there needed to be a way of guarding against code that attempts to change String literals (i. e., make them immutable). Some compilers (with support from the OS) would accomplish this by placing String literal into a special readonly memory segment that would cause a memory fault if a write attempt was made.
In Java this is known as interning. The Java compiler here is just following an standard memory optimization done by compilers for decades. And to address the same issue of these String literals being modified at runtime, Java simply makes the String class immutable (i. e, gives you no setters that would allow you to change the String content). Strings would not have to be immutable if interning of String literals did not occur.
String is not a primitive type, yet you normally want to use it with value semantics, i.e. like a value.
A value is something you can trust won't change behind your back.
If you write: String str = someExpr();
You don't want it to change unless YOU do something with str.
String as an Object has naturally pointer semantics, to get value semantics as well it needs to be immutable.
One factor is that, if Strings were mutable, objects storing Strings would have to be careful to store copies, lest their internal data change without notice. Given that Strings are a fairly primitive type like numbers, it is nice when one can treat them as if they were passed by value, even if they are passed by reference (which also helps to save on memory).
I know this is a bump, but...
Are they really immutable?
Consider the following.
public static unsafe void MutableReplaceIndex(string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
...
string s = "abc";
MutableReplaceIndex(s, '1', 0);
MutableReplaceIndex(s, '2', 1);
MutableReplaceIndex(s, '3', 2);
Console.WriteLine(s); // Prints 1 2 3
You could even make it an extension method.
public static class Extensions
{
public static unsafe void MutableReplaceIndex(this string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
}
Which makes the following work
s.MutableReplaceIndex('1', 0);
s.MutableReplaceIndex('2', 1);
s.MutableReplaceIndex('3', 2);
Conclusion: They're in an immutable state which is known by the compiler. Of couse the above only applies to .NET strings as Java doesn't have pointers. However a string can be entirely mutable using pointers in C#. It's not how pointers are intended to be used, has practical usage or is safely used; it's however possible, thus bending the whole "mutable" rule. You can normally not modify an index directly of a string and this is the only way. There is a way that this could be prevented by disallowing pointer instances of strings or making a copy when a string is pointed to, but neither is done, which makes strings in C# not entirely immutable.
For most purposes, a "string" is (used/treated as/thought of/assumed to be) a meaningful atomic unit, just like a number.
Asking why the individual characters of a string are not mutable is therefore like asking why the individual bits of an integer are not mutable.
You should know why. Just think about it.
I hate to say it, but unfortunately we're debating this because our language sucks, and we're trying to using a single word, string, to describe a complex, contextually situated concept or class of object.
We perform calculations and comparisons with "strings" similar to how we do with numbers. If strings (or integers) were mutable, we'd have to write special code to lock their values into immutable local forms in order to perform any kind of calculation reliably. Therefore, it is best to think of a string like a numeric identifier, but instead of being 16, 32, or 64 bits long, it could be hundreds of bits long.
When someone says "string", we all think of different things. Those who think of it simply as a set of characters, with no particular purpose in mind, will of course be appalled that someone just decided that they should not be able to manipulate those characters. But the "string" class isn't just an array of characters. It's a STRING, not a char[]. There are some basic assumptions about the concept we refer to as a "string", and it generally can be described as meaningful, atomic unit of coded data like a number. When people talk about "manipulating strings", perhaps they're really talking about manipulating characters to build strings, and a StringBuilder is great for that. Just think a bit about what the word "string" truly means.
Consider for a moment what it would be like if strings were mutable. The following API function could be tricked into returning information for a different user if the mutable username string is intentionally or unintentionally modified by another thread while this function is using it:
string GetPersonalInfo( string username, string password )
{
string stored_password = DBQuery.GetPasswordFor( username );
if (password == stored_password)
{
//another thread modifies the mutable 'username' string
return DBQuery.GetPersonalInfoFor( username );
}
}
Security isn't just about 'access control', it's also about 'safety' and 'guaranteeing correctness'. If a method can't be easily written and depended upon to perform a simple calculation or comparison reliably, then it's not safe to call it, but it would be safe to call into question the programming language itself.
Immutability is not so closely tied to security. For that, at least in .NET, you get the SecureString class.
Later edit: In Java you will find GuardedString, a similar implementation.
The decision to have string mutable in C++ causes a lot of problems, see this excellent article by Kelvin Henney about Mad COW Disease.
COW = Copy On Write.
It's a trade off. Strings go into the String pool and when you create multiple identical Strings they share the same memory. The designers figured this memory saving technique would work well for the common case, since programs tend to grind over the same strings a lot.
The downside is that concatenations make a lot of extra Strings that are only transitional and just become garbage, actually harming memory performance. You have StringBuffer and StringBuilder (in Java, StringBuilder is also in .NET) to use to preserve memory in these cases.
Strings in Java are not truly immutable, you can change their value's using reflection and or class loading. You should not be depending on that property for security.
For examples see: Magic Trick In Java
Immutability is good. See Effective Java. If you had to copy a String every time you passed it around, then that would be a lot of error-prone code. You also have confusion as to which modifications affect which references. In the same way that Integer has to be immutable to behave like int, Strings have to behave as immutable to act like primitives. In C++ passing strings by value does this without explicit mention in the source code.
There is an exception for nearly almost every rule:
using System;
using System.Runtime.InteropServices;
namespace Guess
{
class Program
{
static void Main(string[] args)
{
const string str = "ABC";
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
var handle = GCHandle.Alloc(str, GCHandleType.Pinned);
try
{
Marshal.WriteInt16(handle.AddrOfPinnedObject(), 4, 'Z');
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
}
finally
{
handle.Free();
}
}
}
}
It's largely for security reasons. It's much harder to secure a system if you can't trust that your Strings are tamperproof.

Categories

Resources