Is
char buf[] = "test";
in C equivalent to
String buf = new String("test");
in Java?
And is
char *buf;
buf = "test";
equivalent to
String buf = "test";
?
It's difficult to say they're equivalent, although I understand what you're driving at.
Your C version is a sequence of 8-bit chars. The Java variant is Unicode-aware.
Secondly, in Java you're creating an object with behaviour, rather than just a sequence of chars.
Finally, the Java variant is immutable. You can change the reference, but not the underlying set of characters (this is a function of being wrapped by the String object)
For something largely equivalent you could use an array of bytes in Java. Note that this wouldn't be null-terminated, however. Java arrays are aware of their length rather than using a convention of null-termination. Alternatively a closer C++ equivalent would probaly be std::string
This question can't really be answered - you're comparing apples to oranges.
In C, a "string" is really just a char array, that is null-terminated (that is, a '\0' byte at the end, placed by the compiler, and expected by the str__() library functions.
In Java, String is an object, that holds (possibly among other things), an array of characters, and an integer count.
They are different things, and they are used differently. Is there something specific you are trying to accomplish and having trouble with? If so, ask that, and we will try to answer it. Otherwise, this isn't really a valid question, IMO.
The first two are not equivalent. In Java, the String object, besides storing a char array, contains also other things (e.g. the length field). The java version is, of course, more OO.
The second ones are equivalent with the same observations as above. They are both pointers to containers of characters. The c container is a simple char array, while the string is a full-fledged object.
No. These are different data types. char buf[] is an array and String buf is an object. The String buf will be dynamically sized and have plenty of helpful methods with it. char buf[] is a static sized chunk of memory holding 5 8-bit characters.
Below will create an array of characters
char buf[] = "test";
Where as String buf = new String("test"); will lead to creation of a String Object, but internally its char[] itself which is made immutable internally using a String object wrapper. So its a representation difference in the above 2 programming languages.
Related
Some analysis on a Java application showed that it's spending a lot of time decoding UTF-8 byte arrays into String objects. The stream of UTF-8 bytes are coming from a LMDB database, and the values in the database are Protobuf messages, which is why it's decoding UTF-8 so much. Another problem being caused by this is Strings are taking up a large chunk of memory because of the decoding from the memory-map into a String object in the JVM.
I want to refactor this application so it does not allocate a new String every time it reads a message from the database. I want the underlying char array in the String object to simply point to the memory location.
package testreflect;
import java.lang.reflect.Field;
import sun.misc.Unsafe;
public class App {
public static void main(String[] args) throws Exception {
Field field = Unsafe.class.getDeclaredField("theUnsafe");
field.setAccessible(true);
Unsafe UNSAFE = (Unsafe) field.get(null);
char[] sourceChars = new char[] { 'b', 'a', 'r', 0x2018 };
// Encoding to a byte array; asBytes would be an LMDB entry
byte[] asBytes = new byte[sourceChars.length * 2];
UNSAFE.copyMemory(sourceChars,
UNSAFE.arrayBaseOffset(sourceChars.getClass()),
asBytes,
UNSAFE.arrayBaseOffset(asBytes.getClass()),
sourceChars.length*(long)UNSAFE.arrayIndexScale(sourceChars.getClass()));
// Copying the byte array to the char array works, but is there a way to
// have the char array simply point to the byte array without copying?
char[] test = new char[sourceChars.length];
UNSAFE.copyMemory(asBytes,
UNSAFE.arrayBaseOffset(asBytes.getClass()),
test,
UNSAFE.arrayBaseOffset(test.getClass()),
asBytes.length*(long)UNSAFE.arrayIndexScale(asBytes.getClass()));
// Allocate a String object, but set its underlying
// byte array manually to avoid the extra memory copy
long stringOffset = UNSAFE.objectFieldOffset(String.class.getDeclaredField("value"));
String stringTest = (String) UNSAFE.allocateInstance(String.class);
UNSAFE.putObject(stringTest, stringOffset, test);
System.out.println(stringTest);
}
}
So far, I've figured out how to copy a byte array to a char array and set the underlying array in a String object using the Unsafe package. This should reduce the amount of CPU time the application is wasting decoding UTF-8 bytes.
However, this does not solve the memory problem. Is there a way to have a char array point to a memory location and avoid a memory allocation altogether? Avoiding the copy altogether will reduce the number of unnecessary allocations the JVM is making for these strings, leaving more room for the OS to cache entries from the LMDB database.
I think you are taking the wrong approach here.
So far, I've figured out how to copy a byte array to a char array and set the underlying array in a String object using the Unsafe package. This should reduce the amount of CPU time the application is wasting decoding UTF-8 bytes.
Erm ... no.
Using memory copy to copy from a byte[] to char[] is not going to work. Each char in the destination char[] will actually contain 2 bytes from the original. If you then try to wrap the char[] as a String, you will get a weird kind of mojibake.
What a real UTF-8 to String conversion does it to convert between 1 and 4 bytes (codeunits) representing a UTF-8 codepoint into 1 or 2 16-bit codeunits representing the same codepoint in UTF-16. That cannot be done using a plain memory copy.
If you aren't familiar with it, it would be worth reading the Wikipedia article on UTF-8 so that you understand how the text is encoded.
The solution depends on what you intend to do with the text data.
If the data must really be in the form of String (or StringBuilder or char[]) objects, then you really have no choice but to do the full conversion. Try anything else and you are liable to mess up; e.g. garbled text and potential JVM crashes.
If you want something that is "string like", you could conceivably implement a custom subclass of CharSequence, that wraps the bytes in the messages and decodes the UTF-8 on the fly. But doing that efficiently make be a problem, especially implementing the charAt method as an O(1) method.
If you are simply wanting to hold and/or compare the (entire) texts, this could possibly be done by representing them as or in a byte[] objects. These operations can be performed on the UTF-8 encoded data directly.
If the input text could actually be sent in character encoding with a fixed 8-bit character size (e.g. ASCII, Latin-1, etc) or as UTF-16, that simplifies things.
In my application i m storing strings using randomaccessfile and while reading back the string i need to convert byte array to string, which is causing OOM. Is there a better way to convert other than this
str = new String(b, "UTF-8");
where b is byte array
Is there a better way to convert other than new String(bytes, "UTF-8") ?
This is actually a rather complicated question.
This constructor cannot simply incorporate the byte[] into the string:
Prior to Java 9, it is always necessary to decode the byte array to a UTF-16 coded array of char. So the constructor is liable to allocate roughly double the memory used by the source byte[].
With Java 9 you have the option of using a new compact representation for String. If you do the AND if the UTF-8 encoded byte array only contains code-points in Unicode code-plane zero (\u0000 to \u00ff) then the String value is a byte[]. However, even in this case the constructor must copy the bytes to a new byte[].
In both cases, there is no more space-efficient way to create a String from a byte[]. Furthermore, I don't think there is a more space-efficient way do the conversion starting with a stream of bytes and a character count. (I am excluding things like modifying the lava.lang.* implementation, or breaking abstraction using reflection.)
Bottom line: when converting a byte[] to a String you should allow at least twice as much contiguous free memory as the original byte[] if you want your code to work on older JVMs.
Is it possible to point to the first element of a char[] in java?
I have this code in C++ as an example of what I am trying to do:
char word[100] = " ";
char *p;
cout << "Enter a word: ";
cin.getline(word, 100);
p = word + strlen(word) - 1;
Is it possible to do something like this in Java?
Thank you for your time!
Java lacks the concept of pointers, so the answer is "no". All operations on an array must be performed using indexes.
The consequence of this is that you cannot pass a single argument so methods expecting to read or write arrays starting at a certain position; you always need an array and an index.
Note: In general, C++ approaches to reading strings do not translate to Java very well, because Java I/O libraries will manage memory for you, freeing you from having to worry about buffer overruns on reading a string.
Java has no concept of pointers and Java String(s) have a fixed length (they're also immutable so they have a fixed everything).
That being said, your posted code is reading one word which you could do with a Scanner like
Scanner scan = new Scanner(System.in);
String word = scan.next(); // <-- nextLine() if you want a line.
int length = word.length();
Concept of pointers are not used in java reason been that memory release and allocation is done automatically by the java virtual machine , the only way to access an array reference is through the index
Short Answer: No
Long Answer: No... Java does not have any pointers. It only has Object references. Any non-primitive data type is pass-by-reference, meaning that if you have the following:
Object a = new Object();
Object b = a;
Object c = new Object();
a == b is true, while b == c and a == c are false.
If you want to have pointer-like behavior in Java, you need to wrap it up in an Object and use references. For example, it is possible that an AtomicInteger[] could be used, and you pass a reference to one of the AtomicIntegers, but it would probably be best to hold the int index as a 'pointer' to the location in the array. Furthermore, your snippet of code could just have a char variable, set to word[0]. It really depends on what you need a pointer for...
Or is it that it just gets a reference to it?
I have a byte array that gets re-written by an external library - is it safe to pass it into a String constructor, or should I create a clone first?
byte[] b = MagicLib.getData();
String s = new String(b);
// actually a pointer to previous memory, just with different data
b = MagicLib.getMoreData();
A String contains an array of chars, not bytes. Therefore, the String cannot share the byte's storage.
Additionally, note that the byte[] will be decoded into characters according to the platform default charset (per the documentation on String(byte[])), which implies further that a decoded version of the byte[] array has to be separately constructed.
In Oracle Java it returns a new char[] depending on the decoding charset used
Java Strings are immutable, so the entire array has to be copied. Otherwise, you could change the contents of the String by modifying the byte array.
I need to parse a binary file created by C++ and overwrite a 4 char long char array in that file, for example change the original char array of ABCD to WXYZ.
I know exactly the position in terms of bytes of the that char array. I tried RandomAccessFile which let me go to the position easily. But I cannot make the rest work for me right now.
Is the RandomAccessFile a right way to go?
I know I have to do some conversion from 2 bytes char to one byte char.
Anybody has a good way to do this?
Fine: always try the JavaDoc RandomAccessFile.
long position = ...;
byte[] bytes = new byte[] { (byte)'W', ... };
raf.seek(position);
raf.write(bytes);
RandomAccessFile is fine. As you have already figured out, in C++ char is a single byte, whereas Java uses UTF-16.
The easiest option might be to use byte[4] in your code to represent the 4-character ASCII string.