Sizeof in c porting to Java

Sizeof in c porting to Java - java

I have a code in C like this
skip=(unsigned long) (st_row-1)*tot_numcols;
fseek(infile,sizeof(cnum)*skip,0);
Now i have to port it into Java How can I do That.The "cnum" is a Structure in C so I created a class in Java.But about that fseek how can i point to the exact position in File in Java.

Your C design is broken, and you can't do what you apparently want in Java.
It appears that you're storing information out of C structs by blindly dumping the pointer to disk. In addition to being difficult to debug, it's prone to break completely with any change that makes the compiler decide to pack the struct differently, including in particular compiling identical code for 32-bit and 64-bit or little- and big-endian targets. Instead, you should always explicitly serialize structured data. Human-readable formats are best unless there's a very large amount of data.
Java simply doesn't permit this kind of attempt. The Java memory model explicitly hides information about runtime memory packing, and the JVM has wide latitude to organize memory management as it sees fit.
Instead, define a clear format for saving your data, including endianness, and use that from both languages.

Related

How to avoid copying data between Java and Native C++ Code

I am writing C++ library that will be used by different Android applications to process some kind of data organized like two-dimensional storage where each dimension has no predefined restrictions for size (like array of arrays of float, and size of arrays can be quite large).
Current solution uses SWIG to copy data from memory allocated by Java code to C++ structures. It turns out that each array of float value (in Java) became vector of float (in C++).
The problem is that duplication of a large amount of data increases the risk of running out of memory available for application. I understand that, in any case, memory consumption issue should be resolved by input volume limitation, but the library does not know how much memory is available and should have whole data (access to any data element is needed repeatedly) to perform correct processing.
So now I am considering possibility to use one data storage for Java and C++, so C++ code require direct access to data stored by Java code to memory allocated on Java side (making memory allocated by C++ code as single storage is not considered).
I want to know how to organize such memory sharing in a safe manner (preferably using SWIG).
I feel that some difficulties can be with such implementation, e.g. with Java garbage collector (C++ code can address to storage which already deallocated) and slowing memory access through the wrapper (as mentioned earlier, the library requires repeated access to each data item)… but perhaps someone advise me a reliable solution.
The explanation of why my idea is wrong can be accepted, if supported with sufficiently and compelling arguments.

You can take access to raw array of data using Critical Native implementation. This tecknology allow to access directly to jvm memory without owerhead of transfering data between Java and native code.
But this have next restrictions:
must be static and not synchronized;
argument types must be primitive or primitive arrays;
implementation must not call JNI functions, i.e. it cannot allocate Java objects or throw exceptions;
should not run for a long time, since it will block GC while running.
The declaration of a critical native looks like a regular JNI method, except that:
it starts with JavaCritical_ instead of Java_;
it does not have extra JNIEnv* and jclass arguments;
Java arrays are passed in two arguments: the first is an array length, and the second is a pointer to raw array data. That is, no need to call GetArrayElements and friends, you can instantly use a direct array pointer.
Look at original answer and source article for details.

Offset of file pointer robust/reliable for several programing languages?

I am having a question regarding reading files for instance in Java or in C/C++. You usually can get an offset value of the current position in file.
How robust is this offset? Assumed the file is not changed of course will I read the same line via Java as I would if using C/C++ if I position the stream on this offset?
I would guess yes, but I was wondering if I am missing something? What I want to do is making some kind of index that returns this offset value in a specific file? can that work is this offset bound to a certain API or even x-bit architecture?
Regards,

The offset of a given byte in a given file is going to be 100% reliable for (at least) any system with a POSIX / POSIX-like model of files. It follows that the same offest will give you the same byte in Java and C++. However, this does depend on you using the respective languages' I/O APIs correctly; i.e. understanding them.
One thing that can get a bit tricky is when you use some "binary I/O" scheme in C++ that involves treating objects (or structs) as arrays of bytes and reading / writing those bytes. If you do that, you have the problem that the byte-level representations of C / C++ objects are platform dependent. For instance, you can run into the big-endian vs little-endian problem. This doesn't alter offsets ... but it can mean that "stuff" gets mangled due to representation mismatches.
The best thing to do is to use a file representation that is not dependent on the platform where the file is read or written; i.e. don't do it that way.

Serialization vs. Byte Code Translation

I'm a beginner with programming, and I was just wondering if there is a difference between the process of serialization and the process of converting to and from byte code (intermediate language).
I found this on javacodegeeks.com:
Serialization is usually used When the need arises to send your data
over network or stored in files. By data I mean objects and not text.
Now the problem is your Network infrastructure and your Hard disk are
hardware components that understand bits and bytes but not Java
objects. Serialization is the translation of your Java object’s
values/states to bytes to send it over network or save it. --> On
other hand, Deserialization is conversion of byte code to
corresponding java objects. <--
From my understanding of this paragraph, serialization may be the process by which java converts its programs to byte code for the ability to transport to different computer environments and still function correctly.
Am I correct in thinking this?

From my understanding of this paragraph, serialization may be the process by which java converts its programs to byte code for the ability to transport to different computer environments and still function correctly. Am I correct in thinking this?
No, compiling with javac creates the byte code that runs on the JVM. VMs (such as the JVM) INTERPRET the bytecode and use some clever and complicated just-in-time compilation (which IS machine/platform-dependent) to give you the final product. See bytecode is just a bunch of instructions that the JVM interprets. Each bytecode opcode is one byte in length, hence the name bytecode.
Serialization on the other hand, converts the state of a Java object into a stream of bytes. These bytes are not instructions like bytecode. Primary purpose of Java Serialization is to write an object into a stream, so that it can be transported through a network and that object can be rebuilt again. When there are two different parties involved, you need a protocol to rebuild the exact same object again. Java serialization API just provides you that. Other ways you can leverage the feature of serialization is, you can use it to perform a deep copy.
Now the problem is your Network infrastructure and your Hard disk are hardware components that understand bits and bytes but not Java objects. Serialization is the translation of your Java object’s values/states to bytes to send it over network or save it. --> On other hand, Deserialization is conversion of byte code to corresponding java objects.
See you can't just pass a java object to the link layer of the network and expect it to be able to send. Networks send bits and bytes across the physical medium. So serializable lets you encode an object in a standard way to binary, pass it across the network, and then decode it at the receiving end back to the object in the exact state the object was in on the sending side

Performance and memory usage in Java arrays vs C++ arrays

I work on a small company where I work to build some banking software. Now, I have to build some data structure like:
Array [Int-Max] [2] // Large 2D array
Save that to disk and load it next day for future work.
Now, as I only know Java (and little bit C), they always insist me to use C++ or C. As per their suggestion:
They have seen Array [Int-Max] [2] in Java will take nearly 1.5 times more memory than C and C++ takes some what reasonable memory footprint than Java.
C and C++ can handle arbitrarily large files where as Java can't.
As per their suggestion, as database/data-structure become large Java just becomes infeasible. As we have to work on such large database/data-structure, C/C++ is always preferable.
Now my question is,
Why is C or C++ always preferable on large database/data-structure over Java ? Because, C may be, but C++ is also an OOP. So, how it get advantage over Java ?
Should I stay on Java or their suggestion (switch to C++) will be helpful in future on large database/data-structure environment ? Any suggestion ?
Sorry, I have very few knowledge of all those and just started to work on a project, so really confused. Because until now I have just build some school project, have no idea about relatively large project.

why C/C++ is always preferable on large database/data-structure over
Java ? Because, C may be, but C++ is also an OOP. So, how it get
advantage over Java ?
Remember that a java array (of objects)1 is actually an array of references. For simplicity let's look at a 1D array:
java:
[ref1,ref2,ref3,...,refN]
ref1 -> object1
ref2 -> object2
...
refN -> objectN
c++:
[object1,object2,...,objectN]
The overhead of references is not needed in the array when using the C++ version, the array holds the objects themselves - and not only their references. If the objects are small - this overhead might indeed be significant.
Also, as I already stated in comments - there is another issue when allocating small objects in C++ in arrays vs java. In C++, you allocate an array of objects - and they are contiguous in the memory, while in java - the objects themselves aren't. In some cases, it might cause the C++ to have much better performance, because it is much more cache efficient then the java program. I once addressed this issue in this thread
2) Should I stay on Java or their suggestion (switch to C++) will be
helpful in future on large database/data-structure environment ? Any
suggestion ?
I don't believe we can answer it for you. You should be aware of all pros and cons (memory efficiency, libraries you can use, development time, ...) of each for your purpose and make a decision. Don't be afraid to get advises from seniors developers in your company who have more information about the system then we are.
If there was a simple easy and generic answer to this questions - we engineers were not needed, wouldn't we?
You can also profile your code with the expected array size and a stub algorithm before implementing the core and profile it to see what the real difference is expected to be. (Assuming the array is indeed the expected main space consumer)
1: The overhead I am describing next is not relevant for arrays of primitives. In these cases (primitives) the arrays are arrays of values, and not of references, same as C++, with minor overhead for the array itself (length field, for example).

It sounds like you are in inexperienced programmer in a new job. The chances are that "they" have been in the business a long time, and know (or at least think they know) more about the domain and its programming requirements than you do.
My advice is to just do what they insist that you do. If they want the code in C or C++, just write it in C or C++. If you think you are going to have difficulties because you don't know much C / C++ ... warn them up front. If they still insist, they can wear the responsibility for any problems and delays their insistence causes. Just make sure that you do your best ... and try not to be a "squeaky wheel".
1) They have seen Array [Int-Max] [Int-Max] in Java will take nearly 1.5 times more memory than C and C++ takes some what reasonable memory footprint than Java.
That is feasible, though it depends on what is in the arrays.
Java can represent large arrays of most primitive types using close to optimal amounts of memory.
On the other hand, arrays of objects in Java can take considerably more space than in C / C++. In C++ for example, you would typically allocate a large array using new Foo[largeNumber] so that all of the Foo instances are part of the array instance. In Java, new Foo[largeNumber] is actually equivalent to new Foo*[largeNumber]; i.e. an array of pointers, where each pointer typically refers to a different object / heap node. It is easy to see how this can take a lot more space.
2) C/C++ can handle arbitrarily large file where as Java can't.
There is a hard limit to the number of elements in a single 1-D Java array ... 2^31. (You can work around this limit, but it will make your code more complicated.)
On the other hand if you are talking about simply reading and writing files, Java can handle individual files up to 2^63 bytes ... which is more than you could possibly ever want.
1) why C/C++ is always preferable on large database/data-structure over Java ? Because, C may be, but C++ is also an OOP. So, how it get advantage over Java ?
Because of the hard limit. The limit is part of the JLS and the JVM specification. It is nothing to do with OOP per se.
2) Should I stay on Java or their suggestion (switch to C++) will be helpful in future on large database/data-structure environment ? Any suggestion ?
Go with their suggestion. If you are dealing with in-memory datasets that are that large, then their concerns are valid. And even if their concerns are (hypothetically) a bit overblown it is not a good thing to be battling your superiors / seniors ...

1) They have seen Array [Int-Max] [Int-Max] in Java will take nearly 1.5 times more memory than C and C++ takes some what reasonable memory footprint than Java.
That depends on the situation. If you create an new int[1] or new int[1000] there is almost no difference in Java or C++. If you allocate data on the stack, it has a high relative difference as Java doesn't use the stack for such data.
I would first ensure this is not micro-tuning the application. Its worth remembering that one day of your time is worth (assuming you get minimum wage) is about 2.5 GB. So unless you are saving 2.5 GB per day by doing this, suspect its not worth chasing.
2) C/C++ can handle arbitrarily large file where as Java can't.
I have memory mapped a 8 TB file in a pure Java program, so I have no idea what this is about.
There is a limit where you cannot map more than 2 GB or have more than 2 billion elements in an array. You can work around this by having more than one (e.g. up to 2 billion of those)
As we have to work on such large database/data-structure, C/C++ is always preferable.
I regularly load 200 - 800 GB of data with over 5 billion entries into a single Java process (sometime more than one at a time on the same machine)
1) why C/C++ is always preferable on large database/data-structure over Java ?
There is more experience on how to do this in C/C++ than there is in Java, and their experience of how to do this is only in C/C++.
Because, C may be, but C++ is also an OOP. So, how it get advantage over Java ?
When using large datasets, its more common to use a separate database in the Java world (embedded databases are relatively rare)
Java just calls the same system calls you can in C, so there is no real difference in terms of what you can do.
2) Should I stay on Java or their suggestion (switch to C++) will be helpful in future on large database/data-structure environment ? Any suggestion ?
At the end of the day, they pay you and sometimes technical arguments are not really what matters. ;)

Why is there no sizeof in Java?

For what design reason is there no sizeof operator in Java? Knowing that it is very useful in C++ and C#, how can you get the size of a certain type if needed?

Because the size of primitive types is explicitly mandated by the Java language. There is no variance between JVM implementations.
Moreover, since allocation is done by the new operator depending on its argument there is no need to specify the amount of memory needed.
It would sure be convenient sometimes to know how much memory an object will take so you could estimate things like max heap size requirements but I suppose the Java Language/Platform designers did not think it was a critical aspect.

In c is useful only because you have to manually allocate and free memory. However, since in java there is automatic garbage collection, this is not necessary.

In java, you don't work directly with memory, so sizeof is usually not needed, if you still want to determine the size of an object, check out this question.

Memory management is done by the VM in Java, perhaps this might help you: http://www.javamex.com/java_equivalents/memory_management.shtml

C needed sizeof because the size of ints and longs varied depending on the OS and compiler. Or at least it used to. :-) In Java all sizes, bit configurations (e.g. IEEE 754) are better defined.
EDIT - I see that #Maerics provided a link to the Java specs in his answer.

Size of operator present in c/c++ and c/c++ is machine dependent langauge so different data types might have different size on different machine so programmes need to know how big those data types while performing operation that are sensitive to size.
Eg:one machine might have 32 bit integer while another machine might have 16 bit integer.
But Java is machine independent langauge and all the data types are the same size on all machine so no need to find size of data types it is pre defined in Java.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.