Reading VC++ CArchive Binary Format (or Java reading (CObArray))

Reading VC++ CArchive Binary Format (or Java reading (CObArray)) - java

Is there any clear documentation on the binary formats used to serialize the various MFC data structures? I've been able to view some of my own classes in a hex editor and use Java's ByteBuffer class to read them in (with automatic endianness conversions, etc).
However, I am currently running into issues while trying to bring over the CObArray data, as there seems to be a rather large header that is opaque to me, and it is unclear how it is persisting object type information.
Is there a set of online documentation that would be helpful for this? Or some sample Java code from someone that has dealt with this in the past?

Since MFC ships with source code I would create a test MFC application that serializes a CObArray and step through the serialization code. This should give you all the information you need.

I agree with jmatthias: use the MFC source code.
There's also this page on MSDN that may be useful.

Related

PDF document types explanation (such as PDF/A-1)

I am working on software to store legal documents and I was thinking that PDF might be an ideal format to work in. However I am a little confused as to what would best suit my needs in this regard in the actual format of the PDF file.
I have the following requirements for the documents:
will be stored for a minimum of 7 years if not longer
not editable
contain both images and text (images will be in .jpg format ideally)
I was originally looking at using PDF/A-1 however I have discovered that this format does not seem to like using JPEG images, or at least it doesn't when using JODConverter.
Any suggestions/explanations as to which format would best meet these needs would be much appreciated!

For the requirements you described, PDF/A-1b (yes, b at the end!) is the ideal format. The b is for basic -- it has less strict requirements to meet than the PDF/A-1a (a at the end), which is for accessible (or advanced, as I mnemonic it).
If you have no difficulty implementing PDF/A-1a, you may as well go for it. However, depending on your source documents, PDF/A-1a may be extremely difficult and nearly impossible to generate (as it requires the additional tagging of the file's content for the accessibility features).
As for JPEG: of course PDF/A-1b supports JPEGs. It does not allow JPEG2000 compression to be used, because that algorithm was patent encumbered at the time of defining the PDF/A-1b standard. PDF/A-1b generating software therefor must re-compress objects using this type of compression with one of the other methods (which does not pose a big practical problem though.)
You may also want to look at the The PDF/A Competence Center (PDFA) website. (Disclosure: I'm a member of the PDFA.)

PDF/A-1 is a good format for long-term storage (as that's it's intention) and so it tries to remove external dependencies. This includes some things like embedding fonts and DISABLING external hyperlinks (which makes sense also, but can be a gotcha). Some useful info is on the Adobe site (look at the key-specifications tab). PDF sounds like the right answer to your requirements.
The images being embedded should not be a problem. JODReports perhaps is doing something wrong (or the version of OpenOffice/LibreOffice you are using underneath). You could try switching parts of that underlying infrastructure (OO/LO), try experimenting directly from OpenOffice/LibreOffice GUI - export PDF/A-1 and see what the results are or try some other tools in the chain (eg Docmosis though that is based on similar technology).

Java Messenger : save message archives on the computer

I am doing a Java Messenger for people to chat and I an looking for a way to record the message archives on the user's computer.
I have 2 possibilities in my mind :
To Save the conversations in XML files that I store in my documents folder.
To use SQlite, but the problem is that I don't know how it is possible to integrate it to my setup package and I don't know if it is very useful.
What would be the best solution for you ?
Thank you

Another option is using JavaDb, which comes for free with Java 6 (and later versions)
Before you make a choice, you should think about questions such as:
presumably you want this transparent to the user (i.e. no admin involved)
is performance an issue ?
what happens if the storage schema needs migration
do you need transactionality (unlikely, I suspect)
etc. It's quite possible that even a simple text file would suffice. Perhaps your best bet is to choose a simple solution (e.g. a text file) and implement that, and see how far it takes you. However, provide a suitable persistence level abstraction such that you can slot in a different solution in the future with minimal disruption.

I would go for the XML files as they are more generic and could be opened outside your messenger with more or less human readable format. I use Pidgin for instant messaging and it saves chat history in XML. Also to read the history from your application you can transform then easily in HTML to display it nicely.

If you use JAXB, converting Java objects to/from XML is very easy. You just put a few annotations on your classes, and run them through a JAXB marshaller/unmarshaller. See http://docs.oracle.com/javaee/5/tutorial/doc/bnbay.html

Use google's protocolbuffer or 10gen's bson. they are much smaller and faster.
http://code.google.com/apis/protocolbuffers/docs/javatutorial.html
http://bsonspec.org/
One issue is these are in the binary presentation and you might want to make the archive transparent/readable to users

Combining multiple Java classes with ASM at runtime

I'd like to merge several java classes into one. I've read ASM documentation and this http://www.jroller.com/eu/entry/merging_class_methods_with_asm but I can't understand how I can achieve my goal.
Are there more detailed examples about this?
Thanks

from java 1.5 there is a feature called instrumentation which enable you to manipulate byte code of a program during runtime. In addition, you can also manipulate byte code while the class loader loads specific class to the JVM memory. the ASM framework provide you tools to manipulate byte code easily by converting byte code assembler to something readable and adding some utilities to simplify your work. notice that manipulating byte code is very advance technique and you really need to know about the JVM and byte code before doing it.
I personally suggest you will do the ideas that appear above. but if you still presist doing it i suggest you read about instrumentation here: http://www.javalobby.org/java/forums/t19309.html
and than deep dive to ASM or javaassist framework: http://sleeplessinslc.blogspot.co.il/2008/07/java-instrumentation.html
i think java assist is easier to i suggest working with that.
i hoped i help

Access SPSS data from a Python, Java (groovy/grails)' or C++ app without license for SPSS?

I am finding mixed results googling. I have a need to parse a SPSS .sav file to discover the data layout and extract the survey results. Step one is to read the "schema" of the data. For example I need to know the question and its type of allowed responses. I plan to model this data in my own SQL table so I can slice and dice it per my apps requirements. Step two is populate my data model with the respondents answers. Looking at the SPSS sav file I believe it has both types of data I am looking for.
I don't need or want the expensive SPSS software if I don't strictly require it. We will not be doing statistics on this data, just selecting subsets of respondents based on answer filters. The SPSS file will be provided by a partner company that licenses SPSS. I do not need to out any data back into SPSS; my use case is read-only.
I can use Python, Java with or without Groovy, C/C++ for my parser program. This program will be run once at the end of data collection so performance is not particularly important. Ideally I'd like my code to be cross platform so I can develop on my Mac and deploy to Linux, but I can use windows if I must,
A lot of what am finding is either java classes from 2004 or modern Python code that requires a DLL from IBM and is windows specific. Based on my quick explanation of requirements, I would appreciate recommendations from the SO community. I think my needs are simple, but haven't found exactly what I had hoped. An open source lib would be ideal, but I'd even pay for a simple commercial solution at a reasonable price.

You can get the SPSS i/o modules with detailed documentation for free in order to build your own app to read (or write) sav files. The modules are available for all platforms supported by SPSS Statistics.
Go to the SPSS Community site at http://www.ibm.com/developerworks/spssdevcentral and follow the links for SPSS Downloads. You have to register, but that is free.
The SAV file is a binary format with a number of complex structures, so it is better to use the i/o modules. And if new features are added to the SAV file, which has often happened, the i/o modules are updated at the same time, so your code won't go out of date.
HTH,
Jon Peck

GNU PSPP can apparently read SPSS data files. I also found a link to a description of the format in the PSPP source, although it comes with a warning "don't try to read/write this format directly."

There is a java library here:
http://sourceforge.net/projects/spss-writer/

File format conversion library

Are there any well known solutions that meet/exceed below requirements?
conversion from multiple non-graphical document formats to and from HTML (e.g. doc<->HTML, pdf<->html, odt<->html, etc.)
command line or API (Java API is preferable)
cross-platform
commercial or open source

OpenOffice has a rich API that supports conversion between the various supported formats. Check out this question. It recommends using JODConverter.

With DocBook you can export to various output formats, but reverting is always hard. For pdf you can try iText

I (having written an all in one Tex/LaTeX -> HTML and ASCII text and RTF convertor),
would say this would be quite an undertaking.
The problem with this, is these various 'document' formats are intended for rather different purposes.
And while there are indeed such conversion tools between some of these formats,
there is often a conceptual disparity in the structure, meaning and implementation of 'document'
and it is very often is necessary to trade off on features supported by one format to hack together
an acceptable output in another.
For example, PDF is very strong in presentation, precise positioning and support for fonts, where
as HTML is more concerned about structure with practically no considuration for these things
(without CSS).
I am curious how do you envision such an API being used,
when usually someone simply wants a conversion program?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.