Reading the spss file java - java

SPSSReader reader = new SPSSReader(args[0], null);
Iterator it = reader.getVariables().iterator();
while (it.hasNext())
{
System.out.println(it.next());
}
I am using this SPSSReader to read the spss file. Here,every string is printed with some junk characters appended with it.
Obtained Result :
StringVariable: nameogr(nulltpc{)(10)
NumericVariable: weightppuo(nullf{nd)
DateVariable: datexsgzj(nulllanck)
DateVariable: timeppzb(null|wt{l)
DateVariable: datetimegulj{(null|ns)
NumericVariable: commissionyrqh(nullohzx)
NumericVariable: priceeub{av(nullvlpl)
Expected Result :
StringVariable: name (10)
NumericVariable: weight
DateVariable: date
DateVariable: time
DateVariable: datetime
NumericVariable: commission
NumericVariable: price
Thanks in advance :)

I tried recreating the issue and found the same thing.
Considering that there is a licensing for that library (see here), I would assume that this might be a way of the developers to ensure that a license is bought as the regular download only contains a demo version as evaluation (see licensing before the download).
As that library is rather old (copyright of the website is 2003-2008, requirement for the library is Java 1.2, no generics, Vectors are used, etc), I would recommend a different library as long as you are not limited to the one used in your question.
After a quick search, it turned out that there is an open source spss reader here which is also available through Maven here.
Using the example on the github page, I put this together:
import com.bedatadriven.spss.SpssDataFileReader;
import com.bedatadriven.spss.SpssVariable;
public class SPSSDemo {
public static void main(String[] args) {
try {
SpssDataFileReader reader = new SpssDataFileReader(args[0]);
for (SpssVariable var : reader.getVariables()) {
System.out.println(var.getVariableName());
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
I wasn't able to find stuff that would print NumericVariable or similar things but as those were the classnames of the library you were using in the question, I will assume that those are not SPSS standardized. If they are, you will either find something like that in the library or you can open an issue on the github page.
Using the employees.sav file from here I got this output from the code above using the open source library:
resp_id
gender
first_name
last_name
date_of_birth
education_type
education_years
job_type
experience_years
monthly_income
job_satisfaction
No additional characters no more!
Edit regarding the comment:
That is correct. I read through some SPSS stuff though and from my understanding there are only string and numeric variables which are then formatted in different ways. The version published in maven only gives you access to the typecode of a variable (to be honest, no idea what that is) but the github version (that does not appear to be published on maven as 1.3-SNAPSHOT unfortunately) does after write- and printformat have been introduced.
You can clone or download the library and run mvn clean package (assuming you have maven installed) and use the generated library (found under target\spss-reader-1.3-SNAPSHOT.jar) in your project to have the methods SpssVariable#getPrintFormat and SpssVariable#getWriteFormat available.
Those return an SpssVariableFormat which you can get more information from. As I have no clue what all that is about, the best I can do is to link you to the source here where references to the stuff that was implemented there should help you further (I assume that this link referenced to in the documentation of SpssVariableFormat#getType is probably the most helpful to determine what kind of format you have there.
If absolutely NOTHING works with that, I guess you could use the demo version of the library in the question to determine the stuff through it.next().getClass().getSimpleName() as well but I would resort to that only if there is no other way to determining the format.

I am not sure, but looking at your code, it.next() is returning a Variable object.
There has to be some method to be chained to the Variable object, something like it.next().getLabel() or it.next().getVariableName(). toString() on an Object is not always meaningful. Check toString() method of Variable class in SPSSReader library.

Related

recognize parameter change from git repository

I want to extract signature changes (method parameter changes to be exact) from commits to git repository by a java program. I have used the following code:
for (Ref branch : branches) {
String branchName = branch.getName();
for (RevCommit commit : commits) {
boolean foundInThisBranch = false;
RevCommit targetCommit = walk.parseCommit(repo.resolve(
commit.getName()));
for (Map.Entry<String, Ref> e : repo.getAllRefs().entrySet()) {
if (e.getKey().startsWith(Constants.R_HEADS)) {
if (walk.isMergedInto(targetCommit, walk.parseCommit(
e.getValue().getObjectId()))) {
String foundInBranch = e.getValue().getName();
if (branchName.equals(foundInBranch)) {
foundInThisBranch = true;
break;
}
}
}
}
I can extract commit message, commit data and Author name from that, however, I am not able to extract parameter changes from them. I mean it is unable for me to identify parameter changes. I want to know if there is any way to recognize that. I mean it is impossible to recognize them from commit notes that are generated by programmers; I am looking for something like any specific annotation or something else.
This is my code to extract differences:
CanonicalTreeParser oldTreeIter = new CanonicalTreeParser();
oldTreeIter.reset(reader, oldId);
CanonicalTreeParser newTreeIter = new CanonicalTreeParser();
newTreeIter.reset(reader, headId);
List<DiffEntry> diffs= git.diff()
.setNewTree(newTreeIter)
.setOldTree(oldTreeIter)
.call();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DiffFormatter df = new DiffFormatter(out);
df.setRepository(git.getRepository());
The export is really huge and impossible to extract method changes.
You show a way you've found to examine the diffs, but say that the output is too large and you can't extract the method signature changes. If by that you mean that you're asking about specific git support for telling you that a method signature changes, then no - no such support exists. This is because git does not "know" anything about the languages you may or may not have used in the files under source control. Everything is just content that is, or is not, different from other content.
Since a method signature could be split across lines in any number of ways, it's not even guaranteed that just because a method's signature changed its name would appear anywhere in the diff. What you would really have to do is perform a sort of "structural diff". That is, you would have to
check out the "old" version, and pass it to a java parser
check out the "new" version, and pass it to a java parser
compare the resulting parse trees, looking for methods that belong to the same object, but have changed
Even that won't be terribly easy, because methods could be renamed, and because method overloading could make it unclear which signature change goes with which version of a method.
From there what you have is a non-trivial coding problem, which is beyond the scope of SO to answer. If you decide to tackle this problem and run into specific programming questions along the way, of course you could post those questions and perhaps someone will be able to help.

How to know the Java interfaces an OpenOffice Calc UNO object supports (through queryInterface)

I'm developing a "macro" for OpenOffice Calc. As the language, I chose Java, in order to get code assistance in Eclipse. I even wrote a small ant build script that compiles and embeds the "macro" in an *.ods file. In general, this works fine and surprisingly fast; I'm already using some simple stuff quite successfully.
BUT
So often I get stuck because with UNO, I need to "query" an interface for any given non-trivial object, to be able to access data / call methods of that object. I.e., I literally need to guess which interfaces a given object may provide. This is not at all obvious and not even visible during Java development (through some sort of meta-information, reflection or the like), and also sparsely documented (I downloaded tons of stuff, but I don't find the source or maybe JavaDoc for the interfaces I'm using, like XButton, XPropertySet, etc. - XButton has setLabel, but not getLabel - what??).
There is online documentation (for the most fundamental concepts, which is not bad at all!), but it lacks many details that I'm faced with. It always magically stops exactly at the point I need to solve.
I'm willing to look at the C++ code to get a clue what interfaces an object (e.g. the button / event I'm currently stuck with) may provide. Confusingly, the C++ class and file names don't exactly match the Java interfaces. It's almost what I'm looking for, but then in Java I don't really find the equivalent, or calling queryInterface on a given object returns null.. It's becoming a bit frustrating.
How are the UNO Java interfaces generated? Is there some kind of documentation in the code that serves as the origin for the generated (Java) code?
I think I really need to know what interfaces are available at which point, in order to become a bit more fluent during Java-UNO-macro development.
For any serious UNO project, use an introspection tool.
As an example, I created a button in Calc, then used the Java Object Inspector to browse to the button.
Right-clicking and choosing "Add to Source Code" generated the following.
import com.sun.star.awt.XControlModel;
import com.sun.star.beans.XPropertySet;
import com.sun.star.container.XIndexAccess;
import com.sun.star.container.XNameAccess;
import com.sun.star.drawing.XControlShape;
import com.sun.star.drawing.XDrawPage;
import com.sun.star.drawing.XDrawPageSupplier;
import com.sun.star.sheet.XSpreadsheetDocument;
import com.sun.star.sheet.XSpreadsheets;
import com.sun.star.uno.AnyConverter;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XInterface;
//...
public void codesnippet(XInterface _oUnoEntryObject){
try{
XSpreadsheetDocument xSpreadsheetDocument = (XSpreadsheetDocument) UnoRuntime.queryInterface(XSpreadsheetDocument.class, _oUnoEntryObject);
XSpreadsheets xSpreadsheets = xSpreadsheetDocument.getSheets();
XNameAccess xNameAccess = (XNameAccess) UnoRuntime.queryInterface(XNameAccess.class, xSpreadsheets);
Object oName = xNameAccess.getByName("Sheet1");
XDrawPageSupplier xDrawPageSupplier = (XDrawPageSupplier) UnoRuntime.queryInterface(XDrawPageSupplier.class, oName);
XDrawPage xDrawPage = xDrawPageSupplier.getDrawPage();
XIndexAccess xIndexAccess = (XIndexAccess) UnoRuntime.queryInterface(XIndexAccess.class, xDrawPage);
Object oIndex = xIndexAccess.getByIndex(0);
XControlShape xControlShape = (XControlShape) UnoRuntime.queryInterface(XControlShape.class, oIndex);
XControlModel xControlModel = xControlShape.getControl();
XPropertySet xPropertySet = (XPropertySet) UnoRuntime.queryInterface(XPropertySet.class, xControlModel);
String sLabel = AnyConverter.toString(xPropertySet.getPropertyValue("Label"));
}catch (com.sun.star.beans.UnknownPropertyException e){
e.printStackTrace(System.out);
//Enter your Code here...
}catch (com.sun.star.lang.WrappedTargetException e2){
e2.printStackTrace(System.out);
//Enter your Code here...
}catch (com.sun.star.lang.IllegalArgumentException e3){
e3.printStackTrace(System.out);
//Enter your Code here...
}
}
//...
Python-UNO may be better than Java because it does not require querying specific interfaces. Also XrayTool and MRI are easier to use than the Java Object Inspector.

How is obfuscation done in Java?

Today I came across an obfuscated class (well a lot of obfuscated classes in a jar) and I do not have a clue on how this kind of obfuscation is done.
An example:
protected void a(ChannelHandlerContext ☃, ByteBuf ☃, ByteBuf ☃)
throws Exception
{
int ☃ = ☃.readableBytes();
if (☃ < this.c)
{
☃.b(0);
☃.writeBytes(☃);
}
else
{
byte[] ☃ = new byte[☃];
☃.readBytes(☃);
☃.b(☃.length);
this.b.setInput(☃, 0, ☃);
this.b.finish();
while (!this.b.finished())
{
int ☃ = this.b.deflate(this.a);
☃.writeBytes(this.a, 0, ☃);
}
this.b.reset();
}
}
}
As you see above, all the parameter variables are a snow-man. How can this be undone? Also how is it done in the first place; how is the JVM able to "process" those and execute the code without any problem?
To clarify, I am not going to use this code, it is just for educational purposes. I am taking the Computer Science course at school so since we are learning Java and talking of limitations such as decompilations. I am interested in learning more, so I decided to have a look into bigger projects especially servers. This piece of code is pulled out of the Spigot server for Minecraft (A game) that is a fork of Bukkit server for Minecraft that was supposed to be open source.
First of all, you should note that it is the parameters which have this unicode and not the methods. Why is this important?
Parameters do not need to have names specified, as they are mostly indexed by a number reference. However it can be specified and I assume that most java runtimes do in fact not check this name as it is not needed for execution.
In the opposite, class names, method names, and field names are however needed.
About you mentioning Spigot, Spigot is indeed open source. However you most likely decompiled a class which is originally from the original Mojang Minecraft server, which is not open source and is indeed obfuscated.
Edit: In the case you want to investigate these classes, I recently found a tool called Bytecode Viewer, which is available at https://github.com/Konloch/bytecode-viewer
This tool has multiple decompilers as well as some options to view a more bytecode like version of the class file.
An example of a function I found contains the following bytecode data:
<localVar:index=1 , name=☃ , desc=D, sig=null, start=L1, end=L2>
<localVar:index=3 , name=☃ , desc=D, sig=null, start=L1, end=L2>
<localVar:index=5 , name=☃ , desc=D, sig=null, start=L1, end=L2>
Indeed as is visible, the unicode name has been set the same, but it does not matter as in the end the indexes (1,3,5) are used to reference these variables.
protected void a(ChannelHandlerContext ☃, ByteBuf ☃, ByteBuf ☃)
This isn't valid. You cannot have multiple parameters with the same name. It could be that you are not reading the unicode text with the right text format.
Your Text editor is showing the value of the unicode character.
I just tested on eclipse and names with unicode characters are acceptable.
public String publicationXmlUrl(int \u9090currentPage) {
But writing with values are not:
public String publicationXmlUrl(int ♥currentPage) {

Is there a decent, customisable, HTML to Markdown Java API?

I want to save text I scrape from various sources without the HTML tags that are on it, but also keeping as much of the structure as I reasonably can.
Markdown seems to be the solution to this (or possibly MultiMarkdown).
There is a question which offers a suggestion on converting from HTML to markdown, but I want to specify some specific things:
ALL links (including images) are referenced at the END only (i.e. no inline urls)
NO embeded HTML (I'm not even 100% sure yet how I'd like to deal with difficult HTML... but it won't be embeded!)
So my question is as stated in the title: Is there a decent, customisable, HTML to Markdown Java API?
You could try adapting HtmlCleaner which provides a workable interface onto the DOM:
TagNode root = htmlCleaner.clean( stream );
Object[] found = root.evaluateXPath( "//div[id='something']" );
if( found.length > 0 && found instanceof TagNode ) {
((TagNode)found[0]).removeFromTree();
}
This would allow you to structure your output stream in any format that you want using a fairly simple API.
There is a great library for JS called Turndown, you can try it online here. It can be partially customized. For example, links can be referenced at the end. And as far as I know there is no embedded html, everything is transformed.
I needed it for Java (as the linked question), so I ported it. The library for Java is called CopyDown, it has the same test suite as Turndown.
To install with gradle:
dependencies {
compile 'io.github.furstenheim:copy_down:1.0'
}
Then to use it:
CopyDown converter = new CopyDown();
String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>";
String markdown = converter.convert(myHtml);
System.out.println(markdown);
> Some title\n==========\n\nSome html\n\nAnother paragraph\n

How to designate resources as do-not-translate?

I work on the localization of Java software, and my projects have both .properties files and XML resources. We currently use comments to instruct translators to not translate certain strings, but the problem with comments is that they are not machine-readable.
The only solution I can think of is to prefix each do-not-translate key with something like _DNT_ and train our translation tools to ignore these entries. Does anyone out there have a better idea?
Could you break the files up into ones to be translated or ones to be not translated and then only send them the one that are to be translated? (Don't know the structure so har dto know when answering if that is practical...)
The Eclipse JDT also uses comments to prevent the translation of certain Strings:
How to write Eclipse plug-ins for the international market
I think your translation tool should work in a similar way?
The simplest solution is to not put do-not-translate strings (DNTs) in your resource files.
.properties files don't offer much in the way of metadata handling, and since you don't need the data at runtime, its presence in .properties files would be a side-effect rather than something that is desirable. Consider too, partial DNTs where you have something that cannot be translated contained in a translatable string (e.g. a brand name or URI).
"IDENTIFIER english en en en" -> "french fr IDENTIFIER fr fr"
As far as I am aware, even standards like XLIFF do not take DNTs into consideration and you'll have to manage them through custom metadata files, terminology files and/or comments (such as the note element in XLIFF).
Like axelclk posted in his link... eclipse provide a
//$NON-NLS-1$
Statement to notify the project that the first string in this line should not translated. All other string you can find by calling
Source->Externalize Strings
External Strings include all languages you want to support.
File which include the translations looking like:
PluginPage.Error1 = text1
PluginPage.Error2 = text2
Class which read the translation
private static final String BUNDLE_NAME = "com.plugin.name"; //$NON-NLS-1$
private static final ResourceBundle RESOURCE_BUNDLE = ResourceBundle.getBundle(BUNDLE_NAME);
private PluginMessages() {
}
public static String getString(String key) {
// TODO Auto-generated method stub
try {
return RESOURCE_BUNDLE.getString(key);
} catch (MissingResourceException e) {
return '!' + key + '!';
}
}
And you can call it like:
String msg = PluginMessages.getString("PluginPage.Error2"); //$NON-NLS-1$
EDIT:
When a string is externalized and you want to use the original string, you can delete the externalize string from all properties files, without the default one. When the Bundle can not find a message file which is matching to the local language, the default is used.
But this is not working at runtime.
If you do decide to use do-not-translate comments in your properties files, I would recommend you follow the Eclipse convention. It's nothing special, but life will be easier if we all use the same magic string!
(Eclipse doesn't actually support DO-NOT-TRANSLATE comments yet, as far as I know, but Tennera Ant-Gettext has an implementation of the above scheme which is used when converting from resource bundles to Gettext PO files.)

Categories

Resources