I use a shared repository partly containing Java and Python code. The code basis mainly stands on python, but some libraries are written in Java.
Is there a possibility to parse or preprocess Java documentation in order to use
it later in Python-Sphinx or even a plugin?
javasphinx (Github) (Documentation)
It took me way to long to find all the important details to set this up, so here's a brief for all my trouble.
Installation
# Recommend working in virtual environments with latest pip:
mkdir docs; cd docs
python3 -m venv env
source ./env/bin/activate
pip install --upgrade pip
# Recommend installing from source:
pip install git+https://github.com/bronto/javasphinx.git
The pypi version seemed to have broken imports, these issues did not seem to exist in the latest checkout.
Setup & Configuration
Assuming you've got a working sphinx setup already:
Important: add the java "domain" to sphinx, this is embedded in the javasphinx package and does not follow the common .ext. extension-namespace format. (This is the detail I missed for hours):
# docs/sources/conf.py
extensions = ['javasphinx']
Optional: If you want external javadoc linking:
# docs/sources/conf.py
javadoc_url_map = {
'<namespace_here>' : ('<base_url_here>', 'javadoc'),
}
Generating Documentation
The javasphinx package adds the shell tool javasphinx-apidoc, if your current environment is active you can call it as just javasphinx-apidoc, or use its full path: ./env/bin/javasphinx-apidoc:
$ javasphinx-apidoc -o docs/source/ --title='<name_here>' ../path/to/java_dirtoscan
This tool takes arguments nearly identical to sphinx-apidoc:
$ javasphinx-apidoc --help
Usage: javasphinx-apidoc [options] -o <output_path> <input_path> [exclude_paths, ...]
Options:
-h, --help show this help message and exit
-o DESTDIR, --output-dir=DESTDIR
Directory to place all output
-f, --force Overwrite all files
-c CACHE_DIR, --cache-dir=CACHE_DIR
Directory to stored cachable output
-u, --update Overwrite new and changed files
-T, --no-toc Don't create a table of contents file
-t TOC_TITLE, --title=TOC_TITLE
Title to use on table of contents
--no-member-headers Don't generate headers for class members
-s SUFFIX, --suffix=SUFFIX
file suffix (default: rst)
-I INCLUDES, --include=INCLUDES
Additional input paths to scan
-p PARSER_LIB, --parser=PARSER_LIB
Beautiful Soup---html parser library option.
-v, --verbose verbose output
Include Generated Docs in Index
In the output directory of the javasphinx-apidoc command there will have been a packages.rst table-of-contents file generated, you will likely want to include this into your index.html's table of contents like:
#docs/sources/index.rst
Contents:
.. toctree::
:maxdepth: 2
packages
Compile Documentation (html)
With either your python environment active or your path modified:
$ cd docs
$ make html
or
$ PATH=$PATH:./env/bin/ make html
The javadoc command allows you to write and use your own doclet classes to generate documentation in whatever form you choose. The output doesn't need to be directly human-readable ... so there's nothing stopping you outputting in a Sphinx compatible format.
However, I couldn't find any existing doclet that does this specific job.
References:
Oracle's Doclet Overview
UPDATE
The javasphinx extension may be a better alternative. It allows you to generate Sphinx documentation from javadoc comments embedded in Java source code.
Sphinx does not provide a built-in way to parse JavaDoc, and I do not know of any 3rd party extension for this task.
You'll likely have to write your own documenter for the Sphinx autodoc extension. There are different approaches you may follow:
Parse JavaDoc manually. I do not think that there is a JavaDoc pParser for Python, though.
Use Doxygen to parse JavaDoc into XML, and parse that XML. The Sphinx extension breathe does this, though for C++.
Write a Doclet for Java to turn JavaDoc into whatever output format you can hande, and parse this output.
Related
it's a good doc for people who first use java
but meet about command line i have a few questions
https://lift.cs.princeton.edu/java/windows/
His terminal code like this:
~/Desktop/hello> ls
Barnsley.java COS 126.iml WELCOME.txt logo.png
~/Desktop/hello> javac-introcs Barnsley.java
~/Desktop/hello> java-introcs Barnsley 10000
but in my idea,it uses cmd.exe so ls should be replaced dir i know
but when i type javac-introcs Barnsley.java
it tells me
'java-introcs' is not an internal or external command, nor is it a runnable program
Or a batch file.
enter image description here
The output you observed means that you haven't completed the installation of some class-specific programs. The installer for this is mentioned in Section 0 of the link you provided. The most likely cause is that there were some environmental variables that were not modified correctly to add the java-introcs executable or alias to %PATH%. However, you are probably able to replicate the intent of java-introcs, as described below.
To figure out how to make the given Barnsley file compile and run, you'll need to add the dependency StdDraw.class to the classpath when running the file. To do this, you can use the java and javac option -classpath or -cp. You can read more detailed documentation on how to do this here.
While generating the server stub, in my case spring type, I can specify the packages for api, model and invoker. And it works without problems.
I need also change the package for generated supporting files. I'm not able to find the way how to do that. The default package is io.swagger.configuration which needs to be changed in my case.
The Swagger-Codegen support the option when generate code. The detail you can check the official wiki section customizing-the-generator
Let me make an example by using generation to jersey2
java -jar ${your_swagger_codegen_jar} generate \
-i ${your_api_url} \
-l java --library=jersey2 \
-o ${your_output_folder} \
-c ./config.json
The config.json
{
"hideGenerationTimestamp":true,
"groupId":"science.mengxin",
"artifactId":"test-api-client",
"artifactDescription":"test-api-client",
"developerName":"xin.meng",
"developerEmail":"test#test.com",
"developerOrganization":"mengxin",
"developerOrganizationUrl":"mengxin.science",
"licenseName":"mengxin.science",
"invokerPackage":"science.mengxin.test.cloud.client",
"apiPackage":"science.mengxin.test.cloud.client.api",
"modelPackage":"science.mengxin.test.cloud.client.model"
}
Then check pom.xml and source code of generated code, you will find all these parameters has been customised.
The more options to customise the generated code can be found in the swagger-codegen book: English, Chinese
My bash script has the following code generate invocation for "Protocol Buffers".
protoc --proto_path=src --java_out=src
src/com/domain/project/persistentThing.proto
That was from 2012. Today the online documentation for Java Protocol Buffers says:
protoc -I=$SRC_DIR --java_out=$DST_DIR
$SRC_DIR/addressbook.proto
This makes it seem like --proto_path is superceded. What was the purpose of --proto_path?
-I and --proto_path are the same flag: -I is the shorthand version.
The purpose of the flag is to specify a directory in which to look for imported files, much like the C compiler's -I flag, Java's CLASSPATH environment variable, Python's PYTHONPATH, etc. You may specify multiple directories by passing multiple -I flags; they will be searched in order.
I happened to know Tika, very useful in text extraction from word:
curl www.vit.org/downloads/doc/tariff.doc \
| java -jar tika-app-1.3.jar --text
But is there a way to use it to convert the Ms Word file into XML/HTML?
Yes, it involves changing a whooping 4 characters in your command!
If you run java -jar tika-app-1.3.jar --help you'll get something that starts with:
usage: java -jar tika-app.jar [option...] [file|port...]
Options:
-? or --help Print this usage message
-v or --verbose Print debug level messages
-V or --version Print the Apache Tika version number
-g or --gui Start the Apache Tika GUI
-s or --server Start the Apache Tika server
-f or --fork Use Fork Mode for out-of-process extraction
-x or --xml Output XHTML content (default)
-h or --html Output HTML content
-t or --text Output plain text content
-T or --text-main Output plain text content (main content only)
-m or --metadata Output only metadata
.....
From that, you'll see that if you change your --text option to --html or --xml you'll get out nicely formatted XML instead of just the plain text
Despite the fact that this has been answered, since the op tagged the question with the java tag, for completeness I'll add reference to easily see how to do this in java.
The TikaTest.java superclass from Tika's unit tests is the easiest reference to convert word to html using the getXML method. It's a pity that they saw the usefulness of such an API in writing their unit tests, but chose not to expose it as a handy tool, forcing everyone to deal with handlers etc. which is unfortunate boilerplate for the common use case.
What is the recommended way to locate a class in a large group of jar files in different folders (windows)?
currently I use editplus search which seems great . but maybe I am missing a better way?
thanks.
EDIT
I need to find it not programatically.
Try cf.jar
Usage:
java -jar cf.jar SEARCH [DIRECTORY] [OPTIONS]...
Searches all JAR files in the current directory and its sub-directories for
entries matching the SEARCH argument. If DIRECTORY is provided, searching will
be done in that location instead of the current directory.
SEARCH:
A search string containing the class name (entry name). Wild card (*) is
supported. Package separator can be either of `/', `\' or `.'.
Examples:
java.lang.String
java/util/ArrayList
java/lang/Str*B*er
If non ".class" entries also need to be searched, option -a (--all-types)
should be specified. Please see the OPTIONS section for a more detailed
explanation.
DIRECTORY:
If this is not provided, current directory and all its sub-directories will
be used for performing the search. However, if this argument is provided,
the same and its sub-directories will be used as the location to fetch JAR
files from.
If a recursive scan is not needed, option -s (--shallow) can be specified.
OPTIONS:
-h --help
Shows this help
-o [path] --redirect-output
Redirect output to a file path supplied.
-x [x1,x2] --archive-extensions
Extensions in addition to the default ".jar". Comma or space
separated list accepted.
-i --insensitive-case
Case insensitive search.
-q --quiet
Silent search without the progress bar animation.
-a --all-types
Removes the filtering on ".class" types so that other types
such as ".properties", ".xml", etc can also be searched for.
-s --shallow
Performs a shallow search. Doesn't recurse.
Examples:
java -jar cf.jar org/apache/log4j/Level D:\Frameworks
java -jar cf.jar *OracleDriver C:\oracle -x jar,zip
java -jar cf.jar messages.properties D:\IBM\WebSphere -a -x jar,war,ear
java -jar cf.jar util.* D:\Java -i -q
jars=/opt/java/jre/lib/ext/mail.jar:/opt/java/jre/lib/ext/postgresql.jar
for jar in $(echo $jars | sed 's/:/ /g'); do
jar -tf $jar | grep Driver && echo $jar
done
For Windows, replace : with ; in "jars"-Variable and sed command.
sed is part of the gnu-tools - there is a binary win32 port of them, which contains grep, sed, sh.exe and much more useful tools too.
Replace Driver with %1 to make it a parametrizable Script.
I would use online search tools like
http://jarsearch.com
http://jarvana.com
Since you're using IntelliJ you can navigate to the class in question (Go To -> Class...) and then show it in the project view (Alt-F1 -> Project View).
If you already have the jars, I suggest using JD GUI.
Simply open the jars (drag and drop all of them into the UI), then do CTRL+SHIFT+T. Type in the classname, and you get a listing of all jars that the class is located in.
If you know the class name, but not the jar, use the search maven tools. http://search.maven.org/
For commercial clients, I usually use maven. I setup an inhouse nexus repository, and then use the searches from it.
If the project is setup in Eclipse, use CTRL+SHIFT+T and type the classname
If the project is in IntellJ, use CTRL+N and type the class name