How to preserve original document's systemId in XSL chain? - java

I have a bunch of XSLs. One of them happens to use base-uri().
When run directly against a file, it's shows the document's systemId.
When run after another XSL, it shows that XSL's systemId.
Things I don't have control over
XSL contents
Order of XSLs
Has to work with XSLT2 (saxon)
Also, I would prefer a streaming solution. This could be fixed by writing every intermediate result to disk and faking the systemId to that of the original document, but that is highly inefficient.
Here's what I've tried thus far.
public class BadSystemIdDemo {
private static final SAXTransformerFactory XSLT2 =
new net.sf.saxon.TransformerFactoryImpl();
public static void main(String[] args) throws Exception {
Result to = new StreamResult(System.out);
// outputs: "file:///one.xsl"
usingXMLFilter(to);
System.out.println();
// also outputs: "file:///one.xsl"
usingTransformerHandler(to);
System.out.println();
// wanted: "file:///in.xml"
}
private static void usingTransformerHandler(Result to) throws Exception {
TransformerHandler first = XSLT2.newTransformerHandler(Inputs.xsl1());
TransformerHandler second = XSLT2.newTransformerHandler(Inputs.xsl2());
first.setResult(new SAXResult(second));
second.setResult(to);
XSLT2.newTransformer().transform(Inputs.in(), new SAXResult(first));
}
private static void usingXMLFilter(Result to) throws Exception {
XMLReader r = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
XMLFilter first = XSLT2.newXMLFilter(Inputs.xsl1());
XMLFilter second = XSLT2.newXMLFilter(Inputs.xsl2());
first.setParent(r);
second.setParent(first);
XSLT2.newTransformer().transform(Inputs.in(second), to);
}
}
Just examples, the real things are obviously more complicated.
public class Inputs {
private static final String IN_SYSTEM_ID = "file:///in.xml";
private static final String XSL1_SYSTEM_ID = "file:///one.xsl";
private static final String XSL2_SYSTEM_ID = "file:///two.xsl";
static Source in() {
return new StreamSource(new StringReader("<root/>"), IN_SYSTEM_ID);
}
static Source in(XMLReader using) {
return new SAXSource(using, SAXSource.sourceToInputSource(in()));
}
static Source xsl1() {
String contents = ""
+ "<xsl:stylesheet version=\"2.0\""
+ " xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">"
+ " <xsl:template match=\"#*|node()\">"
+ " <xsl:copy>"
+ " <xsl:apply-templates select=\"#*|node()\"/>"
+ " </xsl:copy>"
+ " </xsl:template>"
+ "</xsl:stylesheet>";
return new StreamSource(new StringReader(contents), XSL1_SYSTEM_ID);
}
static Source xsl2() {
String contents = ""
+ "<xsl:stylesheet version=\"2.0\""
+ " xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">"
+ " <xsl:template match=\"*\">"
+ " <xsl:value-of select=\"base-uri(.)\"/>"
+ " </xsl:template>"
+ "</xsl:stylesheet>";
return new StreamSource(new StringReader(contents), XSL2_SYSTEM_ID);
}
}

My first idea would be to add an xml:base attribute to the tree; that will determine the result of the base-uri() function. But given the constraints you describe, perhaps that's too disruptive.
To be honest, I don't really believe the constraints. If you've got control over the Java code, then you can create a stylesheet which imports xsl2 and overrides the template that calls base-uri(), replacing it with a reference to a stylesheet parameter.
However, if you're prepared to move away from the JAXP interface to Saxon's s9api API, then it can probably be done. To set up a transformation pipeline in s9api you use one XsltTransformer as the Destination for another XsltTransformer, and by calling setBaseUri() on the second XsltTransformer you should affect the result of base-uri() called within that stylesheet.

Managed to get this working by overriding the XMLReader#setDocumentLocator(). This is rather hackish though and will probably break if the input document is using XInclude.
private static void usingTransformerHandler(Result to) throws Exception {
TransformerHandler first = XSLT2.newTransformerHandler(Inputs.xsl1());
TransformerHandler second = XSLT2.newTransformerHandler(Inputs.xsl2());
LocatorFixer fixer = new LocatorFixer();
first.setResult(new SAXResult(fixer.wrap(second)));
second.setResult(to);
XSLT2.newTransformer().transform(Inputs.in(), new SAXResult(fixer.wrap(first)));
}
private static void usingXMLFilter(Result to) throws Exception {
XMLReader r = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
XMLFilter first = XSLT2.newXMLFilter(Inputs.xsl1());
XMLFilter second = XSLT2.newXMLFilter(Inputs.xsl2());
LocatorFixer fixer = new LocatorFixer();
first.setParent(fixer.wrap(r));
second.setParent(fixer.wrap(first));
XSLT2.newTransformer().transform(Inputs.in(second), to);
}
Helper
class LocatorFixer {
private Locator copied;
XMLFilterImpl wrap(XMLReader delegate) {
return new XMLFilterImpl(delegate) {
#Override
public void setDocumentLocator(Locator real) {
if (copied != null) {
super.setDocumentLocator(copied);
} else {
copied = new LocatorImpl(real);
super.setDocumentLocator(real);
}
}
};
}
ContentHandler wrap(ContentHandler delegate) {
XMLFilterImpl fixed = wrap((XMLReader) null);
fixed.setContentHandler(delegate);
return fixed;
}
}

Related

Find and replace text in MS Access table rows not working

Given a directory, my application traverses and loads .mdb MS Access dbs using the Jackcess API. Inside of each database, there is a table named GCMT_CMT_PROPERTIES with a column named cmt_data containing some text. I also have a Mapper object (which essentially resembles a Map<String,String> but allows duplicate keys) which I use as a dictionary when replacing a certain word from a string.
So for example if mapper contains fox -> dog then the sentence: "The fox jumps" becomes "The dog jumps".
The design I'm going with for this program is as follows:
1. Given a directory, traverse all subdirectories and load all .mdb files into a File[].
2. For each db file in File[], create a Task<Void> called "TaskMdbUpdater" and pass it the db file.
3. Dispatch and run each task as it is created (see 2. above).
TaskMdbUpdater is responsible for locating the appropriate table and column in the db file it was given and iteratively running a "find & replace" routine on each row of the table to detect words from the dictionary and replace them (as shown in example above) and finally updating that row before closing the db. Each instance of TaskMdbUpdater is a background thread with a Jackcess API DatabaseBuilder assigned to it, so it is able to manipulate the db.
In the current state, the code is running without throwing any exceptions whatsoever, however when I "manually" open the db through Access and inspect a given row, it appears to not have changed. I've tried to pin the source of the issue without any luck and would appreciate any support. If you need to see more code, let me know and I'll update my question accordingly.
public class TaskDatabaseTaskDispatcher extends Task<Void> {
private String parentDir;
private String dbFileFormat;
private Mapper mapper;
public TaskDatabaseTaskDispatcher(String parent, String dbFileFormat, Mapper mapper) {
this.parentDir = parent;
this.dbFileFormat = dbFileFormat;
this.mapper = mapper;
}
#Override
protected Void call() throws Exception {
File[] childDirs = getOnlyDirectories(getDirectoryChildFiles(new File(this.parentDir)));
DatabaseBuilder[] dbs = loadDatabasesInParent(childDirs);
Controller.dprint("TaskDatabaseTaskDispatcher", dbs.length + " databases were found in parent directory");
TaskMdbUpdater[] tasks = new TaskMdbUpdater[dbs.length];
Thread[] workers = new Thread[dbs.length];
for(int i=0; i<dbs.length; i++) {
// for each db, dispatch Task so a worker can update that db.
tasks[i] = new TaskMdbUpdater(dbs[i], mapper);
workers[i] = new Thread(tasks[i]);
workers[i].setDaemon(true);
workers[i].start();
}
return null;
}
private DatabaseBuilder[] loadDatabasesInParent(File[] childDirs) throws IOException {
DatabaseBuilder[] dbs = new DatabaseBuilder[childDirs.length];
// Traverse children and load dbs[]
for(int i=0; i<childDirs.length; i++) {
File dbFile = FileUtils.getFileInDirectory(
childDirs[i].getCanonicalFile(),
childDirs[i].getName() + this.dbFileFormat);
dbs[i] = new DatabaseBuilder(dbFile);
}
return dbs;
}
}
// StringUtils class, utility methods
public class StringUtils {
public static String findAndReplace(String str, Mapper mapper) {
String updatedStr = str;
for(int i=0; i<mapper.getMappings().size(); i++) {
updatedStr = updatedStr.replaceAll(mapper.getMappings().get(i).getKey(), mapper.getMappings().get(i).getValue());
}
return updatedStr;
}
}
// FileUtils class, utility methods:
public class FileUtils {
/**
* Returns only directories in given File[].
* #param list
* #return
*/
public static File[] getOnlyDirectories(File[] list) throws IOException, NullPointerException {
List<File> filteredList = new ArrayList<>();
for(int i=0; i<list.length; i++) {
if(list[i].isDirectory()) {
filteredList.add(list[i]);
}
}
File[] correctSizeFilteredList = new File[filteredList.size()];
for(int i=0; i<filteredList.size(); i++) {
correctSizeFilteredList[i] = filteredList.get(i);
}
return correctSizeFilteredList;
}
/**
* Returns a File[] containing all children under specified parent file.
* #param parent
* #return
*/
public static File[] getDirectoryChildFiles(File parent) {
return parent.listFiles();
}
}
public class Mapper {
private List<aMap> mappings;
public Mapper(List<aMap> mappings) {
this.mappings = mappings;
}
/**
* Returns mapping dictionary, typically used for extracting individual mappings.
* #return List of type aMap
*/
public List<aMap> getMappings() {
return mappings;
}
public void setMappings(List<aMap> mappings) {
this.mappings = mappings;
}
}
/**
* Represents a single String based K -> V mapping.
*/
public class aMap {
private String[] mapping; // [0] - key, [1] - value
public aMap(String[] mapping) {
this.mapping = mapping;
}
public String getKey() {
return mapping[0];
}
public String getValue() {
return mapping[1];
}
public String[] getMapping() {
return mapping;
}
public void setMapping(String[] mapping) {
this.mapping = mapping;
}
}
Update 1:
To verify my custom StringUtils.findAndReplace logic, I've performed the following unit test (in JUnit) which is passing:
#Test
public void simpleReplacementTest() {
// Construct a test mapper/dictionary
List<aMap> aMaps = new ArrayList<aMap>();
aMaps.add(new aMap(new String[] {"fox", "dog"})); // {K, V} = K -> V
Mapper mapper = new Mapper(aMaps);
// Perform replacement
String corpus = "The fox jumps";
String updatedCorpus = StringUtils.findAndReplace(corpus, mapper);
assertEquals("The dog jumps", updatedCorpus);
}
I'm including my TaskMdbUpdater class here separately with some logging code included, as I suspect point of failure lies somewhere in call:
/**
* Updates a given .mdb database according to specifications defined internally.
* #since 2.2
*/
public class TaskMdbUpdater extends Task<Void> {
private final String TABLE_NAME = "GCMT_CMT_PROPERTIES";
private final String COLUMN_NAME = "cmt_data";
private DatabaseBuilder dbPackage;
private Mapper mapper;
public TaskMdbUpdater(DatabaseBuilder dbPack, Mapper mapper) {
super();
this.dbPackage = dbPack;
this.mapper = mapper;
}
#Override
protected Void call() {
try {
// Controller.dprint("TaskMdbUpdater", "Worker: " + Thread.currentThread().getName() + " running");
// Open db and extract Table
Database db = this.dbPackage
.open();
Logger.debug("Opened database: {}", db.getFile().getName());
Table table = db.getTable(TABLE_NAME);
Logger.debug("Opening table: {}", table.getName());
Iterator<Row> tableRows = table.iterator();
// Controller.dprint("TaskMdbUpdater", "Updating database: " + db.getFile().getName());
int i=0;
try {
while( tableRows.hasNext() ) {
// Row is basically a<code> Map<Column_Name, Value> </code>
Row cRow = tableRows.next();
Logger.trace("Current row: {}", cRow);
// Controller.dprint(Thread.currentThread().getName(), "Database name: " + db.getFile().getName());
// Controller.dprint("TaskMdbUpdater", "existing row: " + cRow.toString());
String str = cRow.getString(COLUMN_NAME);
Logger.trace("Row {} column field contents (before find/replace): {}", i, str);
String newStr = performFindAndReplaceOnString(str);
Logger.trace("Row {} column field contents (after find/replace): {}", i, newStr);
cRow.put(COLUMN_NAME, newStr);
Logger.debug("Updating field in row {}", i);
Row newRow = table.updateRow(cRow); // <code>updateRow</code> returns the new, updated row. Ignoring this.
Logger.debug("Calling updateRow on table with modified row");
// Controller.dprint("TaskMdbUpdater", "new row: " + newRow.toString());
i++;
Logger.trace("i = {}", i);
}
} catch(NoSuchElementException e) {
// e.printStackTrace();
Logger.error("Thread has iterated past number of rows in table", e);
}
Logger.info("Iterated through {} rows in table {}", i, table.getName());
db.close();
Logger.debug("Closing database: {}", db.getFile().getName());
} catch (Exception e) {
// e.printStackTrace();
Logger.error("An error occurred while attempting to update row value", e);
}
return null;
}
/**
* #see javafx.concurrent.Task#failed()
*/
#Override
protected void failed() {
super.failed();
Logger.error("Task failed");
}
#Override
protected void succeeded() {
Logger.debug("Task succeeded");
}
private String performFindAndReplaceOnString(String str) {
// Logger.trace("OLD: [" + str + "]");
String updatedStr = null;
for(int i=0; i<mapper.getMappings().size(); i++) {
// loop through all parameter names in mapper to search for in str.
updatedStr = findAndReplace(str, this.mapper);
}
// Logger.trace("NEW: [" + updatedStr + "]");
return updatedStr;
}
}
Here's a small exerept from my log. As you can see, it doesn't seem to do anything after opening the table which has left me a bit perplexed:
INFO (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Located the following directories under specified MOIS parent which contains an .mdb file:
[01_Parent_All_Safe_Test[ RV_DMS_0041RV_DMS_0001RV_DMS_0003RV_DMS_0005RV_DMS_0007RV_DMS_0012RV_DMS_0013RV_DMS_0014RV_DMS_0016RV_DMS_0017RV_DMS_0018RV_DMS_0020RV_DMS_0023RV_DMS_0025RV_DMS_0028RV_DMS_0029RV_DMS_0031RV_DMS_0033RV_DMS_0034RV_DMS_0035RV_DMS_0036RV_DMS_0038RV_DMS_0039RV_DMS_0040 ]]
...
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Created new task: NAMEMAP.logic.TaskMdbUpdater#4cfe46fe
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Created new worker: Thread[Thread-22,5,main]
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Set worker Thread[Thread-22,5,main] as daemon
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Dispatching worker: Thread[Thread-22,5,main]
...
DEBUG (16-02-2017 17:28:00) [Thread-22] NAMEMAP.logic.TaskMdbUpdater.call(): Opened database: RV_DMS_0023.mdb
DEBUG (16-02-2017 17:28:00) [Thread-22] NAMEMAP.logic.TaskMdbUpdater.call(): Opening table: GCMT_CMT_PROPERTIES
After this point, there isn't any more entries entries in the log and the processor spikes at 100% load, remaining that way until I force kill the application. This could mean the program gets stuck in an infinite while loop - however if that were to be the case then shouldn't there be log entries in the file?
Update 2
Okay I've further narrowed the problem by printing log TRACE into stdio. It seems that my performFindAndReplaceOnString is super inefficient and it never gets past the first row of these dbs because it's just grinding away at the long string. Any suggestions on how I can efficiently perform a string replacement for this use case?

Transferring one object between classes using crawler4j

I am a simple web crawler that is built using the building blocks of crawler4j. I am trying to build a dictionary as my crawler crawls and then pass it to my main (controller) as it builds and parses text. How can I do this since my MyCrawler object isn't created in my main class (uses MyCrawler.class as first parameter)? Also, I am unable to change the controller.start method. I want to be able to use the dictionary created in the crawler after the crawler has finished.
The best way I can think to do it is have controller.start take a predefined and created MyCrawler object, but there is no way to do this, that I can see.
Below is my code. Thank you very much for you help!
Crawler:
public class MyCrawler extends WebCrawler
{
private final static Pattern FILTERS = Pattern.compile(".*(\\.(css|js|gif|jpg|png|mp3|mp3|zip|gz))$");
public ArrayList<String> dictionary = new ArrayList<String>();
#Override public boolean shouldVisit(Page referringPage, WebURL url)
{
String href = url.getURL().toLowerCase();
return !FILTERS.matcher(href).matches()
&& href.startsWith("http://lyle.smu.edu/~fmoore"));
}
#Override public void visit(Page page)
{
String url = page.getWebURL().getURL();
System.out.println("URL: " + url);
if(page.getParseData() instanceof HtmlParseData)
{
HtmlParseData h = (HtmlParseData)page.getParseData();
String text = h.getText();
String[] words = text.split(" ");
for(int i = 0;i < words.length;i++)
{
if(!words[i].equals("") || !words[i].equals(null) || !words[i].equals("\n"))
dictionary.add(words[i]);
}
String html = h.getHtml();
Set<WebURL> links = h.getOutgoingUrls();
System.out.println("Text length: " + text.length());
System.out.println("Html length: " + html.length());
System.out.println("Number of outgoing links: " + links.size());
System.out.println(text);
}
}
}
Controller:
public class Controller
{
public ArrayList<String> dictionary = new ArrayList<String>();
public static void main(String[] args) throws Exception
{
int numberOfCrawlers = 1;
String crawlStorageFolder = "/data/crawl/root";
CrawlConfig c = new CrawlConfig();
c.setCrawlStorageFolder(crawlStorageFolder);
c.setMaxDepthOfCrawling(-1); //Unlimited Depth
c.setMaxPagesToFetch(-1); //Unlimited Pages
c.setPolitenessDelay(200); //Politeness Delay
PageFetcher pf = new PageFetcher(c);
RobotstxtConfig robots = new RobotstxtConfig();
RobotstxtServer rs = new RobotstxtServer(robots, pf);
CrawlController controller = new CrawlController(c, pf, rs);
controller.addSeed("http://lyle.smu.edu/~fmoore");
controller.start(MyCrawler.class, numberOfCrawlers);
controller.shutdown();
controller.waitUntilFinish();
}
}
Let a WebCrawlerFactory create your MyCrawler objects. This should do the trick (at least since version 4.2). However your dictionary should support concurrent access (a simple ArrayList does not!)
// use a factory, instead of supplying the crawler type to pass the dictionary
controller.start(new WebCrawlerFactory<MyCrawler>() {
#Override
public MyCrawler newInstance() throws Exception {
return new MyCrawler(dictionary);
}
}, numberOfCrawlers);

fop render to pdf one page out of many

I have a code which runs apache fop against xml content and xsl markup and gives me the apache Intermediate Format output:
StreamSource contentSource = new StreamSource(xmlContentStream);
StreamSource transformSource = new StreamSource(xslMarkupStream);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
Transformer xslfoTransformer = getTransformer(transformSource);
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
IFDocumentHandler targetHandler = foUserAgent.getRendererFactory().createDocumentHandler(
foUserAgent, MimeConstants.MIME_PDF);
FPSIFSerializer fpsSerializer = new FPSIFSerializer();
fpsSerializer.setContext(new IFContext(foUserAgent));
fpsSerializer.mimicDocumentHandler(targetHandler);
foUserAgent.setDocumentHandlerOverride(fpsSerializer);
Fop fop = fopFactory.newFop("application/X-fop-intermediate-format", foUserAgent, outStream);
DefaultHandler defaultHandler = fop.getDefaultHandler();
Result res = new SAXResult(defaultHandler);
xslfoTransformer.transform(contentSource, res);
Then I use that Intermediate Format file to render pdf and png files out of it.
I'm able to set up my own serilaizer here (FPSIFSerializer()).
I have several pages reports, but I don't need to process all of them. Is there any way to skip some pages or extract them from IntermediateFormat so I will be able e.g. to render only 1st page as png and then 2nd to pdf, etc ?
There
http://svn.apache.org/viewvc/xmlgraphics/fop/branches/archive/fop-1_1/examples/embedding/java/embedding/intermediate/ExampleConcat.java?view=markup
is an example of how to concatenate files via IFConcatenator, so I wonder about the best way to split the multipage file?
Thank_you!
The way I've done it is using custom document handler.
/**
* Custom Apache FOP Intermediate Format document handler which allows page skipping.
* Not thread safe.
*/
public class IFPageFilter extends IFDocumentHandlerProxy {
private static final Logger LOGGER = LoggerFactory.getLogger(IFPageFilter.class);
private int currentPage;
private final int desiredPage;
/**
* #param delegate The real document handler
* #param desiredPage the page you want to render (1-based). Other pages will be skipped.
*/
public IFPageFilter(final IFDocumentHandler delegate, final int desiredPage) {
super(delegate);
this.desiredPage = desiredPage;
}
#Override
public void startPage(final int index, final String name, final String pageMasterName, final Dimension size) throws IFException {
currentPage = index + 1;
if (currentPage == desiredPage) {
super.startPage(index, name, pageMasterName, size);
} else {
// do nothing
LOGGER.debug("Page skipped");
}
}
#Override
public IFPainter startPageContent() throws IFException {
if (currentPage == desiredPage) {
return super.startPageContent();
} else {
return EmptyPainter.getInstance();
}
}
#Override
public void endPageContent() throws IFException {
if (currentPage == desiredPage) {
super.endPageContent();
}
}
}
Then you can attach your handler like that:
final IFDocumentHandler targetHandler = FOP_FACTORY.getRendererFactory().createDocumentHandler(userAgent, mime);
final IFPageFilter documentHandler = new IFPageFilter(targetHandler, page);
final ByteArrayOutputStream mimeOut = new ByteArrayOutputStream(XSL_STREAM_BUFFER_SIZE);
IFUtil.setupFonts(documentHandler);
// Tell the target handler where to write the PDF to
targetHandler.setResult(new StreamResult(mimeOut));
try (final InputStream is = ifStream.toInputStream()) {
final Source src = new StreamSource(is);
new IFParser().parse(src, documentHandler, userAgent);
}
return mimeOut;
and you will get the only page you need in the output stream.
Class EmptyPainter is a dirty hack. It is empty implementation of apache IFPainter, it used here to skip page content and avoid NPE. I'm not happy about it, but that is the only way I was able to make it work.
Please note that I use FOP 1.1, and if you faced with such problems it worth to look at trunk - some of them already solved there. I guess dirty hack with EmptyPainter will not be necessary in trunk.
Please give tips if something could be done better here.
Thanks

Is it possible to create an URL pointing to an in-memory object?

I'm trying to extend my library for integrating Swing and JPA by making JPA config as automatic (and portable) as can be done, and it means programmatically adding <class> elements. (I know it can be done via Hibernate's AnnotationConfiguration or EclipseLInk's ServerSession, but - portability). I'd also like to avoid using Spring just for this single purpose.
I can create a persistence.xml on the fly, and fill it with <class> elements from specified packages (via the Reflections library). The problem starts when I try to feed this persistence.xml to a JPA provider. The only way I can think of is setting up a URLClassLoader, but I can't think of a way what wouldn't make me write the file to the disk somewhere first, for sole ability to obtain a valid URL. Setting up a socket for serving the file via an URL(localhost:xxxx) seems... I don't know, evil?
Does anyone have an idea how I could solve this problem? I know it sounds like a lot of work to avoid using one library, but I'd just like to know if it can be done.
EDIT (a try at being more clear):
Dynamically generated XML is kept in a String object. I don't know how to make it available to a persistence provider. Also, I want to avoid writing the file to disk.
For purpose of my problem, a persistence provider is just a class which scans the classpath for META-INF/persistence.xml. Some implementations can be made to accept dynamic creation of XML, but there is no common interface (especially for a crucial part of the file, the <class> tags).
My idea is to set up a custom ClassLoader - if you have any other I'd be grateful, I'm not set on this one.
The only easily extendable/configurable one I could find was a URLClassLoader. It works on URL objects, and I don't know if I can create one without actually writing XML to disk first.
That's how I'm setting things up, but it's working by writing the persistenceXmlFile = new File("META-INF/persistence.xml") to disk:
Thread.currentThread().setContextClassLoader(
new URLResourceClassLoader(
new URL[] { persistenceXmlFile.toURI().toURL() },
Thread.currentThread().getContextClassLoader()
)
);
URLResourceClassLoader is URLCLassLoader's subclass, which allows for looking up resources as well as classes, by overriding public Enumeration<URL> findResources(String name).
Maybe a bit late (after 4 years), but for others that are looking for a similar solution, you may be able to use the URL factory I created:
public class InMemoryURLFactory {
public static void main(String... args) throws Exception {
URL url = InMemoryURLFactory.getInstance().build("/this/is/a/test.txt", "This is a test!");
byte[] data = IOUtils.toByteArray(url.openConnection().getInputStream());
// Prints out: This is a test!
System.out.println(new String(data));
}
private final Map<URL, byte[]> contents = new WeakHashMap<>();
private final URLStreamHandler handler = new InMemoryStreamHandler();
private static InMemoryURLFactory instance = null;
public static synchronized InMemoryURLFactory getInstance() {
if(instance == null)
instance = new InMemoryURLFactory();
return instance;
}
private InMemoryURLFactory() {
}
public URL build(String path, String data) {
try {
return build(path, data.getBytes("UTF-8"));
} catch (UnsupportedEncodingException ex) {
throw new RuntimeException(ex);
}
}
public URL build(String path, byte[] data) {
try {
URL url = new URL("memory", "", -1, path, handler);
contents.put(url, data);
return url;
} catch (MalformedURLException ex) {
throw new RuntimeException(ex);
}
}
private class InMemoryStreamHandler extends URLStreamHandler {
#Override
protected URLConnection openConnection(URL u) throws IOException {
if(!u.getProtocol().equals("memory")) {
throw new IOException("Cannot handle protocol: " + u.getProtocol());
}
return new URLConnection(u) {
private byte[] data = null;
#Override
public void connect() throws IOException {
initDataIfNeeded();
checkDataAvailability();
// Protected field from superclass
connected = true;
}
#Override
public long getContentLengthLong() {
initDataIfNeeded();
if(data == null)
return 0;
return data.length;
}
#Override
public InputStream getInputStream() throws IOException {
initDataIfNeeded();
checkDataAvailability();
return new ByteArrayInputStream(data);
}
private void initDataIfNeeded() {
if(data == null)
data = contents.get(u);
}
private void checkDataAvailability() throws IOException {
if(data == null)
throw new IOException("In-memory data cannot be found for: " + u.getPath());
}
};
}
}
}
We can use the Jimfs google library for that.
First, we need to add the maven dependency to our project:
<dependency>
<groupId>com.google.jimfs</groupId>
<artifactId>jimfs</artifactId>
<version>1.2</version>
</dependency>
After that, we need to configure our filesystem behavior, and write our String content to the in-memory file, like this:
public static final String INPUT =
"\n"
+ "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+ "<note>\n"
+ " <to>Tove</to>\n"
+ " <from>Jani</from>\n"
+ " <heading>Reminder</heading>\n"
+ " <body>Don't forget me this weekend!</body>\n"
+ "</note>";
#Test
void usingJIMFS() throws IOException {
try (var fs = Jimfs.newFileSystem(Configuration.unix())) {
var path = fs.getPath(UUID.randomUUID().toString());
Files.writeString(path, INPUT);
var url = path.toUri().toURL();
assertThat(url.getProtocol()).isEqualTo("jimfs");
assertThat(Resources.asCharSource(url, UTF_8).read()).isEqualTo(INPUT);
}
}
We can find more examples in the official repository.
If we look inside the jimfs source code we will find the implementation is similar to #NSV answer.

How can I get Axis 1.4 to not generate several prefixes for the same XML namespace?

I am receiving SOAP requests from a client that uses the Axis 1.4 libraries. The requests have the following form:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<PlaceOrderRequest xmlns="http://example.com/schema/order/request">
<order>
<ns1:requestParameter xmlns:ns1="http://example.com/schema/common/request">
<ns1:orderingSystemWithDomain>
<ns1:orderingSystem>Internet</ns1:orderingSystem>
<ns1:domainSign>2</ns1:domainSign>
</ns1:orderingSystemWithDomain>
</ns1:requestParameter>
<ns2:directDeliveryAddress ns2:addressType="0" ns2:index="1"
xmlns:ns2="http://example.com/schema/order/request">
<ns3:address xmlns:ns3="http://example.com/schema/common/request">
<ns4:zipcode xmlns:ns4="http://example.com/schema/common">12345</ns4:zipcode>
<ns5:city xmlns:ns5="http://example.com/schema/common">City</ns5:city>
<ns6:street xmlns:ns6="http://example.com/schema/common">Street</ns6:street>
<ns7:houseNum xmlns:ns7="http://example.com/schema/common">1</ns7:houseNum>
<ns8:country xmlns:ns8="http://example.com/schema/common">XX</ns8:country>
</ns3:address>
[...]
As you can see, several prefixes are defined for the same namespace, e.g. the namespace http://example.com/schema/common has the prefixes ns4, ns5, ns6, ns7 and ns8. Some long requests define several hundred prefixes for the same namespace.
This causes a problem with the Saxon XSLT processor, that I use to transform the requests. Saxon limits the the number of different prefixes for the same namespace to 255 and throws an exception when you define more prefixes.
Can Axis 1.4 be configured to define smarter prefixes, so that there is only one prefix for each namespace?
I have the same issue. For the moment, I've worked around it by writing a BasicHandler extension, and then walking the SOAPPart myself and moving the namespace reference up to a parent node. I don't like this solution, but it does seem to work.
I really hope somebody comes along and tells us what we have to do.
EDIT
This is way too complicated, and like I said, I don't like it at all, but here we go. I actually broke the functionality into a few classes (This wasn't the only manipulation that we needed to do in that project, so there were other implementations) I really hope that somebody can fix this soon. This uses dom4j to process the XML passing through the SOAP process, so you'll need dom4j to make it work.
public class XMLManipulationHandler extends BasicHandler {
private static Log log = LogFactory.getLog(XMLManipulationHandler.class);
private static List processingHandlers;
public static void setProcessingHandlers(List handlers) {
processingHandlers = handlers;
}
protected Document process(Document doc) {
if (processingHandlers == null) {
processingHandlers = new ArrayList();
processingHandlers.add(new EmptyProcessingHandler());
}
log.trace(processingHandlers);
treeWalk(doc.getRootElement());
return doc;
}
protected void treeWalk(Element element) {
for (int i = 0, size = element.nodeCount(); i < size; i++) {
Node node = element.node(i);
for (int handlerIndex = 0; handlerIndex < processingHandlers.size(); handlerIndex++) {
ProcessingHandler handler = (ProcessingHandler) processingHandlers.get(handlerIndex);
handler.process(node);
}
if (node instanceof Element) {
treeWalk((Element) node);
}
}
}
public void invoke(MessageContext context) throws AxisFault {
if (!context.getPastPivot()) {
SOAPMessage message = context.getMessage();
SOAPPart soapPart = message.getSOAPPart();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
message.writeTo(baos);
baos.flush();
baos.close();
ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
SAXReader saxReader = new SAXReader();
Document doc = saxReader.read(bais);
doc = process(doc);
DocumentSource ds = new DocumentSource(doc);
soapPart.setContent(ds);
message.saveChanges();
} catch (Exception e) {
throw new AxisFault("Error Caught processing document in XMLManipulationHandler", e);
}
}
}
}
public interface ProcessingHandler {
public Node process(Node node);
}
public class NamespaceRemovalHandler implements ProcessingHandler {
private static Log log = LogFactory.getLog(NamespaceRemovalHandler.class);
private Namespace namespace;
private String targetElement;
private Set ignoreElements;
public NamespaceRemovalHandler() {
ignoreElements = new HashSet();
}
public Node process(Node node) {
if (node instanceof Element) {
Element element = (Element) node;
if (element.isRootElement()) {
// Evidently, we never actually see the root node when we're called from
// SOAP...
} else {
if (element.getName().equals(targetElement)) {
log.trace("Found the target Element. Adding requested namespace");
Namespace already = element.getNamespaceForURI(namespace.getURI());
if (already == null) {
element.add(namespace);
}
} else if (!ignoreElements.contains(element.getName())) {
Namespace target = element.getNamespaceForURI(namespace.getURI());
if (target != null) {
element.remove(target);
element.setQName(new QName(element.getName(), namespace));
}
}
Attribute type = element.attribute("type");
if (type != null) {
log.trace("Replacing type information: " + type.getText());
String typeText = type.getText();
typeText = typeText.replaceAll("ns[0-9]+", namespace.getPrefix());
type.setText(typeText);
}
}
}
return node;
}
public Namespace getNamespace() {
return namespace;
}
public void setNamespace(Namespace namespace) {
this.namespace = namespace;
}
/**
* #return the targetElement
*/
public String getTargetElement() {
return targetElement;
}
/**
* #param targetElement the targetElement to set
*/
public void setTargetElement(String targetElement) {
this.targetElement = targetElement;
}
/**
* #return the ignoreElements
*/
public Set getIgnoreElements() {
return ignoreElements;
}
/**
* #param ignoreElements the ignoreElements to set
*/
public void setIgnoreElements(Set ignoreElements) {
this.ignoreElements = ignoreElements;
}
public void addIgnoreElement(String element) {
this.ignoreElements.add(element);
}
}
No warranty, etc, etc.
For the Request I use this to remove namespaces types:
String endpoint = "http://localhost:5555/yourService";
// Parameter to be send
Integer secuencial = new Integer(11); // 0011
// Make the call
Service service = new Service();
Call call = (Call) service.createCall();
// Disable sending Multirefs
call.setOption( org.apache.axis.AxisEngine.PROP_DOMULTIREFS, new java.lang.Boolean( false) );
// Disable sending xsi:type
call.setOption(org.apache.axis.AxisEngine.PROP_SEND_XSI, new java.lang.Boolean( false));
// XML with new line
call.setOption(org.apache.axis.AxisEngine.PROP_DISABLE_PRETTY_XML, new java.lang.Boolean( false));
// Other Options. You will not need them
call.setOption(org.apache.axis.AxisEngine.PROP_ENABLE_NAMESPACE_PREFIX_OPTIMIZATION, new java.lang.Boolean( true));
call.setOption(org.apache.axis.AxisEngine.PROP_DOTNET_SOAPENC_FIX, new java.lang.Boolean( true));
call.setTargetEndpointAddress(new java.net.URL(endpoint));
call.setSOAPActionURI("http://YourActionUrl");//Optional
// Opertion Name
//call.setOperationName( "YourMethod" );
call.setOperationName(new javax.xml.namespace.QName("http://yourUrl", "YourMethod"));
// Do not send encoding style
call.setEncodingStyle(null);
// Do not send xmlns in the xml nodes
call.setProperty(org.apache.axis.client.Call.SEND_TYPE_ATTR, Boolean.FALSE);
/////// Configuration of namespaces
org.apache.axis.description.OperationDesc oper;
org.apache.axis.description.ParameterDesc param;
oper = new org.apache.axis.description.OperationDesc();
oper.setName("InsertaTran");
param = new org.apache.axis.description.ParameterDesc(new javax.xml.namespace.QName("http://yourUrl", "secuencial"), org.apache.axis.description.ParameterDesc.IN, new javax.xml.namespace.QName("http://www.w3.org/2001/XMLSchema", "int"), int.class, false, false);
oper.addParameter(param);
oper.setReturnType(new javax.xml.namespace.QName("http://www.w3.org/2001/XMLSchema", "int"));
oper.setReturnClass(int.class);
oper.setReturnQName(new javax.xml.namespace.QName("http://yourUrl", "yourReturnMethod"));
oper.setStyle(org.apache.axis.constants.Style.WRAPPED);
oper.setUse(org.apache.axis.constants.Use.LITERAL);
call.setOperation(oper);
Integer ret = (Integer) call.invoke( new java.lang.Object []
{ secuencial });
Alter your client's wsdd to set enableNamespacePrefixOptimization to true
<globalConfiguration >
<parameter name="enableNamespacePrefixOptimization" value="true"/>

Categories

Resources