I am a simple web crawler that is built using the building blocks of crawler4j. I am trying to build a dictionary as my crawler crawls and then pass it to my main (controller) as it builds and parses text. How can I do this since my MyCrawler object isn't created in my main class (uses MyCrawler.class as first parameter)? Also, I am unable to change the controller.start method. I want to be able to use the dictionary created in the crawler after the crawler has finished.
The best way I can think to do it is have controller.start take a predefined and created MyCrawler object, but there is no way to do this, that I can see.
Below is my code. Thank you very much for you help!
Crawler:
public class MyCrawler extends WebCrawler
{
private final static Pattern FILTERS = Pattern.compile(".*(\\.(css|js|gif|jpg|png|mp3|mp3|zip|gz))$");
public ArrayList<String> dictionary = new ArrayList<String>();
#Override public boolean shouldVisit(Page referringPage, WebURL url)
{
String href = url.getURL().toLowerCase();
return !FILTERS.matcher(href).matches()
&& href.startsWith("http://lyle.smu.edu/~fmoore"));
}
#Override public void visit(Page page)
{
String url = page.getWebURL().getURL();
System.out.println("URL: " + url);
if(page.getParseData() instanceof HtmlParseData)
{
HtmlParseData h = (HtmlParseData)page.getParseData();
String text = h.getText();
String[] words = text.split(" ");
for(int i = 0;i < words.length;i++)
{
if(!words[i].equals("") || !words[i].equals(null) || !words[i].equals("\n"))
dictionary.add(words[i]);
}
String html = h.getHtml();
Set<WebURL> links = h.getOutgoingUrls();
System.out.println("Text length: " + text.length());
System.out.println("Html length: " + html.length());
System.out.println("Number of outgoing links: " + links.size());
System.out.println(text);
}
}
}
Controller:
public class Controller
{
public ArrayList<String> dictionary = new ArrayList<String>();
public static void main(String[] args) throws Exception
{
int numberOfCrawlers = 1;
String crawlStorageFolder = "/data/crawl/root";
CrawlConfig c = new CrawlConfig();
c.setCrawlStorageFolder(crawlStorageFolder);
c.setMaxDepthOfCrawling(-1); //Unlimited Depth
c.setMaxPagesToFetch(-1); //Unlimited Pages
c.setPolitenessDelay(200); //Politeness Delay
PageFetcher pf = new PageFetcher(c);
RobotstxtConfig robots = new RobotstxtConfig();
RobotstxtServer rs = new RobotstxtServer(robots, pf);
CrawlController controller = new CrawlController(c, pf, rs);
controller.addSeed("http://lyle.smu.edu/~fmoore");
controller.start(MyCrawler.class, numberOfCrawlers);
controller.shutdown();
controller.waitUntilFinish();
}
}
Let a WebCrawlerFactory create your MyCrawler objects. This should do the trick (at least since version 4.2). However your dictionary should support concurrent access (a simple ArrayList does not!)
// use a factory, instead of supplying the crawler type to pass the dictionary
controller.start(new WebCrawlerFactory<MyCrawler>() {
#Override
public MyCrawler newInstance() throws Exception {
return new MyCrawler(dictionary);
}
}, numberOfCrawlers);
Related
Before anything, the title doesn't convey what I really want to ask.
What I want to know is, how can I make a map, where for several users, it collects their Data and then groups it all together. I'm currently using two lists, one for the users' names and another for their works. I tried using a map.put but it kept overwriting the previous entry. So what I'd like to obtain is as follows;
Desired output:
{user1 = work1, work2, work3 , user2 = work1, work2 , userN = workN}
Current output:
{[user1, user2, user3, user4]=[work1, work2, work3, work4, work5 (user1) , work1 (user2), work1, work2, work3 ( user3 )]}
This is the code that I'm currently using to achieve the above.
private static Map<List<String>, List<String>> repositoriesUserData = new HashMap<>();
private static Set<String> collaboratorNames = new HashSet<>();
public static void main(String[] args) throws Exception {
login();
getCollabs(GITHUB_REPO_NAME);
repositoriesUnderUser();
}
public GitManager(String AUTH, String USERNAME, String REPO_NAME) throws IOException {
this.GITHUB_LOGIN = USERNAME;
this.GITHUB_OAUTH = AUTH;
this.GITHUB_REPO_NAME = REPO_NAME;
this.githubLogin = new GitHubBuilder().withOAuthToken(this.GITHUB_OAUTH, this.GITHUB_LOGIN).build();
this.userOfLogin = this.githubLogin.getUser(GITHUB_LOGIN);
}
public static void login() throws IOException {
new GitManager(GIT_TOKEN, GIT_LOGIN, GITHUB_REPO_NAME);
connect();
}
public static void connect() throws IOException {
if (githubLogin.isCredentialValid()) {
valid = true;
githubLogin.connect(GITHUB_LOGIN, GITHUB_OAUTH);
userOfLogin = githubLogin.getUser(GITHUB_LOGIN);
}
}
public static String getCollabs(String repositoryName) throws IOException {
GHRepository collaboratorsRepository = userOfLogin.getRepository(repositoryName);
collaboratorNames = collaboratorsRepository.getCollaboratorNames();
String collaborators = collaboratorNames.toString();
System.out.println("Collaborators for the following Repository: " + repositoryName + "\nAre: " + collaborators);
String out = "Collaborators for the following Repository: " + repositoryName + "\nAre: " + collaborators;
return out;
}
public static List<String> fillList() {
List<String> collaborators = new ArrayList<>();
collaboratorNames.forEach(s -> {
collaborators.add(s);
});
return collaborators;
}
public static String repositoriesUnderUser() throws IOException {
GHUser user;
List<String> names = new ArrayList<>();
List<String> repoNames = new ArrayList<>();
for (int i = 0; i < fillList().size(); i++) {
user = githubLogin.getUser(fillList().get(i));
Map<String, GHRepository> temp = user.getRepositories();
names.add(user.getLogin());
temp.forEach((c, b) -> {
repoNames.add(b.getName());
});
}
repositoriesUserData.put(names,repoNames);
System.out.println(repositoriesUserData);
return "temporaryReturn";
}
All help is appreciated!
I'll give it a try (code in question still not working for me):
If I understood correctly, you want a Map, that contains the repositories for each user.
So therefore i think the repositoriesUserData should be a Map<String, List<String>.
With that in mind, lets fill the map in each loop-cycle with the user from the lists as key and the list of repository-names as value.
The method would look like this (removed the temporary return and replaced it with void)
public static String repositoriesUnderUser() throws IOException {
for (int i = 0; i < fillList().size(); i++) {
GHUser user = githubLogin.getUser(fillList().get(i));
Map<String, GHRepository> temp = user.getRepositories();
repositoriesUserData.put(user.getLogin(), temp.values().stream().map(GHRepository::getName).collect(Collectors.toList()));
}
return "temporaryReturn";
}
Edit: (Short explanation what is happening in your code)
You are collecting all usernames to the local List names and also adding all repository-names to the local List 'repoNames'.
At the end of the method you put a new entry to your map repositoriesUserData.
That means at the end of the method you just added one single entry to the map where
key = all of the users
value = all of the repositories from the users (because its a list, if two users have the same repository, they are added twice to this list)
Someone please help me i keep trying but not able to find out why i am unable to get the results.
I have created this java springboot web service where when I run the java application, a web browser page will open and when I type in the URL e.g localhost:8080/runbatchfileparam/test.bat the program will check if the test.bat file exist first. If it does, the web page will show a JSON result {“Result”: true} and the command in the batch file will be executed. If it does not exist, the web page will show {“Result”: false}.
I want to create an ASP.NET Web Service that will use the function created in the java web service. When I run the ASP.NET Web Application, a web browser page will open. User will type in URL something like this: localhost:12345/api/callbatchfile/test.bat. The java web service should be running and I should get either {“Result”: true} or {“Result”: false} when I run the C# ASP.NET Web Application too.
However I only get an empty {} without anything inside the brackets. Why is that so?
Here are my code in ASP.NET
TestController.cs
private TestClient testClient = new TestClient();
public async Task<IHttpActionResult> GET(string fileName)
{
try
{
var result = await testClient.runbatchfile(fileName);
var resultDTO = JsonConvert.DeserializeObject<TestVariable>(result);
return Json(resultDTO);
}
catch (Exception e)
{
var result = "Server is not running";
return Ok(new { ErrorMessage = result });
}
}
TestVariable.cs
public class TestVariable
{
public static int fileName { get; set; }
}
TestClient.cs
private static HttpClient client;
private static string BASE_URL = "http://localhost:8080/";
static TestClient()
{
client = new HttpClient();
client.BaseAddress = new Uri(BASE_URL);
client.DefaultRequestHeaders.Accept.Add(
new MediaTypeWithQualityHeaderValue("application/json"));
}
public async Task<string> runbatchfile(string fileName)
{
var endpoint = string.Format("runbatchfile/{0}", fileName);
var response = await client.GetAsync(endpoint);
return await response.Content.ReadAsStringAsync();
}
WebApiConfig.cs
config.Routes.MapHttpRoute(
name: "TestBatchClient",
routeTemplate: "api/runbatchfile/{fileName}",
defaults: new { action = "GET", controller = "Test" }
);
Someone please do help me. Thank you so much.
EDIT
Java web service
Application.java
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
BatchFileController.java
private static final String template = "Sum, %s!";
#RequestMapping("/runbatchfile/{param:.+}")
public ResultFormat runbatchFile(#PathVariable("param") String fileName) {
RunBatchFile rbf = new RunBatchFile();
return rbf.runBatch(fileName);
}
ResultFormat
private boolean result;
public ResultFormat(boolean result) {
this.result = result;
}
public boolean getResult() {
return result;
}
RunBatchFile.java
public ResultFormat runBatch(String fileName) {
String var = fileName;
String filePath = ("C:/Users/attsuap1/Desktop/" + var);
try {
Process p = Runtime.getRuntime().exec(filePath);
int exitVal = p.waitFor();
return new ResultFormat(exitVal == 0);
} catch (Exception e) {
e.printStackTrace();
return new ResultFormat(false);
}
}
I am not sure if this helps.. but I suspect that the AsyncTask is not really executing...
var result = await testClient.testCallBatchProject(fileName);
I would try something like below:
await testClient.testCallBatchProject(fileName).Delay(1000);
Can you try and check if the same happens for a synchronous call? .. if it does, we can zero down on the above.
Given a directory, my application traverses and loads .mdb MS Access dbs using the Jackcess API. Inside of each database, there is a table named GCMT_CMT_PROPERTIES with a column named cmt_data containing some text. I also have a Mapper object (which essentially resembles a Map<String,String> but allows duplicate keys) which I use as a dictionary when replacing a certain word from a string.
So for example if mapper contains fox -> dog then the sentence: "The fox jumps" becomes "The dog jumps".
The design I'm going with for this program is as follows:
1. Given a directory, traverse all subdirectories and load all .mdb files into a File[].
2. For each db file in File[], create a Task<Void> called "TaskMdbUpdater" and pass it the db file.
3. Dispatch and run each task as it is created (see 2. above).
TaskMdbUpdater is responsible for locating the appropriate table and column in the db file it was given and iteratively running a "find & replace" routine on each row of the table to detect words from the dictionary and replace them (as shown in example above) and finally updating that row before closing the db. Each instance of TaskMdbUpdater is a background thread with a Jackcess API DatabaseBuilder assigned to it, so it is able to manipulate the db.
In the current state, the code is running without throwing any exceptions whatsoever, however when I "manually" open the db through Access and inspect a given row, it appears to not have changed. I've tried to pin the source of the issue without any luck and would appreciate any support. If you need to see more code, let me know and I'll update my question accordingly.
public class TaskDatabaseTaskDispatcher extends Task<Void> {
private String parentDir;
private String dbFileFormat;
private Mapper mapper;
public TaskDatabaseTaskDispatcher(String parent, String dbFileFormat, Mapper mapper) {
this.parentDir = parent;
this.dbFileFormat = dbFileFormat;
this.mapper = mapper;
}
#Override
protected Void call() throws Exception {
File[] childDirs = getOnlyDirectories(getDirectoryChildFiles(new File(this.parentDir)));
DatabaseBuilder[] dbs = loadDatabasesInParent(childDirs);
Controller.dprint("TaskDatabaseTaskDispatcher", dbs.length + " databases were found in parent directory");
TaskMdbUpdater[] tasks = new TaskMdbUpdater[dbs.length];
Thread[] workers = new Thread[dbs.length];
for(int i=0; i<dbs.length; i++) {
// for each db, dispatch Task so a worker can update that db.
tasks[i] = new TaskMdbUpdater(dbs[i], mapper);
workers[i] = new Thread(tasks[i]);
workers[i].setDaemon(true);
workers[i].start();
}
return null;
}
private DatabaseBuilder[] loadDatabasesInParent(File[] childDirs) throws IOException {
DatabaseBuilder[] dbs = new DatabaseBuilder[childDirs.length];
// Traverse children and load dbs[]
for(int i=0; i<childDirs.length; i++) {
File dbFile = FileUtils.getFileInDirectory(
childDirs[i].getCanonicalFile(),
childDirs[i].getName() + this.dbFileFormat);
dbs[i] = new DatabaseBuilder(dbFile);
}
return dbs;
}
}
// StringUtils class, utility methods
public class StringUtils {
public static String findAndReplace(String str, Mapper mapper) {
String updatedStr = str;
for(int i=0; i<mapper.getMappings().size(); i++) {
updatedStr = updatedStr.replaceAll(mapper.getMappings().get(i).getKey(), mapper.getMappings().get(i).getValue());
}
return updatedStr;
}
}
// FileUtils class, utility methods:
public class FileUtils {
/**
* Returns only directories in given File[].
* #param list
* #return
*/
public static File[] getOnlyDirectories(File[] list) throws IOException, NullPointerException {
List<File> filteredList = new ArrayList<>();
for(int i=0; i<list.length; i++) {
if(list[i].isDirectory()) {
filteredList.add(list[i]);
}
}
File[] correctSizeFilteredList = new File[filteredList.size()];
for(int i=0; i<filteredList.size(); i++) {
correctSizeFilteredList[i] = filteredList.get(i);
}
return correctSizeFilteredList;
}
/**
* Returns a File[] containing all children under specified parent file.
* #param parent
* #return
*/
public static File[] getDirectoryChildFiles(File parent) {
return parent.listFiles();
}
}
public class Mapper {
private List<aMap> mappings;
public Mapper(List<aMap> mappings) {
this.mappings = mappings;
}
/**
* Returns mapping dictionary, typically used for extracting individual mappings.
* #return List of type aMap
*/
public List<aMap> getMappings() {
return mappings;
}
public void setMappings(List<aMap> mappings) {
this.mappings = mappings;
}
}
/**
* Represents a single String based K -> V mapping.
*/
public class aMap {
private String[] mapping; // [0] - key, [1] - value
public aMap(String[] mapping) {
this.mapping = mapping;
}
public String getKey() {
return mapping[0];
}
public String getValue() {
return mapping[1];
}
public String[] getMapping() {
return mapping;
}
public void setMapping(String[] mapping) {
this.mapping = mapping;
}
}
Update 1:
To verify my custom StringUtils.findAndReplace logic, I've performed the following unit test (in JUnit) which is passing:
#Test
public void simpleReplacementTest() {
// Construct a test mapper/dictionary
List<aMap> aMaps = new ArrayList<aMap>();
aMaps.add(new aMap(new String[] {"fox", "dog"})); // {K, V} = K -> V
Mapper mapper = new Mapper(aMaps);
// Perform replacement
String corpus = "The fox jumps";
String updatedCorpus = StringUtils.findAndReplace(corpus, mapper);
assertEquals("The dog jumps", updatedCorpus);
}
I'm including my TaskMdbUpdater class here separately with some logging code included, as I suspect point of failure lies somewhere in call:
/**
* Updates a given .mdb database according to specifications defined internally.
* #since 2.2
*/
public class TaskMdbUpdater extends Task<Void> {
private final String TABLE_NAME = "GCMT_CMT_PROPERTIES";
private final String COLUMN_NAME = "cmt_data";
private DatabaseBuilder dbPackage;
private Mapper mapper;
public TaskMdbUpdater(DatabaseBuilder dbPack, Mapper mapper) {
super();
this.dbPackage = dbPack;
this.mapper = mapper;
}
#Override
protected Void call() {
try {
// Controller.dprint("TaskMdbUpdater", "Worker: " + Thread.currentThread().getName() + " running");
// Open db and extract Table
Database db = this.dbPackage
.open();
Logger.debug("Opened database: {}", db.getFile().getName());
Table table = db.getTable(TABLE_NAME);
Logger.debug("Opening table: {}", table.getName());
Iterator<Row> tableRows = table.iterator();
// Controller.dprint("TaskMdbUpdater", "Updating database: " + db.getFile().getName());
int i=0;
try {
while( tableRows.hasNext() ) {
// Row is basically a<code> Map<Column_Name, Value> </code>
Row cRow = tableRows.next();
Logger.trace("Current row: {}", cRow);
// Controller.dprint(Thread.currentThread().getName(), "Database name: " + db.getFile().getName());
// Controller.dprint("TaskMdbUpdater", "existing row: " + cRow.toString());
String str = cRow.getString(COLUMN_NAME);
Logger.trace("Row {} column field contents (before find/replace): {}", i, str);
String newStr = performFindAndReplaceOnString(str);
Logger.trace("Row {} column field contents (after find/replace): {}", i, newStr);
cRow.put(COLUMN_NAME, newStr);
Logger.debug("Updating field in row {}", i);
Row newRow = table.updateRow(cRow); // <code>updateRow</code> returns the new, updated row. Ignoring this.
Logger.debug("Calling updateRow on table with modified row");
// Controller.dprint("TaskMdbUpdater", "new row: " + newRow.toString());
i++;
Logger.trace("i = {}", i);
}
} catch(NoSuchElementException e) {
// e.printStackTrace();
Logger.error("Thread has iterated past number of rows in table", e);
}
Logger.info("Iterated through {} rows in table {}", i, table.getName());
db.close();
Logger.debug("Closing database: {}", db.getFile().getName());
} catch (Exception e) {
// e.printStackTrace();
Logger.error("An error occurred while attempting to update row value", e);
}
return null;
}
/**
* #see javafx.concurrent.Task#failed()
*/
#Override
protected void failed() {
super.failed();
Logger.error("Task failed");
}
#Override
protected void succeeded() {
Logger.debug("Task succeeded");
}
private String performFindAndReplaceOnString(String str) {
// Logger.trace("OLD: [" + str + "]");
String updatedStr = null;
for(int i=0; i<mapper.getMappings().size(); i++) {
// loop through all parameter names in mapper to search for in str.
updatedStr = findAndReplace(str, this.mapper);
}
// Logger.trace("NEW: [" + updatedStr + "]");
return updatedStr;
}
}
Here's a small exerept from my log. As you can see, it doesn't seem to do anything after opening the table which has left me a bit perplexed:
INFO (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Located the following directories under specified MOIS parent which contains an .mdb file:
[01_Parent_All_Safe_Test[ RV_DMS_0041RV_DMS_0001RV_DMS_0003RV_DMS_0005RV_DMS_0007RV_DMS_0012RV_DMS_0013RV_DMS_0014RV_DMS_0016RV_DMS_0017RV_DMS_0018RV_DMS_0020RV_DMS_0023RV_DMS_0025RV_DMS_0028RV_DMS_0029RV_DMS_0031RV_DMS_0033RV_DMS_0034RV_DMS_0035RV_DMS_0036RV_DMS_0038RV_DMS_0039RV_DMS_0040 ]]
...
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Created new task: NAMEMAP.logic.TaskMdbUpdater#4cfe46fe
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Created new worker: Thread[Thread-22,5,main]
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Set worker Thread[Thread-22,5,main] as daemon
DEBUG (16-02-2017 17:27:59) [Thread-9] NAMEMAP.logic.TaskDatabaseTaskDispatcher.call(): Dispatching worker: Thread[Thread-22,5,main]
...
DEBUG (16-02-2017 17:28:00) [Thread-22] NAMEMAP.logic.TaskMdbUpdater.call(): Opened database: RV_DMS_0023.mdb
DEBUG (16-02-2017 17:28:00) [Thread-22] NAMEMAP.logic.TaskMdbUpdater.call(): Opening table: GCMT_CMT_PROPERTIES
After this point, there isn't any more entries entries in the log and the processor spikes at 100% load, remaining that way until I force kill the application. This could mean the program gets stuck in an infinite while loop - however if that were to be the case then shouldn't there be log entries in the file?
Update 2
Okay I've further narrowed the problem by printing log TRACE into stdio. It seems that my performFindAndReplaceOnString is super inefficient and it never gets past the first row of these dbs because it's just grinding away at the long string. Any suggestions on how I can efficiently perform a string replacement for this use case?
hi im creating a simple tool using java to create,update and delete issues(tickets) in jira. i am using rest api following code is im using to authenticate jira and issue tickets.
public class JiraConnection {
public static URI jiraServerUri = URI.create("http://localhost:8090/jira/rest/api/2/issue/HSP-1/");
public static void main(String args[]) throws IOException {
final AsynchronousJiraRestClientFactory factory = new AsynchronousJiraRestClientFactory();
final JiraRestClient restClient = factory.createWithBasicHttpAuthentication(jiraServerUri,"vinuvish92#gmail.com","vinu1994");
System.out.println("Sending issue creation requests...");
try {
final List<Promise<BasicIssue>> promises = Lists.newArrayList();
final IssueRestClient issueClient = restClient.getIssueClient();
System.out.println("Sending issue creation requests...");
for (int i = 0; i < 100; i++) {
final String summary = "NewIssue#" + i;
final IssueInput newIssue = new IssueInputBuilder("TST", 1L, summary).build();
System.out.println("\tCreating: " + summary);
promises.add(issueClient.createIssue(newIssue));
}
System.out.println("Collecting responses...");
final Iterable<BasicIssue> createdIssues = transform(promises, new Function<Promise<BasicIssue>, BasicIssue>() {
#Override
public BasicIssue apply(Promise<BasicIssue> promise) {
return promise.claim();
}
});
System.out.println("Created issues:\n" + Joiner.on("\n").join(createdIssues));
} finally {
restClient.close();
}
}
}
according this code i couldn't connect to the jira
**following exception i am getting **
please suggest me best solution to do my task
It seems to me that your error is clearly related to url parameter. The incriminated line and the fact that the error message is about not finding the resource are good indications of it.
You don't need to input the whole endpoint since you are using the JiraRestClient. Depending on the method that you call it will resolve the endpoint. Here is an example that works: as you can see I only input the base url
I have a bunch of XSLs. One of them happens to use base-uri().
When run directly against a file, it's shows the document's systemId.
When run after another XSL, it shows that XSL's systemId.
Things I don't have control over
XSL contents
Order of XSLs
Has to work with XSLT2 (saxon)
Also, I would prefer a streaming solution. This could be fixed by writing every intermediate result to disk and faking the systemId to that of the original document, but that is highly inefficient.
Here's what I've tried thus far.
public class BadSystemIdDemo {
private static final SAXTransformerFactory XSLT2 =
new net.sf.saxon.TransformerFactoryImpl();
public static void main(String[] args) throws Exception {
Result to = new StreamResult(System.out);
// outputs: "file:///one.xsl"
usingXMLFilter(to);
System.out.println();
// also outputs: "file:///one.xsl"
usingTransformerHandler(to);
System.out.println();
// wanted: "file:///in.xml"
}
private static void usingTransformerHandler(Result to) throws Exception {
TransformerHandler first = XSLT2.newTransformerHandler(Inputs.xsl1());
TransformerHandler second = XSLT2.newTransformerHandler(Inputs.xsl2());
first.setResult(new SAXResult(second));
second.setResult(to);
XSLT2.newTransformer().transform(Inputs.in(), new SAXResult(first));
}
private static void usingXMLFilter(Result to) throws Exception {
XMLReader r = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
XMLFilter first = XSLT2.newXMLFilter(Inputs.xsl1());
XMLFilter second = XSLT2.newXMLFilter(Inputs.xsl2());
first.setParent(r);
second.setParent(first);
XSLT2.newTransformer().transform(Inputs.in(second), to);
}
}
Just examples, the real things are obviously more complicated.
public class Inputs {
private static final String IN_SYSTEM_ID = "file:///in.xml";
private static final String XSL1_SYSTEM_ID = "file:///one.xsl";
private static final String XSL2_SYSTEM_ID = "file:///two.xsl";
static Source in() {
return new StreamSource(new StringReader("<root/>"), IN_SYSTEM_ID);
}
static Source in(XMLReader using) {
return new SAXSource(using, SAXSource.sourceToInputSource(in()));
}
static Source xsl1() {
String contents = ""
+ "<xsl:stylesheet version=\"2.0\""
+ " xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">"
+ " <xsl:template match=\"#*|node()\">"
+ " <xsl:copy>"
+ " <xsl:apply-templates select=\"#*|node()\"/>"
+ " </xsl:copy>"
+ " </xsl:template>"
+ "</xsl:stylesheet>";
return new StreamSource(new StringReader(contents), XSL1_SYSTEM_ID);
}
static Source xsl2() {
String contents = ""
+ "<xsl:stylesheet version=\"2.0\""
+ " xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">"
+ " <xsl:template match=\"*\">"
+ " <xsl:value-of select=\"base-uri(.)\"/>"
+ " </xsl:template>"
+ "</xsl:stylesheet>";
return new StreamSource(new StringReader(contents), XSL2_SYSTEM_ID);
}
}
My first idea would be to add an xml:base attribute to the tree; that will determine the result of the base-uri() function. But given the constraints you describe, perhaps that's too disruptive.
To be honest, I don't really believe the constraints. If you've got control over the Java code, then you can create a stylesheet which imports xsl2 and overrides the template that calls base-uri(), replacing it with a reference to a stylesheet parameter.
However, if you're prepared to move away from the JAXP interface to Saxon's s9api API, then it can probably be done. To set up a transformation pipeline in s9api you use one XsltTransformer as the Destination for another XsltTransformer, and by calling setBaseUri() on the second XsltTransformer you should affect the result of base-uri() called within that stylesheet.
Managed to get this working by overriding the XMLReader#setDocumentLocator(). This is rather hackish though and will probably break if the input document is using XInclude.
private static void usingTransformerHandler(Result to) throws Exception {
TransformerHandler first = XSLT2.newTransformerHandler(Inputs.xsl1());
TransformerHandler second = XSLT2.newTransformerHandler(Inputs.xsl2());
LocatorFixer fixer = new LocatorFixer();
first.setResult(new SAXResult(fixer.wrap(second)));
second.setResult(to);
XSLT2.newTransformer().transform(Inputs.in(), new SAXResult(fixer.wrap(first)));
}
private static void usingXMLFilter(Result to) throws Exception {
XMLReader r = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
XMLFilter first = XSLT2.newXMLFilter(Inputs.xsl1());
XMLFilter second = XSLT2.newXMLFilter(Inputs.xsl2());
LocatorFixer fixer = new LocatorFixer();
first.setParent(fixer.wrap(r));
second.setParent(fixer.wrap(first));
XSLT2.newTransformer().transform(Inputs.in(second), to);
}
Helper
class LocatorFixer {
private Locator copied;
XMLFilterImpl wrap(XMLReader delegate) {
return new XMLFilterImpl(delegate) {
#Override
public void setDocumentLocator(Locator real) {
if (copied != null) {
super.setDocumentLocator(copied);
} else {
copied = new LocatorImpl(real);
super.setDocumentLocator(real);
}
}
};
}
ContentHandler wrap(ContentHandler delegate) {
XMLFilterImpl fixed = wrap((XMLReader) null);
fixed.setContentHandler(delegate);
return fixed;
}
}