Convert a list of URLs to a Tree - java

Just to be sure I'm not reinventing the wheel, I want to see if there is some known algorithm, class, or something that can help me solve my problem. I have a huge list of URLs from an application. I'd like to feed those URLs into a tree to create a sitemap-like data structure.
It seems that something like this may have done before. However, everything I see from my searches appears to do it from xml to tree. Ideally I'd like to have answer in Java, but I'm sure I could translate it to Java myself if necessary. If I need to do it myself, I'd probablty take each URL and break them into indexes.
[root] [0] [1] [1] -file
wwe.site.com/dir1/dir2/file.html
[root] [0] [1] [1]
www.site.com/dirabc/dir2/file.html
So, I'd parse each url into offsets [0], [1], [2], … etc., and those be depth down in tree where to add them. That was at least my initial plan. I'm open to any and all suggestions!

You could define your UrlTree as nested HashMaps
public class UrlTree {
private final Map<String, UrlTree> branches = new HashMap<String, UrlTree>();
public void add(String[] tokens, int i) {
if (i >= tokens.length) {
return;
}
final String token = tokens[i];
UrlTree branch = branches.get(token);
if (branch == null) {
branch = new UrlTree();
branches.put(token, branch);
}
branch.add(tokens, i + 1);
}
...
}

You'll need to implement TreeModel in a way that reflects the hierarchy of your observed directory structure. FileTreeModel is an example, and ac.Name is a simple class that parses paths for a vintage file system. See also How to Use Trees. An instance of NetBeans Outline, illustrated here, would make a nice alternative view.

Related

XMLUnit - compare xml and ignore few tags based on a condition

I have couple of xmls which needs to be compared with different set of similar xml and while comparing i need to ignore tags based on a condition, for example
personal.xml - ignore fullname
address.xml - igone zipcode
contact.xml - ignore homephone
here is the code
Diff documentDiff=DiffBuilder
.compare(actualxmlfile)
.withTest(expectedxmlfile)
.withNodeFilter(node -> !node.getNodeName().equals("FullName"))
.ignoreWhitespace()
.build();
How can i add conditions at " .withNodeFilter(node -> !node.getNodeName().equals("FullName")) " or is there a smarter way to do this
You can join multiple conditions together using "and" (&&):
private static void doDemo1(File actual, File expected) {
Diff docDiff = DiffBuilder
.compare(actual)
.withTest(expected)
.withNodeFilter(
node -> !node.getNodeName().equals("FullName")
&& !node.getNodeName().equals("ZipCode")
&& !node.getNodeName().equals("HomePhone")
)
.ignoreWhitespace()
.build();
System.out.println(docDiff.toString());
}
If you want to keep your builder tidy, you can move the node filter to a separate method:
private static void doDemo2(File actual, File expected) {
Diff docDiff = DiffBuilder
.compare(actual)
.withTest(expected)
.withNodeFilter(node -> testNode(node))
.ignoreWhitespace()
.build();
System.out.println(docDiff.toString());
}
private static boolean testNode(Node node) {
return !node.getNodeName().equals("FullName")
&& !node.getNodeName().equals("ZipCode")
&& !node.getNodeName().equals("HomePhone");
}
The risk with this is you may have element names which appear in more than one type of file - where that node needs to be filtered from one type of file, but not any others.
In this case, you would also need to take into account the type of file you are handling. For example, you can use the file names (if they follow a suitable naming convention) or use the root elements (assuming they are different) - such as <Personal>, <Address>, <Contact> - or whatever they are, in your case.
However, if you need to distinguish between XML file types, for this reason, you may be better off using that information to have separate DiffBuilder objects, with different filters. That may result in clearer code.
I had provided the separate method in the below link for !node.getNodeName().equals("FullName")(which you are using in your code), I think by using that separate method you can just pass the array of nodes which you want to ignore and see the results. And incase you wish to add any other conditions based on your requirement, you can try and play in this method.
https://stackoverflow.com/a/68099435/13451711

Parse a single POJO from multiple YAML documents representing different classes

I want to use a single YAML file which contains several different objects - for different applications. I need to fetch one object to get an instance of MyClass1, ignoring the rest of docs for MyClass2, MyClass3, etc. Some sort of selective de-serializing: now this class, then that one... The structure of MyClass2, MyClass3 is totally unknown to the application working with MyClass1. The file is always a valid YAML, of course.
The YAML may be of any structure we need to implement such a multi-class container. The preferred parsing tool is snakeyaml.
Is it sensible? How can I ignore all but one object?
UPD: replaced all "document" with "object". I think we have to speak about the single YAML document containing several objects of different structure. More of it, the parser knows exactly only 1 structure and wants to ignore the rest.
UDP2: I think it is impossible with snakeyaml. We have to read all objects anyway - and select the needed one later. But maybe I'm wrong.
UPD2: sample config file
---
-
exportConfiguration781:
attachmentFieldName: "name"
baseSftpInboxPath: /home/user/somedir/
somebool: false
days: 9999
expected:
- ABC w/o quotes
- "Cat ABC"
- "Some string"
dateFormat: yyyy-MMdd-HHmm
user: someuser
-
anotherConfiguration:
k1: v1
k2:
- v21
- v22
This is definitely possible with SnakeYAML, albeit not trivial. Here's a general rundown what you need to do:
First, let's have a look what loading with SnakeYAML does. Here's the important part of the YAML class:
private Object loadFromReader(StreamReader sreader, Class<?> type) {
Composer composer = new Composer(new ParserImpl(sreader), resolver, loadingConfig);
constructor.setComposer(composer);
return constructor.getSingleData(type);
}
The composer parses YAML input into Nodes. To do that, it doesn't need any knowledge about the structure of your classes, since every node is either a ScalarNode, a SequenceNode or a MappingNode and they just represent the YAML structure.
The constructor takes a root node generated by the composer and generates native POJOs from it. So what you want to do is to throw away parts of the node graph before they reach the constructor.
The easiest way to do that is probably to derive from Composer and override two methods like this:
public class MyComposer extends Composer {
private final int objIndex;
public MyComposer(Parser parser, Resolver resolver, int objIndex) {
super(parser, resolver);
this.objIndex = objIndex;
}
public MyComposer(Parser parser, Resolver resolver, LoaderOptions loadingConfig, int objIndex) {
super(parser, resolver, loadingConfig);
this.objIndex = objIndex;
}
#Override
public Node getNode() {
return strip(super.getNode());
}
private Node strip(Node input) {
return ((SequenceNode)input).getValue().get(objIndex);
}
}
The strip implementation is just an example. In this case, I assumed your YAML looks like this (object content is arbitrary):
- {first: obj}
- {second: obj}
- {third: obj}
And you simply select the object you actually want to deserialize by its index in the sequence. But you can also have something more complex like a searching algorithm.
Now that you have your own composer, you can do
Constructor constructor = new Constructor();
// assuming we want to get the object at index 1 (i.e. second object)
Composer composer = new MyComposer(new ParserImpl(sreader), new Resolver(), 1);
constructor.setComposer(composer);
MyObject result = (MyObject)constructor.getSingleData(MyObject.class);
The answer of #flyx was very helpful for me, opening the way to workaround the library (in our case - snakeyaml) limitations by overriding some methods. Thanks a lot! It's quite possible there is a final solution in it - but not now. Besides, the simple solution below is robust and should be considered even if we'd found the complete library-intruding solution.
I've decided to solve the task by double distilling, sorry, processing the configuration file. Imagine the latter consisting of several parts and every part is marked by the unique token-delimiter. For the sake of keeping the YAML-likenes, it may be
---
#this is a unique key for the configuration A
<some YAML document>
---
#this is another key for the configuration B
<some YAML document
The first pass is pre-processing. For the given String fileString and String key (and DELIMITER = "\n---\n". for example) we select a substring with the key-defined configuration:
int begIndex;
do {
begIndex= fileString.indexOf(DELIMITER);
if (begIndex == -1) {
break;
}
if (fileString.startsWith(DELIMITER + key, begIndex)) {
fileString = fileString.substring(begIndex + DELIMITER.length() + key.length());
break;
}
// spoil alien delimiter and repeat search
fileString = fileString.replaceFirst(DELIMITER, " ");
} while (true);
int endIndex = fileString.indexOf(DELIMITER);
if (endIndex != -1) {
fileString = fileString.substring(0, endIndex);
}
Now we feed the fileString to the simple YAML parsing
ExportConfiguration configuration = new Yaml(new Constructor(ExportConfiguration.class))
.loadAs(fileString, ExportConfiguration.class);
This time we have a single document that must co-respond to the ExportConfiguration class.
Note 1: The structure and even the very content of the rest of configuration file plays absolutely no role. This was the main idea, to get independent configurations in a single file
Note 2: the rest of configurations may be JSON or XML or whatever. We have a method-preprocessor that returns a String configuration - and the next processor parses it properly.

Removing redundant folder paths from a list

I have the following list which contains a series of folder paths. Some of these are redundant so I need to remove them and the final list should only contain the bottom level folders:
Initial list:
var paths = new List<string>
{
"Pavements/",
"Pavements/2019_05/",
"Pavements/2019_06/",
"Pavements/2019_06/A/",
"Roads/",
"Roads/2019_06/"
};
The final List should look like:
paths =
{
"Pavements/2019_05/",
"Pavements/2019_06/A/",
"Roads/2019_06/"
};
i.e. all the upper level folder paths have been removed.
Does anyone know how I can achieve this? I have a feeling I need a recursive method but am unsure how to go about it. I am using C# but answer in java or something similar is ok.
Thanks.
One way to do this is with a linq query that compares each item to all the other items and returns the item only if none of the others begin with it:
paths = paths.Where(path => !paths.Any(p => p != path && p.StartsWith(path))).ToList();

How do I track variable dependencies in Nashorn?

I would like to use the Nashorn engine as a general computation engine. It is powerful, fast has plenty of built-in functions and new functions are very easy to add, using #FunctionalInterface or static methods. Even better, it also provides value-adds like cyclic dependency checking, syntax checking, etc.
However I need to automatically update "output" variables when a dependency changes.
The general idea is that in Java, I'll have something like:
class CalculationEngine {
Data addData(String name, Number value){
...
}
Data addData(String name, String formula){
...
}
String getScript(){
...
}
}
CalculationEngine engine = new CalculationEngine();
Data datum1 = engine.addData("datum1", 1); // Constant integer 1
Data datum2 = engine.addData("datum2", 2); // Constant integer 2
Data datum3 = engine.addData("datum3", "datum1*10");
Data datum4 = engine.addData("datum4", "datum3+datum2");
The CalculationEngine service class knows how to use Nashorn to create a script string out of the Data objects that looks like this:
final String script = engine.getScript(); // "var datum1=1; var datum2=2; var datum3=datum1*10; var datum4=datum3+datum2;"
I know I can parse the script with the Nashorn Parser:
final CompilationUnitTree tree = parser.parse("test", script, null);
But how do I extract the dependencies:
List<Data> whatDependsOn(Data input){
// Process the parsed tree
return list;
}
such that whatDependsOn(datum2) returns [datum4] and whatDependsOn(datum1) returns [datum3, datum4] ?
Or the inverse function getReferencedVariables such that getReferencedVariables(datum3) returns [datum1] and getReferencedVariables(datum4) returns [datum2, datum3] (and I can recursively query getReferencedVariables until all referenced variables have been found).
Basically, when the "value" of one of my Data objects change (due to an external event), how I determine which of my script formulae are affected and need to be recomputed?
I know that the Nashorn script can be parsed but I can not figure out how to use the SimpleTreeVisitorES6 to build up a variable dependency graph:
final CompilationUnitTree tree = parser.parse("test", script, null);
if (tree != null) {
tree.accept(new SimpleTreeVisitorES6<Void, Void>() {
#Override
public Void visitVariable(VariableTree tree, Void v) {
final Kind kind = tree.getKind();
System.out.println("Found a variable: " + kind);
System.out.println(" name: " + kind.toString());
IdentifierTree binding = (IdentifierTree) tree.getBinding();
System.out.println(" kind: " + binding.getKind().name());
System.out.println(" name: " + binding.getName());
System.out.println(" val: " + kind.name());
return null;
}
}, null);
}
one of Nashorn devs here. What you are trying to do is compute the so called def-use relations on source code (well, more likely their transitive closure, but I digress). That's a well-understood compiler theory concept. The good news is that CompilationUnitTree and friends should give you enough information to implement an algorithm for computing this information. The bad news is you'll have to roll up your sleeves and roll your own implementation, I'm afraid. You'll basically have to gather this information, produce merges at control flow join points (back edges and exits of loops, ends of if statements, but you'll also have to handle more exotic stuff like switch/case with their fallthrough semantics and also try/catch/finally, which is the least fun of these as basically control can transfer from anywhere in try block to a catch block.) Your algorithm will also have to repeatedly evaluate loop bodies until the static information you're gathering reaches a fixpoint.
FWIW, while writing Nashorn I had to implement these kinds of things few times using Nashorn's internal parser API (which is different but similar to the public one). If you want some inspiration, you can look into the source code for Nashorn static type analyzer for inferring types of local variables in a JavaScript function which is something I wrote some years ago. If nothing else, it'll give you an idea how to walk an AST tree and keep track of control flow edges and partially computed static analysis data at the edges.
I wish there were an easier way to do this… FWIW, a generalized static analyzer that helps you with bookeeping of flow control could be possible. Good luck.

How to use an array value as field in Java? a1.section[2] = 1;

New to Java, and can't figure out what I hope to be a simple thing.
I keep "sections" in an array:
//Section.java
public static final String[] TOP = {
"Top News",
"http://www.mysite.com/RSS/myfeed.csp",
"top"
};
I'd like to do something like this:
Article a1 = new Article();
a1.["s_" + section[2]] = 1; //should resolve to a1.s_top = 1;
But it won't let me, as it doesn't know what "section" is. (I'm sure seasoned Java people will cringe at this attempt... but my searches have come up empty on how to do this)
Clarification:
My article mysqlite table has fields for the "section" of the article:
s_top
s_sports
...etc
When doing my import from an XML file, I'd like to set that field to a 1 if it's in that category. I could have switch statement:
//whatever the Java version of this is
switch(section[2]) {
case "top": a1.s_top = 1; break;
case "sports": a1.s_sports = 1; break;
//...
}
But I thought it'd be a lot easier to just write it as a single line:
a1["s_"+section[2]] = 1;
In Java, it's a pain to do what you want to do in the way that you're trying to do it.
If you don't want to use the switch/case statement, you could use reflection to pull up the member attribute you're trying to set:
Class articleClass = a1.getClass();
Field field = articleClass.getField("s_top");
field.set(a1, 1);
It'll work, but it may be slow and it's an atypical approach to this problem.
Alternately, you could store either a Map<String> or a Map<String,Boolean> inside of your Article class, and have a public function within Article called putSection(String section), and as you iterate, you would put the various section strings (or string/value mappings) into the map for each Article. So, instead of statically defining which sections may exist and giving each Article a yes or no, you'd allow the list of possible sections to be dynamic and based on your xml import.
Java variables are not "dynamic", unlink actionscript for exemple. You cannot call or assign a variable without knowing it at compile time (well, with reflection you could but it's far to complex)
So yes, the solution is to have a switch case (only possible on strings with java 1.7), or using an hashmap or equivalent
Or, if it's about importing XML, maybe you should take a look on JAXB
If you are trying to get an attribute from an object, you need to make sure that you have "getters" and "setters" in your object. You also have to make sure you define Section in your article class.
Something like:
class Article{
String section;
//constructor
public Article(){
};
//set section
public void setSection(Section section){
this.section = section;
}
//get section
public String getSection(){
return this.section;
}

Categories

Resources