Obtaining all name-value pairs in a form using Jsoup

Obtaining all name-value pairs in a form using Jsoup - java

I want to automate posting of a number of HTML forms using Jsoup and HttpClient. Most of those forms have hidden fields (with session ids, etc.) or have default values that I'd rather leave alone.
Coding each of the form submissions individually -- extracting each of said hidden or default values from the page -- is extremely tedious, so I thought about writing a generic method to obtain the list of HTTP parameters for a given form.
It is not a trivial piece of code, though, because of the variety of input tags and field types, each of which may need specific handling (e.g. textareas, checkboxes, radio buttons, selects, ...) so I thought I'd first search/ask in case it already exists.
Note: Jsoup and HttpClient are a given; I can't change that -- so please no need to provide answers suggesting other solutions: I have a Jsoup Document object and I need to build an HttpClient HttpRequest.

So I've ended up writing it. I would still prefer to swap for something field-tested (and hopefully maintained elsewhere), but in case it helps anyone landing here...
Not thoroughly tested and without support for multipar/form-data, but works in the few examples I've tried:
public void submit(String formSelector, List<String> params) {
if (params.size() % 2 != 0) {
throw new Exception("There must be an even number of params.");
}
Element form= $(formSelector).first();
Set<String> newParams= Sets.newHashSet();
for (int i=0; i < params.size(); i+= 2) {
newParams.add(params.get(i));
}
List<String> allParams= Lists.newArrayList(params);
for (Element field: form.select("input, select, textarea")) {
String name= field.attr("name");
if (name == null || newParams.contains(name)) continue;
String type= field.attr("type").toLowerCase();
if ("checkbox".equals(type) || "radio".equals(type)) {
if (field.attr("checked") != null) {
allParams.add(field.attr("name"));
allParams.add(field.attr("value"));
}
}
else if (! fieldTypesToIgnore.contains(type)) {
allParams.add(field.attr("name"));
allParams.add(field.val());
}
}
String action= form.attr("abs:action");
String method= form.attr("method").toLowerCase();
// String encType= form.attr("enctype"); -- TODO
if ("post".equals(method)) {
post(action, allParams);
}
else {
get(action, allParams);
}
}
($, get, and post are methods I already had lying around... you can easily guess what they do).

Jsoup has a formData method in the FormElement class; it works in simple cases, but it doesn't always do what I need, so I ended up writing some custom code too.

Related

modify underlying result/value of async object

I am using Kotlin in a webserver app and I have a line of code as follows:
.onComplete { jsonResult: AsyncResult<JsonObject>? ->
Now what I want to do is change the underlying JsonObject wrapped in the AsyncResult, so that it is going to be reflected further downstream.
var res: JsonObject? = jsonResult?.result()
if (res != null) {
if (res.getInteger("files_uploaded") > 0) {
res.put("URL", "Some URL")
}
}
I was then imagining to update the underlying JSON object in the result but not sure how to do that.

please take note that single quotes are missing and ` appear as \` because the code formatting. I tried to leave what seemed least confusing...
You should be able to make changes in the conditional statement
if (res !=null) {
res being the JsonObject:
console.log(res);
would show you what's in there. You may need to use
let resXmodifiedX = JSON.parse(res);
One approach is to write a function and pass res to that function which you can do if it is in the console.log(res).
Some notes on what's below:
place the function somewhere consistent maybe at the bottom of the file...
objects often have multiple levels res.person.name, res.contact.email, or whatever...
use multiple for loops:
let level = res[key]; for(child in level) {
you don't need to do this if you know exactly what object attributes you need to update.
you can set the value directly but you always want to test for it before trying to set it to avoid errors that stop execution.
let toBe = toBe =>`${toBe}`;
let update = (res)?toBe(update(res)):toBe('not Found');
This option is really only if you know for sure that data will be there and you can't proceed without it. Which is not uncommon but also not how JSON is designed to be used.
The code below is a concise way to make some simple changes but may not be an ideal solution. To use it xModify(res) replaces console.log(res) above.
function xModify(x) {
let resXmodifiedX = JSON.parse(x);
let res = resXmodifiedX;
for (key in res) {
res[key] = key=='name'? \`My change ${res[key]}\`: key=='other'? \`My Change ${res[key]}\`:res[key];
resXmodifiedX = JSON.stringify(res);
return resXmodifiedX;
}
That will update res.name and res.other otherwise res[key] is unchanged. If you do not need to parse res change let res = xModifiedx; to let res = x; remove the first line and change the last two lines to return res;
function xModify(x) {
let res = x;
for (key in res) {
res[key] = key=='name'? \`My change ${res[key]}\`: key=='other'? \`My Change ${res[key]}\`:res[key];
return res;
}
If your data is numeric which is not generally the case in a web server response scenario this is a terrible approach. Because it is probably a string I used the template variable as a way to easily add a complex pattern in place of a string. My change ${res[key]} not a real world example. Any valid JS code can go in the ${ } (template variable). I've been defaulting to the first pattern more and more.
let me = (bestCase)?`${'the best version'} of myself`:`${'someone'} I'm ok with`;

Can I get the Field value in String into custom TokenFilter in Apache Solr?

I need to write a custom LemmaTokenFilter, which replaces and indexes the words with their lemmatized(base) form. The problem is, that I get the base forms from an external API, meaning I need to call the API, send my text, parse the response and send it as a Map<String, String> to my LemmaTokenFilter. The map contains pairs of <originalWord, baseFormOfWord>. However, I cannot figure out how can I access the full value of the text field, which is being proccessed by the TokenFilters.
One idea is to go through the tokenStream one by one when the LemmaTokenFilter is being created by the LemmaTokenFilterFactory, however I would need to watch out to not edit anything in the tokenStream, somehow reset the current token(since I would need to call the .increment() method on it to get all the tokens), but most importantly this seems unnecessary, since the field value is already there somewhere and I don't want to spend time trying to put it together again from the tokens. This implementation would probably be too slow.
Another idea would be to just process every token separately, however calling an external API with only one word and then parsing the response is definitely too inefficient.
I have found something on using the ResourceLoaderAware interface, however I don't really understand how could I use this to my advantage. I could probably save the map in a text file before every indexing, but writing to a file, opening it and reading from it before every document indexing seems too slow as well.
So the best way would be to just pass the value of the field as a String to the constructor of LemmaTokenFilter, however I don't know how to access it from the create() method of the LemmaTokenFilterFactory.
I could not find any help googling it, so any ideas are welcome.
Here's what I have so far:
public final class LemmaTokenFilter extends TokenFilter {
private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
private Map<String, String> lemmaMap;
protected LemmaTokenFilter(TokenStream input, Map<String, String> lemmaMap) {
super(input);
this.lemmaMap = lemmaMap;
}
#Override
public boolean incrementToken() throws IOException {
if (input.incrementToken()) {
String term = termAtt.toString();
String lemma;
if ((lemma = lemmaMap.get(term)) != null) {
termAtt.setEmpty();
termAtt.copyBuffer(lemma.toCharArray(), 0, lemma.length());
}
return true;
} else {
return false;
}
}
}
public class LemmaTokenFilterFactory extends TokenFilterFactory implements ResourceLoaderAware {
public LemmaTokenFilterFactory(Map<String, String> args) {
super(args);
if (!args.isEmpty()) {
throw new IllegalArgumentException("Unknown parameters: " + args);
}
}
#Override
public TokenStream create(TokenStream input) {
return new LemmaTokenFilter(input, getLemmaMap(getFieldValue(input)));
}
private String getFieldValue(TokenStream input) {
//TODO: how?
return "Šach je desková hra pro dva hráče, v dnešní soutěžní podobě zároveň považovaná i za odvětví sportu.";
}
private Map<String, String> getLemmaMap(String data) {
return UdPipeService.getLemma(data);
}
#Override
public void inform(ResourceLoader loader) throws IOException {
}
}

1. API based approach:
You can create an Analysis Chain with the Custom lemmatizer on top. To design this lemmatizer, I guess you can look at the implementation of the Keyword Tokenizer;
Such that you can read everything whatever is there inside the input and then call your API;
Replace all your tokens from the API response in the input text;
After that in Analysis Chain, use standard or white space tokenizer to tokenized your data.
2. File-Based Approach
It will follow all the same steps, except calling the API it can use the hashmap, from the files mentioned while defining the TokenStream
Now coming to the ResourceLoaderAware:
It is required when you need to indicate your Tokenstream that resource has changed it has inform method which takes care of that. For reference, you can look into StemmerOverrideFilter
Keyword Tokenizer: Emits the entire input as a single token.

So I think I found the answer, or actually two answers.
One would be to write my client application in a way, that incoming requests are first processed - the field value is sent to the external API and the response is stored into some global variable, which can then be accessed from the custom TokenFilters.
Another one would be to use custom UpdateRequestProcessors, which allow us to modify the content of the incoming document, calling the external API and again saving the response so it's somehow globally accessible from custom TokenFilters. Here Erik Hatcher talks about the use of the ScriptUpdateProcessor, which I believe can be used in my case too.
Hope this helps to anyone stumbling upon a similar problem, because I had a hard time looking for a solution to this(could not find any similar threads on SO)

How to check if List<obj> is null?

I have a list that takes a list from my server. this list will hold whatever the server finds at the database ex.
List<OBJ> lstObj = new Arraylist<OBJ>;
Service.getOBJ(new AsyncCallback<List<OBJ>>(){
#Override
public void onFailure(Throwable caught) {
caught.printStackTrace();
}
#Override
public void onSuccess(List<OBJ> result) {
//line to check if result is null
}
});
I have tried
if(result==null){
}
and also tried
if(result.isempty(){
}
but it didnt work. the list will be null if the server doesnt find any record from the database. all i need to do is check if the list is empty.

Checking if the list is empty and checking if result is null are very different things.
if (result == null)
will see if the value of result is a null reference, i.e. it doesn't refer to any list.
if (result.isEmpty())
will see if the value of result is a reference to an empty list... the list exists, it just doesn't have any elements.
And of course, in cases where you don't know if result could be null or empty, just use:
if (result == null || result.isEmpty())

Check number of elements in resulting List:
if (0==result.size()) {
// Your code
}

You will do like this:
if (test != null && !test.isEmpty()) { }
This will check for both null and empty, meaning if it is not null and not empty do your processing.

You're obviously new at this programming thing if you didn't already validate your server, so I'm trying to aim a guess at what might be going on with your server. Depending on what your "" objects are, you could have valid objects that represent data that is meaningless in different ways. For example, you may have String objects with various kinds of white space.
This happens a lot on servers that provide answers using PHP and JSP, where pages are assembled using various include mechanisms and there is white space between them.

The below should do for your code. If you want a negation logic just modify accordingly.
As also suggested by someone CollectionUtils provide just utility methods which removes such null check LOC.
result == null || result.isEmpty()
Hope this helps!

Ignore a Token output in Lucene's IncrementToken() method

I am trying to make a custom filter in Lucene which simply recognizes whether two consequent words in a text start with a capital letter and have the rest as lower case, in which case the two words are to be joined as one token.
The overriden incrementToken method has the following code
#Override
public boolean incrementToken() throws IOException {
if(!input.incrementToken()){
return false;}
//Case were the previous token WAS NOT starting with capital letter and the rest small
if(previousTokenCanditateMainName==false)
{
if(CheckIfMainName(termAtt.term()))
{
previousTokenCanditateMainName=true;
tempString=this.termAtt.term() ; /*This is the*/
// myToken.offsetAtt=this.offsetAtt; /*Token i need to "delete"*/
tempStartOffset=this.offsetAtt.startOffset();
tempEndOffset=this.offsetAtt.endOffset();
return true;
}
else
{
return true;
}
}
//Case were the previous token WAS a Proper name (starting with Capital and continuiing with small letters)
else
{
if(CheckIfMainName(termAtt.term()))
{
previousTokenCanditateMainName=false;
posIncrAtt.setPositionIncrement(0);
termAtt.setTermBuffer(tempString+TOKEN_SEPARATOR+this.termAtt.term());
offsetAtt.setOffset(tempStartOffset, this.offsetAtt.endOffset());
return true;
}
else
{
previousTokenCanditateMainName=false;
return true;
}
}
}
My question is how once i find the first Token that meets my requirements can i "ignore" it.
Currently the code works perfectly with joining the two tokens but i also get an extra token with the first one of the two that I identified.
I tried using the same method setEnableIncrementsPosition(true) as does the built-in stopFilter but in that case my filter needs to be a TokenFilter type which does not allow me to override the incrementToken method.
I hope i phrased my problem properly

You might have a custom method:
private void tokenize()
where you do the splitting and the custom joins. The resulting List<String> tokens need to be held as an attribute of the tokenizer.
In the incrementToken method you simply check if this attribute is null and initialize it if necessary.
You also need to add the tokens in the incrementToken() method to the termAttribute
termAttribute.append(tokens.get(tokenIndex));
this includes that your Tokenizer needs to have an attribute like this:
private CharTermAttribute termAttribute = addAttribute(CharTermAttribute.class);
Probably you need also some fine tuning. But thats only a draft on how this can be achieved in a pretty simple way.

What is the best way to parse this configuration file?

I am working on a personal project that uses a custom config file. The basic format of the file looks like this:
[users]
name: bob
attributes:
hat: brown
shirt: black
another_section:
key: value
key2: value2
name: sally
sex: female
attributes:
pants: yellow
shirt: red
There can be an arbitrary number of users and each can have different key/value pairs and there can be nested keys/values under a section using tab-stops. I know that I can use json, yaml, or even xml for this config file, however, I'd like to keep it custom for now.
Parsing shouldn't be difficult at all as I have already written code to do parse it. My question is, what is the best way to go about parsing this using clean and structured code as well as writing in a way that won't make changes in the future difficult (there might be multiple nests in the future). Right now, my code looks utterly disgusting. For example,
private void parseDocument() {
String current;
while((current = reader.readLine()) != null) {
if(current.equals("") || current.startsWith("#")) {
continue; //comment
}
else if(current.startsWith("[users]")) {
parseUsers();
}
else if(current.startsWith("[backgrounds]")) {
parseBackgrounds();
}
}
}
private void parseUsers() {
String current;
while((current = reader.readLine()) != null) {
if(current.startsWith("attributes:")) {
while((current = reader.readLine()) != null) {
if(current.startsWith("\t")) {
//add user key/values to User object
}
else if(current.startsWith("another_section:")) {
while((current = reader.readLine()) != null) {
if(current.startsWith("\t")) {
//add user key/values to new User object
}
else if (current.equals("")) {
//newline means that a new user is up to parse next
}
}
}
}
}
else if(!current.isEmpty()) {
//
}
}
}
As you can see, the code is pretty messy, and I have cut it short for the presentation here. I feel there are better ways to do this as well maybe not using BufferedReader. Can someone please provide possibly a better way or approach that is not as convoluted as mine?

I would suggest not creating custom code for config files. What you're proposing isn't too far removed from YAML (getting started). Use that instead.
See Which java YAML library should I use?

Everyone will recommend using XML because it's simply better.
However, in case you're on a quest to prove your programmer's worth to yourself...
...there is nothing really fundamentally wrong with the code you posted in the sense that it's clear and it's obvious to potential readers what's going on, and unless I'm totally out of the loop on file operations, it should perform pretty much as well as it could.
The one criticism I could offer is that it's not recursive. Every level requires a new level of code to support. I would probably make a recursive function (a function that calls itself with sub-content as parameter and then again if there's sub-sub-content etc.), that could be called, reading all of this stuff into a hashtable with hashtables or something, and then I'd use that hashtable as a configuration object.
Then again, at that point I would probably stop seeing the point and use XML. ;)

I'd recommend changing the configuration file's format to JSON and using an existing library to parse the JSON objects such as FlexJSON.
{
"users": [
{
"name": "bob",
"hat": "brown",
"shirt": "black",
"another_section": {
"key": "value",
"key2": "value2"
}
},
{
"name": "sally",
"sex": "female",
"another_section": {
"pants": "yellow",
"shirt": "red"
}
}
]
}

It looks simple enough for a state machine.
while((current = reader.readLine()) != null) {
if(current.startsWith("[users]"))
state = PARSE_USER;
else if(current.startsWith("[backgrounds]"))
state = PARSE_BACKGROUND;
else if (current.equals("")) {
// Store the user or background that you've been building up if you have one.
switch(state) {
case PARSE_USER:
case USER_ATTRIBUTES:
case USER_OTHER_ATTRIBUTES:
state = PARSE_USER;
break;
case PARSE_BACKGROUND:
case BACKGROUND_ATTRIBUTES:
case BACKGROUND_OTHER_ATTRIBUTES:
state = PARSE_BACKGROUND;
break;
}
} else switch(state) {
case PARSE_USER:
case USER_ATTRIBUTES:
case USER_OTHER_ATTRIBUTES:
if(current.startsWith("attributes:"))
state = USER_ATTRIBUTES;
else if(current.startsWith("another_section:"))
state = USER_OTHER_ATTRIBUTES;
else {
// Split the line into key/value and store into user
// object being built up as appropriate based on state.
}
break;
case PARSE_BACKGROUND:
case BACKGROUND_ATTRIBUTES:
case BACKGROUND_OTHER_ATTRIBUTES:
if(current.startsWith("attributes:"))
state = BACKGROUND_ATTRIBUTES;
else if(current.startsWith("another_section:"))
state = BACKGROUND_OTHER_ATTRIBUTES;
else {
// Split the line into key/value and store into background
// object being built up as appropriate based on state.
}
break;
}
}
// If you have an unstored object, store it.

If you could utilise XML or JSON or other well-known data encoding as the data format, it will be a lot easier to parse/deserialize the text content and extract the values.
For example.
name: bob
attributes:
hat: brown
shirt: black
another_section:
key: value
key2: value2
Can be Expressed as the follow XML (there are other options to express it in XML as well)
<config>
<User hat="brown" shirt="black" >
<another_section>
<key>value</key>
<key2>value</key2>
</another_section>
</User>
</config>
Custom ( Extremely simple )
As I mentioned in the comment below, you can just make them all name and value pairs.
e.g.
name :bob
attributes_hat :brown
attributes_shirt :black
another_section_key :value
another_section_key2 :value2
and then do string split on '\n' (newline) and ':' to extract the key and value or build a dictionary/map object.

A nice way to clean it up would be to use a table, i.e. replace your conditionals with a Map. You can then invoke you parsing methods through reflection (simple) or create a few more classes implementing a common interface (more work but more robust).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Obtaining all name-value pairs in a form using Jsoup - java

Jsoup has a formData method in the FormElement class; it works in simple cases, but it doesn't always do what I need, so I ended up writing some custom code too.

Related

modify underlying result/value of async object

Can I get the Field value in String into custom TokenFilter in Apache Solr?

How to check if List<obj> is null?

Ignore a Token output in Lucene's IncrementToken() method

What is the best way to parse this configuration file?

Categories

Resources