I have a problem about jsoup because of lazyload scrollLoader.js
I reach site with java code, i have listed only 50 image name by jsoup.But when scroll down on site ,lots of image loads continously. My question is that, is it possible to post image amount into url that uses with Jsoup.connect() to get all image from the site?
here is site : http://www.logowik.com
And this is the usege of script in the site :
<script type="text/javascript">
$(document).ready(function(e) {
CalculateColumns();
recordCount = 50;
groupID = "0";
catID = "0";
query = "";
userEntry = "";
groupInterval = "0";
AddEvent(window, "resize", CalculateColumns);
document["scrollLoader"] = new scrollLoader({evn : getGrids, seize : 1});
document["scrollLoader"].DoScroll();
addLogoClickEvent();
});
</script>
I post this parameters with url like :
http://www.logowik.com/index.php?g=1&groupID=1&catID=0
with this url I get 50 image,because of recordCount = 50 in script. but i cannot post this parameter to url.
For getting 100 images, I try this url: http://www.logowik.com/index.php?recordCount=100&g=1&groupID=1&catID=0
but it doesn't effect.
Thanks
Use firebug or chrome dev tools network panel to see all the requests generated when loading the images, then just recreate them in jsoup.
Related
I have an issue - I'm trying to scrape a Cinema webpage,
---> https://cinemaxx.dk/koebenhavn
I need to get data regarding how many seats that is reserved/sold, I need to extract the last snapshot.
The seats that are reserved/sold is shown on the picture as a red square:
Basiclly, my logic is this.
I scrape the contact using htmlUnit.
I set htmlUnit to execute all JS.
extract the (reservedSeats BASE64 String).
Convert the BASE64 string to image.
Then my program analyse the image, and count how many seats that is reserved / sold.
My issue is:
As I need the last snapshot of the picture, - cause that is the picture that gives the correct data related to how many seats that is reserved / sold. - I start scraping the website 3 min before the movie start,... and untill input == null.
I do this by looping my scrape method - But the ciname server automatic reserve 2 seats at each request (and hold them for 10 minutes). - So I end up reserving all the seats in the whle cinema... (you can see an example on the 2 reserved seats (blue squares) on the picture above)).
I found the JS method in the HTML that reserved the 2 seats at request - Now I would like htmlUnit to execute all JS exect this one JS method that reserves theese 2 seats by HTTP request.
I hope it gives sense, all above.
Is there someone out there that maybe can lead me in the right direction ?, or maybe had similar issue?.
public void scraper(String url) {
final String URL = url;
//Initialize Ghost Browser (FireFox_60):
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
//Configure Ghost Browser:
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setCssEnabled(false);
//Load Url & Configure Ghost Browser:
final HtmlPage page = webClient.getPage(URL);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.waitForBackgroundJavaScript(3000);
//Spider JS PATH to BASE64 data:
final HtmlElement seatPictureRaw = page.querySelector
("body > div.page.page--booking.ng-scope > div.relative > div.inner__container.inner__container--content " +
"> div.seatselect > div > div > div > div:nth-child(2) > div.seatselect__image > img");
//Terminate Current web session:
webClient.getCurrentWindow().getJobManager().removeAllJobs();
webClient.close();
//Process the raw BASE64 Data - Extract clean BASE64 String:
String rawBASE64Data = String.valueOf(seatPictureRaw);
String[] arrOfStr = rawBASE64Data.split("(?<=> 0\") ");
String cleanedUpBASE64Data = arrOfStr[1];
String cleanedUpBASE64Data1 = cleanedUpBASE64Data.replace("src=\"data:image/gif;base64,", "");
String cleanedUpBASE64Data2 = cleanedUpBASE64Data1.replace("\">]", "");
//System.out.println(cleanedUpBASE64Data2);
//Decode BASE64 Rawdata to Image:
final byte[] decodedBytes = Base64.getDecoder().decode(cleanedUpBASE64Data2);
System.out.println("Numbers Of Caracters in BASE64 String: " + decodedBytes.length);
BufferedImage image = ImageIO.read(new ByteArrayInputStream(decodedBytes));
//Forward image for PictureAnalyzer Class...
final PictureAnalyzer pictureAnalyzer = new PictureAnalyzer();
pictureAnalyzer.analyzePixels(image);
} catch (Exception ex) {
ex.printStackTrace();
}
}
One option you have is to intercept&modify the server responses and replace the function call with something else.
replace only the function name (this is uggly because it will generate a js exceptions at runtime) or
remove the function call from the source or
replace the function body with {} or
....
See http://htmlunit.sourceforge.net/faq.html#HowToModifyRequestOrResponse for more
I'm using Bliki-core (version-3.1.0) to access a wikipedia page with the the title "Web service" for my test case. My code is below
String[] listOfTitleStrings = { "Web service" };
User user = new User("", "", "https://en.wikipedia.org/w/api.php");
user.login();
List<Page> listOfPages = user.queryContent(listOfTitleStrings);
for (Page page : listOfPages) {
WikiModel wikiModel = new WikiModel("${image}", "${title}");
String html = wikiModel.render(page.toString());
System.out.println(html);
}
When i access the URL:
http://en.wikipedia.org/w/api.php?
format=xml&action=query&titles=Web%20service&prop=revisions&rvprop=content
I can see the xml output.
But when I run my java code i get the following output
<p>Page{ns=0, title=Web service, id=93483, links=[], categories=[],
editToken='null', imageUrl='null', imageThumbUrl='null',
missing=false, invalid=false, revision=info.bliki.api.Revision#74e46064}</p>
What am I missing here?
Thanks
Changing version of bliki-core to an earlier version 3.0.19 fixed the issue.
Backgroud:
Use Java + BIRT to generate report.
Generate report in viewer and allow user to choose to export it to different format (pdf, xls, word...).
All program are in "Layout", no program in "Master Page".
Have 1 "Data Set". The fields in "Layout" refer to this DS.
There is Group in "Layout", gropu by one field.
In "Group Header", I create one cell to use as page number. "Page : MyPageNumber".
"MyPageNumber" is a field I define which would +1 in Group Header.
Problem:
When I use 1st method to generate report, "MyPageNumber" could not show correctly. Because group header only load one time for each group. It would always show 1.
Question:
As I know there is "restart page number in group" in Crystal report. How to restart page in BIRT?
I want to show data of different group in 1 report file, and the page number start from 1 for each group.
You can do it with BIRT reports using page variables. For example:
Add 2 page variables... Group_page, Group_name.
Add 1 report variable... Group_total_page.
In the report beforeFactory add the script:
prevGroupKey = "";
groupPageNumber = 1;
reportContext.setGlobalVariable("gGROUP_NAME", "");
reportContext.setGlobalVariable("gGROUP_PAGE", 1);
In the report onPageEnd add the script:
var groupKey = currGroup;
var prevGroupKey = reportContext.getGlobalVariable("gGROUP_NAME");
var groupPageNumber = reportContext.getGlobalVariable("gGROUP_PAGE");
if( prevGroupKey == null ){
prevGroupKey = "";
}
if (prevGroupKey == groupKey)
{
if (groupPageNumber != null)
{
groupPageNumber = parseInt(groupPageNumber) + 1;
}
else {
groupPageNumber = 1;
}
}
else {
groupPageNumber = 1;
prevGroupKey = groupKey;
}
reportContext.setPageVariable("GROUP_NAME", groupKey);
reportContext.setPageVariable("GROUP_PAGE", groupPageNumber);
reportContext.setGlobalVariable("gGROUP_NAME", groupKey);
reportContext.setGlobalVariable("gGROUP_PAGE", groupPageNumber);
var groupTotalPage = reportContext.getPageVariable("GROUP_TOTAL_PAGE");
if (groupTotalPage == null)
{
groupTotalPage = new java.util.HashMap();
reportContext.setPageVariable("GROUP_TOTAL_PAGE", groupTotalPage);
}
groupTotalPage.put(groupKey, groupPageNumber);
In a master page onRender script add the following script:
var totalPage = reportContext.getPageVariable("GROUP_TOTAL_PAGE");
var groupName = reportContext.getPageVariable("GROUP_NAME");
if (totalPage != null)
{
this.text = java.lang.Integer.toString(totalPage.get(groupName));
}
In the table group header onCreate event, add the following script, replacing 'COUNTRY' with the name of the column that you are grouping on:
currGroup = this.getRowData().getColumnValue("COUNTRY");
In the master page add a grid to the header or footer and add an autotext variable for Group_page and Group_total_page. Optionally add the page variable for the Group_name as well.
Check out these links for more information about BIRT page variables:
https://books.google.ch/books?id=aIjZ4FYJOQkC&pg=PA85&lpg=PA85&dq=birt+change+autotext&source=bl&ots=K0nCmF2hrD&sig=CBOr_otRW0B72sZoFS7LC_1Mrz4&hl=en&sa=X&ei=ZKNAVcnuLYLHsAXRmIHoCw&ved=0CEoQ6AEwBQ#v=onepage&q=birt%20change%20autotext&f=false
https://www.youtube.com/watch?v=lw_k1qHY_gU
http://www.eclipse.org/birt/phoenix/project/notable2.5.php#jump_4
https://bugs.eclipse.org/bugs/show_bug.cgi?id=316173
http://www.eclipse.org/forums/index.php/t/575172/
Alas, this is not supported with BIRT.
That's probably not the answer you've hoped for, but it's the truth.
This is one of the very few aspects where BIRT is way behind other report generator tools.
However, depending on how you have BIRT integrated into your environment, a workaround approach is possible for PDF export that we use in our solution with great success.
The idea is to let BIRT generate a PDF outline based on the grouping.
And the BIRT report creates information in the ReportContext about where and how it wants the page numbers to be displayed.
After BIRT generated the PDF, a custom PDFPostProcessor uses the PDF outline and the information from the ReportContext to add the page numbers with iText.
If this work-around is viable for you, feel free to contact me.
I want to make a tampermonkey script that basically changes the url of the page. What I want to do is to look if the url has "youtube.com" in it and if it doesn't then it should add /youtube.com to the url.
An example of this is:
The starting website: www.website.com/watch8dzjad8
The changed website: www.website.com/youtube.com/watch8dzjad8
If it helps then the script is meant to be finished in tampermonkey, so that on a specific website it is going to scan for the link and add the /youtube.com if it can't find it since it won't work otherwise and it would really help me to not to copy and paste /youtube.com 10 times a day, as well as to learn how to work with URL's in JavaScript. Thanks in advance
if( !location.host.match(/youtube.com/) )
location= "/youtube.com"+ location.pathname
But instead of that you should restrict this behaviour to a specific site, not just all domains that are not youtube, for example:
if( location.href.match(/website.com\/watch/) )
location= "/youtube.com"+ location.pathname
Explanations
location.href.match(/website.com/watch/)
location.host is the domain of the page (www.website.com)
location.href is the complete URL of the page (http://www.website.com/watch8dzjad8)
match tests if the string follow the given pattern
location= "/youtube.com"+ location.pathname
setting location implies opening the given URL
location.pathname gives the path of the URL (/watch8dzjad8)
So if the URL (http://www.website.com/watch8dzjad8) of the visited page contains the string "website.com/watch", then open "/youtube.com" + "/watch8dzjad8".
As the domain is the same, a relative URL is enough, the browser knows that is the same domain as the current page.
https://developer.mozilla.org/en-US/docs/Web/API/Window/location
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match
try this
function getQueryValue( myUrl ){
myUrl = newUrl.replace(/[\[]/,"\\\[").replace(/[\]]/,"\\\]");
var regexS = "[\\?&]" + myUrl + "=([^&#]*)";
var regex = new RegExp( regexS );
var results = regex.exec( location.href);
if( results == null )
return "";
else
return results;
}
//current url
var curUrl = location.href;
//new url
var newUrl = getQueryValue( "curUrl" );
//redirect to new page
location.href = newUrl;
}
I'm making an app with GoogleMap inside DJ Native webBrowser component. I load page as a string using webBrowser.setHTMLContent(String). HTML file contains JavaScript which add markers to map.
I made simple html file with google-maps-api functions.
It works perfect on Chrome as well as Firefox. But not in webBrowser (djnative).
I discovered that script without new marker statement(google.maps.Marker) works OK.
Have anyone got any idea what's wrong?
Is there any way to show console log from webBrowser (like ctrl+shift+J in Chrome)
This is script code:
<script type="text/javascript" src=https://maps.googleapis.com/maps/api/js?key=[MY_KEY]&sensor=false">
</script>
<script type="text/javascript">
var map;
function initialize() {
var mapOptions = {
center: new google.maps.LatLng(52.236302, 21.007636),
zoom: 10
};
map = new google.maps.Map(document.getElementById("map-canvas"),
mapOptions);
var t = [];
var x = [];
var y = [];
var h = [];
t.push('Location Name 1');
x.push(52.232097);
y.push(20.927985);
h.push('<p><strong>Location Name 1</strong><br/>Address 1</p>');
t.push('Location Name 2');
x.push(52.245097);
y.push(20.945985);
h.push('<p><strong>Location Name 2</strong><br/>Address 2</p>');
/*this is error making code*/
var i = 0;
for ( item in t ) {
var marker = new google.maps.Marker({
position: new google.maps.LatLng(x[i], y[i]),
map: map,
title: t[i],
});
i++;
} /*this is end of error making code*/
}
google.maps.event.addDomListener(window, 'load', initialize);
</script>
1.Dj is using ie as default. did you try opening the html with ie?
2.In dj, you can not always setting the content and expect it run. for example, the tinymce editor, does not run if you set the editor.html (html containint tinymce) directly. That is why the author of dj made internal webserver for editors. You have to call it through an address (for editor ck and tinymce, dj calls localhost, http://127.0.0.1/tinymce/.. but the structure is too complex to be detailed here. you may try for testing purpose, putting your html to a simple web page (tomcat) and call it through loadURL (instead of setContent)