I'm trying to retrieve an article price from a website. The problem is, that the prices differ if you choose online price or store price. After selecting a store the website creates a cookie called: CP_GEODATA with a specific value. I tried to send the cookie in different ways, but I keep getting the online price.
public class Parser {
public static void main(String[] args) throws Exception {
Map<String, String> cookies = new HashMap<String, String>();
cookies.put("CP_COUNTRY ", "%7B%22country%22%3A%22DE%22%7D ");
cookies.put("CP_GEODATA ", "%7B%22location%22%3A-1%2C%22firstlocation%22%3A11%2C%22name%22%3A%22Hamburg%22%7D");
String url = "https://www.cyberport.de/?token=7a2d9b195e32082fec015dca45ba3aa4&sSearchId=565eee12d987b&EVENT=itemsearch&view=liste&query=&filterkategorie=";
Connection.Response res = Jsoup.connect(url).cookies(cookies).data("query", "4B05-525").execute();
Document doc = res.parse();
String tester = doc.select("span[id=articlePrice] > span[class=basis fl]").text();
String tester2 = doc.select("span[id=articlePrice] > span[class=decimal fl]").text();
System.out.println(tester + tester2 + " €");
}
}
The value I'm getting back right now is 2,90 € but it should be 4,90 €. I already tried everything and searched the internet a lot but I did not find any solution working for me.
This is the article I'm receiving the price from:
https://www.cyberport.de/micro-usb-2-0-kabel-usb-a-stecker-micro-b-stecker-0-5m--4B05-525_9374.html
I'm trying to receive the price for the store in Hamburg, Germany.
You can see the cookies I'm setting at the top.
Thank you for any help!
It seems that zone info is stored in session and zone code is sent to server in a post when you select it.
Then you need to do the following steps:
Do the POST with the desired zone
Get the session cookies
Using these cookes do your original POST
Hopefully get the correct results
Here is the code
public static void main(String[] args) throws Exception {
Connection.Response res;
//11 is for Hamburg
String zoneId = "11";
//Set the zone and get the session cookies
res = Jsoup.connect("https://www.cyberport.de/newajaxpass/catalog/itemlist/0/costinfo/" + zoneId)
.ignoreContentType(true)
.method(Method.POST).execute();
final Map<String, String> cookies = res.cookies();
//print the cookies, we'll see session cookies here
System.out.println(cookies);
//If we use that cookies, your code runs Ok
String url = "https://www.cyberport.de/?token=7a2d9b195e32082fec015dca45ba3aa4&sSearchId=565eee12d987b&EVENT=itemsearch&view=liste&query=&filterkategorie=";
res = Jsoup.connect(url).cookies(cookies).data("query", "4B05-525").execute();
Document doc = res.parse();
String tester = doc.select("span[id=articlePrice] > span[class=basis fl]").text();
String tester2 = doc.select("span[id=articlePrice] > span[class=decimal fl]").text();
System.out.println(tester + tester2 + " €");
//Extra check
System.out.println(doc.select("div.townName").text());
}
You'll see:
{SERVERID=realmN03, SCS=76fe7473007c80ea2cfa059f180c603d, SID=pphdh7otcefvc5apdh2r9g0go2}
4,90 €
Hamburg
Which, I hope, is the desired result.
Related
I am writing some code to automate calculating certain page performance metrics. The results I am getting for page size are different by different methods:
What I want to achieve is to read these values shown in this screenshot:
Methods I am using:
Method giving different page load time and different transferred sizes:
Totalbytes and NetData return very different numbers, both very far from what the screenshot would show
public void testing() throws HarReaderException {
JavascriptExecutor js1=((JavascriptExecutor)driver);
try {
Thread.sleep(5000);
}catch(Exception e) {e.printStackTrace();}
String url=driver.getCurrentUrl();
System.out.println("Current URL :"+url);
long pageLoadTime= (Long)js1.executeScript("return (window.performance.timing.loadEventEnd-window.performance.timing.responseStart)");
long TTFB= (Long)js1.executeScript("return (window.performance.timing.responseStart-window.performance.timing.navigationStart)");
long endtoendRespTime= (Long)js1.executeScript("return (window.performance.timing.loadEventEnd-window.performance.timing.navigationStart)");
Date date = new Date();
//Timestamp ts=new Timestamp(date.getTime());
System.out.println("PageLoadTime Time :"+pageLoadTime);
System.out.println("TTFB :"+TTFB);
System.out.println("Customer perceived Time :"+endtoendRespTime);
System.out.println("timeStamp");
String scriptToExecute = "var performance = window.performance || window.mozPerformance || window.msPerformance || window.webkitPerformance || {}; var network = performance.getEntries() || {}; return network;";
String netData = ((JavascriptExecutor)driver).executeScript(scriptToExecute).toString();
System.out.println("Net data: " + netData);
String anotherScript = "return performance\n" +
" .getEntriesByType(\"resource\")\n" +
" .map((x) => x.transferSize)\n" +
" .reduce((a, b) => (a + b), 0);"; //I have tried encodedSize here as well, still gives different results
System.out.println("THIS IS HOPEFULLY THE TOTAL TRANSFER SIZE " + js1.executeScript((anotherScript)).toString());
int totalBytes = 0;
for (LogEntry entry : driver.manage().logs().get(LogType.PERFORMANCE)) {
if (entry.getMessage().contains("Network.dataReceived")) {
Matcher dataLengthMatcher = Pattern.compile("dataLength\":(.*?),").matcher(entry.getMessage()); //I tried encodedLength and other methods but always get different results from the actual page
dataLengthMatcher.find();
totalBytes = totalBytes + Integer.parseInt(dataLengthMatcher.group(1));
//Do whatever you want with the data here.
}
}
System.out.println(totalBytes);
}
Setting up selenium Chrome driver, enabling performance logging and mobbrowser proxy:
#BeforeTest
public void setUp() {
// start the proxy
proxy = new BrowserMobProxyServer();
proxy.start(0);
//get the Selenium proxy object - org.openqa.selenium.Proxy;
Proxy seleniumProxy = ClientUtil.createSeleniumProxy(proxy);
// configure it as a desired capability
DesiredCapabilities capabilities = new DesiredCapabilities().chrome();
LoggingPreferences logPrefs = new LoggingPreferences();
logPrefs.enable(LogType.PERFORMANCE, Level.ALL);
capabilities.setCapability(CapabilityType.LOGGING_PREFS, logPrefs);
capabilities.setCapability(CapabilityType.PROXY, seleniumProxy);
ChromeOptions options = new ChromeOptions();
options.addArguments("--incognito");
capabilities.setCapability(ChromeOptions.CAPABILITY, options);
//set chromedriver system property
System.setProperty("webdriver.chrome.driver", driverPath);
driver = new ChromeDriver(capabilities);
// enable more detailed HAR capture, if desired (see CaptureType for the complete list)
proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT, CaptureType.RESPONSE_CONTENT);
}
Methods I am using to analyze the page:
This method was supposed to show the Load Time in chrome inspector, but it is always showing a lesser number (I think it is showing the time of the last response received instead of DOMContentLoaded or Load Time)
public double calculatePageLoadTime(String filename) throws HarReaderException {
HarReader harReader = new HarReader();
de.sstoehr.harreader.model.Har har = harReader.readFromFile(new File(filename));
HarLog log = har.getLog();
// Access all pages elements as an object
long startTime = log.getPages().get(0).getStartedDateTime().getTime();
// Access all entries elements as an object
List<HarEntry> hentry = log.getEntries();
long loadTime = 0;
int entryIndex = 0;
//Output "response" code of entries.
for (HarEntry entry : hentry)
{
long entryLoadTime = entry.getStartedDateTime().getTime() + entry.getTime();
if(entryLoadTime > loadTime){
loadTime = entryLoadTime;
}
entryIndex++;
}
long loadTimeSpan = loadTime - startTime;
Double webLoadTime = ((double)loadTimeSpan) / 1000;
double webLoadTimeInSeconds = Math.round(webLoadTime * 100.0) / 100.0;
return webLoadTimeInSeconds;
}
I am getting the total number of requests by reading the HAR file from the page, but for some reason it is always 10% less then the actual:
public int getNumberRequests(String filename) throws HarReaderException {
HarReader harReader = new HarReader();
de.sstoehr.harreader.model.Har har = harReader.readFromFile(new File(filename));
HarLog log = har.getLog();
return log.getEntries().size();
}
Testing this on google gives very different results by each method, which are usually 10-200% off from correct numbers.
Why does this happen? Is there a simple way to get those metrics properly from Chrome or any library that makes this easier? My task is automate doing performance analysis on thousands of pages.
I personally analyzed this on my system over and over again and came up with this -
The resource size which its showing currently is the amount of resource fetched till page load event is triggered.
So to overcome this you need to capture the the resource size variable after the page load event also until it stabilizes.Then it will match the actual console values.
I am new to web development in general but so far am doing basic stuff so I don't know why this doesn't work. My servlet receives a request to add new users to my database, but before that I first want to check the values using regular expressions.
So my idea was to have all the parameter names and regex patterns in a hashmap, and then iterate that map, get the parameter from the request object and return an array(for now) containing only the invalid fields. However, it seems that I might get stuck in an infinite loop because I can't find a different explanation why this doesn't work.
I am not sure if this has anything to do with threads as I will only read it and never modify it at runtime, but I switched from HashMap to Concurrent anyway. It seemed too simple to go wrong. So here it is:
public class FormValidator {
public ArrayList<String> Validate(HttpServletRequest request) {
ArrayList<String> invalidFields = new ArrayList<String>();
ConcurrentHashMap<String, String> fieldRegexMap = new ConcurrentHashMap<String, String>();
fieldRegexMap.put("username", ".{8,}");
fieldRegexMap.put("email", "(^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$)");
fieldRegexMap.put("password", "^(?=.*[A-Za-z])(?=.*\\d)(?=.*[$#$!%*#?&])[A-Za-z\\d$#$!%*#?&]{8,10}$");
//fieldRegexMap.put("confirmPassword", "^(?=.*[A-Za-z])(?=.*\\d)(?=.*[$#$!%*#?&])[A-Za-z\\d$#$!%*#?&]{8,10}$");
fieldRegexMap.put("firstname", ".{1,20}");
fieldRegexMap.put("lastname", ".{4,20}");
fieldRegexMap.put("DOB", "^(?:(?:31(\\/|-|\\.)(?:0?[13578]|1[02]))\\1|(?:(?:29|30)(\\/|-|\\.)(?:0?[1,3-9]|1[0-2])\\2))(?:(?:1[6-9]|[2-9]\\d)?\\d{2})$|^(?:29(\\/|-|\\.)0?2\\3(?:(?:(?:1[6-9]|[2-9]\\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\\d|2[0-8])(\\/|-|\\.)(?:(?:0?[1-9])|(?:1[0-2]))\\4(?:(?:1[6-9]|[2-9]\\d)?\\d{2})$\\");
fieldRegexMap.put("city", ".{2,20}");
fieldRegexMap.put("address", ".{2,20}");
fieldRegexMap.put("profession", ".{2,20}");
fieldRegexMap.put("interests", ".{,100}");
fieldRegexMap.put("moreinfo", ".{,500}");
//Get all parameters from the request and validate them key:parametername value:regex
for (Map.Entry<String, String> entry : fieldRegexMap.entrySet()) {
if (!(request.getParameter(entry.getKey()).matches(entry.getValue()))) {
invalidFields.add(entry.getKey());
}
}
return invalidFields;
}
}
Then my servlet's doPost calls ProcessRequest:
protected void processRequest(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=UTF-8");
try (PrintWriter out = response.getWriter()) {
ArrayList<String> results = new ArrayList<String>();
out.println("<h1>Servlet lqRegisterServlet at " +
request.getContextPath() + "</h1>");
out.println("<h2>Method: " + request.getMethod() + "</h2>");
out.println("This Shows up");
FormValidator validator = new FormValidator();
out.println("Everything shows up to this point!");
results = validator.Validate(request);
out.println("This does not show");
for (String param : results) {
out.println("<p>Field:" + param + "</p>");
}
}
response.getOutputStream().println("This is servlet response");
}
I don't know if this is the best way to check the fields, this is a project for me to learn java web development. But it's the only way I could think of that made sense to me and seemed reusable. I plan to create and populate the hashmap outside the validate function.
Thank you for your time
You servlet-call isn't stuck but fails with an error because at least the regular expression for DOB is wrong. I found that out by writing a short main method:
public static void main(String[] args) {
ArrayList<String> invalidFields = null;
ConcurrentHashMap<String, String> fieldRegexMap = new ConcurrentHashMap<String, String>();
fieldRegexMap.put("username", ".{8,}");
fieldRegexMap.put("email", "(^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$)");
fieldRegexMap.put("password", "^(?=.*[A-Za-z])(?=.*\\d)(?=.*[$#$!%*#?&])[A-Za-z\\d$#$!%*#?&]{8,10}$");
//fieldRegexMap.put("confirmPassword", "^(?=.*[A-Za-z])(?=.*\\d)(?=.*[$#$!%*#?&])[A-Za-z\\d$#$!%*#?&]{8,10}$");
fieldRegexMap.put("firstname", ".{1,20}");
fieldRegexMap.put("lastname", ".{4,20}");
fieldRegexMap.put("DOB", "^(?:(?:31(\\/|-|\\.)(?:0?[13578]|1[02]))\\1|(?:(?:29|30)(\\/|-|\\.)(?:0?[1,3-9]|1[0-2])\\2))(?:(?:1[6-9]|[2-9]\\d)?\\d{2})$|^(?:29(\\/|-|\\.)0?2\\3(?:(?:(?:1[6-9]|[2-9]\\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\\d|2[0-8])(\\/|-|\\.)(?:(?:0?[1-9])|(?:1[0-2]))\\4(?:(?:1[6-9]|[2-9]\\d)?\\d{2})$\\");
fieldRegexMap.put("city", ".{2,20}");
fieldRegexMap.put("address", ".{2,20}");
fieldRegexMap.put("profession", ".{2,20}");
fieldRegexMap.put("interests", ".{,100}");
fieldRegexMap.put("moreinfo", ".{,500}");
fieldRegexMap.entrySet().stream()
.forEach(elem -> {
System.out.println(elem.getKey());
System.out.println(Pattern.compile(elem.getValue()));
});
}
This results into the following output:
profession
.{2,20}
password
^(?=.*[A-Za-z])(?=.*\d)(?=.*[$#$!%*#?&])[A-Za-z\d$#$!%*#?&]{8,10}$
firstname
.{1,20}
address
.{2,20}
city
.{2,20}
DOB
Exception in thread "main" java.util.regex.PatternSyntaxException: Unexpected internal error near index 325
^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)0?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$\
^
at java.util.regex.Pattern.error(Pattern.java:1955)
at java.util.regex.Pattern.compile(Pattern.java:1702)
at java.util.regex.Pattern.<init>(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1028)
at IntArraySplitter.lambda$0(IntArraySplitter.java:30)
at IntArraySplitter$$Lambda$1/1995265320.accept(Unknown Source)
at java.util.concurrent.ConcurrentHashMap$EntrySpliterator.forEachRemaining(ConcurrentHashMap.java:3606)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at IntArraySplitter.main(IntArraySplitter.java:28)
The reason why you don't see this is the fact that you already send a response to the client. Because the server already returned an HTTP 200 response code it can't change that to HTTP 500 and therefor just closes the connection to the client when reaching this point. You should see some error message in the server's log, though.
Working with Jsoup. The URL works well on the browser. But it fetches wrong result on the server. I set the maxBodySize "0" as well. But it still only gets first few tags. Moreover the data is even different from the browser one. Can you guys give me a hand?
String queryUrl = "http://www.juso.go.kr/addrlink/addrLinkApi.do?confmKey=U01TX0FVVEgyMDE3MDYyODE0MTYyMzIyMTcw¤tPage=1&countPerPage=20&keyword=연남동";
Document document = Jsoup.connect(queryUrl).maxBodySize(0).get();
Are you aware that this endpoint returns paginated data? Your URL asks for 20 entries from the first page. I assume that the order of these entries is not specified so you can get different data each time you call this endpoint - check if there is a URL parameter that can determine specific sort order.
Anyway to read all 2037 entries you have to do it sequentially. Examine following code:
final String baseUrl = "http://www.juso.go.kr/addrlink/addrLinkApi.do";
final String key = "U01TX0FVVEgyMDE3MDYyODE0MTYyMzIyMTcw";
final String keyword = "연남동";
final int perPage = 100;
int currentPage = 1;
while (true) {
System.out.println("Downloading data from page " + currentPage);
final String url = String.format("%s?confmKey=%s¤tPage=%d&countPerPage=%d&keyword=%s", baseUrl, key, currentPage, perPage, keyword);
final Document document = Jsoup.connect(url).maxBodySize(0).get();
final Elements jusos = document.getElementsByTag("juso");
System.out.println("Found " + jusos.size() + " juso entries");
if (jusos.size() == 0) {
break;
}
currentPage += 1;
}
In this case we are asking for 100 entries per page (that's the maximum number this endpoint supports) and we call it 21 times, as long as calling for a specific page return any <juso> element. Hope it helps solving your problem.
I need to find products in different categories on eBay. But when I use the tutorial code
ebay.apis.eblbasecomponents.FindProductsRequestType request = new ebay.apis.eblbasecomponents.FindProductsRequestType();
request.setCategoryID("Art");
request.setQueryKeywords("furniture");
I get the following error: QueryKeywords, CategoryID and ProductID cannot be used together.
So how is this done?
EDIT: the tutorial code is here.
EDIT2: the link to the tutorial code died, apparently. I've continued to search and the category cannot be used with the keyword search, but there's a Domain that you could presumably add to the request, but sadly it's not in the API - so I'm not sure if indeed it can be done.
The less-than-great eBay API doc is here.
This is my full request:
Shopping service = new ebay.apis.eblbasecomponents.Shopping();
ShoppingInterface port = service.getShopping();
bp = (BindingProvider) port;
bp.getRequestContext().put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, endpointURL);
// Add the logging handler
List<Handler> handlerList = bp.getBinding().getHandlerChain();
if (handlerList == null) {
handlerList = new ArrayList<Handler>();
}
LoggingHandler loggingHandler = new LoggingHandler();
handlerList.add(loggingHandler);
bp.getBinding().setHandlerChain(handlerList);
Map<String,Object> requestProperties = bp.getRequestContext();
Map<String, List<String>> httpHeaders = new HashMap<String, List<String>>();
requestProperties.put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, endpointURL);
httpHeaders.put("X-EBAY-API-CALL-NAME", Collections.singletonList(CALLNAME));
httpHeaders.put("X-EBAY-API-APP-ID", Collections.singletonList(APPID));
httpHeaders.put("X-EBAY-API-VERSION", Collections.singletonList(VERSION));
requestProperties.put(MessageContext.HTTP_REQUEST_HEADERS, httpHeaders);
// initialize WS operation arguments here
FindProductsRequestType request = new FindProductsRequestType();
request.setAvailableItemsOnly(true);
request.setHideDuplicateItems(true);
request.setMaxEntries(2);
request.setPageNumber(1);
request.setQueryKeywords("Postcard");
request.setDomain("");
The last line, which should set the domain like I need to, does not compile. Any idea how to solve this?
EDIT 3: I gave up on the Java API and I'm doing direct REST. The categories on eBay are actually domains now, and the URL looks like this:
String findProducts = "http://open.api.ebay.com/shopping?callname=FindProducts&responseencoding=XML&appid=" + APPID
+ "&siteid=0&version=525&"
+ "&AvailableItemsOnly=true"
+ "&QueryKeywords=" + keywords
+ "&MaxEntries=10"
+ "&DomainName=" + domainName;
This works, but you want to hear a joke? It seems like not all the domains are listed here and so it doesn't really solve this problem. Pretty disappointing work by eBay.
The solution for finding items based on keywords, in a category, is to use findItemsAdvanced. Could have saved me a lot of time if the docs for FindProducts stated this, instead of just saying that you can use either keyword search OR category search.
This is the API URL:
http://open.api.ebay.com/shoppingcallname=findItemsAdvanced&responseencoding=XML&appid=" + APPID
+ "&siteid=0&version=525&"
+ "&AvailableItemsOnly=true"
+ "&QueryKeywords=" + keywords
+ "&categoryId=" + categoryId
+ "&MaxEntries=50
For completion, if you want to get a list of all the top categories you can use this:
http://open.api.ebay.com/Shopping?callname=GetCategoryInfo&appid=" + APPID + "&siteid=0&CategoryID=-1&version=729&IncludeSelector=ChildCategories
Is there any way to get tweets containing a keyword in java? I want to download as many as possible, I have seen a java library twitter4j but it gives only small number of tweets.
Read the documentation of twitter api
https://dev.twitter.com/docs/api/1/get/search
Its rate limited though. I dont think there is a way around it.
The rate limiting varies with open search apis and the ones that require authentication.
http://search.twitter.com/search.json?q=blue%20angels&rpp=5&include_entities=true&result_type=mixed
(Note - this link is copied from twiter api webpage)
You can set the page size and number using Twitter4J to request more tweets.
public static void main(String[] args) throws TwitterException {
Twitter twitter = new TwitterFactory().getInstance();
for (int page = 1; page <= 10; page++) {
System.out.println("\nPage: " + page);
Query query = new Query("#MyWorstFear"); // trending right now
query.setRpp(100);
query.setPage(page);
QueryResult qr = twitter.search(query);
List<Tweet> qrTweets = qr.getTweets();
if(qrTweets.size() == 0) break;
for(Tweet t : qrTweets) {
System.out.println(t.getId() + " - " + t.getCreatedAt() + ": " + t.getText());
}
}
}