Remove .jsp ending with tomcat and nginx - java

I would like to www.example.com/test.jsp as www.example.com/test.
URL rewriting and so on should be too slow. Are there any alternatives? For example, maybe using the jsp files or only servlets when they work with java?
I'm looking for good solution in terms of performance and for the google ranking. The website has 200 pages and grows, so I can't do it manually for every page.
I googled but I didn't find a good answer.

Can't believe you wrote a website with 200+ separate jsp pages. Consider changing site architecture, for example if you have online store with many pages of the same type, you could write only one jsp content page as template and use RESTful architecture to build real page content.

In the nginx configuration, you can make a rewrite of the url when you do the proxy pass

Related

Read full content of a web page in Java

I want to crawl the whole content of the following link with a Java program. The first page is no problem, but when I want to crawl the data of the next pages, there is the same source code as for page one. Therefore a simple HTTP Get does not help at all.
This is the link for the page I need to crawl.
The web site has active contents that need to be interpreted and executed by a HMTL/CSS/JavaScript rendering engine. Therefore I have a simple solution with PhantomJS, but it is sophisticated to run PhantomJS code in Java.
Is there any easier way to read the whole content of the page with Java code? I already searched for a solution, but could not find anything suitable.
Appreciate your help,
kind regards.
Using the Chrome network log (or a similar tool in any other browser) you can identify the XHR request that loads the actual data displayed on the page. I have removed some of the query parameters, but essentially the request looks like this:
GET https://www.blablacar.de/search_xhr?fn=frankfurt&fcc=DE&tn=muenchen&tcc=DE&sort=trip_date&order=asc&limit=10&page=1&user_bridge=0&_=1461181945520
Helpfully, the query parameters look quite easy to understand. The order=asc&limit=10&page=1 part looks like it would be easy to adjust to return your desired results. You could adjust the page parameter to crawl successive pages of data.
The response is JSON, for which there are a ton of libraries available.

Is it advisable to use 'include' tag of jsp for setting the general structure of the website?

I am making a website with around 20 pages in it. Now almost all the pages have same general layout like the menu bar, header, footer etc. I've made a jsp page which contains this common contents and then with the help of 'include' tag I'm using it for the other pages. So is it advisable to follow this technique? Kindly inform me about the pros and cons of using this technique.
Thanks in advance.
Remember that with each #include tag,the whole jsp thing will be converted to a servlet and then it will work as required HTML format as compiled by the browser. So there is no doubt that for a large application it will create unnecessary performance issue.
Instead of doing this you may use iframe tag which is now widely used in web development.
You may modify the iframe source code as u want........
So it's totally depends on which way you want to proceed and your application context.there is no fixed rule that you must have to use this or that technoque

How do i get Contents of an ASPX file through java?

In java, is there any way to get the content of a webpage, wich is an .ASPX file.
I know how to read/write anything from a normal HTML page, but asp pages seem to have one URL for multiple pages, so it's not really possible to reach the desired page by URL.
I understand you can't/won't give me complete instructions right here, but could you maybe send me in the right direction ?
Thanks in advance.
There is nothing special about ASPX pages compared to any other type of page; "plain" html pages could have been dynamically generated as well.
Just don't forget that the query string is also part of the URL. Many ASPX, PHP, etc pages might not even be 'correct' to request without some query string value at all. And other sites don't have file extensions at all... like this site itself. You just have to be sure to get the entire URL for each unique 'page'.
I'm not an expert on .asp, so I might be wrong. However, my impression is that a .asp page should ultimately return HTML (similarly to what a .jsp page does), so you can fetch the content in the same way as you would do for an HTML page.
However, you write that
asp pages seem to have one URL for multiple pages
this makes me think that perhaps your .asp page is using AJAX and so the page content may change while the URL doesn't. Is this your case?
I understand that you are trying to read the aspx from a client PC, not from the server.
If that's right, accessing an HTTP resource is independent from the technology used by the server, all you need to do is to open an http request and retrieve the results.
If you see multiple pages from one URL, then one of the following is happening:
1) POST data is sent to the aspx, and it renders different HTML due to these parameters
2) You are not looking really at the inner page but to a page that provides the frames for the HTML being rendered
3) The page uses heavily Ajax in order to be rendered. The "contents" of the page are not download through the initial request but later by javascript.
Generally, it is probably the first reason.

How Can I rewrite SEO friendly URL on STRUTS?

We have a website which is coded Java with Struts Framework. The WebSite's Urls are not seo friendly. All of them are like below
../buyerApplication.do&companyId=2323
Now We want to make these URLs SEO friendly and I searched and found these solutions:
tuckey.org/urlrewrite : but i don't rely on this system.
adding
title end of link after '&' such as
"../newsId=33233&does-art-in-the-city-equal-art-for-the-city"
: In this solution I am not sure it
works well.
I am waiting your sugestions to solve this problem best.
I actually used URLRewriter (http://tuckey.org/urlrewrite/), which you referenced in your original question. It was very easy to set up and filled my needs perfectly.
To the point, you need a Filter for this.
If you want to keep your existing application's architecture, you'll need to define and create a set of rules to convert unfriendly urls to friendly urls and let the filter convert it and forward the request to the unfriendly url.
If there is no means of modifying an existing application but you want to create a new application based on this idea, you could consider to having a single page controller which translates the HttpServletRequest#getPathInfo()/getRequestURI() to execute the appropriate action class (command pattern) and finally forward the request to the appropriate JSP page. Not sure how that would fit into Struts as I haven't worked with Struts previously.
For what it's worth, you can also look at the REST plugin http://struts.apache.org/2.x/docs/rest-plugin.html, which amongst other things will make your URLs more friendly

autogenerate HTTP screen scraping Java code

I need to screen scrape some data from a website, because it isn't available via their web service. When I've needed to do this previously, I've written the Java code myself using Apache's HTTP client library to make the relevant HTTP calls to download the data. I figured out the relevant calls I needed to make by clicking through the relevant screens in a browser while using the Charles web proxy to log the corresponding HTTP calls.
As you can imagine this is a fairly tedious process, and I'm wodering if there's a tool that can actually generate the Java code that corresponds to a browser session. I expect the generated code wouldn't be as pretty as code written manually, but I could always tidy it up afterwards. Does anyone know if such a tool exists? Selenium is one possibility I'm aware of, though I'm not sure if it supports this exact use case.
Thanks,
Don
I would also add +1 for HtmlUnit since its functionality is very powerful: if you are needing behaviour 'as though a real browser was scraping and using the page' that's definitely the best option available. HtmlUnit executes (if you want it to) the Javascript in the page.
It currently has full featured support for all the main Javascript libraries and will execute JS code using them. Corresponding with that you can get handles to the Javascript objects in page programmatically within your test.
If however the scope of what you are trying to do is less, more along the lines of reading some of the HTML elements and where you dont much care about Javascript, then using NekoHTML should suffice. Its similar to JDom giving programmatic - rather than XPath - access to the tree. You would probably need to use Apache's HttpClient to retrieve pages.
The manageability.org blog has an entry which lists a whole bunch of web page scraping tools for Java. However, I do not seem to be able to reach it right now, but I did find a text only representation in Google's cache here.
You should take a look at HtmlUnit - it was designed for testing websites but works great for screen scraping and navigating through multiple pages. It takes care of cookies and other session-related stuff.
I would say I personally like to use HtmlUnit and Selenium as my 2 favorite tools for Screen Scraping.
A tool called The Grinder allows you to script a session to a site by going through its proxy. The output is Python (runnable in Jython).

Categories

Resources