I use Wicket's AjaxFallbackLink in a number of places. This works fine for users, but it's giving us some SEO headaches.
When Google crawls one of our pages, it might be hours or days before they return and try crawling the AjaxFallbackLinks on that page. Of course since the links look like this:
http://example.com/?wicket:interface=:1869:mediaPanel:permissionsLink::IBehaviorListener:0:2
... the session is no longer valid by the time the crawler returns. This results in a ton of 404 errors on our site, which presumably harms our SEO.
My question: how can I make the Ajax links "stable" (like a BookmarkablePageLink) for search engines, but still retain the Ajax behavior for interactive users?
You can tell Google to ignore certain URL parameters by using the URL Parameter options in Google Webmaster Tools. As of July 2011, you can even tell Google what to do in the case where changing the URL parameters has an effect on the page content (e.g. paging or sorting).
To access the feature, log into your Google webmaster tools account,
click on the site you want to configure, and then choose Site
configuration > URL parameters. You’ll see a list of parameters Google
has found on the site, along with the number of URLs Google is
“monitoring” that contain this parameter.
The default behavior is “Let Googlebot decide”. This results in Google
figuring out duplicates and clustering them.
http://searchengineland.com/google-adds-url-parameter-options-to-google-webmaster-tools-86769
The question for you is whether the content of the page does change when you ignore the wicket:interface params. If it does, maybe you need to explore moving to a stateless Ajax fallback, such as the one described here:
https://github.com/jolira/wicket-stateless
Related
I have developed dynamic website using technology like ajax,java etc, that is constantly generating new pages with ID(like http://www.enggheads.com/#!question/1419242644475) similarly like stackoverflow.com, but my sites pages is not searchable by google or any other search engine.
I want my pages show in the search result searched by search engine. How can I achieve this? I have not submitted any sitemap to google webmaster tool. is sitemap really a right solution...??? that means we have to regenerate the sitemap eachtime(or at set interval) when some one add a question on my website.
I m really very confused that how searche ngine search dynamically created pages like stackoverflow question and facebook proile.
Look up how meta tags works. Every dynamic page will have its own set tags and description.
Also it takes time for Google to index your pages.
Another reason why your website isn't shown in the results is because your key words are too common. Google indexes websites based on keywords mentioned in the meta tags. If they are very common, there will be other popular sites that are ranked above yours. Hence your site is not on the top results.
Google also takes into consideration the popularity of your website. It calls this juice. Your website juice increases and decreases based on how old your site is, and how many relevant redirections happen to and from your website.
All the points I mentioned are just a few things that come under the heading search engine optimization.
SEO is a massive concept and you will only learn it eventually as your website grows.
On the other hand, if you want Google to push your results up to the top. You can pay Google to do so. Google has the biggest advertising business.
This is because search engines can not find URLs containing /#?= . So you can rewrite you URLs. This page can help you to do this. http://weblogs.asp.net/scottgu/tip-trick-url-rewriting-with-asp-net
First of all, to be indexed by Google, first Google should FIND the url. Best way to be found is to have many backlinks(popularity). other wise you have to submit sitemap or URL to search engines.
Unfortunately the query "inurl:#!" giving zero results in Google. So Luiggi Mendoza is right about it.
You can try rewrite URLS using htaccess to make them SEO friendly.
I have a security issue. I have a menu, which allows or disallows user to go to a certain page. It is simple, if user is disallowed to reach page, link to it is being hidden in the menu.
But, user still can enter direct URL in browser, and can reach page.
For example, I have CardPage, per link I can reach it using
setResponsePage(new CardPage());
or
setResponsePage(CardPage.class);
but still I can reach this page entering such URL:
http://127.0.0.1:8080/my-application-war/?wicket:bookmarkablePage=:com.my.application.CardPage
Is there is any way to forbid user to access this page per URL?
I can give you additional info on that question if there is something you need.
UPDATE
Had to do it manually, check on every page if it is allowed to be seen, and redirect to NotAllowedPage.
There are several ways to do this, like using Spring Security or Apache Shiro. But if you don't need anything fancy, I would just go with the functionality that is provided by Wicket itself. Just read Security with Wicket in the Wicket guide (which is by the way an excellence reference).
I am looking to develop an app that will take login details from the user, go to a website, login, return values on the web page and then display them to the user on the phone.
Does java have this functionallity? Will I need to use javascript instead maybe? do these answers depend on the website that I am trying to access?
In my head I figure that I could just read in the paramaters as strings or chars, parse the webpage for the appropriate form and "paste" the appropriate value into the form "box". However, I have never attempted anything like this with coding so I am completely new to the idea and dont really know where to start. I tried googling around but any information that I found was either irrelevant or conflicting.
I'm not looking for the code to do it because I will not really learn anythig from that but a finger in the right direction would be great. I really do want to try get better at programming so that's why I've started to give myself these little side projects
Any help that can be offered would be great
Ian,
You can try using http-client (http://hc.apache.org/httpclient-3.x/) lib from apache. It lets to pro grammatically access a website (from a Java code). You will need to do the following things
Use the http-client lib to POST the data to the web site.
Receive the html response.
Use some html parser or xpath to retrieve the values from the response html.
You would need a script which accesses the webpage and enters the data, but in my opinion this is illegal. Because you are accessing a secured area and are able to look into sensitive data. Also accessing the page via a script is "botting" - most pages have safety precautions to prevent the execution of scripts, because most of them are harmful.
In my opinion there is no legal and easy solution to this.
I have an url.
How to know all the existed subUrls of this page.
For example,
http://tut.by/car/12324 - exists
................/car/66666 - doesn`t exist
Desirably, in java.
I have already experimented with almost all from java-source.net/open-source/crawlers - no one can do that, they can only go by hrefs.
Thx in advance!
That's going to be nearly impossible, if there's no index page. While many web servers will create an HTML index page for you if one isn't provided by the site creator, it's a very common practice to disable directory listing, for security reasons.
What you seek is not possible. The server defines the actual meaning of the path in an URL, and it's not possible to 'guess' unless you know a great deal about the server and how it processes the URLs.
I agree, the information you would be seeking would be in an index page. I.e. sometimes you go on a website and delete the "page.html" part. And volia you see all the pages and folders in that directory.
But as mentioned, this is often disabled for security reasons, so users cannot wander around.
Therefore, your other choices are to either
A) Guess, just keep trying different combinations to brute force the page URLs, 00001, 00002, 00003, etc
B) Crawl the website start at its root, looking for links in a page to another page on the website, until all links have been exhausted. Obviously pages on the site will no links to it will never be found.
C) As the owner of the website for the information you require.
I'm trying to create a series of sites that all run as one application, but have different designs (to localise them).
My idea is to map separate domain names to the one site. E.g: www.mysite1.com maps to www.mysite.appspot.com/mysite1 and www.mysite2.com maps to www.mysite.appspot.com/mysite2
I'm guessing that there must be a url pattern or something to pass a servlet the name of the site from web.xml? I'd like urls such as www.mysite.appspot.com/mysite1/forumpost/3/ to be able to be handled by the same servlet as www.mysite.appspot.com/mysite2/forumpost/3/.
Ideally I'd like to pass the site name as a parameter to the servlet.
Surely there is someone that has done this before, or some standard way of doing this? I've got a fuzzy idea about parsing the url to take the site name out of it, but I'm pretty new to servlets etc and thought that someone might be able to shed some light on this situation.
Thanks!
You can't map your own subdomains of appspot.com apps (eg, foo.mysite.appspot.com), but you can map arbitrary domains to your app directly, such as www.mysite1.com and www.mysite2.com - just add them all as aliases to your Google Apps account, and then map them to your App Engine app. Once you've got that done, you just need to check the content of the Host header in your app to route requests to the appropriate handlers (or otherwise vary the content you return).
Try using a javax.servlet.Filter and forwarding to the language specific pages based on the HTTP request header 'Accept-Language' (I think that's the one). You can get at that with a call to javax.servlet.HttpServletRequest.getHeader(String).
This way your site has a single URL and the separation into language specific pages is handled internally.