How to make dynamic website searchable by search engine - java

I have developed dynamic website using technology like ajax,java etc, that is constantly generating new pages with ID(like http://www.enggheads.com/#!question/1419242644475) similarly like stackoverflow.com, but my sites pages is not searchable by google or any other search engine.
I want my pages show in the search result searched by search engine. How can I achieve this? I have not submitted any sitemap to google webmaster tool. is sitemap really a right solution...??? that means we have to regenerate the sitemap eachtime(or at set interval) when some one add a question on my website.
I m really very confused that how searche ngine search dynamically created pages like stackoverflow question and facebook proile.

Look up how meta tags works. Every dynamic page will have its own set tags and description.
Also it takes time for Google to index your pages.
Another reason why your website isn't shown in the results is because your key words are too common. Google indexes websites based on keywords mentioned in the meta tags. If they are very common, there will be other popular sites that are ranked above yours. Hence your site is not on the top results.
Google also takes into consideration the popularity of your website. It calls this juice. Your website juice increases and decreases based on how old your site is, and how many relevant redirections happen to and from your website.
All the points I mentioned are just a few things that come under the heading search engine optimization.
SEO is a massive concept and you will only learn it eventually as your website grows.
On the other hand, if you want Google to push your results up to the top. You can pay Google to do so. Google has the biggest advertising business.

This is because search engines can not find URLs containing /#?= . So you can rewrite you URLs. This page can help you to do this. http://weblogs.asp.net/scottgu/tip-trick-url-rewriting-with-asp-net

First of all, to be indexed by Google, first Google should FIND the url. Best way to be found is to have many backlinks(popularity). other wise you have to submit sitemap or URL to search engines.
Unfortunately the query "inurl:#!" giving zero results in Google. So Luiggi Mendoza is right about it.
You can try rewrite URLS using htaccess to make them SEO friendly.

Related

Is Google Dictionary API now comes under Google custom engine?

I asked the same question on google developer forum, looking for clear direction for the same.
I am working on a project to get meaning of English word. And Google search engine provides result comprehensively. I would like to have those results (meaning of words, it's synonym, audio source from gstatic, usage graph since 1800, etc...) as input to my program for further processing. Is Custom search engine has exposed API or Google have explicit API for this? Earlier i thought of screen scraping, but it would be much easier if Google provide API for this as well.
Thanks in Anticipation.
Check out Wikitionary
Wiktionary (whose name is a blend of the words wiki and dictionary) is
a multilingual, web-based project to create a free content dictionary
of all words in all languages. It is available in 172 languages and in
Simple English. Like its sister project Wikipedia, Wiktionary is run
by the Wikimedia Foundation, and is written collaboratively by
volunteers, dubbed "Wiktionarians". Its wiki software, MediaWiki,
allows almost anyone with access to the website to create and edit
entries.
You can access their data using the MediaWiki API, no need to scrape google search results.

Web scraping through web pages using JSoup

Ive made a web scraper to scraper pieces of information on the IMDB. It traversed each page by changing the number in the url to a different random one and then repeated the web scraping process on this new page.
http://www.imdb.com/title/tt0800369/ <--Changing this number for a new movie.
How can I do this on the BFI website? I cant see a way to go from film to film.
Thanks in advance!
Following randomly generated links is not the most efficient way to traversed over WWW...
You really should follow URL's that you found on other pages. You can use crawler4j that seems to be easiest Java crawler to start with. There are also some alternatives.

what are the steps to make a word search for a website?

I want to write a word search,which connects to a specific website(huge one),takes the word from user,searches the site and returns the strings which contain the word;this should be written in java and as an applet.I have read some tutorials and questions on this,and understood what have to be done is:
1.connect to a website and get the content of a website and save it to a string.(this should be done with a webcrawler which will be made from my own code for connecting to website and save the content to a string + jsoup library to parse the html code).
2.save the datas to a database(in my case nosql database).
3.index the datas in database.
4.query the database to show the results.
5.make a UI for showing the search results(I use swing.japplet).
now my qustions are:
1.have I understood correctly the steps which I have to go?(please explain me in details if a step is unnecessary or necessary)
2.Is it necessary to have a database?
notice:I want to implement it myself,without using ready things such as lucene,nutch,solr,...
edit:3 people told me applet is not suitable for such a thing,so what should be the replacement?
many many thanks for your help.
You should look at using Lucene, as it does most of what you want here.
You should not use applets.
For small data set, database should be sufficient. Databases like mysql comes with full text search functions.
For bigger data set, you might want to consider Lucene or Solr.
That is one way way to implement this. Another (simpler) way would be to use an existing text search / indexing engine like Lucene / Solr. Going to the effort of reimplementing the "text search / indexing" wheel using database technology strikes me as a waste of effort, unless you have a sound technical reason for doing so.
You do need to has some kind of database, because indexing a website on the fly would simply not work. Lucene will handle that.
I think your choice of Java applets to build the UI is a bad idea. There are other technologies that give results that are as good or better ... without the security risk of a Java browser plugin.
Finally, another way to make your website searchable is to get Google to do it for you. Make your website content indexable, and then use Google's search APIs.

Google page index - java

does any one has idea how to get google page index in Java?
I was googleing since last 2-3 days but helpless, can any one refer me API for that or give some suggestion for how to do that
Lots of thanks in advance
For example if we search for facebook in google, we get around 22,980,000,000 results. So I want to fetch this number using JAVA
make a corresponding HTTP request from Java to Google, then parse the replied HTML code. There is a div with the ID resultStats. This div contains the number of results.
Not sure what your real requirement is, what kind of index do you want? Google export fairly a bit amount of APIs via RESTful service, some of them are packaged with JavaScript lib like Google MAP API. There are also Java client library for OAUTH authentication
The custom search API information could be found at http://code.google.com/apis/customsearch/v1/overview.html. A comprehensive list of google APIs could be accessed at https://code.google.com/apis/console

Wicket SEO: removing "?wicket:interface=..." from AjaxFallbackLinks

I use Wicket's AjaxFallbackLink in a number of places. This works fine for users, but it's giving us some SEO headaches.
When Google crawls one of our pages, it might be hours or days before they return and try crawling the AjaxFallbackLinks on that page. Of course since the links look like this:
http://example.com/?wicket:interface=:1869:mediaPanel:permissionsLink::IBehaviorListener:0:2
... the session is no longer valid by the time the crawler returns. This results in a ton of 404 errors on our site, which presumably harms our SEO.
My question: how can I make the Ajax links "stable" (like a BookmarkablePageLink) for search engines, but still retain the Ajax behavior for interactive users?
You can tell Google to ignore certain URL parameters by using the URL Parameter options in Google Webmaster Tools. As of July 2011, you can even tell Google what to do in the case where changing the URL parameters has an effect on the page content (e.g. paging or sorting).
To access the feature, log into your Google webmaster tools account,
click on the site you want to configure, and then choose Site
configuration > URL parameters. You’ll see a list of parameters Google
has found on the site, along with the number of URLs Google is
“monitoring” that contain this parameter.
The default behavior is “Let Googlebot decide”. This results in Google
figuring out duplicates and clustering them.
http://searchengineland.com/google-adds-url-parameter-options-to-google-webmaster-tools-86769
The question for you is whether the content of the page does change when you ignore the wicket:interface params. If it does, maybe you need to explore moving to a stateless Ajax fallback, such as the one described here:
https://github.com/jolira/wicket-stateless

Categories

Resources