Latest step by Step Installation guide for dummies: Nutch 0.9
By Peter P. Wang, Zillionics LLC
Try the search engine I developed for The Christian Life: Malachi Search
Please support my effort by using the best free/low price web hosting: 1&1 Inc
To add your comments, please go to: http://nutchtube.blogspot.com/2008/02/latest-step-by-step-installation-guide.html
Run it by clicking the Configure Tomcat icon below.
Click the Start button below to start Apache Tomcat Service.
Then you will be able to see the following screen in the browser if you go to http://localhost:8080
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<description>Peter Pu Wang
<description> Nutch spiderman
Once things are configured, running the crawl is easy. Just use the crawl command. Its options include:
For example, a typical call might be:
bin/nutch crawl urls -dir crawl -depth 3 -topN 50
Typically one starts testing one's configuration by crawling at shallow depths, sharply limiting the number of pages fetched at each level (-topN), and watching the output to check that desired pages are fetched and undesirable pages are not. Once one is confident of the configuration, then an appropriate depth for a full crawl is around 10. The number of pages per level (-topN) for a full crawl can be from tens of thousands to millions, depending on your resources.
d. Set Your Searcher Directory
Next, navigate to your nutch webapp folder then WEB-INF/classes. Edit the nutch-site.xml file and add the following to it (make sure you don't have two sets of <configuration></configuration> tags!):
For example, if your nutch directory resides at C:\nutch-0.9.0 and you specified crawl as the directory after the -dir command, then enter C:\nutch-0.9.0\crawl\ instead of your_crawl_folder_here.
Reload the Application. Use the Tomcat Manager and simply click the "Reload" command for nutch, or restart Tomcat using the windows services tool.
Open up a browser and enter the url http://localhost:8080. The nutch search page should appear. As long as you've defined the correct location of your nutch index directory (as shown above), clicking search should yield results.
Congratulations! It rocks!
Peter P. Wang