Integrating Apache Nutch with MySQL on Windows

Integrating Apache Nutch with MySQL on Windows - java

I am trying to integrate Apache Nutch 2.1 with Mysql server on Windows 8 platform. I am following tutorial http://nlp.solutions.asia/?p=180. I have made following changes to the apache-nutch-2.1.
I downloaded apache-nutch-2.1-src.zip and extracted.
Uncommented following in ivy/ivy.xml
<dependency org="mysql" name="mysql-connector-java" rev="5.1.18" conf="*->default"/>
commented sql properties for and added gora properties for mysql conf/gora.properties.
gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?
createDatabaseIfNotExist=true
gora.sqlstore.jdbc.user=root
gora.sqlstore.jdbc.password=root
Added properties to conf/nutch-site.xml
executed ant runtime command from command prompt. It created /runtime directory.
Added seeds.txt file inside /runtime/local/urls directory with www.apache.nutch.org value.
added +^http://([a-z0-9]*.)*nutch.org/ to both domain-urlfilter.txt and regex-urlfilter.txt files inside /runtime/local/conf directory.
When I am running command for start crawling through cygwin terminal..following exception is occurring,
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Abhijeet\mapred\staging\Abhijeet530509219\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:219)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
I have searched over internet that Hadoop does not work with Windows which is alright as I my not using Hadoop for storing data. I am using Mysql.
Can anybody suggest What am i doing wrong ?

I have using Nutch2 on both windows and Linux. Just to run it on Windows you need this Haddop 1.0.3 patch installed: https://github.com/congainc/patch-hadoop_7682-1.0.x-win.

Related

Selenium WebDriverManager - Exception when running on Linux Server

I am new to Selenium and having some issues.
I am using the WebDriverManager in connection with Selenium. This is my code:
WebDriverManager.chromedriver().setup();
When I run the code on my local system (Windows 10 OS), everything runs perfectly fine. When I run my code as a web application on our Linux Server (Ubuntu 18.04, Tomcat 9), I get the following exception:
io.github.bonigarcia.wdm.config.WebDriverManagerException: Exception reading resolution cache as a properties file
at io.github.bonigarcia.wdm.cache.ResolutionCache.<init>(ResolutionCache.java:86)
at io.github.bonigarcia.wdm.WebDriverManager.getResolutionCache(WebDriverManager.java:1490)
at io.github.bonigarcia.wdm.WebDriverManager.clearResolutionCache(WebDriverManager.java:780)
at io.github.bonigarcia.wdm.WebDriverManager.handleException(WebDriverManager.java:1263)
at io.github.bonigarcia.wdm.WebDriverManager.manage(WebDriverManager.java:1060)
at io.github.bonigarcia.wdm.WebDriverManager.setup(WebDriverManager.java:393)
....
Caused by: java.io.IOException: No such file or directory
at java.base/java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.base/java.io.File.createNewFile(File.java:1035)
at io.github.bonigarcia.wdm.cache.ResolutionCache.<init>(ResolutionCache.java:75)
I am using Selenium 4.2.1 and Webdriver 5.1.0.
On our Linux server, I have installed Google Chrome as described here. When running
google-chrome --version
-> Google Chrome 102.0.5005.115
I get the shown result, so I think Chrome should be installed correctly.
Has anybody an idea?

It seems it is failing to create the resolution cache, which is a properties files created by default in the following path: ~/.cache/selenium. You can try to create manually that path (although WebDriverManager should have been able to create when it does not exist).
In any case, to debug it properly, you need to check the WebDriverManager traces. For that, you need need to include a Logback configuration file (for example, like this) in your project classpath. The name of this file should be src/test/resources/logback-test.xml (if you want logs only for your tests) or src/test/resources/logback.xml (if you want logs for both tests and application code). Then, you can to use the following line to set the level to TRACE:
<logger name="io.github.bonigarcia" level="TRACE" />
For further info about logging with with SLF4J and Logback you can see the following tutorial.

Class not found exception while creating a table in HBASE

After succesfull installation and configuration of HBase on top of HDFS on our local servers, I did the same configuration on our OVH VPS machines, however I am getting a strange error.
Entire setup is starting fine, however, when I try to create a table from hbase shell, I am getting following error:
2017-05-20 11:59:19,256 ERROR
[RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=16000]
master.MasterRpcServices: Region server prdhad001,16020,1495274311971
reported a fatal error:
ABORTING region server prdhad001,16020,1495274311971: The coprocessor
org.apache.hadoop.hbase.client.coprocessor.AggregateImplementation
threw java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.client.coprocessor.AggregateImplementation
Cause: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.client.coprocessor.AggregateImplementation
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
and after that entire hbase is getting corrupted and throwing numerous errors. It seems that it doesn't load some jars properly , however jar is present inside lib folder.
My configuration :
Virtualization: kvm Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-514.16.1.el7.x86_64
Architecture: x86-64 Hadoop 2.7.3 HBase 1.3.0
export PATH=$PATH:$HADOOP_HOME/bin export
HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin export
HADOOP_MAPRED_HOME=$HADOOP_HOME export
HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export
HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export
HBASE_CLASSPATH=/usr/local/hbase/lib/

I figured it out, inside the hbase-site.xml I had following property added:
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>
which is used for the coprocessor, which we don't use at the moment. Removing this part fixed the issue, however, I guess that if we would like to use coprocessor in the future we might encounter this issue again do any other help will bee appreciated.

Sqlite java.lang.UnsatisfiedLinkError in linux

I have a program that uses Sqlite database. It works fine on Windows (exported jar or directly in Eclipse) but when I move it to linux server (plan is to use run it at certain intervals, cron job). I'm exporting it to jar from Eclipse and packing the sqlite-jdbc4-3.8.2-SNAPSHOT.jar with it. Error is this:
/$ /usr/bin/java -jar /home/username/Software.jar /home/username/
java.lang.UnsatisfiedLinkError: /tmp/sqlite-3.8.2-amd64-libsqlitejdbc.so: /tmp/sqlite-3.8.2-amd64-libsqlitejdbc.so: failed to map segment from shared object: Operation not permitted
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.sqlite.core.NativeDB._open(Ljava/lang/String;I)V
at org.sqlite.core.NativeDB._open(Native Method)
at org.sqlite.core.DB.open(DB.java:161)
at org.sqlite.core.CoreConnection.open(CoreConnection.java:145)
at org.sqlite.core.CoreConnection.<init>(CoreConnection.java:66)
at org.sqlite.jdbc3.JDBC3Connection.<init>(JDBC3Connection.java:21)
at org.sqlite.jdbc4.JDBC4Connection.<init>(JDBC4Connection.java:23)
at org.sqlite.SQLiteConnection.<init>(SQLiteConnection.java:44)
at org.sqlite.JDBC.createConnection(JDBC.java:113)
at org.sqlite.JDBC.connect(JDBC.java:87)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:207)
....
So before you ask, I've made sure that sqlite-3.8.2-amd64-libsqlitejdbc.so in /tmp/ has all permissions (rwxrwxrwx). Still that native library is causing problems. It does get copied in /tmp/ folder though. That being said I totally suck in Linux... and for that reason I'm pretty much clueless what to try next.
What should I do? Switch connector?
EDIT:
Solved the problem by using System.setProperty("java.io.tmpdir", "/home/username/"); Apparently it for some reason couldn't execute the native library from tmp folder... Probably because it was created by root. Also I had to revert back to sqlite-jdbc-3.7.2.jar because the new one crashes on linux.

I had same problem, and I found the solution in this GitHub issue:
JAVA_OPTS=-Djava.io.tmpdir=/path/to/some/other/tmpdir bin/cerebro
Also look at this other SO answer.

SVG generation with FOP doesn't work

On one server, and on my Windows laptop, producing PDFs with this method works fine:
http://www.databasesandlife.com/svg-to-pdf/
But on the other server I get this error:
org.apache.batik.transcoder.TranscoderException: Error while setting up PDFDocumentGraphics2D
Enclosed Exception:
Error while setting up fonts
at org.apache.fop.svg.PDFTranscoder.transcode(PDFTranscoder.java:189)
at org.apache.batik.transcoder.XMLAbstractTranscoder.transcode(Unknown Source)
at org.apache.batik.transcoder.SVGAbstractTranscoder.transcode(Unknown Source)
at org.apache.batik.apps.rasterizer.SVGConverter.transcode(Unknown Source)
at org.apache.batik.apps.rasterizer.SVGConverter.execute(Unknown Source)
I have been Googling and searching for hours, but to no avail. What can I do?
I tried installing the following packages but they didn't help:
sudo apt-get install gsfonts gsfonts-x11 gsfonts-other batik \
libbatik-java libxmlgraphics-commons-java \
libxmlgraphics-commons-java fop sun-java6-fonts
My situation is:
Debian 6.0.3
Sun Java version "1.6.0_26"
JARs: avalon-framework-4.2.0.jar batik-all-1.7.jar commons-io-1.3.1.jar commons-logging-1.0.4.jar fop-0.95.jar log4j-1.2.15.jar xml-apis-ext.jar xmlgraphics-commons-1.3.1.jar

Sincerely it is not a good idea to create such temporary files inside the jetty structure from the Debian package. In case of an update, you may get into troubles. Such a cache directory should be located in /var, like /var/tmp for instance.
According to documentation, FOP is supposed to use the temporary directory in case of failure. Probably your finding deserves a bug report.
Until it is fixed, you should set cache-file option Disabling cache with use-cache is another way but probably with performance impacts.

The approach to solving this problem is in log4j.properties I turned up the level to TRACE.
There I saw the extra log before the TranscoderException that I'd seen previously:
2012-02-28 11:51:24,863 DEBUG: org.apache.fop.fonts.FontCache:
Writing font cache to /usr/share/jetty/.fop/fop-fonts.cache
org.apache.batik.transcoder.TranscoderException:
Error while setting up PDFDocumentGraphics2D
Horray for logs! (And writing a log about what the program is about to do, not just once it has done it, so that if the operation fails, then you know what it was trying to do while it failed.)
On Debian, the Jetty webserver runs under the user jetty and has its home directory at /usr/share/jetty/. However, the jetty user does not have write-access to its own home directory, therefore this ~/.fop directory could not be created.
adrian#10770-02:~$ grep jetty /etc/passwd
jetty:x:107:111::/usr/share/jetty:/bin/false
adrian#10770-02:~$ ls -ld /usr/share/jetty
drwxr-xr-x 7 root root 4096 Feb 28 11:52 /usr/share/jetty/
I don't know whether this is by design, or a bug, but creating this directory so that Jetty could write it...
sudo mkdir -p -m 0777 /usr/share/jetty/.fop
...solved the problem.

Weird problem using sun.security.pkcs11.SunPKCS11: The specified procedure could not be found?

I'm developing this application to be used speceifically with Firefox (it's for internal use). Basically, we're using the sun.security stuff to read Firefox's KeyStore and sign data with the certs we get.
I've tested this on several machines and the results are varying, I can't seem to pinpoint the reason.
I've tested it on the latest ubuntu release, Firefox 3.6.13, using Java version 1.6.0_22, it works there. I also have a Windows XP laptop with the same Firefox version using Java version 1.6.0_17, where it works as well.
There are 2 other Windows XP laptops that it will not work on, giving the same error. They're running the same version of Firefox and using java version 1.6.0_17.
The error is:
java.security.ProviderException: Could not initialize NSS
at sun.security.pkcs11.SunPKCS11.<init>(SunPKCS11.java:183)
at sun.security.pkcs11.SunPKCS11.<init>(SunPKCS11.java:86)
at SignedMessage.SigningApplet.initializeCrypto(SigningApplet.java:327)
at SignedMessage.SigningApplet.init(SigningApplet.java:84)
at sun.plugin2.applet.Plugin2Manager$AppletExecutionRunnable.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: The specified procedure could not be found.
at sun.security.pkcs11.Secmod.nssLoadLibrary(Native Method)
at sun.security.pkcs11.Secmod.initialize(Secmod.java:186)
at sun.security.pkcs11.SunPKCS11.<init>(SunPKCS11.java:179)
... 5 more
Exception: java.security.ProviderException: Could not initialize NSS
From what I can tell it can't find the native nssLoadLibrary routine? The configuration file is pointing it to the Firefox install directory (where it can grab the nss3.dll or libnss3.so file). It does this across all pc's and all the paths in the configuration seem to be valid.
A sample config file, for what it's worth:
name=NSS
nssDbMode=readOnly
nssModule=keystore
nssSecmodDirectory="C:\\Documents and Settings\\user\\Application Data\\Mozilla\\firefox\\Profiles/8bzd2qqm.default"
nssLibraryDirectory=C:\Program Files\Mozilla Firefox
I was hoping someone would have a clue, or maybe some tips on getting further with debugging. I'm at a loss here.

I'm likely much too late for this to be of use to you, but I was having similar problems, and adding dist\WINXXX_DBG.OBJ\lib to my PATH resolved this issue.

Quick answer: Use the x86 jdk not the x64 jdk with NSS and JSS
Quick test against a NSS certificate database:
keytool -list -v -storetype pkcs11 -providerClass sun.security.pkcs11.SunPKCS11 -providerArg NSS_CONFIG_FIPS
where NSS_CONFIG_FIPS is the path to a config file pointing to an NSS database. This command will fail with a stack trace that matches the questioners error if it is a JDK issue and succeed if the JDK is configured properly (and if the config file is correct)
Note that my stack trace included the message:
Caused by: java.io.IOException: %1 is not a valid Win32 application.
I ran dumpbin /headers on the NSS dlls and found that the Mozilla built binaries are all 32 bit. I installed the x86 jdk and repointed JAVA_HOME. Everything began working.
To Vivek's point, NSS and the accompanying executables are very sensitive to the presence of the libraries. Be sure all of the .dll, .lib, and .chk files are present on the path. In particular, modutil.exe will fail certain commands without the chk files and the error messages are not helpful. Your NSS lib folder will need to include the NSS and NSPR lib folders, the jss4.dll and jss4.lib files, and the jss4.jar.
Also note that if you build NSS yourself, the libaries will not be signed with an approved code signing cert which will cause problems with JCA.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Integrating Apache Nutch with MySQL on Windows - java

I have using Nutch2 on both windows and Linux. Just to run it on Windows you need this Haddop 1.0.3 patch installed: https://github.com/congainc/patch-hadoop_7682-1.0.x-win.

Related

Selenium WebDriverManager - Exception when running on Linux Server

Class not found exception while creating a table in HBASE

Sqlite java.lang.UnsatisfiedLinkError in linux

SVG generation with FOP doesn't work

Weird problem using sun.security.pkcs11.SunPKCS11: The specified procedure could not be found?

Categories

Resources