Implementation of crawler4j - java

I am attempting to get the basic form of crawler4j running as seen here. I have modified the first few lines by defining the rootFolder and numberOfCrawlers as follows:
public class BasicCrawlController {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("Needed parameters: ");
System.out.println("\t rootFolder (it will contain intermediate crawl data)");
System.out.println("\t numberOfCralwers (number of concurrent threads)");
return;
}
/*
* crawlStorageFolder is a folder where intermediate crawl data is
* stored.
*/
String crawlStorageFolder = args[0];
args[0] = "/data/crawl/root";
/*
* numberOfCrawlers shows the number of concurrent threads that should
* be initiated for crawling.
*/
int numberOfCrawlers = Integer.parseInt(args[1]);
args[1] = "7";
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(crawlStorageFolder);
No matter how I seem to define it I still am receiving the error
Needed parameters:
rootFolder (it will contain intermediate crawl data)
numberOfCralwers (number of concurrent threads)
I think that I need to "set the paramaters in the Run Configurations" window but I do not know what that means. How can I properly configure this basic crawler to get it up and running?

After you compile the program with the javac keyword you need to run it by typing the following:
java BasicCrawler Controller "arg1" "arg2"
The error is telling you that you aren't specifying arg[0] or arg[1] when you run the program. Also, what is with this " args[1] = "7";" after you have already received the number of crawlers parameter?
For what it looks like you are trying to do remove the first 5 lines because you are attempting to use hard coded values anyway. Then set the crawlForStorage String to your directory path and the numberOfCrawlers to 7. Then you wouldn't have to specify command line parameters. If you want to use command line parameters get rid of your hard coded values above and specify them at the CL

Related

How can I handle spaces in arguments?

My app launches and loads the file if I do: myApp /file:c:\nospaces.asd from cmd but if I do myApp /file:c:\with spaces.asd it won't work because the program receives two arguments: myApp /file:c:\with and spaces.asd.
I know that I can do myApp "/file:c:\with spaces.asd" and it'll work like that from cmd. However this isn't a good solution because if I double click the .asd file (custom extension) (and select launch with my app) then main won't be getting the arguments as one line but as two arguments.
How can I go about fixing this issue so that my main will receive only one argument when double clicking the file?
You could join the arguments together if the file is not found. Something like this:
public static void main(String[] args) {
String fileName = joinArgumentsToValidFileName(args);
}
public static String joinArgumentsToValidFileName(String[] args) {
if(args.length == 0) {
return "";
}
String fileName = args[0];
int index = 1;
while(!new File(fileName).exists() && index < args.length) {
fileName += " " + args[index];
index++;
}
return fileName;
}
This assumes that the first argument (or the arguments) must be the file name. Any additional agruments could be evaluated by remembering the offset index somehow (not included in the code above).
But note: This is a not standard behavior of passing arguments to an application and could lead to cunfusion. So if you find a way to pass a "" wrapped file name argument, don't do this!
Double clicking to launch java would be affected by whats registered in windows related to the same i.e windows registry for jar-file launch command.
These are some similar questions asked on SO you can take a look at:
Running JAR file on Windows
java can run jar from cmd but not by double clicking
They mention the useful part which is ensuring you have the wildcard behind the executable in the registry c:\...\javaw.exe" -jar "%1" %

Packaging a jar with preconfigured command line arguments

I am wondering if there's a way to create a jar that includes some command line arguments in it, the arguments that are usually passed in the command line when one tries to start up the jar (these parameters are then passed on to the main function). Basically instead of starting my app with
java -jar myapp.jar "arg1" "arg2", I want to start my app with
java -jar myapp.jar
and have "arg1" and "arg2" passed to the main function.
The reason behind this is that I want to deploy this to different environments, and I want my jar to contain different parameters according to the environment it's being deployed at.
Maybe there's another way to achieve similar results ??
Cheers.
PS: Looking for a maven solution.
Edit: I'll add a complete example to make this a bit more clear:
Let's say I have 2 environments: "Production" and "Test". I want to run the jar in the same way no matter in what environment I deploy it. So I always want to run it with:
java -jar myapp.jar
But! In order for my 2 environments to run ok, I need the Production environment jar to start it's main method with an argument "prod" and I need the Test environment jar to start it's main method with an argument "test".
If I correctly understood your problem, in your main() you could define a simple logic to handle the case where you do not specify any input parameter; the logic could retrieve the desired values according to the correct platform/env.
As an example:
public class Test01
{
public static void main(String... aaa)
{
// Check input
if(aaa.length == 0) {
/* Insert logic to retrieve the value you want, depending on the platform/environment.
* A trivial example could be: */
aaa = new String[2];
aaa[0] = "First value";
aaa[1] = "Second value";
}
// Processing, e.g. print the 2 input values
System.out.println(aaa[0] + ", " + aaa[1]);
}
}
Fyi, I created a runnable jar using eclipse, and start the application by either
java -jar Test01.jar
or
java -jar Test01.jar arg1 arg2
Hope this helps!
One solution is to change main(String[] args) to get values from env var if they are not present in the passed arguments.
String user;
String password;
if(args.length < 2)
{
user = System.getenv("appUser");
password = System.getenv("appPassword");
} else {
user = args[0];
password = args[1];
}
You can also create another class with a main function that will call the real one.
public class CallerMyApp{
public void main(String[] args) {
String[] realArgs = {System.getenv("appUser"), System.getenv("appPassword")};
MyApp.main(realArgs);
}
}
Then to execute its something like
java -cp myapp.jar CallerMyApp

java.lang.ArrayIndexOutOfBoundsException: 0

I am learning java using a book. There is this exercise that I can't get to work properly. It adds two doubles using the java class Double. When I try to run this code in Eclipse it gives me the error in the title.
public static void main(String[] args) {
Double d1 = Double.valueOf(args[0]);
Double d2 = Double.valueOf(args[1]);
double result = d1.doubleValue() + d2.doubleValue();
System.out.println(args[0] + "+" + args[1] + "=" + result);
}
Problem
This ArrayIndexOutOfBoundsException: 0 means that the index 0 is not a valid index for your array args[], which in turn means that your array is empty.
In this particular case of a main() method, it means that no argument was passed on to your program on the command line.
Possible solutions
If you're running your program from the command line, don't forget to pass 2 arguments in the command (2, because you're accessing args[0] and args[1])
If you're running your program in Eclipse, you should set the command line arguments in the run configuration. Go to Run > Run configurations... and then choose the Arguments tab for your run configuration and add some arguments in the program arguments area.
Note that you should handle the case where not enough arguments are given, with something like this at the beginning of your main method:
if (args.length < 2) {
System.err.println("Not enough arguments received.");
return;
}
This would fail gracefully instead of making your program crash.
This code expects to get two arguments when it's run (the args array).
The fact that accessing args[0] causes a java.lang.ArrayIndexOutOfBoundsException means you aren't passing any.

Look for previous working directory to implement "cd -"

I am currently implementing a shell with limited functionality using Java programming language. The scope of the shell has restricted requirement too. The task is to model a Unix shell as much as I can.
When I am implementing the cd command option, I reference a Basic Shell Commands page, it mentions that a cd is able to go back to the last directory I am in with the command "cd -".
As I am given only a interface with the method public String execute(File presentWorkingDirectory, String stdin).
I will like to know if there is API call from Java which I can retrieve the previous working directory, or if there any implementation for this command?
I know one of the simple implementation is to declare a variable to store the previous working directory. However I am currently having the shell itself (the one that take in the command with options), and each time a command tool is executed, a new thread is created. Hence I do not think it is advisable for the "main" thread to store the previous working directory.
Update (6-Mar-'14): Thank for the suggestion! I have now discussed with the coder for shell, and have added an additional variable to store the previous working directory. Below is the sample code for sharing:
public class CdTool extends ATool implements ICdTool {
private static String previousDirectory;
//Constructor
/**
* Create a new CdTool instance so that it represents an unexecuted cd command.
*
* #param arguments
* the argument that is to be passed in to execute the command
*/
public CdTool(final String[] arguments) {
super(arguments);
}
/**
* Executes the tool with arguments provided in the constructor
*
* #param workingDir
* the current working directory path
*
* #param stdin
* the additional input from the stdin
*
* #return the message to be shown on the shell, null if there is no error
* from the command
*/
#Override
public String execute(final File workingDir, final String stdin) {
setStatusCode(0);
String output = "";
final String newDirectory;
if(this.args[0] == "-" && previousDirectory != null){
newDirectory = previousDirectory;
}
else{
newDirectory = this.args[0];
}
if( !newDirectory.equals(workingDir) &&
changeDirectory(newDirectory) == null){
setStatusCode(DIRECTORY_ERROR_CODE);
output = DIRECTORY_ERROR_MSG;
}
else{
previousDirectory = workingDir.getAbsolutePath();
output = changeDirectory(newDirectory).getAbsolutePath();
}
return output;
}
}
P.S: Please note that this is not the full implementation of the code, and this is not the full functionality of cd.
Real shell (at least Bash) shell stores current working directory path in PWD environment variable and old working directory path in OLDPWD. Rewriting PWD does not change your working directory, but rewriting OLDPWD really changes where cd - will take you.
Try this:
cd /tmp
echo "$OLDPWD" # /home/palec
export OLDPWD='/home'
cd - # changes working directory to /home
I don’t know how you implement the shell functionality (namely how you represent current working directory; usually it’s an inherent property of the process, implemented by the kernel) but I think that you really have to keep the old working directory in an extra variable.
By the way shell also forks for each command executed (except for the internal ones). Current working directory is a property of a process. When a command is started, it can change its inner current working directory, but it does not affect the shell’s one. Only cd command (which is internal) can change shell’s current working directory.
If you want to keep more than one working directory just create a LinkedList where you add each new presentWorkingDirectory at the and and if you want to return use linkedList.popLast to get the last workingDirectory.

How to read and write text files from the main in java

The static method main, which receives an array of strings. The array should have two elements: the path where the files are located (at index 0), and the name of the files to process (at index 1). For example, if the name was “Walmart” then the program should use “Walmart.cmd” (from which it will read commands) and “Walmart.pro” (from which it will read/write products).
I don't want anyone to write the code for me because this is something I need to learn. However I've been reading this through and the wording is confusing. If someone could help me understand what it wants from me through pseudo-code or an algorithm it would be greatly appreciated.
Where I'm confused is how to initialize arg[0] and arg[1] and exactly
what they are being initialized to.
The main method's String array input argument consists of whatever String arguments you pass to the program's main method when you run the program. For example, here is a simple program that loops over args and prints a nice message with each argument's index and value on a separate line:
package com.example;
public class MainExample {
public static void main(String[] args) {
for (int i = 0; i < args.length; i++) {
System.out.printf("args[%d]=%s\n", i, args[i]);
}
}
}
Once you've compiled the program, you can run it on the command-line and pass it some arguments:
java -cp . com.example.MainExample eh? be sea 1 2 3 "multiple words"
Output:
args[0]=eh?
args[1]=be
args[2]=sea
args[3]=1
args[4]=2
args[5]=3
args[6]=multiple words
So lets explain to you
Create a class Inventory : if you don't know how to create a class google it just as is
The static method main: Every executable class in java (at least from the console) has the main method you should google java main method and propably in the same place you find it you will see the default arguments that it receives
When you learn about the default arguments of method main you will undertand about the 'args' that has to be on it
You will have t study the class String google it "java String class"
You will have to study the class File google it "java File class"
At the end everything else would be just logic and I beleave you have learned some at this point.
public class Inventory { // class inventory
public static void main(String[] args) // main method
{
if(args.length==2){ // check if args contains two elements
String filePath = args[0];
String fileName = args[1];
filePath+= System.getProperty("file.separator")+fileName;
File fileCMD = new File(filePath+".cmd");
//fileCMD.createNewFile();
File filePRO =new File(filePath+".pro");
//filePRO.createNewFile();
}
else {
//write the code to print the message Usage: java Inventory Incorrect number of parameters for a while and exit the program.
}
}
This is what I've understood. Basically you have to write a program to create two files, one called fileName.cmd and the other fileName.pro. You have to construct the path of the files using the arguments (input parameters of the main method) and system's file separator. If the arguments don't have two elements you have to print the 'invalid' message. That's it.
Where I'm confused is how to initialize arg[0] and arg[1] and exactly
what they are being initialized to.
You have to use command line to pass the arguments and launch the program , something like the following code in cmd or terminal:
java inventory thePath theFileName
That's how it get initialized.

Categories

Resources