Crawling GitHub with JGit

Crawling GitHub with JGit - java

I'm trying to crawl a GitHub Wiki with JGit.
When I try it with one URL, it worked perfectly fine. Then I tried it with another random URL and got an error.
Please see the extract of my code:
import java.io.File;
import java.io.IOException;
import org.eclipse.jgit.api.Git;
import org.eclipse.jgit.api.errors.GitAPIException;
public class Main {
// with this URL I get an error
String url = "https://github.com/radiant/radiant.wiki.git";
// this URL works
// String url = "https://github.com/WardCunningham/Smallest-Federated-Wiki.wiki.git";
public static void main(String[] args) {
Main m = new Main();
m.jgitTest();
System.out.println("Done!");
}
public void jgitTest() {
try {
File localPath = File.createTempFile("TestGitRepository", "");
localPath.delete();
Git.cloneRepository().setURI(url).setDirectory(localPath).call();
} catch (IOException | GitAPIException e) {
System.err.println("excepton: " + e.getMessage());
e.printStackTrace();
}
}
}
This is the stack trace:
Exception in thread "main" org.eclipse.jgit.dircache.InvalidPathException: Invalid path (contains separator ':'): How-To:-Create-an-Extension.textile
at org.eclipse.jgit.dircache.DirCacheCheckout.checkValidPathSegment(DirCacheCheckout.java:1243)
at org.eclipse.jgit.dircache.DirCacheCheckout.checkValidPathSegment(DirCacheCheckout.java:1225)
at org.eclipse.jgit.dircache.DirCacheCheckout.checkValidPath(DirCacheCheckout.java:1185)
at org.eclipse.jgit.dircache.DirCacheCheckout.processEntry(DirCacheCheckout.java:311)
at org.eclipse.jgit.dircache.DirCacheCheckout.prescanOneTree(DirCacheCheckout.java:290)
at org.eclipse.jgit.dircache.DirCacheCheckout.doCheckout(DirCacheCheckout.java:408)
at org.eclipse.jgit.dircache.DirCacheCheckout.checkout(DirCacheCheckout.java:393)
at org.eclipse.jgit.api.CloneCommand.checkout(CloneCommand.java:236)
at org.eclipse.jgit.api.CloneCommand.call(CloneCommand.java:127)
at Main.jgitTest(Main.java:21)
at Main.main(Main.java:13)
If you visit the wiki page of the URL that doesn't work (https://github.com/radiant/radiant/wiki), you will find this page: How To: Create an Extension.
The title of this page is the cause of the error: Invalid path (contains separator ':'): How-To:-Create-an-Extension.textile.
I assume I need to escape all output.

I suppose you are on windows. You can't create a file on windows having the ":" in the name. JGit should handle it somehow, so I suppose this is a bug in JGit.

I had the same problem with pure git, and this answer helped me:
git config core.protectNTFS false

Related

how to authorize an user using jGit

I'm creating an application in Java and using jGit. As part of this I need to authenticate an user. I want to output if the user is existing or not. Currently I get an exception as user is not authorized. Below is my code.
import java.io.File;
import org.eclipse.jgit.api.Git;
import org.eclipse.jgit.transport.UsernamePasswordCredentialsProvider;
public class AuthenticateanUser {
public static void main(String[] args) throws Exception {
final String REMOTE_URL = "https://myRepo.git";
// prepare a new folder for the cloned repository
File localPath = File.createTempFile("TestGitRepository", "");
localPath.delete();
// then clone
try (Git result = Git.cloneRepository().setURI(REMOTE_URL)
.setCredentialsProvider(new UsernamePasswordCredentialsProvider("myId", "myPwd"))
.setDirectory(localPath).call()) {
System.out.println("Having repository: " + result.status());
}
}
}
when I run my above code, If I give correct credentials, I get the output as
Having repository:XXXXX
if I give wrong credentials I get error as
Exception in thread "main" org.eclipse.jgit.api.errors.TransportException: https://myRepo.git: not authorized
Instead of this I want to print, Invalid credentials.
please let me know where am I going wrong and how can I fix this.
Thanks

You go:
try (Git result = Git.cloneRepository().setURI(REMOTE_URL) {
...
} catch (TransportException te) {
System.out.println("Invalid credentials");
}
for example.
You should not tell the user if the account exists or not. As in: if you tell that an attacker, he can conclude that he already got a valid username.

"Image not valid, but TextSearch is switched off!" error - Sikuli on Mac

I am trying to automate a desktop application on Mac using Sikuli and Eclipse.
Source code:
import org.sikuli.script.FindFailed;
import org.sikuli.script.ImagePath;
import org.sikuli.script.Screen;
public class TextEditorExample {
public static void main(String[] args) throws FindFailed {
// TODO Auto-generated method stub
Screen s=new Screen();
System.out.println(ImagePath.getBundlePath());
s.click("spotlight_icon.png");
s.find("spotlight.png");
s.type("spotlight.png","finder");
s.click("applications.png");
s.click("texteditor_icon.png");
s.find("texteditor.png");
s.type("texteditor.png","Sikuli Example");
}
}
But I'm getting the following error :
[error] Image: Image not valid, but TextSearch is switched off!
[error] RunTimeAPI: Wait: Abort: unknown
[error] RunTimeAPI: ImageMissing: spotlight_icon.png
Path of sikuli script:
/Users/adamin/Desktop/Automation/SikuliExample/src/TextEditorExample.java
Path of Images:
/Users/adamin/Desktop/Automation/SikuliExample/src/spotlight_icon.png
/Users/adamin/Desktop/Automation/SikuliExample/src/spotlight.png
/Users/adamin/Desktop/Automation/SikuliExample/src/applications.png
/Users/adamin/Desktop/Automation/SikuliExample/src/texteditor_icon.png
/Users/adamin/Desktop/Automation/SikuliExample/src/texteditor.png
Can anybody help me in solving this issue?

The imagepath is set by default to your project root folder and will only look for patterns there. Just set the bundle path manually to wherever your files are:
ImagePath.setBundlePath("fullpath");
Alternatively, place your files to whatever folder that is returned by:
System.out.println(ImagePath.getBundlePath());

Use Pattern.
Pattern pattern = new Pattern(path+"spotlight_icon.png");
Screen s=new Screen();
try {
s.click(pattern);
} catch (FindFailed e) {
e.printStackTrace();
}

This Error most probably comes when image is not loadable, Meanwhile, use this approach
try{
String path = "path of your image";
Pattern target = new Pattern(path);
Screen scr = new Screen();
scr.click(target);
}
catch(Exception e)
{
e.printStackTrace();
}
Y

URI Schema: Infinite command prompts are opening

I went through the following doc center and tried to create my own URI schema myDocs:
https://msdn.microsoft.com/en-us/library/aa767914(v=vs.85).aspx
Following is my Java program. It takes a command line argument and returns the URL in the browser.
import java.awt.Desktop;
import java.io.IOException;
public class URIOpen {
public static void main(String args[]) {
if (args.length == 0) {
return;
}
String uri = args[0];
try {
Desktop.getDesktop().browse(java.net.URI.create(uri));
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
}
I updated the (Default) value field of the command key like below.
"C:\Program Files (x86)\Java\jdk1.8.0_102\bin\java" -cp "C:\Users\Krishna\Documents\Study\Miscellaneous\examples" "URIOpen" "%1"
When I try to run the command myDocs:http://google.com, I end up opening infinite command prompts.
The following is my URI schema entry structure in the registry. Any help on this?

Your solution end up opening infinite command prompts because of:
you registered the execution of the custom URIOpen class to be activated by the system when it has to deal with myDocs:'s scheme based URI;
when custom URIOpen class executes the line Desktop.getDesktop().browse(java.net.URI.create(uri)); the system will receive again an URI based on the same scheme ( myDocs: ) and it will activate again a new command to execute your class again and again and again ...
Probably you would like to change your code in someway like that:
try {
java.net.URI theURI = java.net.URI.create(uri);
// System.out.println(theURI.getScheme()); => myDocs
String uriBrowsablePart = theURI.getRawSchemeSpecificPart();
// System.out.println(uriBrowsablePart); => http://google.com
Desktop.getDesktop().browse(java.net.URI.create(uriBrowsablePart));
// the above statement will open default browser on http://google.com
} catch (IOException e) {
System.out.println(e.getMessage());
}
try replacing your try-catch block with my suggestion and see if it works as required.

Extracting Text From JPG

I've tried this code and added the needed jar files but still I'm getting an error message like Exception in thread "main" java.lang.UnsatisfiedLinkError: Unable to load library 'libtesseract302'.
Is there a complete tutorial how to extract text and what things should be done to address the error? Any help is appreciated...
import net.sourceforge.tess4j.*;
import java.io.File;
public class ExtractTxtFromImg {
public static void main(String[] args) {
File imgFile = new File("C:\\Documents and Settings\\rueca\\Desktop\\sampleImg.jpg");
Tesseract instance = Tesseract.getInstance(); // JNA Interface Mapping
// Tesseract1 instance = new Tesseract1(); // JNA Direct Mapping
try {
String result = instance.doOCR(imgFile);
System.out.println(result);
} catch (Exception e) {
System.err.println(e.getMessage());
}
}
}

In addition to adding the jars, you also need to add the natives. You can do so with Djava.library.path="C:\[absolute path to dir containing *.dll files and such]"
Note that you need to provide the directory, not the file itself.

java.net.URL refer to a file in a parent directoty

I have a very basic question. I need a URL object but the file is in the previous directory relative to the project.
For instance, if I do
File testFile = new File("../../data/myData.xml");
works perfectly fine, it finds the file
However,
URL testURL = new URL("file:///../../data/myData.xml")
gives an
Exception in thread "main" java.io.FileNotFoundException: /../../data/myData.xml
Any idea, how to solve, work around this? without changing the position of the data?
Thanks a lot in advance
Altober

you can use this
URL testURL = new File("../../data/myData.xml").toURI().toURL();

/**
* #param args
*/
public static void main(String[] args) {
try {
URL testUrl = new URL("file://C:/Users/myName/Desktop/abc.txt");
System.out.println(testUrl.toString());
} catch (MalformedURLException e) {
e.printStackTrace();
}
}
}
The above code is working file, just tested it, so you need to use file:// and if possible try full path

Exception in thread "main" java.io.FileNotFoundException: /../../data/myData.xml
Note that it is looking for parent directory of root directory, not of current directory.
I dont know if File URLs can refer to relative paths, try
‘new URL("file://../../data/myData.xml")'‘

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Crawling GitHub with JGit - java

I suppose you are on windows. You can't create a file on windows having the ":" in the name. JGit should handle it somehow, so I suppose this is a bug in JGit.

I had the same problem with pure git, and this answer helped me: git config core.protectNTFS false

Related

how to authorize an user using jGit

"Image not valid, but TextSearch is switched off!" error - Sikuli on Mac

URI Schema: Infinite command prompts are opening

Extracting Text From JPG

java.net.URL refer to a file in a parent directoty

Categories

Resources