How to count inserted/deleted lines in JGit - java

When we do git log --shortstat we get the number of lines inserted, deleted, and changed. Something like:
1 file changed, 9 insertions(+), 3 deletions(-)
Please help me with getting the number of lines inserted, deleted, and changed.
I am doing a repository clone to get git project on local machine. Here is the same code:
RepoClone repoClone = new RepoClone();
repoClone.repoCloner();
repository = builder.setGitDir(repoClone.repoDir).setMustExist(true).build();
I am even able to get a TreeWalk:
TreeWalk treeWalk = getCommitsTreeWalk();
I am able to retrieve file name, count of number of commits per file, LOC, and the number of developers who worked on each xml/ java file.
while (treeWalk.next()) {
if (treeWalk.getPathString().endsWith(".xml") || treeWalk.getPathString().endsWith(".java")) {
jsonDataset = new JSONObject();
countDevelopers = new HashSet<String>();
count = 0;
logs = new Git(repository).log().addPath(treeWalk.getPathString()).call();
for (RevCommit rev: logs) {
countDevelopers.add(rev.getAuthorIdent().getEmailAddress());
count++;
}
jsonDataset.put("FileName", treeWalk.getPathString());
jsonDataset.put("CountDevelopers", countDevelopers.size());
jsonDataset.put("CountCommits", count);
jsonDataset.put("LOC", countLines(treeWalk.getPathString()));
commitDetails.put(jsonDataset);
}
}
Now, I want to retrieve the number of lines inserted and deleted for each file.

The following code snippet compares two commits and prints the changes. diffFormatter.scan() returns a list of DiffEntrys which each describes an added, deleted or modified file. Each of the diff entries in turn has a list of HunkHeaders which desribe the changes within that file.
// Create two commits to be compared
File file = new File( git.getRepository().getWorkTree(), "file.txt" );
writeFile( file, "line1\n" );
RevCommit oldCommit = commitChanges();
writeFile( file, "line1\nline2\n" );
RevCommit newCommit = commitChanges();
// Obtain tree iterators to traverse the tree of the old/new commit
ObjectReader reader = git.getRepository().newObjectReader();
CanonicalTreeParser oldTreeIter = new CanonicalTreeParser();
oldTreeIter.reset( reader, oldCommit.getTree() );
CanonicalTreeParser newTreeIter = new CanonicalTreeParser();
newTreeIter.reset( reader, newCommit.getTree() );
// Use a DiffFormatter to compare new and old tree and return a list of changes
DiffFormatter diffFormatter = new DiffFormatter( DisabledOutputStream.INSTANCE );
diffFormatter.setRepository( git.getRepository() );
diffFormatter.setContext( 0 );
List<DiffEntry> entries = diffFormatter.scan( newTreeIter, oldTreeIter );
// Print the contents of the DiffEntries
for( DiffEntry entry : entries ) {
System.out.println( entry );
FileHeader fileHeader = diffFormatter.toFileHeader( entry );
List<? extends HunkHeader> hunks = fileHeader.getHunks();
for( HunkHeader hunk : hunks ) {
System.out.println( hunk );
}
}
I think with the information provided by DiffEntry and HunkHeader you should be able to get the desired --shortstat.

Related

jgit IncorrectObjectTypeException while compare commits

I have a java program and want to mine a git repository to extract methods signature changes. I want to compare git entry commit differences by java. Therefore, I have used this code:
List<DiffEntry> diffs= git.diff()
.setNewTree(newTreeIter)
.setOldTree(oldTreeIter)
.call();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DiffFormatter df = new DiffFormatter(out);
df.setRepository(git.getRepository());
for(DiffEntry diff : diffs)
{
df.format(diff);
diff.getOldId();
if(diff.getNewPath().endsWith(".java") && diff.getChangeType()== DiffEntry.ChangeType.MODIFY){
modifyItems=difN.getDiffs(diff.getOldId().name(),diff.getNewId().name());
}
Here it is getDiff method which tries to compare two commits based on their id:
Repository repo = new FileRepository("xxx.git");
Git git = new Git(repo);
ObjectReader reader = git.getRepository().newObjectReader();
ObjectId headId = git.getRepository().resolve(headIdStr);
ObjectId oldId = git.getRepository().resolve(oldIdStr);
CanonicalTreeParser oldTreeIter = new CanonicalTreeParser();
oldTreeIter.reset(reader, oldId);
CanonicalTreeParser newTreeIter = new CanonicalTreeParser();
newTreeIter.reset(reader, headId);
List<DiffEntry> diffs= git.diff()
.setNewTree(newTreeIter)
.setOldTree(oldTreeIter)
.call();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DiffFormatter df = new DiffFormatter(out);
df.setRepository(git.getRepository());
for(DiffEntry diff : diffs)
{
df.format(diff);
diff.getOldId();
String diffText = out.toString("UTF-8");
out.reset();
}
return diffs;
I try to compare this commit with its previous one but I am not sure if this code is correct or not. moreover, as long as I run this code this error happens:
Exception in thread "main" org.eclipse.jgit.errors.IncorrectObjectTypeException: Object e973f7fea67fd355623e1df75b0c756004afa55f is not a tree.

How do I do “git show sha1” using JGit

Basically I would like to read the contents of all files in a commit based on the commit hash.
I've tried the following:
try(RevWalk revWalk = new RevWalk(gitRepository))
{
RevCommit commit = revWalk.parseCommit(ObjectId.fromString(commitSha));
RevTree tree = commit.getTree();
try(TreeWalk treeWalk = new TreeWalk(gitRepository))
{
treeWalk.addTree(tree);
treeWalk.setRecursive(true);
ObjectId entryId = null;
while (treeWalk.next())
{
entryId = treeWalk.getObjectId(0);
}
ObjectLoader loader = gitRepository.open(entryId);
}
revWalk.dispose();
}
but it seems to be picking up files from previous commits as well.
EDIT: I realize that I wasn't very specific in my original post.
Let's say I make a commit (Commit1) where I add a file (File1). Then I make a commit (Commit2) where I add a different file (File2). Then I make another commit (Commit3) where I modified File2. I would now like to get the contents of File2 from Commit2 for whatever reason. Using the above, the treewalk will retrieve the contents of Commit2 AND Commit1 which is not what I want.
As you've noticed, Git does not store a commit as a diff to the prior commit, it stores a commit as a snapshot of the entire repository at that point in time.
This is not terribly obvious, because even git show <commitid> will provide you with a diff between a commit and its parent. But it becomes clear when you iterate over the contents of a commit like you've done.
If you want to emulate git show <commitid> and look at what changes were introduced by a commit, you'll need to compare it to its parent.
Git git = new Git(gitRepository);
ObjectId newTreeId = ObjectId.fromString(commitSha + "^{tree}");
ObjectId oldTreeId = gitRepository.resolve(commitSha + "^^{tree}");
CanonicalTreeParser newTree = new CanonicalTreeParser();
newTree.reset(reader, newTreeId);
CanonicalTreeParser oldTree = new CanonicalTreeParser();
oldTree.reset(reader, oldTreeId);
for (DiffEntry de : git.diff().setNewTree(newTree).setOldTree(oldTree).call())
{
/* Print the file diff */
DiffFormatter formatter = new DiffFormatter(System.out);
formatter.setRepository(gitRepository);
formatter.format(de);
}

How to determine who last changed a file with JGit

There is a good cook-book receipt for JGit which describes how to blame the author of a specific line in a file.
Now I want to know who last changed a file. Iterating over all lines to find the last changed line looks a little bit not so elegant. Ideas?
You can use the LogCommand with a path filter like this:
Iterable<RevCommit> iterable = git.log().addPath( "foo.txt" ).call();
RevCommit latestCommit = iterable.iterator().next();
The code looks for the latestCommit that modified foo.txt. I haven't tested the above snippet with merge commits or other commits that have more than one parent.
Note however that this solution potentially may leak resources: the RevWalk which provides the iterator is created by the LogCommand but never closed.
In order to avoid the resource leak you can manually iterate the history like so:
RevCommit latestCommit = null;
String path = "file.txt";
try( RevWalk revWalk = new RevWalk( git.getRepository() ) ) {
Ref headRef = git.getRepository().exactRef( Constants.HEAD );
RevCommit headCommit = revWalk.parseCommit( headRef.getObjectId() );
revWalk.markStart( headCommit );
revWalk.sort( RevSort.COMMIT_TIME_DESC );
revWalk.setTreeFilter( AndTreeFilter.create( PathFilter.create( path ), TreeFilter.ANY_DIFF ) );
latestCommit = revWalk.next();
}

Lucene changing from RAMDirectory to FSDIrectory - Content-Field missing

I'm just a lucene starter and and i got stuck on a problem during a change from a RAMDIrectory to a FSDirectory:
First my code:
private static IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_43,
new StandardAnalyzer(Version.LUCENE_43));
Directory DIR = FSDirectory.open(new File(INDEXLOC)); //INDEXLOC = "path/to/dir/"
// RAMDirectory DIR = new RAMDirectory();
// Index some made up content
IndexWriter writer =
new IndexWriter(DIR, iwc);
// Store both position and offset information
FieldType type = new FieldType();
type.setStored(true);
type.setStoreTermVectors(true);
type.setStoreTermVectorOffsets(true);
type.setStoreTermVectorPositions(true);
type.setIndexed(true);
type.setTokenized(true);
IDocumentParser p = DocumentParserFactory.getParser(f);
ArrayList<ParserDocument> DOCS = p.getParsedDocuments();
for (int i = 0; i < DOCS.size(); i++) {
Document doc = new Document();
Field id = new StringField("id", "doc_" + i, Field.Store.YES);
doc.add(id);
Field text = new Field("content", DOCS.get(i).getContent(), type);
doc.add(text);
writer.addDocument(doc);
}
writer.close();
// Get a searcher
IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(DIR));
// Do a search using SpanQuery
SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "zahl"));
TopDocs results = searcher.search(fleeceQ, 10);
for (int i = 0; i < results.scoreDocs.length; i++) {
ScoreDoc scoreDoc = results.scoreDocs[i];
System.out.println("Score Doc: " + scoreDoc);
}
IndexReader reader = searcher.getIndexReader();
AtomicReader wrapper = SlowCompositeReaderWrapper.wrap(reader);
Map<Term, TermContext> termContexts = new HashMap<Term, TermContext>();
Spans spans = fleeceQ.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), termContexts);
int window = 2;// get the words within two of the match
while (spans.next() == true) {
Map<Integer, String> entries = new TreeMap<Integer, String>();
System.out.println("Doc: " + spans.doc() + " Start: " + spans.start() + " End: " + spans.end());
int start = spans.start() - window;
int end = spans.end() + window;
Terms content = reader.getTermVector(spans.doc(), "content");
TermsEnum termsEnum = content.iterator(null);
BytesRef term;
while ((term = termsEnum.next()) != null) {
// could store the BytesRef here, but String is easier for this
// example
String s = new String(term.bytes, term.offset, term.length);
DocsAndPositionsEnum positionsEnum = termsEnum.docsAndPositions(null, null);
if (positionsEnum.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
int i = 0;
int position = -1;
while (i < positionsEnum.freq() && (position = positionsEnum.nextPosition()) != -1) {
if (position >= start && position <= end) {
entries.put(position, s);
}
i++;
}
}
}
System.out.println("Entries:" + entries);
}
it's just some code i found on a great website and i wanted to try .... everything works great using the RAMDirectory. But if i change it to my FSDirectory it's giving me a NullpointerException like :
Exception in thread "main" java.lang.NullPointerException at
com.org.test.TextDB.myMethod(TextDB.java:184) at
com.org.test.Main.main(Main.java:31)
The statement Terms content = reader.getTermVector(spans.doc(), "content"); seems to get no result and returns null. so the exception. but why? in my ramDIR everything works fine.
It seems that the indexWriter or the Reader (really don't know) didn't write or didn't read the field "content" properly from the index. But i really don't know why its 'written' in a RAMDirectory and not written in a FSDIrectory?!
Anybody an idea to that?
Gave this a test a quick test run, and I can't reproduce your issue.
I think the most likely issue here is old documents in your index. The way this is written, every time it is run, more documents will be added to your index. Old documents from previous runs won't get deleted, or overwritten, they'll just stick around. So, if you have run this before on the same directory, say perhaps, before you added the line type.setStoreTermVectors(true);, some of your results may be these old documents with term vectors, and reader.getTermVector(...) will return null, if the document does not store term vectors.
Of course, anything indexed in a RAMDirectory will be dropped as soon as execution finishes, so the issue would not occur in that case.
Simple solution would be to try deleting the index directory and run it again.
If you want to start with a fresh index when you run this, you can set that up through the IndexWriterConfig:
private static IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_43,
new StandardAnalyzer(Version.LUCENE_43));
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
That's a guess, of course, but seems consistent with the behavior you've described.

JGit detect rename in working copy

Contex
I'm trying to detect possible file rename that occurred after last commit, in a working copy.
On my example, I have a clean working copy and I do that:
git mv old.txt new.txt
Running $ git status shows the expected result:
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# renamed: old.txt -> new.txt
I tried
Using a StatusCommand, I can see old.txt in the removed list, and new.txt in the added list.
But I can't find a way to link them together.
I'm aware of the existence of RenameDetector, but it works using DiffEntry, and I don't know how to get DiffEntries between HEAD and the Working Copy.
Never mind, found the answer.
JGit's API is very complicated..
TreeWalk tw = new TreeWalk(repository);
tw.setRecursive(true);
tw.addTree(CommitUtils.getHead(repository).getTree());
tw.addTree(new FileTreeIterator(repository));
RenameDetector rd = new RenameDetector(repository);
rd.addAll(DiffEntry.scan(tw));
List<DiffEntry> lde = rd.compute(tw.getObjectReader(), null);
for (DiffEntry de : lde) {
if (de.getScore() >= rd.getRenameScore()) {
System.out.println("file: " + de.getOldPath() + " copied/moved to: " + de.getNewPath());
}
}
(This snippet also use Gitective library)
In a case that someone wants to use path filter when getting DiffEntry, new and old path should be provided.
List<DiffEntry> diffs = git.diff()
.setOldTree(prepareTreeParser(repository, oldCommit))
.setNewTree(prepareTreeParser(repository, newCommit))
.setPathFilter(PathFilterGroup.createFromStrings(new String[]{"new/b.txt","b.txt"}))
.call();
RenameDetector rd = new RenameDetector(repository);
rd.addAll(diffs);
diffs = rd.compute();
If you want code of tree parser method:
private static AbstractTreeIterator prepareTreeParser(Repository repository, String objectId) throws IOException {
try (RevWalk walk = new RevWalk(repository)) {
RevCommit commit = walk.parseCommit(repository.resolve(objectId));
RevTree tree = walk.parseTree(commit.getTree().getId());
CanonicalTreeParser treeParser = new CanonicalTreeParser();
try (ObjectReader reader = repository.newObjectReader()) {
treeParser.reset(reader, tree.getId());
}
walk.dispose();
return treeParser;
}
}

Categories

Resources