How to get conflicting lines with JGit

How to get conflicting lines with JGit - java

I'm developing the application, in which JGit is used.
After a pull, I have conflicting files. I can get it from
List<String> list = git.status().call().getConflicting();
The list contains files in conflict.
I know, that I can get conflicting files from
Map<String, int[][]> conflicts = git.pull().call().getMergeResult().getConflicts();
but it doesn't work if I restart my application. After the restart, I will have an empty map because I'm not able to redo the pull when the repository is in merging state.
How can I get conflicting lines by the name of file via JGit API?

You could try to use a ResolveMerger to re-run the merge like so:
ThreeWayMerger merger = StrategyResolve.newMerger(repository, true);
merger.merge(headCommit, fetchedCommit);
Note that the MergeCommand that is called during pull may use a different merge strategy. See MergeCommand ~ line 337 for details. However, make sure to create an in-core merger (the second argument must be true).
With merger.getMergeResults() you should be able to get the conflicting lines.
The whole approach, however, may fail because your work directory is already dirty with conflict markers (<<<<<<<<). Depending on your overall goal, I suggest reconsidering your approach to pull.
If you fetch changes from the upstream repository (without merging immediately) you can dry-run the merge as outlined above as often as necessary. The FetchResult returned by FetchCommand::call() contains information about the commit(s) that were fetched.

Related

Conditional Github Sync For Java Projects

I have two projects in Git named Zeus and Odin. Both contain some java packages, files, and libraries. One particular package named Olympos is common to both and contains 'almost' same files. Difference is that files within Olympos for Zeus may have methods that interact with DB, but those in Odin will never (though it will contain method with same name but only placeholder code). Please look at following example for further clarification:
Project: Zeus;
Package: Olympos;
File: DummyOne.java;
Method:
public void adapterDB(){
// do something to connect with DB
// do something DB specific- fire queries
}
Project: Odin;
Package: Olympos;
File: DummyOne.java;
Method:
public void adapterDB(){
// do nothing and return null;
}
So the problem- Olympos needs to be in sync at all the times for both the projects, such that when a file change is checked in to Zeus, it automatically syncs with that of Odin or vice-versa. But it has to happen conditionally:
- if new method is being checked in to any file that contains DB related operation (in Zeus), only the method header should sync to Odin, without the actual logic
- if new method is being checked in to any file that does NOT contain DB related operation (in either Zeus or Odin), it should completely match in both the packages.
Obviously, I want to shed off the additional time required to make such changes manually every time and then sync them up separately to these projects since there are like two dozen such changes every week.
Is this something possible using Git? Or there is perhaps more obvious solution outside Git (hope I am not missing elephant in the room).

Understanging Conflicts Merging Algorithm

I look at a merge marker that looked all screwed up. To give you the situation lets have this:
public void methodA() {
prepare();
try {
doSomething();
}
catch(Exception e) {
doSomethingElse();
}
}
Now comes in a merge (I use SourceTree for pull).
And the marker looks like this:
<<<<<<<<< HEAD
try {
doSomething();
}
catch(Exception e) {
doSomethingElse();
}
============================
private void methodB() {
doOtherStuff();
>>>>>>>> 9832432984384398949873ab
}
So what the pulled commit does is removing the methodA completely and adding methodB instead.
But you notice that there are some lines entirely missing.
From what I understand of the process, Git is trying a so called auto-merge and if this fails and conflicts where detected, the complete merge is expressed by parts marked with '<<<* HEAD' + before + '====' + after + '>>>* CommitID' and prepare a manual conflict resolution.
So why does it leave out some lines. It looks more like a bug to me.
I use Windows7 and the installed git version is 2.6.2.windows.1. While the newest version is 2.9, I wonder if anything is known about a git version having a merge problem of this magnitude? This is not the first time I experienced something like this... .

You are correct to be concerned: Git knows nothing of languages, and its built-in merge algorithm is based strictly on line-at-time comparisons. You do not have to use this built-in merge algorithm, but most people do because (a) it mostly just works, and (b) there are not that many alternatives.
Note that this depends on your merge strategy (-s argument); the text below is for the default recursive strategy. The resolve strategy is pretty similar to recursive; the octopus strategy applies to more than just two commits; and the ours strategy is entirely different (and is nothing like -X ours). You can also select alternative strategies or algorithms for specific files, using .gitattributes and "merge drivers". And, none of this applies to files that Git has decided to believe are "binary": for these, it does not even attempt merging. (I am not going to cover any of that here, just how the default recursive strategy treats files.)
How git merge works (when using the default -s recursive)
Merge starts with two commits: the current one (also called "ours", "local", and HEAD), and some "other" one (also called "theirs" and "remote")
Merge finds the merge base between these commits
Normally that's just one other commit: the one at the first point where the implied branches1 join up
In some special cases (multiple merge base candidates), Git must invent a "virtual merge base" (but we'll ignore these cases here)
Merge runs two diffs: git diff base local and git diff base other
These have rename detection turned on
You can run these same diffs yourself to see what merge will see
You can think of these two diffs as "what we did" and "what they did". The goal of a merge is to combine "what we did" and "what they did". The diffs are line based, come from a minimal edit distance algorithm,2 and are really just Git's guess about what we did, and what they did.
The output of the first diff (base-vs-local) tells Git which base files correspond to which local files, i.e., how to follow names from the current commit back to the base. Git can then use the base names to spot renames or deletes in the other commit as well. For the most part we can just ignore rename and delete issues, and also new-file-creation issues. Note that Git version 2.9 turns on rename detection by default for all diffs, not just merge diffs. (You can turn this on yourself in earlier Git versions by configuring diff.renames to true; see also the git config setting for diff.renameLimit.)
If a file is changed on only one side (base-to-local, or base-to-other), Git simply takes those changes. Git only has to do a three-way merge when a file is changed on both sides.
To perform a three-way merge, Git essentially walks through the two diffs (base-to-local and base-to-other), one "diff hunk" at a time, comparing the changed regions. If each hunk affects a different part of the original base file, Git just takes that hunk. If some hunk(s) affect the same part of the base file, Git tries to take one copy of whatever that change is.
For instance, if the local change says "add a close brace line" and the remote change says "add (the same place, same indentation) close brace line", Git will take just one copy of the close brace. If both say "delete a close brace line" Git will just delete the line once.
Only if the two diffs conflict—e.g., one says "add a close brace line indented 12 spaces" and the other says "add a close brace line indented 11 spaces" will Git declare a conflict. By default, Git writes the conflict into the file, showing the two sets of changes—and, if you set merge.conflictstyle to diff3, also showing the code from the merge-base version of the file.
Any non-conflicting diff hunks, Git applies. If there were conflicts, Git normally leaves the file in "conflicted merge" state. However, the two -X arguments (-X ours and -X theirs) modify this: with -X ours Git chooses "our" diff hunk in the conflict, and puts that change in, ignoring "their" change. With -X theirs Git chooses "their" diff hunk and puts that change in, ignoring "our" change. These two -X arguments guarantee that Git does not declare a conflict after all.
If Git is able to resolve everything on its own for this file, it does so: you get the base file, plus your local changes, plus their other changes, in the work-tree and in the index/staging-area.
If Git is not able to resolve everything on its own, it puts the base, other, and local versions of the file into the index/staging-area, using the three special nonzero index slots. The work-tree version is always "what Git was able to resolve, plus the conflict markers as directed by various configurable items."
Every index entry has four slots
A file such as foo.java is normally staged in slot zero. This means it is ready to go into a new commit now. The other three slots are empty, by definition, because there is a slot-zero entry.
During a conflicted merge, slot zero is left empty, and slots 1-3 are used to hold the merge base version, the "local" or --ours version, and the other or --theirs version. The work-tree holds the in-progress merge.
You can use git checkout to extract any of these versions, or git checkout -m to re-create the merge conflict. All successful git checkout commands update the work-tree version of the file.
Some git checkout commands leave the various slots undisturbed. Some git checkout commands write into slot 0, wiping out the entries in slots 1-3, so that the file is ready for commit. (To know which ones do what, you just have to memorize them. I had them wrong, in my head, for quite a while.)
You cannot run git commit until all unmerged slots have been cleared out. You can use git ls-files --unmerged to view unmerged slots, or git status for a more human-friendly version. (Hint: use git status. Use it often!)
Successful merge does not mean good code
Even if git merge successfully auto-merges everything, that does not mean the result is correct! Of course, when it stops with a conflict, this also means that Git was not able to auto-merge everything, not that what it has auto-merged on its own is correct. I like to set merge.conflictstyle to diff3 so that I can see what Git thought the base was, before it replaced that "base" code with the two sides of the merge. Often a conflict happens because the diff chose the wrong base—such as some matching braces and/or blank lines—rather than because there had to be an actual conflict.
Using the "patience" diff can held with poor base choice, at least in theory. I have not experimented with this myself. The new "compaction heuristic" in Git 2.9 is promising, but I have not experimented with this either.
You must always inspect and/or test the results of a merge. If the merge is already committed, you can edit files, build and test, git add the corrected versions, and use git commit --amend to shove the previous (incorrect) merge commit out of the way and put in a different commit with the same parents. (The --amend part of git commit --amend is false advertising. It does not change the current commit itself, because it can not; instead, it makes a new commit with the same parent IDs as the current commit, instead of the normal method of using the current commit's ID as the new commit's parent.)
You can also suppress the auto-commit of a merge with --no-commit. In practice, I have found little need for this: most merges mostly just work, and a quick eyeballing of git show -m and/or "it compiles and passes unit tests" catches problems. However, during a conflicted or --no-commit merge, a simple git diff will give you a combined diff (the same sort you get with git show without -m, after you commit the merge), which can be helpful, or may be more confusing. You can run more-specific git diff commands and/or inspect the three (base, local, other) slot entries, as Gregg noted in a comment.
Seeing what Git will see
Besides using diff3 as your merge.conflictstyle, you can see the diffs that git merge will see. All you need to do is run two git diff commands—the same two that git merge will run.
To do these, you must find—or at least, tell git diff to find—the merge base. You can use git merge-base, which literally finds the (or all) merge base(s) and prints them out:
$ git merge-base --all HEAD foo
4fb3b9e0570d2fb875a24a037e39bdb2df6c1114
This says that between the current branch and branch foo, the merge base is commit 4fb3b9e... (and there is only one such merge base). I can then run git diff 4fb3b9e HEAD and git diff 4fb3b9e foo. But there is an easier way, as long as I can assume that there is only the one merge base:
$ git diff foo...HEAD # note: three dots
This tells git diff (and only git diff) to find the merge base between foo and HEAD, and then compare that commit—that merge base—to commit HEAD. And:
$ git diff HEAD...foo # again, three dots
does the same thing, find the merge base between HEAD and foo—"merge base" is commutative so these should be the same as the other way around, like 7+2 and 2+7 are both 9—but this time diff the merge base against commit foo.1
(For other commands—things that are not git diff—the three-dot syntax produces a symmetric difference: the set of all commits that are on either branch, but not on both branches. For branches with a single merge base commit, this is "every commit after the merge base, on each branch": in other words, the union of the two branches, excluding the merge base itself and any earlier commits. For branches with multiple merge bases, this subtracts away all the merge bases. For git diff we just assume there's only the one merge base, and instead of subtracting it and its ancestors away, we use it as the left or "before" side of the diff.)
1In Git, a branch name identifies one particular commit, namely the tip of the branch. In fact, this is how branches actually work: a branch name names a specific commit, and in order to add another commit to the branch—branch here meaning the chain of commits—Git makes a new commit whose parent is the current branch-tip, then points the branch name at the new commit. The word "branch" can refer to either the branch name, or the entire chain of commits; we are supposed to figure out which one by context.
At any time, we can name one specific commit, and treat that as a branch, by taking that commit and all its ancestors: its parent, its parent's parent, and so on. When we hit a merge commit—a commit with two or more parents—in this process, we take all the parent commits, and their parents' parents, and so on.
2This algorithm is actually selectable. The default myers is based on an algorithm by Eugene Myers, but Git has a few other options.

In a merge, only the changes that contain conflicts are marked.
Changes in Rev A and different changes in Rev B, are directly merged in. Only changes in Rev A and Rev B at the same place are marked as conflicts. The user is notified that conflicts exist in the file and need to be resolved.
When you go to resolve the conflicts, the merged file with have the independent changes from both Rev A and Rev B already in place, and the conflicting markers for the conflicting sections.

Creating commit with Jgit and plumbing commands

I am trying to construct a commit with plumbing commands in JGit. Besides fetching the information, I use is basically these commands:
treeFormatter.append(folderName, FileMode.TREE, treeObjectId);
treeFormatter.append(fileName, FileMode.REGULAR_FILE, blobObjectId);
eventually
objectInserter.insert( treeFormatter );
And at the end setting the final tree into a commit. This works perfectly with some commits but with others although the files are there I can't push the repo. The bash says:
error: unpack failed: error Invalid tree (tree number): incorrectly
sorted
I found out here that
Tree entries are sorted by the byte sequence that comprises the entry name. However, for the purposes of the sort comparison, entries for tree objects are compared as if the entry name byte sequence has a trailing ASCII ‘/’ (0x2f).
So tried to add the files by a particular order based in the conversion into bytes of the object name (not file name), but comparing with actual commits from bash, I can't figure out which order does Git need to add the files.
So: Anyone knows how to use the plumbing methods in JGit to construct a commit with several files? I am pretty sure I just need the correct way of sorting objects but can't find out what is it

Just found out the solution,
You need to put the files in a particular order depending on the file name or the folder name, my problem is I was looking to the ObjectId.getName() which is this hash.

How to get all commits for a certain release in GitHub?

I know I can get all commits in a project using GET /repos/:owner/:repo/commits
Now I want to get all commits for a certain release of that project.
What should I do?

Judging by your answer to my question, you want the commits made since some tag. This will take a couple steps to complete, first you need to get the SHA for the tag in question. You'll want to use the git references API to get a specific reference. In the specific example that you linked you'll want to do
GET /repos/nasa/mct/git/refs/tags/v1.8b3
And you'll want to get the 'sha' attribute from the object stored in the 'object' attribute of the response object. With the 'sha' attribute, you'll want to use the commits API to list commits starting with that 'sha' so your request will look like this:
GET /repos/nasa/mct/commits?sha=%(sha_from_first_request)s
That will give you 30 commits per-page by default (if I remember correctly), so you should see if adding &per_page=100 to the end helps. I can't tell you exactly how to do this in Java, but I expect you'll be able to use one of the libraries written to interact with the API to make it easier.

MarkLogic: Move document from one directory to another on some condition

I'm new to MarkLogic and trying to implement following scenario with its Java API:
For each user I'll have two directories, something like:
1.1. user1/xmls/recent/
1.2. user1/xmls/archived/
When user is doing something with his xml - it's put to the "recent" directory;
When user is doing something with his next xml and "recent" directory is full (e.g. has some amount of documents, let's say 20) - the oldest document is moved to the "archived" directory;
User can request all documents from the "recent" directory and should get no more than 20 records;
User can remove something from the "recent" directory manually; In this case, if it had 20 documents, after deleting one it must have 19;
User can do something with his xmls simultaneously and "recent" directory should never become bigger than 20 entries.
Questions are:
In order to properly handle simultaneous adding of xmls to the "recent" directory, should I block whole "recent" directory when adding new entry (to actually add it, check if there are more than 20 records after adding, select the oldest 21st one and move it to the "archived" directory and do all these steps atomically)? How can I do it?
Any suggestions on how to implement this via Java API?
Is it possible to change document's URI (e.g. replace "recent" with "archived" in my case)?
Should I consider using MarkLogic's collections here?
I'm open to any suggestions and comments (as I said I'm new to MarkLogic and maybe my thoughts on how to handle described scenario are completely wrong).

You can achieve atomicity of a sequence of transactions using Multi-Statement Transactions (MST)
It is possible to MST from the Java API: http://docs.marklogic.com/guide/java/transactions#id_79848
It's not possible to change a URI. However, it is possible to use an MST to delete the old document and reinsert a new one using the new URI in one an atomic step. This would have the same effect.
Possibly, and judging from your use case, unless you must have the recent/archived information as part of the URI, it may be simpler to store this information in collections. However, you should read the documentation and evaluate for yourself: http://docs.marklogic.com/guide/search-dev/collections#chapter

Personally I would skip all the hassle with separate directories as well as collections. You would endlessly have to move files around, or changes their properties. It would be much easier to not calculate anything up front, and simply use lastModified property, or something alike, to determine most recent items at run-time.
HTH!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.