Understanging Conflicts Merging Algorithm - java
I look at a merge marker that looked all screwed up. To give you the situation lets have this:
public void methodA() {
prepare();
try {
doSomething();
}
catch(Exception e) {
doSomethingElse();
}
}
Now comes in a merge (I use SourceTree for pull).
And the marker looks like this:
<<<<<<<<< HEAD
try {
doSomething();
}
catch(Exception e) {
doSomethingElse();
}
============================
private void methodB() {
doOtherStuff();
>>>>>>>> 9832432984384398949873ab
}
So what the pulled commit does is removing the methodA completely and adding methodB instead.
But you notice that there are some lines entirely missing.
From what I understand of the process, Git is trying a so called auto-merge and if this fails and conflicts where detected, the complete merge is expressed by parts marked with '<<<* HEAD' + before + '====' + after + '>>>* CommitID' and prepare a manual conflict resolution.
So why does it leave out some lines. It looks more like a bug to me.
I use Windows7 and the installed git version is 2.6.2.windows.1. While the newest version is 2.9, I wonder if anything is known about a git version having a merge problem of this magnitude? This is not the first time I experienced something like this... .
You are correct to be concerned: Git knows nothing of languages, and its built-in merge algorithm is based strictly on line-at-time comparisons. You do not have to use this built-in merge algorithm, but most people do because (a) it mostly just works, and (b) there are not that many alternatives.
Note that this depends on your merge strategy (-s argument); the text below is for the default recursive strategy. The resolve strategy is pretty similar to recursive; the octopus strategy applies to more than just two commits; and the ours strategy is entirely different (and is nothing like -X ours). You can also select alternative strategies or algorithms for specific files, using .gitattributes and "merge drivers". And, none of this applies to files that Git has decided to believe are "binary": for these, it does not even attempt merging. (I am not going to cover any of that here, just how the default recursive strategy treats files.)
How git merge works (when using the default -s recursive)
Merge starts with two commits: the current one (also called "ours", "local", and HEAD), and some "other" one (also called "theirs" and "remote")
Merge finds the merge base between these commits
Normally that's just one other commit: the one at the first point where the implied branches1 join up
In some special cases (multiple merge base candidates), Git must invent a "virtual merge base" (but we'll ignore these cases here)
Merge runs two diffs: git diff base local and git diff base other
These have rename detection turned on
You can run these same diffs yourself to see what merge will see
You can think of these two diffs as "what we did" and "what they did". The goal of a merge is to combine "what we did" and "what they did". The diffs are line based, come from a minimal edit distance algorithm,2 and are really just Git's guess about what we did, and what they did.
The output of the first diff (base-vs-local) tells Git which base files correspond to which local files, i.e., how to follow names from the current commit back to the base. Git can then use the base names to spot renames or deletes in the other commit as well. For the most part we can just ignore rename and delete issues, and also new-file-creation issues. Note that Git version 2.9 turns on rename detection by default for all diffs, not just merge diffs. (You can turn this on yourself in earlier Git versions by configuring diff.renames to true; see also the git config setting for diff.renameLimit.)
If a file is changed on only one side (base-to-local, or base-to-other), Git simply takes those changes. Git only has to do a three-way merge when a file is changed on both sides.
To perform a three-way merge, Git essentially walks through the two diffs (base-to-local and base-to-other), one "diff hunk" at a time, comparing the changed regions. If each hunk affects a different part of the original base file, Git just takes that hunk. If some hunk(s) affect the same part of the base file, Git tries to take one copy of whatever that change is.
For instance, if the local change says "add a close brace line" and the remote change says "add (the same place, same indentation) close brace line", Git will take just one copy of the close brace. If both say "delete a close brace line" Git will just delete the line once.
Only if the two diffs conflict—e.g., one says "add a close brace line indented 12 spaces" and the other says "add a close brace line indented 11 spaces" will Git declare a conflict. By default, Git writes the conflict into the file, showing the two sets of changes—and, if you set merge.conflictstyle to diff3, also showing the code from the merge-base version of the file.
Any non-conflicting diff hunks, Git applies. If there were conflicts, Git normally leaves the file in "conflicted merge" state. However, the two -X arguments (-X ours and -X theirs) modify this: with -X ours Git chooses "our" diff hunk in the conflict, and puts that change in, ignoring "their" change. With -X theirs Git chooses "their" diff hunk and puts that change in, ignoring "our" change. These two -X arguments guarantee that Git does not declare a conflict after all.
If Git is able to resolve everything on its own for this file, it does so: you get the base file, plus your local changes, plus their other changes, in the work-tree and in the index/staging-area.
If Git is not able to resolve everything on its own, it puts the base, other, and local versions of the file into the index/staging-area, using the three special nonzero index slots. The work-tree version is always "what Git was able to resolve, plus the conflict markers as directed by various configurable items."
Every index entry has four slots
A file such as foo.java is normally staged in slot zero. This means it is ready to go into a new commit now. The other three slots are empty, by definition, because there is a slot-zero entry.
During a conflicted merge, slot zero is left empty, and slots 1-3 are used to hold the merge base version, the "local" or --ours version, and the other or --theirs version. The work-tree holds the in-progress merge.
You can use git checkout to extract any of these versions, or git checkout -m to re-create the merge conflict. All successful git checkout commands update the work-tree version of the file.
Some git checkout commands leave the various slots undisturbed. Some git checkout commands write into slot 0, wiping out the entries in slots 1-3, so that the file is ready for commit. (To know which ones do what, you just have to memorize them. I had them wrong, in my head, for quite a while.)
You cannot run git commit until all unmerged slots have been cleared out. You can use git ls-files --unmerged to view unmerged slots, or git status for a more human-friendly version. (Hint: use git status. Use it often!)
Successful merge does not mean good code
Even if git merge successfully auto-merges everything, that does not mean the result is correct! Of course, when it stops with a conflict, this also means that Git was not able to auto-merge everything, not that what it has auto-merged on its own is correct. I like to set merge.conflictstyle to diff3 so that I can see what Git thought the base was, before it replaced that "base" code with the two sides of the merge. Often a conflict happens because the diff chose the wrong base—such as some matching braces and/or blank lines—rather than because there had to be an actual conflict.
Using the "patience" diff can held with poor base choice, at least in theory. I have not experimented with this myself. The new "compaction heuristic" in Git 2.9 is promising, but I have not experimented with this either.
You must always inspect and/or test the results of a merge. If the merge is already committed, you can edit files, build and test, git add the corrected versions, and use git commit --amend to shove the previous (incorrect) merge commit out of the way and put in a different commit with the same parents. (The --amend part of git commit --amend is false advertising. It does not change the current commit itself, because it can not; instead, it makes a new commit with the same parent IDs as the current commit, instead of the normal method of using the current commit's ID as the new commit's parent.)
You can also suppress the auto-commit of a merge with --no-commit. In practice, I have found little need for this: most merges mostly just work, and a quick eyeballing of git show -m and/or "it compiles and passes unit tests" catches problems. However, during a conflicted or --no-commit merge, a simple git diff will give you a combined diff (the same sort you get with git show without -m, after you commit the merge), which can be helpful, or may be more confusing. You can run more-specific git diff commands and/or inspect the three (base, local, other) slot entries, as Gregg noted in a comment.
Seeing what Git will see
Besides using diff3 as your merge.conflictstyle, you can see the diffs that git merge will see. All you need to do is run two git diff commands—the same two that git merge will run.
To do these, you must find—or at least, tell git diff to find—the merge base. You can use git merge-base, which literally finds the (or all) merge base(s) and prints them out:
$ git merge-base --all HEAD foo
4fb3b9e0570d2fb875a24a037e39bdb2df6c1114
This says that between the current branch and branch foo, the merge base is commit 4fb3b9e... (and there is only one such merge base). I can then run git diff 4fb3b9e HEAD and git diff 4fb3b9e foo. But there is an easier way, as long as I can assume that there is only the one merge base:
$ git diff foo...HEAD # note: three dots
This tells git diff (and only git diff) to find the merge base between foo and HEAD, and then compare that commit—that merge base—to commit HEAD. And:
$ git diff HEAD...foo # again, three dots
does the same thing, find the merge base between HEAD and foo—"merge base" is commutative so these should be the same as the other way around, like 7+2 and 2+7 are both 9—but this time diff the merge base against commit foo.1
(For other commands—things that are not git diff—the three-dot syntax produces a symmetric difference: the set of all commits that are on either branch, but not on both branches. For branches with a single merge base commit, this is "every commit after the merge base, on each branch": in other words, the union of the two branches, excluding the merge base itself and any earlier commits. For branches with multiple merge bases, this subtracts away all the merge bases. For git diff we just assume there's only the one merge base, and instead of subtracting it and its ancestors away, we use it as the left or "before" side of the diff.)
1In Git, a branch name identifies one particular commit, namely the tip of the branch. In fact, this is how branches actually work: a branch name names a specific commit, and in order to add another commit to the branch—branch here meaning the chain of commits—Git makes a new commit whose parent is the current branch-tip, then points the branch name at the new commit. The word "branch" can refer to either the branch name, or the entire chain of commits; we are supposed to figure out which one by context.
At any time, we can name one specific commit, and treat that as a branch, by taking that commit and all its ancestors: its parent, its parent's parent, and so on. When we hit a merge commit—a commit with two or more parents—in this process, we take all the parent commits, and their parents' parents, and so on.
2This algorithm is actually selectable. The default myers is based on an algorithm by Eugene Myers, but Git has a few other options.
In a merge, only the changes that contain conflicts are marked.
Changes in Rev A and different changes in Rev B, are directly merged in. Only changes in Rev A and Rev B at the same place are marked as conflicts. The user is notified that conflicts exist in the file and need to be resolved.
When you go to resolve the conflicts, the merged file with have the independent changes from both Rev A and Rev B already in place, and the conflicting markers for the conflicting sections.
Related
Conditional Github Sync For Java Projects
I have two projects in Git named Zeus and Odin. Both contain some java packages, files, and libraries. One particular package named Olympos is common to both and contains 'almost' same files. Difference is that files within Olympos for Zeus may have methods that interact with DB, but those in Odin will never (though it will contain method with same name but only placeholder code). Please look at following example for further clarification: Project: Zeus; Package: Olympos; File: DummyOne.java; Method: public void adapterDB(){ // do something to connect with DB // do something DB specific- fire queries } Project: Odin; Package: Olympos; File: DummyOne.java; Method: public void adapterDB(){ // do nothing and return null; } So the problem- Olympos needs to be in sync at all the times for both the projects, such that when a file change is checked in to Zeus, it automatically syncs with that of Odin or vice-versa. But it has to happen conditionally: - if new method is being checked in to any file that contains DB related operation (in Zeus), only the method header should sync to Odin, without the actual logic - if new method is being checked in to any file that does NOT contain DB related operation (in either Zeus or Odin), it should completely match in both the packages. Obviously, I want to shed off the additional time required to make such changes manually every time and then sync them up separately to these projects since there are like two dozen such changes every week. Is this something possible using Git? Or there is perhaps more obvious solution outside Git (hope I am not missing elephant in the room).
How to get conflicting lines with JGit
I'm developing the application, in which JGit is used. After a pull, I have conflicting files. I can get it from List<String> list = git.status().call().getConflicting(); The list contains files in conflict. I know, that I can get conflicting files from Map<String, int[][]> conflicts = git.pull().call().getMergeResult().getConflicts(); but it doesn't work if I restart my application. After the restart, I will have an empty map because I'm not able to redo the pull when the repository is in merging state. How can I get conflicting lines by the name of file via JGit API?
You could try to use a ResolveMerger to re-run the merge like so: ThreeWayMerger merger = StrategyResolve.newMerger(repository, true); merger.merge(headCommit, fetchedCommit); Note that the MergeCommand that is called during pull may use a different merge strategy. See MergeCommand ~ line 337 for details. However, make sure to create an in-core merger (the second argument must be true). With merger.getMergeResults() you should be able to get the conflicting lines. The whole approach, however, may fail because your work directory is already dirty with conflict markers (<<<<<<<<). Depending on your overall goal, I suggest reconsidering your approach to pull. If you fetch changes from the upstream repository (without merging immediately) you can dry-run the merge as outlined above as often as necessary. The FetchResult returned by FetchCommand::call() contains information about the commit(s) that were fetched.
Creating commit with Jgit and plumbing commands
I am trying to construct a commit with plumbing commands in JGit. Besides fetching the information, I use is basically these commands: treeFormatter.append(folderName, FileMode.TREE, treeObjectId); treeFormatter.append(fileName, FileMode.REGULAR_FILE, blobObjectId); eventually objectInserter.insert( treeFormatter ); And at the end setting the final tree into a commit. This works perfectly with some commits but with others although the files are there I can't push the repo. The bash says: error: unpack failed: error Invalid tree (tree number): incorrectly sorted I found out here that Tree entries are sorted by the byte sequence that comprises the entry name. However, for the purposes of the sort comparison, entries for tree objects are compared as if the entry name byte sequence has a trailing ASCII ‘/’ (0x2f). So tried to add the files by a particular order based in the conversion into bytes of the object name (not file name), but comparing with actual commits from bash, I can't figure out which order does Git need to add the files. So: Anyone knows how to use the plumbing methods in JGit to construct a commit with several files? I am pretty sure I just need the correct way of sorting objects but can't find out what is it
Just found out the solution, You need to put the files in a particular order depending on the file name or the folder name, my problem is I was looking to the ObjectId.getName() which is this hash.
How to perform a 'merge' using clearcase?
I have been working on branch X and I need to move my code to branch Y. All my code are new classes that I started, so no one else has been working/modified my code, this also does not exist in the branch that I'm moving the code to. So my question is, what is the process to move the code from one branch to another? i have never done it before. Do i copy and paste the classes into the new branch or is there a tool that is usually used for this ?
The key of a ClearCase merge is to do the merge in the destination view (the view associated with the branch or the UCM stream to which you merge to. You can then start the merge with: cleartool merge Cleartool Merge Manager: see "Howto merge using the ClearCase Merge manager" As you can see, the first step is to select said target view: I would recommend using a dynamic view rather than a snapshot view: a snapshot would start by an automatic update (which takes time), as opposed to a dynamic view which would start the merge immediately. See more at "What are the differences between a snapshot view and a dynamic view?" It supposes that you have: a source branch or a label which will identify the source versions you want to merge, a destination view with a config spec allowing to create new versions on top of a destination branch (so a config spec with -mkbranch rules in it) See more at "About merging files and directories in base ClearCase":
yepp, there's a tool that helps you: it's the ClearCase MergeManager. It has a nice GUI and helps you to get the job done.
SVNClient.logMessages never returns a result
I'm using JavaHL to connect to a 1.6 svn repos. While I managed to list the contents of the repository, I'm not able to get the item history (the comments made on the check ins as well as the dates and the authors). As far as I see, SVNClient.logMessages is the right method, but the callback method is never been executed. I used Revision.HEAD for the path revision and a revision range object holding Revision.START and Revision.HEAD; the limit is set to 0 (which is no limit according to the documentation). I'm trying to fetch the revision, the date, the author and the comment. If someone knows about example code on using JavaHL I'm maybe able to find my fault by comparing that code to mine. BTW: I know about SVNKit, but the management decided not to buy it. Thus I have to use JavaHL, where next-to-no sample programs exist (and the doc will merely list the classes and interfaces without a very detailed description). So, please point me in that direction of SVNKit as this is impossible for me. Any pointers appreciated. Gnarf
The issue has been solved. The problem was the call to SVNClient.logMessages(), especially the revision range used. The start revision had been Revision.START that, according to the documentation, is used to describe the "first existing revision". The problem disappeared when I used Revision.getInstance(1) instead. As it is reasonable that any item has at least one revision (the initial one) with that number, it should be save to use that. Hopefully this will save anyone else from spending another two-and-a-half days to figure it out! Gnarf