Creating commit with Jgit and plumbing commands - java

I am trying to construct a commit with plumbing commands in JGit. Besides fetching the information, I use is basically these commands:
treeFormatter.append(folderName, FileMode.TREE, treeObjectId);
treeFormatter.append(fileName, FileMode.REGULAR_FILE, blobObjectId);
eventually
objectInserter.insert( treeFormatter );
And at the end setting the final tree into a commit. This works perfectly with some commits but with others although the files are there I can't push the repo. The bash says:
error: unpack failed: error Invalid tree (tree number): incorrectly
sorted
I found out here that
Tree entries are sorted by the byte sequence that comprises the entry name. However, for the purposes of the sort comparison, entries for tree objects are compared as if the entry name byte sequence has a trailing ASCII ‘/’ (0x2f).
So tried to add the files by a particular order based in the conversion into bytes of the object name (not file name), but comparing with actual commits from bash, I can't figure out which order does Git need to add the files.
So: Anyone knows how to use the plumbing methods in JGit to construct a commit with several files? I am pretty sure I just need the correct way of sorting objects but can't find out what is it

Just found out the solution,
You need to put the files in a particular order depending on the file name or the folder name, my problem is I was looking to the ObjectId.getName() which is this hash.

Related

How to get conflicting lines with JGit

I'm developing the application, in which JGit is used.
After a pull, I have conflicting files. I can get it from
List<String> list = git.status().call().getConflicting();
The list contains files in conflict.
I know, that I can get conflicting files from
Map<String, int[][]> conflicts = git.pull().call().getMergeResult().getConflicts();
but it doesn't work if I restart my application. After the restart, I will have an empty map because I'm not able to redo the pull when the repository is in merging state.
How can I get conflicting lines by the name of file via JGit API?
You could try to use a ResolveMerger to re-run the merge like so:
ThreeWayMerger merger = StrategyResolve.newMerger(repository, true);
merger.merge(headCommit, fetchedCommit);
Note that the MergeCommand that is called during pull may use a different merge strategy. See MergeCommand ~ line 337 for details. However, make sure to create an in-core merger (the second argument must be true).
With merger.getMergeResults() you should be able to get the conflicting lines.
The whole approach, however, may fail because your work directory is already dirty with conflict markers (<<<<<<<<). Depending on your overall goal, I suggest reconsidering your approach to pull.
If you fetch changes from the upstream repository (without merging immediately) you can dry-run the merge as outlined above as often as necessary. The FetchResult returned by FetchCommand::call() contains information about the commit(s) that were fetched.

Understanging Conflicts Merging Algorithm

I look at a merge marker that looked all screwed up. To give you the situation lets have this:
public void methodA() {
prepare();
try {
doSomething();
}
catch(Exception e) {
doSomethingElse();
}
}
Now comes in a merge (I use SourceTree for pull).
And the marker looks like this:
<<<<<<<<< HEAD
try {
doSomething();
}
catch(Exception e) {
doSomethingElse();
}
============================
private void methodB() {
doOtherStuff();
>>>>>>>> 9832432984384398949873ab
}
So what the pulled commit does is removing the methodA completely and adding methodB instead.
But you notice that there are some lines entirely missing.
From what I understand of the process, Git is trying a so called auto-merge and if this fails and conflicts where detected, the complete merge is expressed by parts marked with '<<<* HEAD' + before + '====' + after + '>>>* CommitID' and prepare a manual conflict resolution.
So why does it leave out some lines. It looks more like a bug to me.
I use Windows7 and the installed git version is 2.6.2.windows.1. While the newest version is 2.9, I wonder if anything is known about a git version having a merge problem of this magnitude? This is not the first time I experienced something like this... .
You are correct to be concerned: Git knows nothing of languages, and its built-in merge algorithm is based strictly on line-at-time comparisons. You do not have to use this built-in merge algorithm, but most people do because (a) it mostly just works, and (b) there are not that many alternatives.
Note that this depends on your merge strategy (-s argument); the text below is for the default recursive strategy. The resolve strategy is pretty similar to recursive; the octopus strategy applies to more than just two commits; and the ours strategy is entirely different (and is nothing like -X ours). You can also select alternative strategies or algorithms for specific files, using .gitattributes and "merge drivers". And, none of this applies to files that Git has decided to believe are "binary": for these, it does not even attempt merging. (I am not going to cover any of that here, just how the default recursive strategy treats files.)
How git merge works (when using the default -s recursive)
Merge starts with two commits: the current one (also called "ours", "local", and HEAD), and some "other" one (also called "theirs" and "remote")
Merge finds the merge base between these commits
Normally that's just one other commit: the one at the first point where the implied branches1 join up
In some special cases (multiple merge base candidates), Git must invent a "virtual merge base" (but we'll ignore these cases here)
Merge runs two diffs: git diff base local and git diff base other
These have rename detection turned on
You can run these same diffs yourself to see what merge will see
You can think of these two diffs as "what we did" and "what they did". The goal of a merge is to combine "what we did" and "what they did". The diffs are line based, come from a minimal edit distance algorithm,2 and are really just Git's guess about what we did, and what they did.
The output of the first diff (base-vs-local) tells Git which base files correspond to which local files, i.e., how to follow names from the current commit back to the base. Git can then use the base names to spot renames or deletes in the other commit as well. For the most part we can just ignore rename and delete issues, and also new-file-creation issues. Note that Git version 2.9 turns on rename detection by default for all diffs, not just merge diffs. (You can turn this on yourself in earlier Git versions by configuring diff.renames to true; see also the git config setting for diff.renameLimit.)
If a file is changed on only one side (base-to-local, or base-to-other), Git simply takes those changes. Git only has to do a three-way merge when a file is changed on both sides.
To perform a three-way merge, Git essentially walks through the two diffs (base-to-local and base-to-other), one "diff hunk" at a time, comparing the changed regions. If each hunk affects a different part of the original base file, Git just takes that hunk. If some hunk(s) affect the same part of the base file, Git tries to take one copy of whatever that change is.
For instance, if the local change says "add a close brace line" and the remote change says "add (the same place, same indentation) close brace line", Git will take just one copy of the close brace. If both say "delete a close brace line" Git will just delete the line once.
Only if the two diffs conflict—e.g., one says "add a close brace line indented 12 spaces" and the other says "add a close brace line indented 11 spaces" will Git declare a conflict. By default, Git writes the conflict into the file, showing the two sets of changes—and, if you set merge.conflictstyle to diff3, also showing the code from the merge-base version of the file.
Any non-conflicting diff hunks, Git applies. If there were conflicts, Git normally leaves the file in "conflicted merge" state. However, the two -X arguments (-X ours and -X theirs) modify this: with -X ours Git chooses "our" diff hunk in the conflict, and puts that change in, ignoring "their" change. With -X theirs Git chooses "their" diff hunk and puts that change in, ignoring "our" change. These two -X arguments guarantee that Git does not declare a conflict after all.
If Git is able to resolve everything on its own for this file, it does so: you get the base file, plus your local changes, plus their other changes, in the work-tree and in the index/staging-area.
If Git is not able to resolve everything on its own, it puts the base, other, and local versions of the file into the index/staging-area, using the three special nonzero index slots. The work-tree version is always "what Git was able to resolve, plus the conflict markers as directed by various configurable items."
Every index entry has four slots
A file such as foo.java is normally staged in slot zero. This means it is ready to go into a new commit now. The other three slots are empty, by definition, because there is a slot-zero entry.
During a conflicted merge, slot zero is left empty, and slots 1-3 are used to hold the merge base version, the "local" or --ours version, and the other or --theirs version. The work-tree holds the in-progress merge.
You can use git checkout to extract any of these versions, or git checkout -m to re-create the merge conflict. All successful git checkout commands update the work-tree version of the file.
Some git checkout commands leave the various slots undisturbed. Some git checkout commands write into slot 0, wiping out the entries in slots 1-3, so that the file is ready for commit. (To know which ones do what, you just have to memorize them. I had them wrong, in my head, for quite a while.)
You cannot run git commit until all unmerged slots have been cleared out. You can use git ls-files --unmerged to view unmerged slots, or git status for a more human-friendly version. (Hint: use git status. Use it often!)
Successful merge does not mean good code
Even if git merge successfully auto-merges everything, that does not mean the result is correct! Of course, when it stops with a conflict, this also means that Git was not able to auto-merge everything, not that what it has auto-merged on its own is correct. I like to set merge.conflictstyle to diff3 so that I can see what Git thought the base was, before it replaced that "base" code with the two sides of the merge. Often a conflict happens because the diff chose the wrong base—such as some matching braces and/or blank lines—rather than because there had to be an actual conflict.
Using the "patience" diff can held with poor base choice, at least in theory. I have not experimented with this myself. The new "compaction heuristic" in Git 2.9 is promising, but I have not experimented with this either.
You must always inspect and/or test the results of a merge. If the merge is already committed, you can edit files, build and test, git add the corrected versions, and use git commit --amend to shove the previous (incorrect) merge commit out of the way and put in a different commit with the same parents. (The --amend part of git commit --amend is false advertising. It does not change the current commit itself, because it can not; instead, it makes a new commit with the same parent IDs as the current commit, instead of the normal method of using the current commit's ID as the new commit's parent.)
You can also suppress the auto-commit of a merge with --no-commit. In practice, I have found little need for this: most merges mostly just work, and a quick eyeballing of git show -m and/or "it compiles and passes unit tests" catches problems. However, during a conflicted or --no-commit merge, a simple git diff will give you a combined diff (the same sort you get with git show without -m, after you commit the merge), which can be helpful, or may be more confusing. You can run more-specific git diff commands and/or inspect the three (base, local, other) slot entries, as Gregg noted in a comment.
Seeing what Git will see
Besides using diff3 as your merge.conflictstyle, you can see the diffs that git merge will see. All you need to do is run two git diff commands—the same two that git merge will run.
To do these, you must find—or at least, tell git diff to find—the merge base. You can use git merge-base, which literally finds the (or all) merge base(s) and prints them out:
$ git merge-base --all HEAD foo
4fb3b9e0570d2fb875a24a037e39bdb2df6c1114
This says that between the current branch and branch foo, the merge base is commit 4fb3b9e... (and there is only one such merge base). I can then run git diff 4fb3b9e HEAD and git diff 4fb3b9e foo. But there is an easier way, as long as I can assume that there is only the one merge base:
$ git diff foo...HEAD # note: three dots
This tells git diff (and only git diff) to find the merge base between foo and HEAD, and then compare that commit—that merge base—to commit HEAD. And:
$ git diff HEAD...foo # again, three dots
does the same thing, find the merge base between HEAD and foo—"merge base" is commutative so these should be the same as the other way around, like 7+2 and 2+7 are both 9—but this time diff the merge base against commit foo.1
(For other commands—things that are not git diff—the three-dot syntax produces a symmetric difference: the set of all commits that are on either branch, but not on both branches. For branches with a single merge base commit, this is "every commit after the merge base, on each branch": in other words, the union of the two branches, excluding the merge base itself and any earlier commits. For branches with multiple merge bases, this subtracts away all the merge bases. For git diff we just assume there's only the one merge base, and instead of subtracting it and its ancestors away, we use it as the left or "before" side of the diff.)
1In Git, a branch name identifies one particular commit, namely the tip of the branch. In fact, this is how branches actually work: a branch name names a specific commit, and in order to add another commit to the branch—branch here meaning the chain of commits—Git makes a new commit whose parent is the current branch-tip, then points the branch name at the new commit. The word "branch" can refer to either the branch name, or the entire chain of commits; we are supposed to figure out which one by context.
At any time, we can name one specific commit, and treat that as a branch, by taking that commit and all its ancestors: its parent, its parent's parent, and so on. When we hit a merge commit—a commit with two or more parents—in this process, we take all the parent commits, and their parents' parents, and so on.
2This algorithm is actually selectable. The default myers is based on an algorithm by Eugene Myers, but Git has a few other options.
In a merge, only the changes that contain conflicts are marked.
Changes in Rev A and different changes in Rev B, are directly merged in. Only changes in Rev A and Rev B at the same place are marked as conflicts. The user is notified that conflicts exist in the file and need to be resolved.
When you go to resolve the conflicts, the merged file with have the independent changes from both Rev A and Rev B already in place, and the conflicting markers for the conflicting sections.

Scan duplicate document with md5

for some reasons I can't use MessageDigest.getInstance("MD5"), so I must write the algorithm code in manual way, my project is scan duplicate document (*.doc, *.txt, *.pdf) on Android device. My question is, what must I write before entering the algorithm, to scan the duplicate document on MY ROOT directory on Android device? Without select the directory, when I press button scan, the process begin, the listview show. Is anyone can help me? My project deadline will come. Thank you so much.
public class MD5 {
//What must I write here, so I allow to scan for duplicate document on Android root with MD5 Hash
//MD5 MANUAL ALGORITHM CODE
}
WHOLE PROCESS:
your goal is to detect (and perhaps store information about) duplicate files.
1 Then, first, you have to iterate through directories and files,
see this:
list all files from directories and subdirectories in Java
2 and for each file, to load it like a byte array
see this:
Reading a binary input stream into a single byte array in Java
3 then compute your MD5 - your project
4 and store this information
Your can use a Set to dectect duplicates (a Set has unique elements).
Set<String> files_hash; // each String is a string representation of MD5
if (files_hash.contains(my_md5)) // you know you have it already
or a
Map<String,String> file_and_hash; // each is file => hash
// you have to iterate to know if you have it already, or keep also a Set
ANSWER for MD5:
read algorithm:
https://en.wikipedia.org/wiki/MD5
RFC: https://www.ietf.org/rfc/rfc1321.txt
some googling ...
this presentation, step by step
http://infohost.nmt.edu/~sfs/Students/HarleyKozushko/Presentations/MD5.pdf
or try to duplicate C (or java) implementation ...
OVERALL STRATEGY
To keep time and have processus faster, you must also think about the use of your function:
if you use it once, for one unique file, better is to reduce work, by selecting before other files on their size.
if you use it regularly (and want to do it fast), scan regularly new files in background to keep an hash base up to date. Detection of new file is straightforward.
if you want to get all files duplicated, better scan everything, and use Set Strategy also
Hope this helps
You'll want to recursively scan for files, then, for each file found, calculate its MD5 or whatever and store that hash value, either in a Set<...> if you only want to know if a file is a dupe, or in a Map<..., File> if you want to be able to tell which file the current file is a duplicate of.
For each file's hash, you look into the collection of already known hashes to check if that particular hash value is in it; if it is, you (most likely) have a duplicate file; if it is not, you add the new hash value to the collection and proceed with the next file.

Create Folders recursively

I call a webservice and get the following data from it:
Name of the folder
Id of the folder
Id of the parent-folder (null if it is root)
I create ArrayLists (List<String>) for the names, the ids and the parent-ids. So the folder with the name on position "0" has the id and the parent-id on position "0" in these lists.
Now I need to recreate the same structure on my local file system. The user enters a root-directory ("C:\test" for example) that I need to use.
I guess that a recursive method would be the best thing to do, but I have no idea how to implement it.
Any ideas / hints?
I don't see how recursion helps you. I assume you get multiple sets of the data you present, implied by your explanation though you don't say so. You also don't say what order you get them in. I'd create a hashmap, using full path to each parent as a key, and an object representing the directory as a value. The directory object would contain pointers to all its child directories. I'd create that entire hashmap, then walk it top-down. If you don't get the data in the correct order to build it top-down, then you'll have to put them all in a list and search the list to create top-down order, or trust that you can build the list without the IDs and fill them in later

Java find where a class is used in the code - programmatically

I have a List of classes which I can iterate through. Using Java is there a way of finding out where these classes are used so that I can write it out to a report?
I know that I can find out using 'References' in Eclipse but there are too many to be able to do this manually. So I need to be able to do this programmatically. Can anyone give me any pointers please?
Edit:
This is static analysis and part of creating a bigger traceability report for non-technical people. I have comprehensive Javadocs but they are not 'friendly' and also work in the opposite direction to how I need the report. Javadocs start from package and work downwards, whereas I need to start a variable level and work upwards. If that makes any sense.
You could try to add a stacktrace dump somewhere in the class that isolates the specific case you are looking for.
public void someMethodInMyClass()
{
if (conditions_are_met_to_identify)
{
Thread.dumpStack();
}
// ... original code here
}
You may have to scan all the sources, and check the import statements. (Taking care of the * imports.. having to setup your scanner for both the fully Qualified class name and its packagename.*)
EDIT: It would be great to use the eclipse search engine for this. Perhaps here is the answer
Still another approach (probably not complete):
Search Google for 'java recursively list directories and files' and get source code that will recursively list all the *.java file path/names in a project.
For each file in the list:
1: See if the file path/name is in the list of fully qualified file names you are interested in. If so, record is path/name as a match.
2: Regardless if its a match or not, open the file and copy its content to a List collection. Iterate through the content list and see if the class name is present. If found, determine its path by seeing if its in the same package as the current file you are examining. If so, you have a match. If not, you need to extract the paths from the *.import statements, add it to the class name, and see if it exists in your recursive list of file path/names. If still not found, add it to a 'not found' list (including what line number it was found on) so you can manually see why it was not identified.
3: Add all matches to a 'found match' list. Examine the list to ensure it looks correct.
Not sure what you are trying to do, but in case you want to analyse code during runtime, I would use an out-of-the box profiler that shows you what is loaded and what allocated.
#Open source profilers: Open Source Java Profilers
On the other hand, if you want to do this yourself (During runtime) you can write your own custom profiler:
How to write a profiler?
You might also find this one useful (Although not exactly what you want):
How can I list all classes loaded in a specific class loader
http://docs.oracle.com/javase/7/docs/api/java/lang/instrument/Instrumentation.html
If what you are looking is just to examine your code base, there are really good tools out there as well.
#see http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis

Categories

Resources