Skip to content

Commit 90bd932

Browse files
Linus TorvaldsJunio C Hamano
authored andcommitted
Fix up diffcore-rename scoring
The "score" calculation for diffcore-rename was totally broken. It scaled "score" as score = src_copied * MAX_SCORE / dst->size; which means that you got a 100% similarity score even if src and dest were different, if just every byte of dst was copied from src, even if source was much larger than dst (eg we had copied 85% of the bytes, but _deleted_ the remaining 15%). That's clearly bogus. We should do the score calculation relative not to the destination size, but to the max size of the two. This seems to fix it. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
1 parent fc66d21 commit 90bd932

1 file changed

Lines changed: 5 additions & 7 deletions

File tree

diffcore-rename.c

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ static int estimate_similarity(struct diff_filespec *src,
133133
* match than anything else; the destination does not even
134134
* call into this function in that case.
135135
*/
136-
unsigned long delta_size, base_size, src_copied, literal_added;
136+
unsigned long max_size, delta_size, base_size, src_copied, literal_added;
137137
unsigned long delta_limit;
138138
int score;
139139

@@ -144,9 +144,9 @@ static int estimate_similarity(struct diff_filespec *src,
144144
if (!S_ISREG(src->mode) || !S_ISREG(dst->mode))
145145
return 0;
146146

147-
delta_size = ((src->size < dst->size) ?
148-
(dst->size - src->size) : (src->size - dst->size));
147+
max_size = ((src->size > dst->size) ? src->size : dst->size);
149148
base_size = ((src->size < dst->size) ? src->size : dst->size);
149+
delta_size = max_size - base_size;
150150

151151
/* We would not consider edits that change the file size so
152152
* drastically. delta_size must be smaller than
@@ -174,12 +174,10 @@ static int estimate_similarity(struct diff_filespec *src,
174174
/* How similar are they?
175175
* what percentage of material in dst are from source?
176176
*/
177-
if (dst->size < src_copied)
178-
score = MAX_SCORE;
179-
else if (!dst->size)
177+
if (!dst->size)
180178
score = 0; /* should not happen */
181179
else
182-
score = src_copied * MAX_SCORE / dst->size;
180+
score = src_copied * MAX_SCORE / max_size;
183181
return score;
184182
}
185183

0 commit comments

Comments
 (0)