Entry tags:
- bsd,
- diff,
- floss,
- open source,
- patch,
- unified diff
diff and patch
I've been interested in the BSD versions of diff and patch for a long while now. I often use a version of patch modified from various BSD patch implementations that were based on patch version 2.0.12u8. My variation includes some support for carriage return/line feed differences among systems. I've found it useful when working with patches from Windows or DOS systems that might accidentally introduce carriage return/line feed sequences instead of just line feed which POSIX systems use.
Recently saw an informative page covering some of the history of patch and diff:
https://invisible-island.net/diffstat/
It mentions that the 12u variant of patch contains no copylefted code. It also mentions a version 2.0.12.u9. So, I thought it was interesting that the various BSD versions found on NetBSD, OpenBSD and FreeBSD seem to start with version 2.0.12u8 as their basis. I'm very curious as to why they started with u8 instead of u9 but have seen no documentation on it. There doesn't seem to be that many differences between u8 and u9 so they probably didn't use much that was useful. If anyone knows anything further on this or know of some projects that started with the u9 version, I'd appreciate hearing about it.
BSD operating systems such as NetBSD switched from the BSD 4 clause license to the 2 clause. The 2 clause license is compatible with GNU license software while the 4 clause is not. So, it's not surprising that most of the code for BSD diff can be found with a 2 clause license. However, the diffreg.c file still contains the 4 clause license and a 3 clause license. I've searched for alternative versions of diffreg.c with other licenses. There are other versions available such as the diffreg.c that comes with the Plan 9 diff. They use the same algorithm. However, they don't contain some of the updates found in the BSD versions such as support for unified diff.
The BSD version of diff uses the longest common subsequence algorithm which performs in O(n2) at worst case and typically performs as O(nlogn). There is a newer algorithm by Myers that performs in O(nD). It's used by GNU diff. I saw a request a long while ago for someone to update the BSD diff to use the Myers algorithm. I don't think anyone's ever completed the task.
While the BSD version of diff is very usable, it would be nice to have a version of diff without the 4 code clause so that it could possibly be used as library code in conjunction with software that might contain GNU licensed source code. I prefer working with a more lenient license like BSD 2 clause or MIT rather than GNU when possible. So, it would be great to find a version of diff that supported unified diff output and avoided the GNU or BSD 4 clause licenses. I think the main alternative that might offer diff with a more lenient license and unified diff support would be Toybox. The Toybox license is BSD 0. However, Toybox doesn't support as many command line options as BSD or GNU diff.
I also just found out that there are versions of diff and patch for sbase. However, they're not in the main source code repository for sbase. You can find the source by searching the dev list from suckless.org. The sbase diff appears to be fully POSIX compliant. It's missing several features that were added to BSD and GNU diff programs. It supports unified diff only. The output is very close to the output of the BSD diff utility and it also uses the longest common subsequence algorithm.
I did run across the code to convert between diff standard output and unified diff output:
https://github.com/AceHusky12/unidiff
It's listed as public domain.
If anyone else is using an unusual variant of diff or patch or is writing their own, I'd be very interested in hearing about. I would love to compare notes on some of the different BSD variants out there and the various features they support.
Recently saw an informative page covering some of the history of patch and diff:
https://invisible-island.net/diffstat/
It mentions that the 12u variant of patch contains no copylefted code. It also mentions a version 2.0.12.u9. So, I thought it was interesting that the various BSD versions found on NetBSD, OpenBSD and FreeBSD seem to start with version 2.0.12u8 as their basis. I'm very curious as to why they started with u8 instead of u9 but have seen no documentation on it. There doesn't seem to be that many differences between u8 and u9 so they probably didn't use much that was useful. If anyone knows anything further on this or know of some projects that started with the u9 version, I'd appreciate hearing about it.
BSD operating systems such as NetBSD switched from the BSD 4 clause license to the 2 clause. The 2 clause license is compatible with GNU license software while the 4 clause is not. So, it's not surprising that most of the code for BSD diff can be found with a 2 clause license. However, the diffreg.c file still contains the 4 clause license and a 3 clause license. I've searched for alternative versions of diffreg.c with other licenses. There are other versions available such as the diffreg.c that comes with the Plan 9 diff. They use the same algorithm. However, they don't contain some of the updates found in the BSD versions such as support for unified diff.
The BSD version of diff uses the longest common subsequence algorithm which performs in O(n2) at worst case and typically performs as O(nlogn). There is a newer algorithm by Myers that performs in O(nD). It's used by GNU diff. I saw a request a long while ago for someone to update the BSD diff to use the Myers algorithm. I don't think anyone's ever completed the task.
While the BSD version of diff is very usable, it would be nice to have a version of diff without the 4 code clause so that it could possibly be used as library code in conjunction with software that might contain GNU licensed source code. I prefer working with a more lenient license like BSD 2 clause or MIT rather than GNU when possible. So, it would be great to find a version of diff that supported unified diff output and avoided the GNU or BSD 4 clause licenses. I think the main alternative that might offer diff with a more lenient license and unified diff support would be Toybox. The Toybox license is BSD 0. However, Toybox doesn't support as many command line options as BSD or GNU diff.
I also just found out that there are versions of diff and patch for sbase. However, they're not in the main source code repository for sbase. You can find the source by searching the dev list from suckless.org. The sbase diff appears to be fully POSIX compliant. It's missing several features that were added to BSD and GNU diff programs. It supports unified diff only. The output is very close to the output of the BSD diff utility and it also uses the longest common subsequence algorithm.
I did run across the code to convert between diff standard output and unified diff output:
https://github.com/AceHusky12/unidiff
It's listed as public domain.
If anyone else is using an unusual variant of diff or patch or is writing their own, I'd be very interested in hearing about. I would love to compare notes on some of the different BSD variants out there and the various features they support.