I've been looking at different diff tools trying to figure out how I would design my own cross-platform portable diff program with the features I use most. The most interesting thing I found out was how different the various outputs of the diff programs all are. There's also no one algorithm that creates the most intuitive output for a human to read. A certain algorithm may be better for one case but not for another. I'll share a summary of what I've researched about diff utilities so far.

The two main diff programs I think about when I consider diff programs are the diff from GNU diffutils and the BSD diff program that's available with many BSD operating systems. The BSD diff program traditionally uses a variant of the Longest Common Subsequence and attributes the algorithm they use to Harold Stone. The latest GNU diff program uses Myers' algorithm. Myers' algorithm was a breakthrough because it managed to solve the problem in O(ND) time, something that was once thought to be impossible. The original GNU diff program predated Myers' algorithm so much older versions used a different solution. I've seen some posts at the BSD site that one of their goals was to add a Myers' implementation to their diff program to improve speed. I haven't seen anyone complete that project to date. The improvement in speed of the Myers' algorithm does come at a cost of requiring more memory. Useful diff algorithms need to keep the trade off of space versus time in mind and not come up with solutions that fail if the files being compared are arbitrarily large.

Most systems use either the GNU diff or the BSD diff or are based on them. Busybox is based on the BSD diff. I did search for other implementations that might be useful for people using an alternative to the standard GNU coreutils and diffutils. I was only able to find a few. It seems diff algorithms are more easily found in version control tools such as git. Toybox has an original implementation of diff. It's in the toys pending directory so I assume it's not part of most standard Toybox installations. I did some experimenting with it to see if I could use it as a stand-alone program. I got as far as finding some bugs in the implementation of displaying diffs in Unidiff format with a specified number of context lines. The sbase project also has an implementation of diff. It's written to optimize for speed. It can use different algorithms depending on the situation. However, in using it, I found cases where it was incorrectly marking lines as different when they were the same line. I originally assumed both projects used Myers' algorithm, but on closer examination, they probably don't. I found another C based implementation of the diff algorithm in GOT (Game of Trees version control system) which does use Myers' algorithm. Was able to get it to compile and compare two files using it. Interestingly, the output from it did not match with the output from GNU diff.

The projects have a wide range of licenses from GPL to BSD 0 clause. Busybox is licensed using GPLv2.
I am not at all sure how they legally use the diff code based on BSD systems which is licensed with a Caldera 4 clause license that is incompatible with GPL licenses. Possibly the Busybox developers wrote an exception for the Busybox license so they could incorporate that code. sbase uses an MIT license. The BSD based diff is mainly using a BSD 3 clause license (or 4 clause in older versions) but includes the Caldera license in the diffreg.c file. I've searched for other versions of the BSD diff tool that use other licenses. While the licenses vary, most copies of the diffreg.c file include the Caldera license. I did run across a few BSD variants that were licensed with BSD 3 clause licenses and didn't mention the Caldera license in diffreg.c but I think this was an omission and don't think their versions of diffreg.c were actually licensed as BSD 3 clause. Finding a decent version of the diff algorithm that can be used in a library and can link with code using GNU GPL licenses or proprietary licenses is no easy task.

I still find it surprising that the various diff implementations all come up with different output and not even different versions of the GNU diff utility will necessarily have the same results. It makes it harder to verify if a diff program's output is valid or not.

I'd be very interested to hear what others think are necessary features in a diff program. Does an alternative to GNU diff work well enough for your situation? Are there features from one diff program you wish were part of another? I'd like to take the best from some of the various diff programs out there like sbase and toybox and combine them to make a diff that has all the functionality I use most. I'd need a tool that I could use in conjunction with programs such as diffh which can display differences in files side by side in HTML format. It would be nice to discuss this further with others looking for lightweight, efficient or cross-platform tool implementations for their systems. How would you design a diff program?
Here are some interesting articles and tips related to the C preprocessor and to creating makefiles.


This shows how to print the value of a macro:
http://c-faq.com/ansi/stringize.html

How to use static patterns with makefiles:
https://www.gnu.org/software/make/manual/make.html#Static-Pattern


Preprocessor tips:
The second article shows how to check which include files are used by a C or C++ file.

https://mischasan.wordpress.com/2013/04/07/the-c-preprocessor-not-as-cryptic-as-youd-think/
https://mischasan.wordpress.com/2011/06/14/stupid-gcc-trick-2-finding-all-included-files-recursively/
https://mischasan.wordpress.com/2011/10/12/stupid-gcc-trick-3-list-all-built-in-gcc-define-symbols/


Make tips:
The first article has a great tip for creating directories in makfiles using "%/..:".

https://mischasan.wordpress.com/2012/07/05/two-more-cheap-gmake-tricks-creating-directories-and-printing-variables/
https://mischasan.wordpress.com/2015/03/09/gmake-cheap-trick-3/
https://mischasan.wordpress.com/2015/03/21/gmake-cheap-trick-4-for-non-recursive-make/

To see what commands make will run when invoked, you can use:
make --just-print


Nonrecursive makefile tips:

https://mischasan.wordpress.com/2013/03/30/non-recursive-make-gmake-part-1-the-basic-gnumakefile-layouts/
https://mischasan.wordpress.com/2013/03/30/non-recursive-make-gmake-part-2-rules-mk/
https://mischasan.wordpress.com/2013/04/13/non-recursive-make-part-3-a-tool-for-the-fearless/


Replacing other utilities with sed:

In case you need these in a makefile, build script or elsewhere and don't have them on your system:

https://github.com/aureliojargas/sed.sf.net/blob/master/local/docs/emulating_unix.txt


Miscellaneous compiler tips:

To stop the gnu gcc compiler after the first error use:
-Wfatal-errors

To stop the gnu gcc compiler after the N errors use:
-fmax-errors=N
So for three errors add the following to the command line when invoking gcc:
-fmax-errors=3


My favorite preprocessor tips:

You can use a C preprocessor to generate templates.
Here's an example to generate web pages using the preprocessor and templates:
http://www.distasis.com/cpp/mingw.htm#makeprograms

For my build system, I need a more functionality than the standard C preprocessor could offer.
Rather than reinventing the wheel and writing my own preprocessor, I found gpp has just enough capabilities beyond a standard preprocessor to handle the task.
The gpp preprocessor is available from:
https://logological.org/gpp


Know of some other preprocessor or makefile tips? Have written an article or blog post with your tips on these topics? Please share them. They may get included in this list.
I've listed some core utilities options besides GNU. I thought I'd share something about what utilities I personally prefer to use. My main requirement in a good set of core utilities is portability. This is rather hard to find. You would think that if a utility was efficient and lightweight, it would be easy to port. However, that's not necessarily so. Many utilities that are designed for efficiency take advantage of features of a particular operating system which makes them harder to port.

At first, I considered starting with sbase which had stated goals similar to what I was looking for, but it didn't have enough features to effectively replace the GNU core utilities when developing and building programs. While newer versions of sbase have added a lot of functionality, they've become much less portable.

My favorite source for inspiration is Minix. Earlier versions provided some interesting and fairly portable versions of a variety of utilites:
https://www.minix-vmd.org/cgi-bin/raw/source/std/1.7.5/src/commands/simple/
Some of the utilities don't have sufficient UTF-8 support or lack some newer functionality found in GNU utilities that makes them fail when attempting to build applications. However, they make a useful starting point.

In some cases, the OBase or BSD utilities do a better job than the older Minix ones and still do that job efficiently. I particularly like the version of patch found on BSD systems. It's an earlier variant of the Free Software Foundation's patch program. Unlike the FSF's version of patch which uses the GNU license, it uses a BSD style license.

For some utilities, I've consulted the POSIX standards ( http://pubs.opengroup.org/onlinepubs/9699919799/idx/utilities.html ) and rewritten them from scratch.

Rather than trying to port utilities such as the Free Software Foundations coreutils, I thought having a lightweight, efficient, highly portable option would be a useful alternative. Many of the FSF developers have little interest in portability making it hard to get later versions of their programs working on non-POSIX systems. I was surprised at how little interest most users have in developing portable alternatives to the core utilities that could be used to build software. Not only was their little interest, some people posted extremely negative comments when anyone suggested creating alternatives to the FSF software. I was also surprised by some of the negative reactions I read about wonderful projects like SBase.

I have a growing collection of public domain, BSD and MIT licensed alternatives to the GNU core utilities. For now, I just use them for my own projects. If you have an interest in portable utilities and tools, would like to see a viable portable alternative to the FSF's GNU coreutils or would like to further discuss related topics in a positive light, feel free to contact me. I'd enjoy talking with other developers and utility users on the topic.

You'll find some added information on my utilities and information on how to discuss the topic further at:
http://www.distasis.com/cpp/lmbld.htm
Most systems (other than BSD based ones) use GNU's core utilities. It's used by most Linux distributions. Cygwin, MinGW and gnuwin32 run ported versions of the GNU applications as well. Even Microsoft's SFU/SUA included some of the GNU utilities. However, the GNU core utilities are typically more bloated and have more feature creep than other versions of standard Unix utility programs. BSD systems have their versions of core utilities. The latest version of Minix has adopted the BSD utilities. They tend to be less bloated than the GNU versions, but are still more bloated than other options out there. The BSD utilities also tend toward adding new features similar to but not to the same extent as the GNU utilities. Also, some of their utilities aren't as well optimized as the GNU versions. Busybox seems like the most viable option for a lightweight but still comprehensive version of core utilities. I'm currently using it on my Debian system instead of the GNU core utilities. Toybox is a similar alternative to Busybox. It has a better license option than Busybox, but it's lacking some features and tools that Busybox has.

Here are some links to core utility collections:

Earlier Minix alternatives
http://www.minix-vmd.org/cgi-bin/raw/source/std/1.7.5/src/commands/simple/
Earlier versions of Minix put together an interesting collection of lightweight utilities from various sources.

Busybox
https://busybox.net/
Windows ports of Busybox:
https://frippery.org/busybox/
https://github.com/rmyorston/busybox-w32
https://github.com/realthunder/busybox-w32

Toybox
http://landley.net/toybox/

Heirloom Project
http://heirloom.sourceforge.net/
Based on traditional implementations of standard Unix utilities. Not very portable to non-POSIX systems. Not as bloated as GNU or BSD core utilities.

OBase
https://github.com/chneukirchen/obase
Port of OpenBSD userland to Linux.

SBase
http://git.suckless.org/sbase
This started out as a discussion on one of the suckless.org mailing lists of how to write efficient core utilites that weren't all part of one executable like Busybox or Toybox. Some good examples were posted and the project was started. Then project development was quiet for a while. The project became active again and one of the main goals besides efficiency was UTF-8/internationalization support. Looks like they've borrowed some UTF-8 support concepts (such as Runes) from Plan 9. It's not designed to be portable to non-POSIX systems. However, it does look like they've covered replacing most of the basic core utilities with lightweight, efficient versions.

Other alternatives:
https://github.com/jbruchon/elks/tree/master/elkscmd
https://github.com/EtchedPixels/FUZIX/tree/master/Applications
https://github.com/Orc/bin
https://github.com/eltanin-os/utilchest
https://github.com/eltanin-os/cbase
https://github.com/rofl0r/hardcore-utils
http://git.musl-libc.org/cgit/noxcuse/tree/
https://github.com/pikhq/pikhq-coreutils
http://www.fefe.de/embutils/
http://skarnet.org/software/s6-portable-utils/
https://github.com/dimkr/lazy-utils
https://github.com/arsv/minitools
http://git.suckless.org/ubase
https://github.com/dcantrell/bsdutils
https://github.com/cheusov/nbase
https://github.com/rswier/swieros
https://github.com/minoca/swiss
https://github.com/mentos-team/MentOS/tree/master/programs

MinGW

May. 5th, 2016 12:29 pm
There are now several forks of MinGW and each has its pros and cons. However, there are now enough negatives to using them, that I've found it necessary to build MinGW from scratch myself. The MinGW64 project uses a later version of gcc, has better compatibility for building Open Source projects and has its own thread library instead of using Red Hat's pthreads-w32. Some custom builds of MinGW64 even have POSIX threading set as the default instead of Win32 threading. That means better compatibility for C++ thread related code (since the GNU C++ library relies on POSIX threading for parts of its implementation). The MinGW project has always been more careful about licensing and making sure that the code it was using was properly licensed and legal for usage. The MinGW project did follow the example of the MinGW64 project in one key area. They switched from public domain to a MIT license for their runtime library and Win32 API. When they did so without clearly indicating that an exception could be made similar to the GNU gcc runtime license exception, I felt it was time to stop using that version of the MinGW compiler.

I'm currently working with gcc 4.9.2 compiled from source. I'm still using the older public domain APIs, but I've made several modifications for compatibility with the Win32 API (including some modifications that aren't available in the MinGW64 libraries). I have a minimal thread library that was custom written for portability. It's based on C11 thread support and includes POSIX functionality. The gnu compiler is built with POSIX threads as the default so C++ threading works as expected.

So far, I've had no reason to want to work with any other MinGW forks. The version I have does everything I need and supports all the programs I want to compile on Windows. My particular fork is continually evolving. I continue to add support for new Win32 API changes, Win32 API omissions, new C/C++ features as I need them. At some point, I hope to completely replace the runtime library with code that better supports internationalization (better UTF-8 support), C standard compatibility and other useful features.

If anyone else is finding limitations with the compilers maintained by the various MinGW and MinGW64 projects or other related forks based on these projects, I highly recommending building the GNU compiler from source on your own with the options you need most. If you're interested in discussing the GNU compiler further or want to know more about my modifications, you're welcome to use the CppDesign mailing list ( http://groups.yahoo.com/group/CppDesign ) as a forum for further discussion.

December 2025

S M T W T F S
 123456
7 8910111213
14151617181920
21222324252627
28293031   

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Dec. 24th, 2025 10:16 am
Powered by Dreamwidth Studios