[personal profile] lmemsm
It's hard to find public domain, Open Source and Creative Commons licensed language resources in formats that are easy for programs to work with. There are growing resources of scanned public domain books. Among them, you can find all kinds of dictionaries and references. Some sites even use an OCR to translate scanned documents to text formats. However, the translated versions are usually full of typographical errors.

There are a few projects out there that use Free, Open Source or Creative Commons licensing and have a goal of creating dictionaries or other references in accessible and searchable digital format, but not a lot. It would be nice to see more projects of this sort. The results could be useful with word processors and editors (such as LibreOffice, Abiword, SciTE), electronic dictionaries (such as stardict) and games (such as anagramarama and scramble).

Here are some of the projects I've located:

XDXF, the XML dictionary interchange format, project had a collection of dictionaries and language translation word lists they were working with and converting to various formats.
They also have tools for converting between XDXF and other formats:

The Moby project is a wonderful public domain resource. It includes word lists, thesaurus and more.

The Free Dictionaries Project also provides downloads. If you want to translate one language to another, this is a useful, free resource:

SCOWL (Spell Checker Oriented Word Lists) and Friends has useful word lists and resources for spell checker utilities:

YAWL (Yet Another Word List) is based on the updated Public Domain ENABLE (Enhanced North American Benchmark Lexicon).
You can also find Libre licensed word lists in FLOSS games such as anagramarama.

Here's a rhyming dictionary (source code and online example) that uses Moby project resources to find rhymes:

If you know of other projects or developments in this area, I'd love to hear about them ( http://www.distasis.com/connect.htm ).",public,0,,
16253,2017-03-28 07:59:00,2017-03-28 11:59:20,"My projects with rhyming and language translation and other dictionaries, word lists and thesauri","The last post mentioned other groups' projects with dictionaries and language resources. I thought I'd mention some of the projects I've been working on in this area.

I've been creating build scripts with the LM BLD project ( http://www.distasis.com/cpp/lmbld.htm ) so that I'll have automated, repeatable steps to build programs, libraries and other types of packages. Here are some of things I've been working on.

The Moby project is a very nice dictionary resource. Using their thesaurus, I was able to create a word list and a simple dictionary in stardict format. I use it with Open Source programs like scramble.

The Strongs concordance is in the public domain. I've created a translation dictionary in stardict format with it.

I happen to like the stardict dictionary format. There are several nice programs that can work with that format. I wanted something lightweight that would work well on older systems or let me create my own GUI interfaces. The closest thing I could find to what I wanted was sdcv. However, there were a few issues I had with it. The biggest is that it requires glib as a dependency and I didn't want to install GTK+ related dependencies on my systems. The second issue I ran into was that it couldn't handle some of the newer versions of the stardict format. Since the code is GNU GPL licensed, I started with it and made several modifications and customizations. The result is sdcv2 which can be linked to my own Unicode shared libraries in place of glib if desired and can work with dictionaries in more recent stardict formats. It may not make use of all the latest features in the newer formats, but it can at least access information from them.

I've seen other projects that use the sdcv library as a back-end and create their own GUI for a dictionary program. It makes sense if the program uses GTK+, but it seems awkward for Qt or other GUI programs to require GTK+ related dependencies. With sdcv2, there are no GTK+ related dependencies.

I would love to find a dictionary with a FLTK GUI, especially if it can handle stardict format. Since, I haven't been able to find one, I may try to write one at some point. I've also been thinking about creating a pdcurses front end. When I use sdcv (or sdcv2) from the command line, certain systems like Windows can't handle input or output of certain Unicode characters correctly. I've added support for SDL 2.x, SDL2_ttf and the ability to work with a range of Unicode characters within the UCS-2 character set to pdcurses. I think pdcurses would make an interesting front end for a program using the sdcv2 library. It would work on any system that supports SDL 1.x or 2.x, including more unusual operating systems like Syllable and Haiku. Would like to hear from others who may be interested in or are working on similar projects.

The dictzip program compresses dictionary files. It uses an extension to the gzip format with extra fields to include information about the compressed dictionary. Files compressed with this format often use the .dz extension. You can use dictzip with stardict files to save space. dictzip is primarily a POSIX compliant program, so it doesn't convert well to certain systems. I was able to find a Windows port that limited the program's functionality, but did enough to get the job of compression done. I've made some modifications to it and am using it as a cross-platform method of compressing stardict dictionary files.

Several utilities and conversion programs were created for stardict in the stardict-tools project. Similar to stardict and sdcv, glib is a dependency for stardict-tools. There are a few tools that use a GTK+ front-end as well. I personally only use the stardict-tools to convert tab delimited files and files in babylon format to stardict. So, I modified the command line tools that do those conversions to build without glib. I also created my own makefile just to build the tools I use.

I've searched and I've yet to find a rhyming dictionary in stardict format. So, I'm working on creating one. It's a slow process. I've taken a public domain rhyming dictionary as a starting point and I'm in the process of editing it and converting it to the format I need.

I've also been searching for an Open Source C/C++ grammer checker, but I've yet to find one that I like.

These are just some of the projects I'm working on. If you're interested in comparing notes on these topics or if you have recommendations of other dictionary and word related projects you like, feel free to contact me ( http://www.distasis.com/contact.htm ).

July 2017

234 5678

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 20th, 2017 12:41 pm
Powered by Dreamwidth Studios