Comparison of desktop search software
From Wikinfo
Note 1: Because it's necessary to register at wikinfo to edit information... which is a nuisance for many, you can email me any info you wish to see updated here.
My email is: the.real.monkey.d.luffy@thegrandline (replace thegrandline with gmail.com)
Note 2: Many of these projects will use Xesam in the future and as such will incomporate its features. From Xesam's website: Xesam is an umbrella project with the purpose of providing unified APIs and specs for desktop search- and metadata services. We are collaborating with several projects such as Tracker, Strigi, Beagle, Pinot, Recoll, and Nepomuk-KDE.
Contents |
Feature comparison
| Beagle[1] | Tracker[2][3] | Recoll[4][5] | Strigi | Jindex | Pinot | |
|---|---|---|---|---|---|---|
| Regular expressions | Partial[6] | Yes[7] | No | Yes | ? | No |
| Boolean operators: AND OR NOT | Yes | Yes[8] | Yes | Yes | ? | Yes |
| Multiple character encoding and languages | Partial[9] | Yes[10] | Yes[11] | Yes[12] | ? | Yes[13] |
| Keyword search | Yes | Yes | Yes | Yes | ? | Yes |
| Full text search | Yes | No[14] | Yes | No | ? | Yes |
| Searching exact sentences supports: line breaks | Yes | ? | Yes | No | ? | Yes[15] |
| Searching exact sentences supports: de-hyphenation on line breaks[16] | No | ? | Partial[17] | No | ? | Partial[18] |
| Searching exact sentences supports: text in columns[19] | ? | ? | Partial[20] | No | ? | Partial[21] |
| Searching exact sentences supports: non-alphanumeric characters | Partial[22] | ? | Partial[23] | No | ? | Partial[24] |
| Stemming | Yes[25] | Yes | Yes | No | ? | Yes[26] |
| Allow user tagging | No | Yes | No | No | ? | Yes |
| Restrict search to tags | N/A | Yes | N/A | N/A | ? | Yes[27] |
| Restrict search to directories | Partial[28] | Yes | Yes[29] | Yes | ? | Yes[30] |
| Metadata-based image retrieval | Yes | ? | Yes[31] | Yes | ? | Yes[32] |
| Content-based image retrieval | No | ? | No | No | ? | No
|
| Thumbnails for indexed images and videos | Yes[33] | Yes | No | Partial[34] | ? | No |
| Index archive files recursively | Partial | No[35] | No | Partial | ? | Partial |
| Index removable media | Yes[36] | Yes[37] | No | No[38] | ? | No
|
| Removable media is cataloged[39] | Yes[36] | N/A | N/A | N/A | ? | No |
| Different database catalogs for indexing data | Yes | Yes | Yes | Yes | ? | ? |
| File checksum (allows finding duplicate files) | No | No[40] | No | Yes | ? | No[41] |
| Back end used | Lucene.Net | ? | Xapian | CLucene and Nepomuk | ? | Xapian |
Operating systems supported
| Beagle | Tracker | Recoll | Strigi | Jindex | Pinot | |
|---|---|---|---|---|---|---|
| Linux | Yes | Yes | Yes | Yes | Yes | Yes |
| Mac OS X | Work In Progress | Yes | Yes | Yes | ? | No |
| Windows | Work In Progress | No | No | Yes | ? | No |
Archive file types supported
| Beagle | Tracker | Recoll | Strigi | Jindex | Pinot | |
|---|---|---|---|---|---|---|
| zip | Yes | No | No | Yes | ? | No |
| rar | No | No | No | No | ? | No |
| 7-zip | No | No | No | No | ? | No |
| tar | Yes | No | No | Yes | ? | Yes |
| gzip | Yes | No | No | Yes | ? | No |
| bzip2 | Yes | No | No | Yes | ? | No |
| Disk images | ? | ? | ? | ? | ? | Partial[42] |
Databases supported for storing indexed data
| Beagle | Tracker | Recoll | Strigi | Jindex | Pinot | |
|---|---|---|---|---|---|---|
| SQLite | ? | Yes | No | No | ? | Partial[43] |
| Xapian | ? | ? | Yes | No | ? | Partial[44] |
See also
External links
- Desktop search tools for GNU/Linux: the competition hots up - Tracker, Recoll Strigi and Deskbar
- Comparison of indexers: Beagle, JIndex, Tracker, Strigi (December 2006)
Notes and references
- ^ http://mail.gnome.org/archives/dashboard-hackers/2008-March/msg00012.html
- ^ http://www.gnome.org/projects/tracker/features.html
- ^ http://mail.gnome.org/archives/tracker-list/2008-March/msg00031.html
- ^ Some of this information was obtained by contacting the author through email (unfortunately the conversation is not hosted anywhere since there is no mailing list).
- ^ http://www.lesbonscomptes.com/recoll/features.html
- ^ Only wildcard query terms supported for full text searches.
- ^ Through RDF query. Additionally, future Xesam implementation will do this too.
- ^ An expression tree is planned in the near future to do other booleans.
- ^ Planned. Currently, its utf8 by default if the encoding is not specified for the file (some files e.g. html files can specify the encoding in their metadata).
- ^ Everything is converted to UTF-8. Non-UTF8 needs the user's locales set up appropriately so that data can be successfully converted to UTF-8.
- ^ Support for multiple charsets. Internal processing and storage uses Unicode UTF-8.
- ^ Uses UTF-8
- ^ Internal conversion to UTF-8
- ^ Exact and precise phrases is planned to be supported shortly. It will be case-insensitive but otherwise precise including non-alphanumerics.
- ^ Yes, in the sense that line breaks are removed from text prior to indexing, and at search time.
- ^ That would mean the text "wa-(line break here)ter" would be indexed as "water".
- ^ This is a function of the input filter, so the support depends on the document type. Some filters do it (e.g. pdf), some don't.
- ^ Partial. At indexing time, this depends on filters capabilities. At search time, v0.89 and newer de-hyphene queries.
- ^ It is common for scientific articles to be available in PDF format where each page has two columns. This feature means that lines are index as per-column (correct mode) instead of per-page.
- ^ This is a function of the input filter. I think pdftotxt handles this correctly.
- ^ Partial, this depends on filters.
- ^ String "a+b" can be searched and will return matching files with "a+b" in them, but will also return files with "a-b", i.e. the non-alphanumeric character is not matched.
- ^ Depends on the characters. Some are stripped while indexing, some not. Email addresses are correctly matched, for example: "john@x.com" is not the same search as "john x com".
- ^ Partial, some characters are dropped at indexing time.
- ^ Full text searches are always stemmed. Keyword seaches are never stemmed.
- ^ Stemming is not globally set and is enabled for queries for which the language is set.
- ^ Yes, with the label: operator.
- ^ Searching by specifying a directory will only search in that directory (or directories if the name matches multiple actual locations) but no recursively in its subdirectories.
- ^ Also allows specific file name searches with wildcards.
- ^ With the dir: operator one can restrict the query to documents in folder A or exclude those in folder B.
- ^ If the exiftool helper is installed, image metadata is indexed. But there is no support for displaying images internally.
- ^ Yes, for files that embed EXIF data.
- ^ The search service does not generate thumbnails itself. The search GUIs use the thumbnailers of the respective desktop environments (e.g. beagle-search uses the GNOME thumbnailer, kerry uses KDE thumbnail API).
- ^ No thumbnails for videos.
- ^ Will probably do so soon.
- ^ a b This is in svn trunk. http://www.mail-archive.com/dashboard-hackers@gnome.org/msg04465.html
- ^ This is in svn trunk. http://www.mail-archive.com/tracker-list@gnome.org/msg03537.html
- ^ From the developer: You have to actively add it and index it. But it's quite cumbersome. Might as well make this a no too, until it is user friendly.
- ^ i.e. it's possible to distinguish different DVDs from eachother through their unique volume number and an optional user-provided string (number or name)
- ^ Hasn't been fully implemented since to the author's this has not been necessary.
- ^ But on the TODO list for a while.
- ^ ISO9660 images.
- ^ SQLite is used for historical data only.
- ^ Xapian is the index back-end.
Categories
Category:Desktop search engines
Disclaimer
This article is not present in wikipedia because their bureaucrats classified it as not being worthy. Their claims were the lack of 3rd party sources.
TODOs
- Add information about supported application data and file types.
- Add Google Desktop and others to the comparison list.

