Comparison of desktop search software

From Wikinfo

Jump to: navigation, search

Note 1: Because it's necessary to register at wikinfo to edit information... which is a nuisance for many, you can email me any info you wish to see updated here.
My email is: the.real.monkey.d.luffy@thegrandline (replace thegrandline with gmail.com)

Note 2: Many of these projects will use Xesam in the future and as such will incomporate its features. From Xesam's website: Xesam is an umbrella project with the purpose of providing unified APIs and specs for desktop search- and metadata services. We are collaborating with several projects such as Tracker, Strigi, Beagle, Pinot, Recoll, and Nepomuk-KDE.

Contents

Feature comparison

Beagle[1] Tracker[2][3] Recoll[4][5] Strigi Jindex Pinot
Regular expressions Partial[6] Yes[7] No Yes  ? No
Boolean operators:
AND OR NOT
Yes Yes[8] Yes Yes  ? Yes
Multiple character encoding and languages Partial[9] Yes[10] Yes[11] Yes[12]  ? Yes[13]
Keyword search Yes Yes Yes Yes  ? Yes
Full text search Yes No[14] Yes No  ? Yes
Searching exact sentences supports:
line breaks
Yes  ? Yes No  ? Yes[15]
Searching exact sentences supports:
de-hyphenation on line breaks[16]
No  ? Partial[17] No  ? Partial[18]
Searching exact sentences supports:
text in columns[19]
 ?  ? Partial[20] No  ? Partial[21]
Searching exact sentences supports:
non-alphanumeric characters
Partial[22]  ? Partial[23] No  ? Partial[24]
Stemming Yes[25] Yes Yes No  ? Yes[26]
Allow user tagging No Yes No No  ? Yes
Restrict search to tags N/A Yes N/A N/A  ? Yes[27]
Restrict search to directories Partial[28] Yes Yes[29] Yes  ? Yes[30]
Metadata-based image retrieval Yes  ? Yes[31] Yes  ? Yes[32]
Content-based image retrieval No  ? No No  ? No


Thumbnails for indexed images and videos Yes[33] Yes No Partial[34]  ? No
Index archive files recursively Partial No[35] No Partial  ? Partial
Index removable media Yes[36] Yes[37] No No[38]  ? No


Removable media is cataloged[39] Yes[36] N/A N/A N/A  ? No
Different database catalogs for indexing data Yes Yes Yes Yes  ?  ?
File checksum (allows finding duplicate files) No No[40] No Yes  ? No[41]
Back end used Lucene.Net  ? Xapian CLucene
and Nepomuk
 ? Xapian

Operating systems supported

Beagle Tracker Recoll Strigi Jindex Pinot
Linux Yes Yes Yes Yes Yes Yes
Mac OS X Work In Progress Yes Yes Yes  ? No
Windows Work In Progress No No Yes  ? No

Archive file types supported

Beagle Tracker Recoll Strigi Jindex Pinot
zip Yes No No Yes  ? No
rar No No No No  ? No
7-zip No No No No  ? No
tar Yes No No Yes  ? Yes
gzip Yes No No Yes  ? No
bzip2 Yes No No Yes  ? No
Disk images  ?  ?  ?  ?  ? Partial[42]

Databases supported for storing indexed data

Beagle Tracker Recoll Strigi Jindex Pinot
SQLite  ? Yes No No  ? Partial[43]
Xapian  ?  ? Yes No  ? Partial[44]

See also

External links

Notes and references

  1. ^ http://mail.gnome.org/archives/dashboard-hackers/2008-March/msg00012.html
  2. ^ http://www.gnome.org/projects/tracker/features.html
  3. ^ http://mail.gnome.org/archives/tracker-list/2008-March/msg00031.html
  4. ^ Some of this information was obtained by contacting the author through email (unfortunately the conversation is not hosted anywhere since there is no mailing list).
  5. ^ http://www.lesbonscomptes.com/recoll/features.html
  6. ^ Only wildcard query terms supported for full text searches.
  7. ^ Through RDF query. Additionally, future Xesam implementation will do this too.
  8. ^ An expression tree is planned in the near future to do other booleans.
  9. ^ Planned. Currently, its utf8 by default if the encoding is not specified for the file (some files e.g. html files can specify the encoding in their metadata).
  10. ^ Everything is converted to UTF-8. Non-UTF8 needs the user's locales set up appropriately so that data can be successfully converted to UTF-8.
  11. ^ Support for multiple charsets. Internal processing and storage uses Unicode UTF-8.
  12. ^ Uses UTF-8
  13. ^ Internal conversion to UTF-8
  14. ^ Exact and precise phrases is planned to be supported shortly. It will be case-insensitive but otherwise precise including non-alphanumerics.
  15. ^ Yes, in the sense that line breaks are removed from text prior to indexing, and at search time.
  16. ^ That would mean the text "wa-(line break here)ter" would be indexed as "water".
  17. ^ This is a function of the input filter, so the support depends on the document type. Some filters do it (e.g. pdf), some don't.
  18. ^ Partial. At indexing time, this depends on filters capabilities. At search time, v0.89 and newer de-hyphene queries.
  19. ^ It is common for scientific articles to be available in PDF format where each page has two columns. This feature means that lines are index as per-column (correct mode) instead of per-page.
  20. ^ This is a function of the input filter. I think pdftotxt handles this correctly.
  21. ^ Partial, this depends on filters.
  22. ^ String "a+b" can be searched and will return matching files with "a+b" in them, but will also return files with "a-b", i.e. the non-alphanumeric character is not matched.
  23. ^ Depends on the characters. Some are stripped while indexing, some not. Email addresses are correctly matched, for example: "john@x.com" is not the same search as "john x com".
  24. ^ Partial, some characters are dropped at indexing time.
  25. ^ Full text searches are always stemmed. Keyword seaches are never stemmed.
  26. ^ Stemming is not globally set and is enabled for queries for which the language is set.
  27. ^ Yes, with the label: operator.
  28. ^ Searching by specifying a directory will only search in that directory (or directories if the name matches multiple actual locations) but no recursively in its subdirectories.
  29. ^ Also allows specific file name searches with wildcards.
  30. ^ With the dir: operator one can restrict the query to documents in folder A or exclude those in folder B.
  31. ^ If the exiftool helper is installed, image metadata is indexed. But there is no support for displaying images internally.
  32. ^ Yes, for files that embed EXIF data.
  33. ^ The search service does not generate thumbnails itself. The search GUIs use the thumbnailers of the respective desktop environments (e.g. beagle-search uses the GNOME thumbnailer, kerry uses KDE thumbnail API).
  34. ^ No thumbnails for videos.
  35. ^ Will probably do so soon.
  36. ^ a b This is in svn trunk. http://www.mail-archive.com/dashboard-hackers@gnome.org/msg04465.html
  37. ^ This is in svn trunk. http://www.mail-archive.com/tracker-list@gnome.org/msg03537.html
  38. ^ From the developer: You have to actively add it and index it. But it's quite cumbersome. Might as well make this a no too, until it is user friendly.
  39. ^ i.e. it's possible to distinguish different DVDs from eachother through their unique volume number and an optional user-provided string (number or name)
  40. ^ Hasn't been fully implemented since to the author's this has not been necessary.
  41. ^ But on the TODO list for a while.
  42. ^ ISO9660 images.
  43. ^ SQLite is used for historical data only.
  44. ^ Xapian is the index back-end.

Categories

Category:Desktop search engines

Category:Software comparisons

Disclaimer

This article is not present in wikipedia because their bureaucrats classified it as not being worthy. Their claims were the lack of 3rd party sources.

TODOs

  • Add information about supported application data and file types.
  • Add Google Desktop and others to the comparison list.
Personal tools