SILVERCODERS DocToText is a powerful utility that can convert documents in many formats to plain text.
The package, available to users for free on open source GPL license, includes console application and C/C++ library, that allows embedding text extraction mechanism into other application.
The utility supports MS Office binary formats: MS Word (DOC), MS Excel (XLS, XLSB), MS PowerPoint (PPT),
Rich Text Format (RTF),
OpenDocument (also known as ODF and ISO/IEC 26300, full name: OASIS Open Document Format for Office Applications): text documents (ODT), spreadsheets (ODS), presentations (ODP), graphics (ODG),
Office Open XML (ISO/IEC 29500, also called OOXML, OpenXML or MSOOXML) documents: MS Word (DOCX), MS Excel (XLSX), MS PowerPoint (PPTX),
iWork formats (PAGES, NUMBERS, KEYNOTE),
OpenDocument Flat XML formats (FODP, FODS, FODT),
Portable Document Format (PDF),
Email files (EML)
and HyperText Markup Language (HTML).
Extracting plain text from doc, xls, ppt, rtf, odt, ods, odp, odg, docx, xlsx, pptx, pages, numbers, keynote, fodp, fods, fodt, pdf, eml and html files can be used for a lot of things like searching, indexing or archiving. DocToText can be also used as a fast console viewer.