This is a real simple library to extract the full text from pdf and Microsft Word 97/2000/XP documents.

There are two classes PDFExtractor and WordExtractor. Both classes have only one method extractText. This method takes an InputStream as an argument and returns a String.

//Example:

FileInputStream in = new FileInputStream ("test.doc");
WordExtractor extractor = new WordExtractor();

String str = extractor.extractText();


This product includes software developed by the Apache Software Foundation (http://www.apache.org/).