Quantcast
Channel: Java Programming Forum - Learn Java Programming - Apache POI
Viewing all articles
Browse latest Browse all 120

Help me! convert word document to pdf

$
0
0
I'm working with Apache POI , I have a project Convert word document to pdf. Now, I used Apache POI ,org.apache.poi.hwpf.extractor library to getText from word document:
Code:

import com.itextpdf.text.*;
import com.itextpdf.text.pdf.*;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
 POIFSFileSystem fs = null;
            fs = new POIFSFileSystem(new FileInputStream(filename));
            //Couldn't close the braces at the end as my site did not allow it to close

            HWPFDocument doc = new HWPFDocument(fs);
            WordExtractor we = new WordExtractor(doc);
            Document document = new Document();
            PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("/Program Files/NCMSCT/HopDong.pdf"));
            Range range = doc.getRange();
            document.open();
            writer.setPageEmpty(true);
            document.newPage();
            writer.setPageEmpty(true);

            String[] paragraphs = we.getParagraphText();
 for (int i = 0; i < paragraphs.length; i++) {
                paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
                System.out.println("Paragraph  " + i + ":  " + paragraphs[i]);
              System.out.println("Length:" + paragraphs[ i].length());

            }

but i can't get object :hyperlink, table, image and format of word document :=(:. I used other library as: jdoctopdf-0.9-beta.jar , tika-parsers-0.9-jdk14.jar library but doesn't get all format from word document. Therefore who have way help me, please reply soon. Thank all!:(handshake):

Viewing all articles
Browse latest Browse all 120

Trending Articles