Constructor and Description |
---|
PdfExtractor()
Initializes new
PdfExtractor object. |
PdfExtractor(IDocument document)
Initializes new
PdfExtractor object on base of the document . |
Modifier and Type | Method and Description |
---|---|
void |
bindPdf(InputStream inputStream)
Binds PDF document from stream.
|
void |
bindPdf(String inputFile)
Bind input PDF file.
|
void |
extractAttachment() |
void |
extractAttachment(String attachmentFileName)
Extracts attachment to PDF file by attachment name.
|
void |
extractImage()
Extract images from PDF file.
|
void |
extractText()
Extracts text from a Pdf document.
|
void |
extractText(com.aspose.ms.System.Text.Encoding encoding)
Extracts text from a Pdf document using specified encoding.
|
ByteArrayOutputStream[] |
getAttachment()
Saves all the attachment file to streams.
|
void |
getAttachment(String outputPath)
Stores attachment into file.
|
com.aspose.ms.System.Collections.ArrayList |
getAttachmentInfo()
Gets the list of attachments.
|
List |
getAttachNames()
Returns list of attachments in PDF file.
|
int |
getEndPage()
Gets or sets end page in the page range where extracting operation will be performed.
|
int |
getExtractImageMode()
Sets the mode for extract images process.
|
int |
getExtractTextMode()
Sets the mode for extract text's result.
|
boolean |
getNextImage(OutputStream outputStream)
Retreive next image from PDF file and stores it into stream.
|
boolean |
getNextImage(OutputStream outputStream,
com.aspose.ms.System.Drawing.Imaging.ImageFormat format)
Retreive next image from PDF file and stores it into stream with given image format.
|
boolean |
getNextImage(String outputFile)
Retreives next image from PDF document.
|
boolean |
getNextImage(String outputFile,
com.aspose.ms.System.Drawing.Imaging.ImageFormat format)
Retreives next image from PDF document with given image format.
|
void |
getNextPageText(OutputStream outputStream)
Saves one page's text to stream.
|
void |
getNextPageText(String outputFile)
Saves one page's text to file.
|
String |
getPassword()
Gets input file's password.
|
int |
getResolution()
Gets resolution for extracted images.
|
int |
getStartPage()
Gets or sets start page in the page range where extracting operation will be performed.
|
void |
getText(OutputStream outputStream)
Saves text to stream. see also:
ExtractText
|
void |
getText(OutputStream outputStream,
boolean filterNotAscii)
Saves text to stream. see also:
ExtractText
|
void |
getText(String outputFile)
Saves text to file. see also:
ExtractText
|
void |
getTextInternal(com.aspose.ms.System.IO.Stream outputStream) |
TextSearchOptions |
getTextSearchOptions()
Gets or sets text search options.
|
boolean |
hasNextImage()
Checks if more images are accessible in PDF document.
|
boolean |
hasNextPageText()
Indicates that whether can get more texts or not.
|
boolean |
isBidi()
Is true when text has hebriew or arabic symbols.
|
void |
setEndPage(int value) |
void |
setExtractImageMode(int value) |
void |
setExtractTextMode(int value) |
void |
setPassword(String value)
Sets input file's password.
|
void |
setResolution(int value)
Set resolution for extracted images.
|
void |
setStartPage(int value) |
void |
setTextSearchOptions(TextSearchOptions value) |
public PdfExtractor()
Initializes new PdfExtractor
object.
public PdfExtractor(IDocument document)
Initializes new PdfExtractor
object on base of the document
.
document
- Pdf document.public int getStartPage()
Gets or sets start page in the page range where extracting operation will be performed.
PdfExtractor ext = new PdfExtractor(); ext.bindBdf("sample.pdf"); ext.setStartPage(2); ext.setEndPage(5); ext.extractText();
public void setStartPage(int value)
public int getEndPage()
Gets or sets end page in the page range where extracting operation will be performed.
PdfExtractor ext = new PdfExtractor(); ext.bindBdf("sample.pdf"); ext.setStartPage(2); ext.setEndPage(3); ext.extractText();
public void setEndPage(int value)
public int getExtractTextMode()
Sets the mode for extract text's result.
Value: 0 is pure text mode and 1 is raw ordering mode. Default is 0.The example demonstratres theExtractTextMode
property usage in text extraction scenario.PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf(@"D:\Text\text.pdf"); extractor.setExtractTextMode(1); extractor.extractText(); extractor.getText(@"D:\Text\text.txt");
public void setExtractTextMode(int value)
public TextSearchOptions getTextSearchOptions()
Gets or sets text search options.
public void setTextSearchOptions(TextSearchOptions value)
public int getExtractImageMode()
Sets the mode for extract images process.
public void setExtractImageMode(int value)
public boolean isBidi()
Is true when text has hebriew or arabic symbols. This case must be specially considered because string functions change their behaviour and start process text from right to left (except numbers and other non text chars).
public void extractText()
Extracts text from a Pdf document.
First example demonstratres how to extract all the text from PDF file.PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf("D:\Text\text.pdf"); extractor.extractText(); extractor.getText("D:\Text\text.txt");Second example demonstratres how to extract each page's text into one txt file.PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf(TestPath + "Aspose.Pdf.Kit.Pdf"); extractor.extractText(); String prefix = TestPath + "Aspose.Pdf.Kit"; String suffix = ".txt"; int pageCount = 1; while (extractor.hasNextPageText()) { extractor.getNextPageText(prefix + pageCount + suffix); pageCount++; }
public void extractText(com.aspose.ms.System.Text.Encoding encoding)
Extracts text from a Pdf document using specified encoding.
First example demonstratres how to extract all the text from PDF file.PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf(@"D:\Text\text.pdf"); extractor.extractText(Encoding.Unicode); extractor.getText(@"D:\Text\text.txt");Second example demonstratres how to extract each page's text into one txt file.PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf(TestPath + "Aspose.Pdf.Kit.Pdf"); extractor.extractText(Encoding.Unicode); String prefix = TestPath + "Aspose.Pdf.Kit"; String suffix = ".txt"; int pageCount = 1; while (extractor.hasNextPageText()) { extractor.getNextPageText(prefix + pageCount + suffix); pageCount++; }
encoding
- The encoding of the extracted text.public void getText(String outputFile)
Saves text to file. see also:ExtractText
outputFile
- The file path and name to save the text.public void getText(OutputStream outputStream)
Saves text to stream. see also:ExtractText
outputStream
- The stream to save the text.public void getTextInternal(com.aspose.ms.System.IO.Stream outputStream)
public void bindPdf(String inputFile)
Bind input PDF file.
PdfExtractor ext = new PdfExtractor(); ext.bindPdf("sample.pdf");
public void bindPdf(InputStream inputStream)
Binds PDF document from stream.
PdfExtractor ext = new PdfExtractor(); Stream stream = new FileInputStream("sample.pdf", FileMode.Open, FileAccess.Read); ext.bindPdf(stream);
public void extractImage()
Extract images from PDF file.
PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf("sample.pdf"); extractor.extractImage(); int i = 1; while (extractor.HasNextImage()) { extractor.getNextImage("image-" + i +".pdf"); }
public boolean hasNextImage()
Checks if more images are accessible in PDF document. Note: ExtractImage must be called before using of this method.
PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf("sample.pdf"); extractor.extractImage(); int i = 1; while (extractor.hasNextImage()) { extractor.getNextImage("image-" + i +".pdf"); }
public boolean getNextImage(String outputFile)
Retreives next image from PDF document. Note: ExtractImage must be called before using of this method.
PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf("sample.pdf"); extractor.extractImage(); int i = 1; while (extractor.HasNextImage()) { extractor.getNextImage("image-" + i +".pdf"); }
outputFile
- File where image will be storedpublic boolean getNextImage(String outputFile, com.aspose.ms.System.Drawing.Imaging.ImageFormat format)
Retreives next image from PDF document with given image format. Note: ExtractImage must be called before using of this method.
outputFile
- File where image will be storedpublic boolean getNextImage(OutputStream outputStream, com.aspose.ms.System.Drawing.Imaging.ImageFormat format)
Retreive next image from PDF file and stores it into stream with given image format.
outputStream
- Stream where image data will be savedformat
- The format of the image.public boolean getNextImage(OutputStream outputStream)
Retreive next image from PDF file and stores it into stream.
outputStream
- Stream where image data will be savedpublic List getAttachNames()
Returns list of attachments in PDF file. Note: ExtractAttachments must be called befor using this method.
Example demonstrates how to extract attachment names form PDF file.PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf(TestSettings.GetInputFile("sample.pdf")); extractor.ExtractAttachment(); List attachments = extractor.getAttachNames(); for (string name : attachments) System.out.println(name);
public void extractAttachment()
public void extractAttachment(String attachmentFileName)
Extracts attachment to PDF file by attachment name.
attachmentFileName
- Name of attachment to extractpublic void getAttachment(String outputPath)
Stores attachment into file.
outputPath
- Directory path where attachment(s) will be stored.
Null or empty string means attachment(s) will be placed in the application directory.public boolean hasNextPageText()
Indicates that whether can get more texts or not.
The example demonstratres theHasNextPageText
property usage in text extraction scenario.PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf(TestPath + "Aspose.Pdf.Kit.Pdf"); extractor.extractText(Encoding.Unicode); String prefix = TestPath + "Aspose.Pdf.Kit"; String suffix = ".txt"; int pageCount = 1; while (extractor.hasNextPageText()) { extractor.getNextPageText(prefix + pageCount + suffix); pageCount++; }
public void getNextPageText(String outputFile)
Saves one page's text to file.
The example demonstratres the GetNextPageText method usage in text extraction scenario.PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf(TestPath + @"Aspose.Pdf.Kit.Pdf"); extractor.extractText(Encoding.Unicode); String prefix = TestPath + @"Aspose.Pdf.Kit"; String suffix = ".txt"; int pageCount = 1; while (extractor.hasNextPageText()) { extractor.getNextPageText(prefix + pageCount + suffix); pageCount++; }
outputFile
- The file path and name to save the text.public void getNextPageText(OutputStream outputStream)
Saves one page's text to stream.
The example demonstratres theGetNextPageText
method usage in text extraction scenario.PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf(TestPath + @"Aspose.Pdf.Kit.Pdf"); extractor.extractText(Encoding.Unicode); String prefix = TestPath + @"Aspose.Pdf.Kit"; String suffix = ".txt"; int pageCount = 1; while (extractor.hasNextPageText()) { FileInputStream fs = new FileInputStream(prefix + pageCount + suffix, FileMode.Create); extractor.getNextPageText(prefix + pageCount + suffix); fs.Close(); pageCount++; }
outputStream
- The stream to save the text.public void getText(OutputStream outputStream, boolean filterNotAscii)
Saves text to stream. see also:ExtractText
outputStream
- The stream to save the text.filterNotAscii
- If this parameter is true all Not ASCII simbols will be removedpublic ByteArrayOutputStream[] getAttachment()
Saves all the attachment file to streams.
[C#] PdfExtractor extractor = new PdfExtractor(); extractor.bindPdf(path + "Attach.pdf"); extractor.extractAttachment(); IList names = extractor.getAttachNames(); ByteArrayOutputStream[] tempStreams = extractor.getAttachment(); for (int i=0; i<tempStreams.Length; i++) { string name = (string)names[i]; OutputStream fs = new FileOutputStream(path + name); os.write(tempStreams[i].toByteArray()); fs.Close(); }
public com.aspose.ms.System.Collections.ArrayList getAttachmentInfo()
Gets the list of attachments.
public int getResolution()
Gets resolution for extracted images. Default value is 150. Images which have greater resolution value are more clear. However increasing resolution value results in increasing time and memory needed to extract images. Usually to get clear image it's enough to set resolution to 150 or 300.
public void setResolution(int value)
Set resolution for extracted images. Default value is 150. Images which have greater resolution value are more clear. However increasing resolution value results in increasing time and memory needed to extract images. Usually to get clear image it's enough to set resolution to 150 or 300.
public String getPassword()
Gets input file's password.
public void setPassword(String value)
Sets input file's password.
Copyright © 2017 Aspose. All Rights Reserved.