如何获取及查找文本（Java）

文本页面 (Text Page)

Foxit PDF SDK提供APIs来提取，选择，搜索和检索PDF文档中的文本。 PDF文本内容存储在与特定页面相关的TextPage对象中。TextPage类可用于获取PDF页面中文本的信息，例如单个字符，单个单词，指定字符范围或矩形内的文本内容等。它还可用于构造其他文本相关类的对象，用来对文本内容执行更多操作或从文本内容访问指定信息：

在PDF页面的文本内容中搜索文本，使用TextPage对象来构建TextSearch对象。
访问类似超文本链接的文本，使用TextPage对象来构建PageTextLinks对象。

Example:

如何从PDF页面中提取文本

import com.foxit.sdk.pdf.PDFDoc;
import com.foxit.sdk.pdf.TextPage;
...
// Assuming PDFPage page has been loaded and parsed.

// Get the text page object.
TextPage textpage = new TextPage(page, e_ParseTextNormal);
int nCharCount = textpage.getCharCount();
String texts = textpage.getChars(0, nCharCount);
...

如何在PDF文档中获取矩形区域中的文本

import com.foxit.sdk.pdf.PDFDoc;
import com.foxit.sdk.pdf.TextPage;
import com.foxit.sdk.common.fxcrt.RectF;
import com.foxit.sdk.common.fxcrt.RectFArray;
...
// Assuming PDFPage page has been loaded and parsed.
...

TextPage textpage = new TextPage(page, e_ParseTextNormal);
RectF selRc = new RectF(100,100,250,250);
String selText = textpage.getTextInRect(selRc);
...

文本搜索 (Text Search)

Foxit PDF SDK 提供APIs来搜索PDF文档、XFA文档、文本页面或者PDF注释中的文本。它提供了文本搜索和获取搜索结果的函数：

指定搜索模式和选项，使用TextSearch.setPattern、TextSearch.setStartPage (仅对PDF文档中的文本搜索有用)、TextSearch.setEndPage (仅对PDF文档中的文本搜索有用)、和TextSearch.setSearchFlags接口。
进行搜索，使用TextSearch.findNext和TextSearch.findPrev接口。
获取搜索结果，使用TextSearch.getMatchXXX() 接口。

Example:

如何在PDF文档中搜索指定的文本

import com.foxit.sdk.common.fxcrt.RectF;
import com.foxit.sdk.common.fxcrt.RectFArray;
import com.foxit.sdk.pdf.PDFDoc;
import com.foxit.sdk.pdf.TextSearch;
...
TextSearch search = new TextSearch(doc, null);
int start_index = 0, end_index = doc.getPageCount() - 1;
search.setStartPage(0);
search.setEndPage(doc.getPageCount() - 1);

String pattern = "Foxit";
search.setPattern(pattern);

int flags = e_SearchNormal;
// if want to specify flags, you can do it like this:
// flags |= TextSearch::e_SearchMatchCase;
// flags |= TextSearch::e_SearchMatchWholeWord;
// flags |= TextSearch::e_SearchConsecutive;
search.setSearchFlags(flags);
int match_count = 0;
while (search.findNext()) {
      RectFArray rect_array = search.getMatchRects();
      match_count++;
}
...

文本链接 (Text Link)

在PDF页面中，指向网站、网络资源以及电子邮件地址的超链接文本和普通文本一样。在处理文本链接之前，用户应首先调用PageTextLinks.getTextLink接口来获取一个textlink对象。

Example:

如何检索PDF页面中的超链接

import com.foxit.sdk.pdf.PDFPage;
import com.foxit.sdk.pdf.annots.*;
...

// Assuming PDFPage page has been loaded and parsed.
...

TextPage text_page = new TextPage(page, TextPage.e_ParseTextNormal);
PageTextLinks page_textlinks = new PageTextLinks(text_page);
TextLink text_link = page_textlinks.getTextLink(index); // specify an index.
String str_uri = text_link.getURI();
...

更新于 2020年4月22日

这篇文章有用吗?

Yes No

文本页面 (Text Page)

如何从PDF页面中提取文本

如何在PDF文档中获取矩形区域中的文本

文本搜索 (Text Search)

如何在PDF文档中搜索指定的文本

文本链接 (Text Link)

如何检索PDF页面中的超链接

这篇文章有用吗?

相关文章