企业级PDF阅读器架构设计

企业级PDF阅读器架构设计

概述

EmbedPDF是一个基于WebAssembly的企业级PDF阅读器,采用插件化架构,支持多框架集成。本文将深入介绍其技术架构,包括PDFium引擎、插件系统、框架适配等核心设计。

项目架构

Monorepo结构

embedpdf/
├── packages/                  # 40+核心包
│   ├── core/                  # 核心插件系统
│   ├── engines/               # 渲染引擎
│   ├── pdfium/               # PDFium WASM封装
│   ├── models/               # 共享类型定义
│   ├── plugin-render/        # 渲染插件
│   ├── plugin-annotation/    # 注释插件
│   ├── plugin-search/        # 搜索插件
│   ├── plugin-zoom/          # 缩放插件
│   ├── plugin-scroll/        # 滚动插件
│   └── ...                   # 其他插件
├── examples/                  # 框架示例
│   ├── react/                # React示例
│   ├── vue/                  # Vue示例
│   ├── svelte/               # Svelte示例
│   └── vanilla/              # 原生JS示例
├── viewers/                   # 预构建查看器
│   ├── snippet/              # 代码片段查看器
│   ├── react/                # React查看器
│   └── vue/                  # Vue查看器
└── website/                   # 文档网站

架构分层

PDF阅读器架构

PDFium引擎

为什么选择PDFium

PDFium是Google Chrome浏览器的PDF渲染引擎,具有以下优势:

  • 功能完整:支持完整的PDF规范
  • 性能优秀:C++实现,性能接近原生
  • WebAssembly:可编译为WASM在浏览器运行
  • 开源免费:BSD许可证

引擎架构

// packages/engines/src/lib/pdfium/engine.ts

export interface PdfEngine<T = Blob> {
  // 文档操作
  openDocumentUrl(file: PdfFileUrl, options?): PdfTask<PdfDocumentObject>
  openDocumentBuffer(file: PdfFile, options?): PdfTask<PdfDocumentObject>
  getMetadata(doc: PdfDocumentObject): PdfTask<PdfMetadataObject>
  closeDocument(doc: PdfDocumentObject): PdfTask<boolean>
  
  // 页面渲染
  renderPage(doc, page, options?): PdfTask<T>
  renderPageRaw(doc, page, options?): PdfTask<ImageDataLike>
  renderThumbnail(doc, page, options?): PdfTask<T>
  
  // 文本和搜索
  search(doc, page, keyword, options?): PdfTask<SearchResult[]>
  extractTextRects(doc, page): PdfTask<PdfTextRectObject[]>
  extractText(doc, page): PdfTask<string>
  
  // 注释
  getAnnotations(doc, page): PdfTask<PdfAnnotationObject[]>
  addAnnotation(doc, page, annotation): PdfTask<boolean>
  updateAnnotation(doc, page, annotation): PdfTask<boolean>
  deleteAnnotation(doc, page, id): PdfTask<boolean>
  
  // 表单
  getFormFields(doc, page): PdfTask<PdfWidgetAnnoObject[]>
  setFormFieldValue(doc, field, value): PdfTask<boolean>
  
  // 高级功能
  redactText(doc, page, rects, options?): PdfTask<boolean>
  flattenPage(doc, page, flags?): PdfTask<PdfPageFlattenResult>
  exportDocument(doc, options?): PdfTask<Uint8Array>
  printDocument(doc, options?): PdfTask<boolean>
}

WASM封装

// packages/pdfium/src/index.ts
import { createPDFium } from './lib/pdfium-loader'
import { PDFiumEngine } from './lib/engine'

export async function createPDFiumEngine(config: PDFiumConfig): Promise<PdfEngine> {
  // 加载WASM模块
  const pdfiumModule = await createPDFium({
    wasmUrl: config.wasmUrl || '/pdfium.wasm',
    workerUrl: config.workerUrl,
    useWorker: config.useWorker ?? true,
  })
  
  // 创建引擎实例
  const engine = new PDFiumEngine(pdfiumModule, config)
  
  // 初始化
  await engine.initialize()
  
  return engine
}

// 使用示例
const engine = await createPDFiumEngine({
  wasmUrl: '/assets/pdfium.wasm',
  useWorker: true,  // 使用Web Worker避免阻塞UI
})

// 打开PDF
const doc = await engine.openDocumentUrl({
  url: '/documents/sample.pdf',
  password: '',
})

// 渲染第一页
const page = doc.pages[0]
const blob = await engine.renderPage(doc, page, {
  scale: 1.5,
  rotation: 0,
})

插件系统

核心设计

// packages/core/src/lib/base/base-plugin.ts

export abstract class BasePlugin<
  TConfig = any,
  TCapability = any,
  TState = any,
  TAction extends Action = Action
> implements IPlugin<TConfig> {
  
  protected pluginStore: PluginStore<TState, TAction>
  protected coreStore: Store<CoreState, CoreAction>
  protected readonly engine: PdfEngine
  
  // 插件配置
  abstract getConfig(): TConfig
  
  // 插件清单
  abstract getManifest(): PluginManifest
  
  // 生命周期钩子
  protected onDocumentLoadingStarted(documentId: string): void {}
  protected onDocumentLoaded(documentId: string): void {}
  protected onDocumentClosed(documentId: string): void {}
  protected onPageRendered(page: number): void {}
  protected onScaleChanged(documentId: string, scale: number): void {}
  protected onRotationChanged(documentId: string, rotation: number): void {}
  
  // 暴露能力
  protected abstract buildCapability(): TCapability
  
  public provides(): Readonly<TCapability> {
    return this.buildCapability()
  }
}

// 插件清单
export interface PluginManifest {
  name: string
  version: string
  description: string
  author: string
  dependencies: string[]
  capabilities: string[]
}

渲染插件示例

// packages/plugin-render/src/lib/render-plugin.ts
import { BasePlugin } from '@embedpdf/core'

export interface RenderPluginConfig {
  quality: 'low' | 'medium' | 'high'
  enableTiling: boolean
  tileSize: number
}

export interface RenderCapability {
  renderPage: (pageNumber: number, options: RenderOptions) => Promise<Blob>
  renderThumbnail: (pageNumber: number, options: ThumbnailOptions) => Promise<Blob>
  getRenderedPage: (pageNumber: number) => RenderedPage | undefined
  invalidatePage: (pageNumber: number) => void
}

export class RenderPlugin extends BasePlugin<
  RenderPluginConfig,
  RenderCapability,
  RenderState,
  RenderAction
> {
  
  private renderCache: Map<number, RenderedPage> = new Map()
  private tileCache: Map<string, Tile> = new Map()
  
  getConfig(): RenderPluginConfig {
    return {
      quality: 'high',
      enableTiling: true,
      tileSize: 512,
    }
  }
  
  getManifest(): PluginManifest {
    return {
      name: '@embedpdf/plugin-render',
      version: '1.0.0',
      description: 'PDF page rendering plugin',
      author: 'EmbedPDF Team',
      dependencies: [],
      capabilities: ['rendering'],
    }
  }
  
  protected buildCapability(): RenderCapability {
    return {
      renderPage: this.renderPage.bind(this),
      renderThumbnail: this.renderThumbnail.bind(this),
      getRenderedPage: this.getRenderedPage.bind(this),
      invalidatePage: this.invalidatePage.bind(this),
    }
  }
  
  private async renderPage(
    pageNumber: number,
    options: RenderOptions
  ): Promise<Blob> {
    const cacheKey = `${pageNumber}-${options.scale}-${options.rotation}`
    
    // 检查缓存
    if (this.renderCache.has(cacheKey)) {
      return this.renderCache.get(cacheKey)!.blob
    }
    
    // 获取文档
    const doc = this.coreStore.getState().document.current
    if (!doc) throw new Error('No document loaded')
    
    // 渲染页面
    const blob = await this.engine.renderPage(doc, doc.pages[pageNumber - 1], {
      scale: options.scale,
      rotation: options.rotation,
    })
    
    // 缓存结果
    this.renderCache.set(cacheKey, {
      blob,
      pageNumber,
      scale: options.scale,
      timestamp: Date.now(),
    })
    
    return blob
  }
  
  // 瓦片渲染(用于大图分块)
  private async renderTile(
    pageNumber: number,
    tileX: number,
    tileY: number,
    zoom: number
  ): Promise<Tile> {
    const tileKey = `${pageNumber}-${tileX}-${tileY}-${zoom}`
    
    if (this.tileCache.has(tileKey)) {
      return this.tileCache.get(tileKey)!
    }
    
    // 计算瓦片区域
    const tileSize = this.getConfig().tileSize
    const rect = {
      left: tileX * tileSize / zoom,
      top: tileY * tileSize / zoom,
      right: (tileX + 1) * tileSize / zoom,
      bottom: (tileY + 1) * tileSize / zoom,
    }
    
    // 渲染瓦片
    const blob = await this.engine.renderPage(
      this.coreStore.getState().document.current!,
      { number: pageNumber },
      {
        scale: zoom,
        clipRect: rect,
      }
    )
    
    const tile: Tile = {
      key: tileKey,
      blob,
      x: tileX,
      y: tileY,
      zoom,
    }
    
    this.tileCache.set(tileKey, tile)
    return tile
  }
}

注释插件

// packages/plugin-annotation/src/lib/annotation-plugin.ts

export interface AnnotationCapability {
  // CRUD操作
  getAnnotations: (pageNumber: number) => Annotation[]
  addAnnotation: (annotation: Annotation) => Promise<void>
  updateAnnotation: (id: string, updates: Partial<Annotation>) => Promise<void>
  deleteAnnotation: (id: string) => Promise<void>
  
  // 批注操作
  addComment: (annotationId: string, comment: Comment) => Promise<void>
  getComments: (annotationId: string) => Comment[]
  
  // 状态
  getSelectedAnnotation: () => Annotation | undefined
  setSelectedAnnotation: (id: string | undefined) => void
}

// 支持的注释类型
export enum AnnotationType {
  HIGHLIGHT = 'highlight',     // 高亮
  UNDERLINE = 'underline',     // 下划线
  STRIKEOUT = 'strikeout',     // 删除线
  SQUIGGLY = 'squiggly',       // 波浪线
  TEXT = 'text',               // 文本注释
  FREETEXT = 'freetext',       // 自由文本
  INK = 'ink',                 // 手写墨迹
  SQUARE = 'square',           // 矩形
  CIRCLE = 'circle',           // 圆形
  LINE = 'line',               // 直线
  ARROW = 'arrow',             // 箭头
  STAMP = 'stamp',             // 印章
  FILEATTACHMENT = 'fileattachment', // 附件
}

export class AnnotationPlugin extends BasePlugin<
  AnnotationConfig,
  AnnotationCapability,
  AnnotationState,
  AnnotationAction
> {
  
  private annotations: Map<number, Annotation[]> = new Map()
  
  protected buildCapability(): AnnotationCapability {
    return {
      getAnnotations: this.getAnnotations.bind(this),
      addAnnotation: this.addAnnotation.bind(this),
      updateAnnotation: this.updateAnnotation.bind(this),
      deleteAnnotation: this.deleteAnnotation.bind(this),
      addComment: this.addComment.bind(this),
      getComments: this.getComments.bind(this),
      getSelectedAnnotation: () => this.pluginStore.getState().selectedAnnotation,
      setSelectedAnnotation: (id) => {
        this.pluginStore.dispatch({ type: 'SELECT_ANNOTATION', payload: id })
      },
    }
  }
  
  private async addAnnotation(annotation: Annotation): Promise<void> {
    // 保存到内存
    const pageAnnotations = this.annotations.get(annotation.pageNumber) || []
    pageAnnotations.push(annotation)
    this.annotations.set(annotation.pageNumber, pageAnnotations)
    
    // 保存到PDF
    const doc = this.coreStore.getState().document.current
    if (doc) {
      await this.engine.addAnnotation(doc, { number: annotation.pageNumber }, {
        type: annotation.type,
        rect: annotation.rect,
        color: annotation.color,
        contents: annotation.content,
      })
    }
    
    // 触发事件
    this.emit('ANNOTATION_ADDED', annotation)
  }
}

框架适配

React适配

// packages/plugin-render/src/react/hooks.ts
import { useCallback, useEffect, useState } from 'react'
import { useEmbedPDF } from '@embedpdf/core/react'

export function useRender() {
  const { pluginRegistry } = useEmbedPDF()
  const [renderedPages, setRenderedPages] = useState<Map<number, Blob>>(new Map())
  
  const renderPlugin = pluginRegistry.getPlugin<RenderPlugin>('@embedpdf/plugin-render')
  
  const renderPage = useCallback(async (pageNumber: number, options: RenderOptions) => {
    if (!renderPlugin) return
    
    const blob = await renderPlugin.provides().renderPage(pageNumber, options)
    setRenderedPages(prev => new Map(prev).set(pageNumber, blob))
    return blob
  }, [renderPlugin])
  
  const invalidatePage = useCallback((pageNumber: number) => {
    if (!renderPlugin) return
    renderPlugin.provides().invalidatePage(pageNumber)
    setRenderedPages(prev => {
      const next = new Map(prev)
      next.delete(pageNumber)
      return next
    })
  }, [renderPlugin])
  
  return {
    renderedPages,
    renderPage,
    invalidatePage,
  }
}

// React组件
export function PDFViewer({ documentUrl }: { documentUrl: string }) {
  const { document, loading, error } = useDocument(documentUrl)
  const { renderedPages, renderPage } = useRender()
  const { scale } = useZoom()
  const { currentPage } = useScroll()
  
  useEffect(() => {
    if (document && currentPage > 0) {
      renderPage(currentPage, { scale, rotation: 0 })
    }
  }, [document, currentPage, scale, renderPage])
  
  if (loading) return <LoadingSpinner />
  if (error) return <ErrorMessage error={error} />
  
  return (
    <div className="pdf-viewer">
      <Toolbar />
      <Viewport>
        <Scroller>
          {document?.pages.map((page, index) => (
            <Page
              key={page.id}
              pageNumber={index + 1}
              blob={renderedPages.get(index + 1)}
              width={page.width * scale}
              height={page.height * scale}
              onVisible={() => renderPage(index + 1, { scale })}
            />
          ))}
        </Scroller>
      </Viewport>
    </div>
  )
}

Vue适配

// packages/plugin-render/src/vue/composables.ts
import { ref, computed, watch } from 'vue'
import { useEmbedPDF } from '@embedpdf/core/vue'

export function useRender() {
  const { pluginRegistry } = useEmbedPDF()
  const renderedPages = ref<Map<number, Blob>>(new Map())
  
  const renderPlugin = computed(() => 
    pluginRegistry.getPlugin<RenderPlugin>('@embedpdf/plugin-render')
  )
  
  const renderPage = async (pageNumber: number, options: RenderOptions) => {
    if (!renderPlugin.value) return
    
    const blob = await renderPlugin.value.provides().renderPage(pageNumber, options)
    renderedPages.value.set(pageNumber, blob)
    return blob
  }
  
  return {
    renderedPages,
    renderPage,
  }
}

// Vue组件
<template>
  <div class="pdf-viewer">
    <Toolbar />
    <Viewport>
      <Scroller>
        <Page
          v-for="(page, index) in document?.pages"
          :key="page.id"
          :page-number="index + 1"
          :blob="renderedPages.get(index + 1)"
          :width="page.width * scale"
          :height="page.height * scale"
          @visible="renderPage(index + 1, { scale })"
        />
      </Scroller>
    </Viewport>
  </div>
</template>

<script setup lang="ts">
import { useDocument } from '@embedpdf/core/vue'
import { useRender } from '@embedpdf/plugin-render/vue'
import { useZoom } from '@embedpdf/plugin-zoom/vue'

const props = defineProps<{ documentUrl: string }>()

const { document, loading, error } = useDocument(props.documentUrl)
const { renderedPages, renderPage } = useRender()
const { scale } = useZoom()
</script>

核心状态管理

// packages/core/src/lib/store/store.ts

export interface CoreState {
  document: {
    current: PdfDocumentObject | null
    loading: boolean
    error: Error | null
  }
  view: {
    scale: number
    rotation: number
    currentPage: number
    visiblePages: number[]
  }
}

export type CoreAction =
  | { type: 'DOCUMENT_LOADING' }
  | { type: 'DOCUMENT_LOADED'; payload: PdfDocumentObject }
  | { type: 'DOCUMENT_ERROR'; payload: Error }
  | { type: 'DOCUMENT_CLOSED' }
  | { type: 'SET_SCALE'; payload: number }
  | { type: 'SET_ROTATION'; payload: number }
  | { type: 'SET_CURRENT_PAGE'; payload: number }
  | { type: 'SET_VISIBLE_PAGES'; payload: number[] }

export const coreReducer = (state: CoreState, action: CoreAction): CoreState => {
  switch (action.type) {
    case 'DOCUMENT_LOADING':
      return { ...state, document: { ...state.document, loading: true, error: null } }
    case 'DOCUMENT_LOADED':
      return { ...state, document: { current: action.payload, loading: false, error: null } }
    case 'DOCUMENT_ERROR':
      return { ...state, document: { ...state.document, loading: false, error: action.payload } }
    case 'SET_SCALE':
      return { ...state, view: { ...state.view, scale: action.payload } }
    case 'SET_CURRENT_PAGE':
      return { ...state, view: { ...state.view, currentPage: action.payload } }
    default:
      return state
  }
}

// Store实现
export class Store<TState, TAction extends Action> {
  private state: TState
  private listeners: Set<(state: TState, prevState: TState) => void> = new Set()
  
  constructor(
    private reducer: (state: TState, action: TAction) => TState,
    initialState: TState
  ) {
    this.state = initialState
  }
  
  getState(): TState {
    return this.state
  }
  
  dispatch(action: TAction): void {
    const prevState = this.state
    this.state = this.reducer(this.state, action)
    this.listeners.forEach(listener => listener(this.state, prevState))
  }
  
  subscribe(listener: (state: TState, prevState: TState) => void): () => void {
    this.listeners.add(listener)
    return () => this.listeners.delete(listener)
  }
}

使用示例

// 完整使用示例
import { createPDFViewer } from '@embedpdf/core'
import { RenderPlugin } from '@embedpdf/plugin-render'
import { ZoomPlugin } from '@embedpdf/plugin-zoom'
import { ScrollPlugin } from '@embedpdf/plugin-scroll'
import { AnnotationPlugin } from '@embedpdf/plugin-annotation'
import { SearchPlugin } from '@embedpdf/plugin-search'

async function initPDFViewer(container: HTMLElement, url: string) {
  const viewer = await createPDFViewer({
    container,
    engine: {
      wasmUrl: '/assets/pdfium.wasm',
    },
    plugins: [
      new RenderPlugin({ quality: 'high', enableTiling: true }),
      new ZoomPlugin({ min: 0.25, max: 5, step: 0.25 }),
      new ScrollPlugin({ mode: 'vertical', smooth: true }),
      new AnnotationPlugin({ enabledTypes: ['highlight', 'text', 'ink'] }),
      new SearchPlugin({ caseSensitive: false, wholeWord: false }),
    ],
  })
  
  // 加载文档
  await viewer.loadDocument(url)
  
  // API使用
  const render = viewer.getCapability<RenderCapability>('rendering')
  const zoom = viewer.getCapability<ZoomCapability>('zoom')
  const search = viewer.getCapability<SearchCapability>('search')
  
  // 搜索
  const results = await search.search('keyword', { highlight: true })
  
  // 缩放
  zoom.setScale(1.5)
  
  return viewer
}

总结

企业级PDF阅读器的设计要点:

  1. WASM引擎:PDFium提供完整的PDF处理能力
  2. 插件架构:功能模块化,按需加载
  3. 多框架支持:React/Vue/Svelte/原生JS适配
  4. 状态管理:Redux风格的集中式状态
  5. 性能优化:瓦片渲染、缓存、Web Worker
  6. 功能完整:注释、表单、搜索、打印全支持

下一篇将介绍CI/CD流水线的设计与实践。

阅读更多

Skills系统:可扩展AI能力设计

Skills系统:可扩展AI能力设计

概述 Skills系统是AI-Native架构中的重要组件,它允许通过声明式配置扩展AI的能力。本文将介绍Skills系统的设计与实现,让大模型能够像人类专家一样具备特定领域的能力。 什么是Skills系统 概念 Skills(技能)是一种声明式的AI能力扩展机制,类似于人类的"专业技能": 通用AI助手 专业AI助手(带Skills) ┌──────────────────────┐ ┌──────────────────────────────┐ │ │ │ │ │ 用户:请帮我写代码 │ │ 用户:请帮我审查这段代码 │ │ │ │ │ │ AI:我是一个AI助手 │ │ AI:[激活

By 菱角
插件化架构设计模式

插件化架构设计模式

概述 插件化架构是一种将核心功能与扩展功能分离的设计模式,允许系统在运行时动态加载和卸载功能模块。本文将介绍如何在微服务平台中设计和实现插件化架构。 为什么需要插件化 插件化优势 1. 模块化:功能独立,边界清晰 2. 可扩展:按需加载,动态增删 3. 隔离性:插件间互不干扰 4. 可维护:独立开发、测试、部署 5. 可定制:用户按需选择功能 核心设计 架构概览 核心组件实现 1. 插件接口定义 // core/plugin.interface.ts // 插件接口 export interface IPlugin { // 插件名称 readonly name: string // 插件版本 readonly version: string // 插件配置 getConfig(): PluginConfig // 插件清单

By 菱角
gRPC服务通信设计与实践

gRPC服务通信设计与实践

概述 在微服务架构中,服务间通信是关键环节。相比REST API,gRPC提供了更高的性能和更强的类型安全。本文将介绍如何在微服务平台中设计和实现gRPC服务通信。 为什么选择gRPC gRPC vs REST对比 特性 gRPC REST 协议 HTTP/2 HTTP/1.1 序列化 Protocol Buffers (二进制) JSON (文本) 性能 高(二进制+压缩) 中(文本开销) 类型安全 强(代码生成) 弱(运行时检查) 流式通信 原生支持(双向流) 需额外实现(SSE/WebSocket) 代码生成 自动生成 手动编写 浏览器支持 需gRPC-Web 原生支持 调试难度

By 菱角
多语言微服务架构:Node.js与Python协作

多语言微服务架构:Node.js与Python协作

概述 在微服务架构中,根据场景选择最适合的编程语言是最佳实践。本文将介绍如何在微服务平台中实现Node.js与Python的协作,发挥各自技术优势。 技术选型策略 为什么混合使用 服务划分 Node.js服务(7个) 服务 功能 选择Node.js的原因 llm.api 大模型服务 高并发SSE流式响应 ucenter.api 用户中心 RESTful API标准实践 doc.api 文件服务 流式上传下载处理 resource.api 资源管理 gRPC高性能通信 rag.api 知识库服务 MongoDB集成便利 statistic.api 统计分析 事件驱动架构 pptonline.api PPT服务 与前端技术栈统一 Python服务(1个) 服务 功能 选择Python的原因

By 菱角