Tutorial#

Introduction#

python-epub3 is a Python library for managing ePub 3 books, even though it can also be used to operate ePub 2.

Install through github:

pip install git+https://github.com/ChenyangGao/python-epub3

Install through pypi:

pip install python-epub3

Reading ePub#

from epub3 import ePub

book = ePub("sample.epub")

There is a epub3.ePub class used for operating ePub files. It accepts a optional file path to the ePub file as argument.

Let’s say the sample.epub with the content.opf file content is

<?xml version="1.0" encoding="UTF-8"?>
<package version="3.3" unique-identifier="pub-id" xmlns="http://www.idpf.org/2007/opf" >
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
       <dc:identifier id="pub-id">urn:uuid:bb4d4afe-f787-4d21-97b8-68f6774ba342</dc:identifier>
       <dc:title>ePub</dc:title>
       <dc:language>en</dc:language>
       <meta property="dcterms:modified">2989-06-04T00:00:00Z</meta>
    </metadata>
   <manifest>
      <item
          id="nav"
          href="nav.xhtml"
          properties="nav"
          media-type="application/xhtml+xml"/>
      <item
          id="intro"
          href="intro.xhtml"
          media-type="application/xhtml+xml"/>
      <item
          id="c1"
          href="chap1.xhtml"
          media-type="application/xhtml+xml"/>
      <item
          id="c1-answerkey"
          href="chap1-answerkey.xhtml"
          media-type="application/xhtml+xml"/>
      <item
          id="c2"
          href="chap2.xhtml"
          media-type="application/xhtml+xml"/>
      <item
          id="c2-answerkey"
          href="chap2-answerkey.xhtml"
          media-type="application/xhtml+xml"/>
      <item
          id="c3"
          href="chap3.xhtml"
          media-type="application/xhtml+xml"/>
      <item
          id="c3-answerkey"
          href="chap3-answerkey.xhtml"
          media-type="application/xhtml+xml"/>
      <item
          id="notes"
          href="notes.xhtml"
          media-type="application/xhtml+xml"/>
      <item
          id="cover"
          href="images/cover.svg"
          properties="cover-image"
          media-type="image/svg+xml"/>
      <item
          id="f1"
          href="images/fig1.jpg"
          media-type="image/jpeg"/>
      <item
          id="f2"
          href="images/fig2.jpg"
          media-type="image/jpeg"/>
      <item
          id="css"
          href="style/book.css"
          media-type="text/css"/>
   </manifest>
    <spine
        page-progression-direction="ltr">
    <itemref
        idref="intro"/>
    <itemref
        idref="c1"/>
    <itemref
        idref="c1-answerkey"
        linear="no"/>
    <itemref
        idref="c2"/>
    <itemref
        idref="c2-answerkey"
        linear="no"/>
    <itemref
        idref="c3"/>
    <itemref
        idref="c3-answerkey"
        linear="no"/>
    <itemref
        idref="notes"
        linear="no"/>
    </spine>
</package>

Package document#

The package document is an XML document that consists of a set of elements that each encapsulate information about a particular aspect of an EPUB publication. These elements serve to centralize metadata, detail the individual resources, and provide the reading order and other information necessary for its rendering.

The following list summarizes the information found in the package document:

Metadata — mechanisms to include and/or reference information about the EPUB publication.
A manifest — identifies via URL [url], and describes via MIME media type [rfc4839], the set of publication resources.
A spine — an ordered sequence of ID references to top-level resources in the manifest from which reading systems can reach or utilize all other resources in the set. The spine defines the default reading order.
Collections — a method of encapsulating and identifying subcomponents within the EPUB publication.
Manifest fallback chains — a mechanism that defines an ordered list of top-level resources as content equivalents. A reading system can then choose between the resources based on which it is capable of rendering.

Metadata#

The metadata element encapsulates meta information.

The package document metadata element has two primary functions:

to provide a minimal set of meta information for reading systems to use to internally catalogue an EPUB publication and make it available to a user (e.g., to present in a bookshelf).
to provide access to all rendering metadata needed to control the layout and display of the content (e.g., fixed-layout properties).

Note

The package document does not provide complex metadata encoding capabilities. If EPUB creators need to provide more detailed information, they can associate metadata records (e.g., that conform to an international standard such as [onix] or are created for custom purposes) using the link element. This approach allows reading systems to process the metadata in its native form, avoiding the potential problems and information loss caused by translating to use the minimal package document structure.

Property epub3.ePub.metadata is used for fetching metadata. It is an instance of type epub3.Metadata.

Dublin Core required elements#

Minimal required metadata elements from DCMES (Dublin Core Metadata Element Set) is:

dc:identifier contains an identifier such as a UUID, DOI or ISBN.
dc:title represents an instance of a name for the EPUB publication.
dc:language specifies the language of the content of the EPUB publication.

The dc prefix namespace represents the URI http://purl.org/dc/elements/1.1/ and is used when accessing Dublin Core metadata.

The minimal set of metadata required in the package document is defined inside of content.opf file.

<package unique-identifier="pub-id">
    <metadata>
       <dc:identifier id="pub-id">urn:uuid:bb4d4afe-f787-4d21-97b8-68f6774ba342</dc:identifier>
       <dc:title>ePub</dc:title>
       <dc:language>en</dc:language>
       <meta property="dcterms:modified">2989-06-04T00:00:00Z</meta>
    </metadata>
</package>

>>> book.metadata.dc('identifier')
<DCTerm(<{http://purl.org/dc/elements/1.1/}identifier>, attrib={'id': 'BookId'}, text='urn:uuid:bb4d4afe-f787-4d21-97b8-68f6774ba342') at 0x105338210>

>>> book.metadata.dc('title')
<DCTerm(<{http://purl.org/dc/elements/1.1/}title>, text='ePub') at 0x105313fd0>

>>> book.metadata.dc('language')
<DCTerm(<{http://purl.org/dc/elements/1.1/}language>, text='en') at 0x105357550>

>>> book.metadata.meta('[@property="dcterms:modified"]')
<Meta(<{http://www.idpf.org/2007/opf}meta>, attrib={'property': 'dcterms:modified'}, text='2989-06-04T00:00:00Z') at 0x10532dd90>

You can also use these properties to quickly obtain

>>> book.identifier
'urn:uuid:bb4d4afe-f787-4d21-97b8-68f6774ba342'

>>> book.title
'ePub'

>>> book.language
'en'

Dublin Core optional elements#

All [dcterms] elements except for dc:identifier, dc:language, and dc:title are designated as OPTIONAL.

Properties in the `/terms/` namespace:	abstract, accessRights, accrualMethod, accrualPeriodicity, accrualPolicy, alternative, audience, available, bibliographicCitation, conformsTo, contributor, coverage, created, creator, date, dateAccepted, dateCopyrighted, dateSubmitted, description, educationLevel, extent, format, hasFormat, hasPart, hasVersion, identifier, instructionalMethod, isFormatOf, isPartOf, isReferencedBy, isReplacedBy, isRequiredBy, issued, isVersionOf, language, license, mediator, medium, modified, provenance, publisher, references, relation, replaces, requires, rights, rightsHolder, source, spatial, subject, tableOfContents, temporal, title, type, valid
Properties in the `/elements/1.1/` namespace:	contributor, coverage, creator, date, description, format, identifier, language, publisher, relation, rights, source, subject, title, type
Vocabulary Encoding Schemes:	DCMIType, DDC, IMT, LCC, LCSH, MESH, NLM, TGN, UDC
Syntax Encoding Schemes:	Box, ISO3166, ISO639-2, ISO639-3, Period, Point, RFC1766, RFC3066, RFC4646, RFC5646, URI, W3CDTF
Classes:	Agent, AgentClass, BibliographicResource, FileFormat, Frequency, Jurisdiction, LicenseDocument, LinguisticSystem, Location, LocationPeriodOrJurisdiction, MediaType, MediaTypeOrExtent, MethodOfAccrual, MethodOfInstruction, PeriodOfTime, PhysicalMedium, PhysicalResource, Policy, ProvenanceStatement, RightsStatement, SizeOrDuration, Standard
DCMI Type Vocabulary:	Collection, Dataset, Event, Image, InteractiveResource, MovingImage, PhysicalObject, Service, Software, Sound, StillImage, Text
Terms for vocabulary description:	domainIncludes, memberOf, rangeIncludes, VocabularyEncodingScheme

The `meta` element#

The meta element provides a generic means of including package metadata.

Each meta element defines a metadata expression. The property attribute takes a property data type value that defines the statement made in the expression, and the text content of the element represents the assertion. (Refer to D.1 Vocabulary association mechanisms for more information.)

This specification defines two types of metadata expressions that EPUB creators can define using the meta element:

A primary expression is one in which the expression defined in the meta element establishes some aspect of the EPUB publication. A meta element that omits a refines attribute defines a primary expression.
A subexpression is one in which the expression defined in the meta element is associated with another expression or resource using the refines attribute to enhance its meaning. A subexpression might refine a media clip, for example, by expressing its duration, or refine a creator or contributor expression by defining the role of the person.

Note

EPUB creators MAY use subexpressions to refine the meaning of other subexpressions, thereby creating chains of information.

All the [dcterms] elements represent primary expressions, and permit refinement by meta element subexpressions.

Note

The Meta Properties Vocabulary is the default vocabulary for use with the property attribute.

EPUB creators MAY add terms from other vocabularies as defined in D.1 Vocabulary association mechanisms.

You can also have custom metadata. For instance this is how custom metadata is defined in content.opf file. You can define same key more than once.

<dc:creator id="creator">ChengyangGao</dc:creator>
<meta refines="#creator" property="role" scheme="marc:relators">author</meta>
<meta refines="#creator" property="file-as" scheme="marc:relators">author</meta>

book.metadata.add("dc:creator", dict(id="creator"), text="ChenyangGao")
book.metadata.add("meta", dict(refines="#creator", property="role", scheme="marc:relators", id="role"), text="author")
book.metadata.add("meta", dict(refines="#creator", property="file-as", scheme="marc:relators", id="file-as"), text="author")

To get all <meta>, you can do as the following

>>> book.metadata.iterfind('meta').list()
[<Meta(<{http://www.idpf.org/2007/opf}meta>, attrib={'property': 'dcterms:modified'}, text='2989-06-04T00:00:00Z') at 0x10532dd90>,
 <Meta(<{http://www.idpf.org/2007/opf}meta>, attrib={'refines': '#creator', 'property': 'role', 'scheme': 'marc:relators', 'id': 'role'}, text='author') at 0x1053ee3d0>,
 <Meta(<{http://www.idpf.org/2007/opf}meta>, attrib={'refines': '#creator', 'property': 'file-as', 'scheme': 'marc:relators', 'id': 'file-as'}, text='author') at 0x105e15610>]

Note

The Metadata.iterfind() method uses ElementPath to retrieve child nodes.

Check the official documentation for more info:

The `link` element#

The link element associates resources with an EPUB publication, such as metadata records.

The metadata element MAY contain zero or more link elements, each of which identifies the location of a publication resource or a linked resource in its REQUIRED href attribute.

Resources referenced from the link element are publication resources only when they are:

referenced from the spine; or
included or embedded in an EPUB content document (e.g., a metadata record serialized as RDFa [rdfa-core] or as JSON-LD [json-ld11] embedded in an [html] script element).

In all other cases (e.g., when linking to standalone [onix] records), the resources referenced are not publication resources (i.e., are not subject to core media type requirements) and EPUB creators MUST NOT list them in the manifest.

Manifest#

The manifest element provides an exhaustive list of publication resources used in the rendering of the content.

With the exception of the package document, the manifest MUST list all publication resources regardless of whether they are container resources or remote resources.

As the package document is already identified by the container.xml file, the manifest MUST NOT specify an item element for it (i.e., a self-reference serves no purpose).

Note

The manifest is only for listing publication resources. Linked resources and the special files for processing the OCF Container (i.e., files in the META-INF directory, and the mimetype file) are restricted from inclusion.

Failure to provide a complete manifest of publication resources may lead to rendering issues. Reading systems might not unzip such resources or could prevent access to them for security reasons.

Property epub3.ePub.manifest is used for fetching manifest. It is an instance of type epub3.Manifest.

The `item` element#

The item element represents a publication resource.

The epub3.ePub.manifest contains a series of items that are wrapped by the epub3.Item class.

Each item element has 3 required attributes: id, href and media-type.

<manifest>
  <item
      id="nav"
      href="nav.xhtml"
      properties="nav"
      media-type="application/xhtml+xml"/>
  <item
      id="intro"
      href="intro.xhtml"
      media-type="application/xhtml+xml"/>
  <item
      id="c1"
      href="chap1.xhtml"
      media-type="application/xhtml+xml"/>
  <item
      id="c1-answerkey"
      href="chap1-answerkey.xhtml"
      media-type="application/xhtml+xml"/>
  <item
      id="c2"
      href="chap2.xhtml"
      media-type="application/xhtml+xml"/>
  <item
      id="c2-answerkey"
      href="chap2-answerkey.xhtml"
      media-type="application/xhtml+xml"/>
  <item
      id="c3"
      href="chap3.xhtml"
      media-type="application/xhtml+xml"/>
  <item
      id="c3-answerkey"
      href="chap3-answerkey.xhtml"
      media-type="application/xhtml+xml"/>
  <item
      id="notes"
      href="notes.xhtml"
      media-type="application/xhtml+xml"/>
  <item
      id="cover"
      href="./images/cover.svg"
      properties="cover-image"
      media-type="image/svg+xml"/>
  <item
      id="f1"
      href="./images/fig1.jpg"
      media-type="image/jpeg"/>
  <item
      id="f2"
      href="./images/fig2.jpg"
      media-type="image/jpeg"/>
  <item
      id="css"
      href="./style/book.css"
      media-type="text/css"/>
</manifest>

>>> book.manifest
{'nav': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'nav', 'href': 'nav.xhtml', 'properties': 'nav', 'media-type': 'application/xhtml+xml'}) at 0x105324910>,
 'intro': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'intro', 'href': 'intro.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105324ed0>,
 'c1': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c1', 'href': 'chap1.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325650>,
 'c1-answerkey': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c1-answerkey', 'href': 'chap1-answerkey.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325790>,
 'c2': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c2', 'href': 'chap2.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325850>,
 'c2-answerkey': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c2-answerkey', 'href': 'chap2-answerkey.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325f90>,
 'c3': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c3', 'href': 'chap3.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x1053267d0>,
 'c3-answerkey': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c3-answerkey', 'href': 'chap3-answerkey.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105327450>,
 'notes': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'notes', 'href': 'notes.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x1053274d0>,
 'cover': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'cover', 'href': 'images/cover.svg', 'properties': 'cover-image', 'media-type': 'image/svg+xml'}) at 0x105327a50>,
 'f1': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'f1', 'href': 'images/fig1.jpg', 'media-type': 'image/jpeg'}) at 0x105355410>,
 'f2': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'f2', 'href': 'images/fig2.jpg', 'media-type': 'image/jpeg'}) at 0x105563c10>,
 'css': <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'css', 'href': 'style/book.css', 'media-type': 'text/css'}) at 0x105560a50>}

>>> book.manifest.list()
[<Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'nav', 'href': 'nav.xhtml', 'properties': 'nav', 'media-type': 'application/xhtml+xml'}) at 0x105324910>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'intro', 'href': 'intro.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105324ed0>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c1', 'href': 'chap1.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325650>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c1-answerkey', 'href': 'chap1-answerkey.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325790>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c2', 'href': 'chap2.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325850>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c2-answerkey', 'href': 'chap2-answerkey.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325f90>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c3', 'href': 'chap3.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x1053267d0>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c3-answerkey', 'href': 'chap3-answerkey.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105327450>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'notes', 'href': 'notes.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x1053274d0>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'cover', 'href': 'images/cover.svg', 'properties': 'cover-image', 'media-type': 'image/svg+xml'}) at 0x105327a50>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'f1', 'href': 'images/fig1.jpg', 'media-type': 'image/jpeg'}) at 0x105355410>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'f2', 'href': 'images/fig2.jpg', 'media-type': 'image/jpeg'}) at 0x105563c10>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'css', 'href': 'style/book.css', 'media-type': 'text/css'}) at 0x105560a50>]

You can retrieve an element from the manifest in various ways.

>>> # by index
>>> item = book.manifest[0]
>>> item
<Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'nav', 'href': 'nav.xhtml', 'properties': 'nav', 'media-type': 'application/xhtml+xml'}) at 0x105324910>

>>> # by item (itself)
>>> book.manifest[item]
<Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'nav', 'href': 'nav.xhtml', 'properties': 'nav', 'media-type': 'application/xhtml+xml'}) at 0x105324910>

>>> # by id (to the item)
>>> book.manifest[item.id]
<Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'nav', 'href': 'nav.xhtml', 'properties': 'nav', 'media-type': 'application/xhtml+xml'}) at 0x105324910>

>>> # by href (to the item)
>>> book.manifest(item.href)
<Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'nav', 'href': 'nav.xhtml', 'properties': 'nav', 'media-type': 'application/xhtml+xml'}) at 0x105324910>

You can use Manifest.filter_by_attrs() to filter items by a certain attribute (default to 'media-type').

>>> # equal to this value: "image/jpeg"
>>> book.manifest.filter_by_attr("image/jpeg").list()
[<Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'f1', 'href': 'images/fig1.jpg', 'media-type': 'image/jpeg'}) at 0x105355410>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'f2', 'href': 'images/fig2.jpg', 'media-type': 'image/jpeg'}) at 0x105563c10>]

>>> # starts with the specified prefix: "image"
>>> book.manifest.filter_by_attr("^image").list()
[<Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'cover', 'href': 'images/cover.svg', 'properties': 'cover-image', 'media-type': 'image/svg+xml'}) at 0x105327a50>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'f1', 'href': 'images/fig1.jpg', 'media-type': 'image/jpeg'}) at 0x105355410>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'f2', 'href': 'images/fig2.jpg', 'media-type': 'image/jpeg'}) at 0x105563c10>]

>>> # ends with the specified suffix: "xhtml+xml"
>>> book.manifest.filter_by_attr("$xhtml+xml").list()
[<Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'nav', 'href': 'nav.xhtml', 'properties': 'nav', 'media-type': 'application/xhtml+xml'}) at 0x105324910>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'intro', 'href': 'intro.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105324ed0>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c1', 'href': 'chap1.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325650>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c1-answerkey', 'href': 'chap1-answerkey.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325790>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c2', 'href': 'chap2.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325850>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c2-answerkey', 'href': 'chap2-answerkey.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105325f90>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c3', 'href': 'chap3.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x1053267d0>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'c3-answerkey', 'href': 'chap3-answerkey.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105327450>,
 <Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'notes', 'href': 'notes.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x1053274d0>]

To open a file, you can use either the Manifest.open() or Item.open() method, which returns a file-like object, an instance of io.Base.

>>> item = book.manifest("nav.xhtml")
>>> item
<Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'nav', 'href': 'nav.xhtml', 'properties': 'nav', 'media-type': 'application/xhtml+xml'}) at 0x105324910>

>>> book.manifest.open(item.href)
<_io.TextIOWrapper name='/var/folders/k1/3r19jl7d30n834vdmbz9ygh80000gn/T/tmpjar2_4kv/4d4b73b9-61a9-4de4-b773-5ff752b920af' encoding='utf-8'>

>>> item.open()
<_io.TextIOWrapper name='/var/folders/k1/3r19jl7d30n834vdmbz9ygh80000gn/T/tmpjar2_4kv/4d4b73b9-61a9-4de4-b773-5ff752b920af' encoding='utf-8'>

>>> item.open("rb")
<_io.BufferedReader name='/var/folders/k1/3r19jl7d30n834vdmbz9ygh80000gn/T/tmpjar2_4kv/4d4b73b9-61a9-4de4-b773-5ff752b920af'>

>>> item.open("rb", buffering=0)
<_io.FileIO name='/var/folders/k1/3r19jl7d30n834vdmbz9ygh80000gn/T/tmpjar2_4kv/4d4b73b9-61a9-4de4-b773-5ff752b920af' mode='rb' closefd=True>

Spine#

The spine element defines an ordered list of manifest item references that represent the default reading order.

The spine MUST specify at least one EPUB content document or foreign content document.

Important

EPUB creators MUST list in the spine all EPUB and foreign content documents that are hyperlinked to from publication resources in the spine, where hyperlinking encompasses any linking mechanism that requires the user to navigate away from the current resource. Common hyperlinking mechanisms include the href attribute of the [html] a and area elements and scripted links (e.g., using DOM Events and/or form elements). The requirement to list hyperlinked resources applies recursively (i.e., EPUB creators must list all EPUB and foreign content documents hyperlinked to from hyperlinked documents, and so on.).

EPUB creators also MUST list in the spine all EPUB and foreign content documents hyperlinked to from the EPUB navigation document, regardless of whether EPUB creators include the EPUB navigation document in the spine.

Note

As hyperlinks to resources outside the EPUB container are not publication resources, they are not subject to the requirement to include in the spine (e.g., web pages and web-hosted resources).

Publication resources used in the rendering of spine items (e.g., referenced from [html] embedded content) similarly do not have to be included in the spine.

Property epub3.ePub.spine is used for fetching spine. It is an instance of type epub3.Spine.

The itemref element#

The itemref element identifies an EPUB content document or foreign content document in the default reading order.

The epub3.ePub.spine contains a series of itemrefs that are wrapped by the epub3.Itemref class.

Each itemref element has a required attribute: idref.

Important

Each itemref element MUST reference the ID [xml] of an item in the manifest via the IDREF [xml] in its idref attribute. item element IDs MUST NOT be referenced more than once.

Each referenced manifest item MUST be either a) an EPUB content document or b) a foreign content document that includes an EPUB content document in its manifest fallback chain.

Note

Although EPUB publications require an EPUB navigation document, it is not mandatory to include it in the spine.

<spine
    page-progression-direction="ltr">
   <itemref
       idref="intro"/>
   <itemref
       idref="c1"/>
   <itemref
       idref="c1-answerkey"
       linear="no"/>
   <itemref
       idref="c2"/>
   <itemref
       idref="c2-answerkey"
       linear="no"/>
   <itemref
       idref="c3"/>
   <itemref
       idref="c3-answerkey"
       linear="no"/>
   <itemref
       idref="notes"
       linear="no"/>
</spine>

>>> book.spine
{'intro': <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'intro'}) at 0x105574450>,
 'c1': <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c1'}) at 0x105574510>,
 'c1-answerkey': <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c1-answerkey'}) at 0x1055745d0>,
 'c2': <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c2'}) at 0x105574690>,
 'c2-answerkey': <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c2-answerkey'}) at 0x105574790>,
 'c3': <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c3'}) at 0x105574890>,
 'c3-answerkey': <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c3-answerkey'}) at 0x105574990>,
 'notes': <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'notes'}) at 0x105574a50>}

>>> book.spine.iter().list()
[<Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'intro'}) at 0x105574450>,
 <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c1'}) at 0x105574510>,
 <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c1-answerkey'}) at 0x1055745d0>,
 <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c2'}) at 0x105574690>,
 <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c2-answerkey'}) at 0x105574790>,
 <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c3'}) at 0x105574890>,
 <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'c3-answerkey'}) at 0x105574990>,
 <Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'notes'}) at 0x105574a50>]

You can retrieve an element from the spine in various ways.

>>> # by index
>>> itemref = book.spine[0]
>>> itemref
<Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'intro'}) at 0x105574450>

>>> # by item
>>> item = book.manifest[itemref.idref]
>>> item
<Item(<{http://www.idpf.org/2007/opf}item>, attrib={'id': 'intro', 'href': 'intro.xhtml', 'media-type': 'application/xhtml+xml'}) at 0x105324ed0>
>>> book.spine[item]
<Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'intro'}) at 0x105574450>

>>> # by itemref (itself)
>>> book.spine[itemref]
<Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'intro'}) at 0x105574450>

>>> # by id (to the item)
>>> book.manifest[itemref.idref]
<Itemref(<{http://www.idpf.org/2007/opf}itemref>, attrib={'idref': 'intro'}) at 0x105574450>

Writing ePub#

…

Tutorial#

Introduction#

Reading ePub#

Package document#

Metadata#

Dublin Core required elements#

Dublin Core optional elements#

The meta element#

The link element#

Manifest#

The item element#

Spine#

The itemref element#

Writing ePub#

The `meta` element#

The `link` element#

The `item` element#