I’ve implemented a lightweight Javascript library for breaking large HTML documents into a series of pages, obeying the stylesheet constraints (like page-break-before) provided by CSS. You can find the library, which is open-source, at codex.js. It’s technically part of the FDJT library, but it only depends on dom.js.
If you wonder why you might want something like this, take a look at my post on how The Page Isn’t Dead. For all sorts of reasons, it’s easier to read a large document when it’s intelligently broken into pages. Unfortunately, most browsers don’t do this themselves, and even most e-readers do it poorly.
The library works by constructing a series of div.codexpage blocks within a designated container. A stylesheet (codex.css) defines div.codexpage to be a fixed positioned element and users of the library can extend or modify this definition.
Using the library starts by instantiating an instance of the CodexLayout object, e.g.
var layout=new CodexLayout();
You can set a lot of parameters, though it also tries to get defaults values in various ways. A more fleshed out call might look like this:
var layout=new CodexLayout({
page_width: 500, page_height: 500, // dimensions
// Where to add new pages
container: document.getElementByID("MYPAGES"),
// Prefix for page element IDs, e.g.
// page 42 would have id MYCODEXPAGE42
pageprefix: "MYCODEXPAGE",
logfn: console.log, // how to log notable events
// Layout rules:
// Codex observes CSS declarations, but you can put
// additional constraints here:
forcebreakbefore: "H1",
forcebreakafter: "div.signature",
avoidbreakinside: "div.code",
avoidbreakbefore: "div.signature,div.attribution",
avoidbreakafter: "h1,h2,h3,h4,h5,h6,h7",
// There are some constraints which CSS doesn't include:
codexfullpage: "div.titlepage",
// Put this element on a page by itself, but don't
// interrupt the narrative flow
codexfloatpage: "div.illustration"});
Once you’ve got a layout object, you just add DOM nodes to it by calling addContent(node). It actually moves the DOM nodes, but you can always call the layout’s revert() method to restore all the nodes to where they came from.
The implementation works by duplicating the document hierarchy on each page, splitting the contents of container nodes and duplicating their contents if needed. For example, suppose a poem wrapped in a div.poem block needs to be split across multiple pages. Each page will have its own div.poem block and the first one will have a codexdupstart CSS class, the last one will have the codexdupend class, and any intervening nodes will have a simple codexdup class. By default, these CSS classes try to override any top or bottom definitions (margins, borders, padding), but designers may want to customize these definitions.
The library is agnostic about how pages are navigated. By default, div.codexpage has zero opacity and div.codexpage.curpage has an opacity of 1, making it easy to change pages by moving the curpage class around (it does a little fade-in/fade-out if supported by the browser). But users of the library can override these definitions to add (for example) sliding page transitions or other effects.
There is still work to be done, but it seems to be working pretty well and it’s not too slow. It uses the browser’s underlying geometry and styling engines, which limits how fast it can really go.
Enjoy. I’ll update here if there are minor changes and post news about more significant changes.
October 20, 2011 at 11:06 pm |
Huh, crazy – the concept & code is incredibly similar to something I did last year called rePublish, at github.com/blaine/republish
Good to see someone else working on similar concepts!